VAM and Status Quo Bias

4011991883_8cc3a77d69_nYesterday at This Week in Ed I wrote about the American Statistical Association’s report on value-added modeling in education:

[I]f you were to hear about this report only from the staunchest, most ideological opponents of VAM, you would think it says something else entirely. Valerie Strauss, for instance, claims the report “slammed” the use of VAM to evaluate teachers and Diane Ravitch seems to think it is a “damning indictment” of such policies.

The report itself is not nearly so hyperbolic.

For a useful summary check out Stephen Sawchuk, but the report itself is a mere seven accessible pages so I encourage you read it yourself.

The bottom line for the ASA is that they are optimistic about the use of “statistical methodology” to improve and evaluate educational interventions, but current value-added models have many limitations that make them difficult to interpret and apply, especially when evaluating individual teachers.

There’s been considerable confusion about the report because some people seem to be having a hard time getting their heads around the possibility that VAMs have serious limitations but may nevertheless be appropriate for use in education generally or teacher evaluation specifically. 

So, for example, it’s true – as I was told over and over on Twitter – that the report states that VAMs “typically measure correlation, not causation” and “do not directly measure potential teacher contributions toward other student outcomes [besides standardized test scores]” and offer “scores and rankings [that] can change substantially when a different model or test is used”.

Crucially, however, none of this establishes that VAMs shouldn’t be used to evaluate teachers.

I mentioned this in the post, but it bears repeating: all methods of educational evaluation have limitations. That includes the methods we currently use! The trick is to identify and understand those limitations, and then select the best method (or combination of methods) for meeting our objectives.

It just so happens that the methods we use to evaluate teachers now consist mostly of classroom observations by administrators, usually employing a lengthy rubric with varying degrees of clarity and specificity. Those observations are by no means free of limitations.

In fact, it’s hard not to suspect that if, in an alternate universe, reformers were trying to move us away from widespread use of VAMs and toward an observation-based system, many critics would be up in arms about the fact that classroom observations are subject to many of the same limitations as VAMs (plus a few of their own).

Do classroom observations “measure causation” with respect to student outcomes? Do they “directly measure teacher contributions” toward student outcomes at all? Are teacher ratings based on observations stable even when evaluators and rubrics are changed?

As with VAMs, the answer to all of those questions about classroom observations is, of course, “no”. And yet we rely – heavily! – on these observations anyway.

Because classroom observations are a ubiquitous part of the status quo today, we take them for granted and their limitations don’t seem nearly so horrendous. VAMs, on the other hand, represent a departure from the status quo, which means that we’re much more likely to turn a critical eye toward them.

That’s a natural tendency, but it’s also basically irrational. The relevant question is not just, “do VAMs have limitations?”, but “compared to what?”

This entry was posted in Education Reform and tagged , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.


  1. Posted April 15, 2014 at 5:11 PM | Permalink

    Hey Paul,

    I love your point that we should compare using VAM in evaluations to alternative forms of evaluating teachers. However, you ignore one very important difference between VAM and observations: VAM scores are output-focused while teacher observations are input-focused. A considerable amount of research suggests that incentives based on inputs are significantly more effective at motivating behavioral change (seemingly the goal of evaluations) than incentives based on outputs. Probabilistic reasoning also indicates that improved future outcomes depend on our ability to focus on our decision-making process instead of our current outcomes. Current uses of VAM in most evaluation systems fly in the face of these facts, and it’s therefore problematic to consider its limitations to be similar to the limitations (which certainly exist) of administrative evaluations.

    That said, I really appreciate your approach to the ASA’s statement.


    • Posted April 15, 2014 at 5:29 PM | Permalink

      I agree about the utility of input-focused incentive systems. (That’s one of the reasons I’m skeptical of standards- or mastery-based grading systems: )

      That said, it’s not obvious to me that the observation-based system is particularly good as an incentive system, either. What’s useful about input-focused incentives is that they are clear, specific, and plausibly attainable. Our current observation rubrics often lack those characteristics.

      Now, I’d like to see those input rubrics improved, but we’re pretty far from the point in education where we have a clear set of truly specific “best practices” that we’re going to get everybody on board to promote in evaluations. (I’d argue that’s part of the reason for the rise in outcome-focused incentive systems: )

      The more fundamental issue though is that there’s really no way to determine the “right” role for VAMs a priori: it’s a combination of value judgments and empirical questions. My personal suspicion is that there is a modest role for VAMs in teacher evaluation – say, 20% of evaluations for teachers in appropriately-tested subjects – but I’m not wedded to that as a matter of principle or prior conviction. When I talk about the “promise” of VAM I just mean I think they may be useful. They may not be, on balance!

      My goal with these posts is just to emphasize to people that we can’t use a priori arguments to rule out VAMs altogether, particularly when VAMs seem to measure some real phenomena of interest with some reliability.

      • Posted April 15, 2014 at 10:14 PM | Permalink

        I think that’s a good goal and I certainly agree that VAM may be useful. I have a different opinion about its eventual use in evaluations – I’m skeptical that it will ever make sense to use it for anything besides a data point in a cycle of inquiry (with a focus on reflection and development of an action plan instead of dinging or crediting teachers for their score as a specific percentage of the evaluation) – but I would definitely like to see continued research.

        I really like the post you linked and you’re absolutely right that input-based incentives should be “clear, specific, and plausibly attainable.” You’re also right about us not having “a clear set of truly specific ‘best practices,'” but I think that point tells us very clearly that reformers need to focus on identifying and building consensus around these inputs before they start thinking about the outputs. We’re trying to do that with our new teacher evaluation system in San Jose Unified and, if it can be done, it’s the clearest path to the outcomes everyone wants.

        My basic point is that it’s both counterproductive and unethical to base personnel decisions on outputs when we’re not giving teachers clear guidance on the actions they can take to achieve the desired outcomes.

  2. Cara Jackson
    Posted April 16, 2014 at 5:51 AM | Permalink

    I am very appreciative of this alternative universe in which internet comments are really thoughtful. Question for you: what if there’s more than one way to succeed with students? Both of you acknowledge that current rubrics might not be capturing “best practices” but I would argue that one thing VAM has in its favor is that it allows for the possibility that many different teaching styles/practices can be successful.

    Ben said that it’s unethical to base personnel decisions on outputs if we’re not giving teachers clear guidance on the actions they can take to achieve desired outcomes. If the classroom observation rubric ratings are positively correlated with value added, and if these observations are combined with feedback and advice about classroom practice, I think you could argue that teachers are being given guidance on the actions they can take to achieve desired outcomes. Of course, those are two big “ifs” and a lot would depend on the quality of feedback.

    All that said, student achievement is just part of the “desired outcomes” society seeks from the education system. Some recent work from Ferguson & Danielson suggests that measures that predict value added differ systematically from the measures that predicts happiness, effort or inspiration. Multiple measures might allow us to capture teachers’ contributions more holistically.

    • Posted April 16, 2014 at 11:50 PM | Permalink

      While VAM certainly “allows for the possibility that many different teaching styles/practices can be successful,” so does an appropriately-defined set of best practices. To the extent that our definition of best practices excludes the use of unorthodox but effective techniques, I’d argue that the problem lies with a prohibitively narrow definition rather than our attempt to focus on teacher inputs.

      If we’ve done a good job identifying best practices and developing our value added methodology, we’d certainly expect a correlation between evaluations that rely on best practices and value added scores. However, that correlation is never going to be particularly high because so many variables influence student outcomes (I’ve still never seen a study that suggests that school-related factors, in sum, account for more than 33% of the opportunity gap). Given that reality, it is much more sensible (and, as I argued before, more ethical) to make decisions based on teacher inputs than on student outcomes that may or may not have followed directly from those inputs. I believe this paradigm is also better in other professions in which outcomes are more in employees’ proximate control.

      On a different note, I completely agree about the need to consider other components of desired outcomes in addition to test scores and personally value several other outcomes above student achievement data.

One Trackback

Leave a Reply