[I]f you were to hear about this report only from the staunchest, most ideological opponents of VAM, you would think it says something else entirely. Valerie Strauss, for instance, claims the report “slammed” the use of VAM to evaluate teachers and Diane Ravitch seems to think it is a “damning indictment” of such policies.
The report itself is not nearly so hyperbolic.
For a useful summary check out Stephen Sawchuk, but the report itself is a mere seven accessible pages so I encourage you read it yourself.
The bottom line for the ASA is that they are optimistic about the use of “statistical methodology” to improve and evaluate educational interventions, but current value-added models have many limitations that make them difficult to interpret and apply, especially when evaluating individual teachers.
There’s been considerable confusion about the report because some people seem to be having a hard time getting their heads around the possibility that VAMs have serious limitations but may nevertheless be appropriate for use in education generally or teacher evaluation specifically.
So, for example, it’s true – as I was told over and over on Twitter – that the report states that VAMs “typically measure correlation, not causation” and “do not directly measure potential teacher contributions toward other student outcomes [besides standardized test scores]” and offer “scores and rankings [that] can change substantially when a different model or test is used”.
Crucially, however, none of this establishes that VAMs shouldn’t be used to evaluate teachers.
I mentioned this in the post, but it bears repeating: all methods of educational evaluation have limitations. That includes the methods we currently use! The trick is to identify and understand those limitations, and then select the best method (or combination of methods) for meeting our objectives.
It just so happens that the methods we use to evaluate teachers now consist mostly of classroom observations by administrators, usually employing a lengthy rubric with varying degrees of clarity and specificity. Those observations are by no means free of limitations.
In fact, it’s hard not to suspect that if, in an alternate universe, reformers were trying to move us away from widespread use of VAMs and toward an observation-based system, many critics would be up in arms about the fact that classroom observations are subject to many of the same limitations as VAMs (plus a few of their own).
Do classroom observations “measure causation” with respect to student outcomes? Do they “directly measure teacher contributions” toward student outcomes at all? Are teacher ratings based on observations stable even when evaluators and rubrics are changed?
As with VAMs, the answer to all of those questions about classroom observations is, of course, “no”. And yet we rely – heavily! – on these observations anyway.
Because classroom observations are a ubiquitous part of the status quo today, we take them for granted and their limitations don’t seem nearly so horrendous. VAMs, on the other hand, represent a departure from the status quo, which means that we’re much more likely to turn a critical eye toward them.
That’s a natural tendency, but it’s also basically irrational. The relevant question is not just, “do VAMs have limitations?”, but “compared to what?”