Back in early April the American Statistical Association put out a “Statement on Using Value-Added Models for Educational Assessment“.
Last month, Raj Chetty, John Friedman, and Jonah Rockoff issued a response, in part because so many commentators seemed to misunderstand the ASA statement and in part because the ASA seemed not to have incorporated some of Chetty et al.’s most recent research.
Diane Ravitch’s unimpressed follow-up involves a few all-too-common misconceptions:
What do Chetty, Friedman, and Rockoff say about the ASA statement? Do they modify their conclusions? No. Did it weaken their arguments in favor of VAM? Apparently not. They agree with all of the ASA cautions but remain stubbornly attached to their original conclusion that one “high-value added (top 5%) rather than an average teacher for a single grade raises a student’s lifetime earnings by more than $50,000.” How is that teacher identified? By the ability to raise test scores. So, again, we are offered the speculation that one tippy-top fourth-grade teacher boosts a student’s lifetime earnings, even though the ASA says that teachers account for “about 1% to 14% of the variability in test scores…”
The argument is that if teachers account for only a small fraction of the variation in student test scores, teacher quality is probably not a useful lever by which we can improve education outcomes.
This is wrong for at least three reasons.
First, to know whether 1%-14% is a lot of variation to account for we have to compare teachers to something else. It’s not entirely clear from her post, but Ravitch1 seems to want to compare teachers to all other factors put together, but that comparison tells us very little. The 86%-99% of the variation in student test scores not explained by teachers is not explained by a single other factor for us to focus all of our policy energy on; it’s an aggregate of a large number of factors, each likely accounting for a much smaller fraction of the variation.
Second, even if some factors explain more variation in test scores, that doesn’t mean we have to pick just one factor to care about. We may want to prioritize, say, poverty reduction over teacher quality improvements, but that doesn’t mean only the former matters.
Third, and most fundamentally, variation accounted for by a factor is not a measure of that factor’s importance. The ASA statement actually points this out somewhat obscurely:
Research on VAMs has been fairly consistent that aspects of educational effectiveness that are measurable and within teacher control represent a small part of the total variation in student test scores or growth; most estimates in the literature attribute between 1% and 14% of the total variability to teachers. This is not saying that teachers have little effect on students, but that variation among teachers accounts for a small part of the variation in scores.
The fact that this was included in the ASA statement has not prevented considerable confusion; VAM critics have latched on to the first sentence, but seem not to understand the significance of the second.
Let’s unpack that second sentence.
If a factor doesn’t explain much of the variation in student test scores, that could mean that the factor is relatively unimportant and that even large changes in that factor would not have significant effects on scores.
Another possibility, however, is that that factor doesn’t systematically vary between students.
Consider “access to breathable oxygen”. If you crunched the numbers, you would likely find that access to breathable oxygen accounts for very little – if any – of the variation in students’ tests scores. This is because all students have roughly similar access to breathable oxygen. If all students have the same access to breathable oxygen, then access to breathable oxygen cannot “explain” or “account for” the differences in their test scores.
Does this mean that access to breathable oxygen is unimportant for test scores? Obviously not. On the contrary: access to breathable oxygen is very important for kids’ test scores, and this is true even though access to breathable oxygen explains ≈0% of their variation.
Now let’s return to the importance of teachers. If teachers account for only a small fraction of variation in student test scores, that may mean that teacher quality is largely unimportant. It may also mean that teacher quality does not vary systematically very much between students.
Another way to think of it is this: if every teacher was exactly as effective as every other teacher, teachers would account for exactly 0% of the variation in student test scores. This would be true regardless of whether these imaginary teachers spent the entire school day reading the newspaper or if they successfully taught advanced calculus to 3rd graders.
In other words, determining statistically how much variation is “explained” by teachers will not, by itself, tell you how important teacher quality is.
This is precisely where research like that of Chetty et al. comes in. It attempts to go beyond simple measures of “explained variation” to quantify teachers’ actual importance and impact.
We can still reasonably disagree about what that research tells us about “the importance of teachers”. What you can’t reasonably do is dismiss that research out of hand using measures of explained variation, as those are not direct measures of importance.
- Ravitch is by no means the only one who makes these mistakes, but she’s usefully illustrative here. [↩]