The Perils of Favoring Consistency over Validity: Are “bad” VAMS more “consistent” than better ones?
This is another stat-geeky researcher post, but I’ll try to tease out the practical implications. This post comes about partly, though not directly in response to a new Brown Center/Brookings report on evaluating teacher evaluation systems. From that report, by an impressive team of authors, one can tease out two apparent preferences for evaluation systems, or more specifically for any statistical component of those evaluation systems to be based on student assessment scores.
- A preference to isolate as precisely as statistically feasible, the influence of the teacher on student test score gains;
- A preference to have a statistical rating of teacher effectiveness that is relatively consistent from year to year (where the more consistent models still aren’t particularly consistent).
While there shouldn’t necessarily be a conflict between identifying the best model of teacher effects and having a model that is reliable over time, I would argue that the pressure to achieve the second objective above may lead researchers – especially those developing models for direct application in school districts – to make