What Should The Results Of New Teacher Evaluations Look Like?
In a previous post, I discussed the initial results from new teacher evaluations in several states, and the fact that states with implausibly large proportions of teachers in the higher categories face a difficult situation – achieving greater differentiation while improving the quality and legitimacy of their systems.
I also expressed concern that pre-existing beliefs about the “proper” distribution of teacher ratings — in particular, how many teachers should receive the lowest ratings — might inappropriately influence the process of adjusting the systems based on the first round of results. In other words, there is a risk that states and districts will change their systems in a crude manner that lowers ratings simply for the sake of lowering ratings.
Such concerns of course imply a more general question: How should we assess the results of new evaluation systems? That’s a complicated issue, and these are largely uncharted waters. Nevertheless, I’d like to offer a few thoughts as states and districts move forward.
Statewide results mask a great deal of underlying district-level variation in results and design/implementation. This may be an obvious point, but it’s worth noting: The new systems’ results vary within and between states, as should interpretations of and expectations for those results. For instance, the first round of ratings in Michigan, one of the states that received criticism, were as follows:
Clearly, the fact that three percent of teachers received the “minimally effective” or “ineffective” ratings is implausible, and must be addressed. However, the systems, as well as their actual results, vary quite a bit from district to district. The most useful first step here would be to look for associations between districts’ results and
I also expressed concern that pre-existing beliefs about the “proper” distribution of teacher ratings — in particular, how many teachers should receive the lowest ratings — might inappropriately influence the process of adjusting the systems based on the first round of results. In other words, there is a risk that states and districts will change their systems in a crude manner that lowers ratings simply for the sake of lowering ratings.
Such concerns of course imply a more general question: How should we assess the results of new evaluation systems? That’s a complicated issue, and these are largely uncharted waters. Nevertheless, I’d like to offer a few thoughts as states and districts move forward.
Statewide results mask a great deal of underlying district-level variation in results and design/implementation. This may be an obvious point, but it’s worth noting: The new systems’ results vary within and between states, as should interpretations of and expectations for those results. For instance, the first round of ratings in Michigan, one of the states that received criticism, were as follows:
Clearly, the fact that three percent of teachers received the “minimally effective” or “ineffective” ratings is implausible, and must be addressed. However, the systems, as well as their actual results, vary quite a bit from district to district. The most useful first step here would be to look for associations between districts’ results and