Latest News and Comment from Education

Tuesday, July 21, 2015

Research On Teacher Evaluation Metrics: The Weaponization Of Correlations | Shanker Institute

Research On Teacher Evaluation Metrics: The Weaponization Of Correlations | Shanker Institute:

Research On Teacher Evaluation Metrics: The Weaponization Of Correlations



Our guest author today is Cara Jackson, Assistant Director of Research and Evaluation at the Urban Teacher Center.
In recent years, many districts have implemented multiple-measure teacher evaluation systems, partly in response to federal pressure from No Child Left Behind waivers and incentives from the Race to the Top grant program. These systems have not been without controversy, largely owing to the perception – not entirely unfounded - that such systems might be used to penalize teachers.  One ongoing controversy in the field of teacher evaluation is whether these measures are sufficiently reliable and valid to be used for high-stakes decisions, such as dismissal or tenure.  That is a topic that deserves considerably more attention than a single post; here, I discuss just one of the issues that arises when investigating validity.
 The diagram below is a visualization of a multiple-measure evaluation system, one that combines information on teaching practice (e.g. ratings from a classroom observation rubric) with student achievement-based measures (e.g. value-added or student growth percentiles) and student surveys.  The system need not be limited to three components; the point is simply that classroom observations are not the sole means of evaluating teachers.   
In validating the various components of an evaluation system, researchers often examine their correlation with other components.  To the extent that each component is an attempt to capture something about the teacher’s underlying effectiveness, it’s reasonable to expect that different measurements taken of the same teacher will be positively related.  For example, we might examine whether ratings from a classroom observation rubric are positively correlated with value-added.
Just how strong that relationship should be is less clear.  Recently, I attended a conference session in which one researcher had correlated student surveys and value-added, and another had correlated classroom observation data and student growth percentiles.  Both correlations were low, roughly 0.3, though positive and statistically significant.  But the researchers’ interpretations were quite different: one described the correlation as sufficient to support inclusion of student surveys in teacher evaluations, while the other argued that student growth percentiles are not appropriate for use in teacher evaluations given the low correlation. 
Is it possible for both interpretations to be correct?  Maybe, if you expect that: 1) the correlations between student surveys and value-added should be positive but small (as was the case with the results of the first researcher, who interpreted the modest relationship as supporting the use of student surveys); and 2) the correlations between classroom observation data and student growth percentiles should relatively strong (which would be consistent with the second researcher, who found a weak relationship and concluded that growth percentile estimates were not suitable for use).  You might visualize that “scenario” as follows:
The small green portion of overlap between classroom observations and student surveys represents a small correlation.  Perhaps we expect that these two measures are capturing quite different aspects of teacher quality, and thus have only a modest expectation that teachers who do well on one measure may do well on the other.  The large orange overlap between classroom observations and student growth percentiles, on the other hand, suggests that we expect these two measures are capturing pretty much the same thing: classroom observations capture the teaching practices that generate student gains.
The classroom observation rubrics I’ve seen attempt to capture a broad set of teaching practices, not all of which necessarily have a strong direct impact student achievement gains; some may address social-emotional climate of the classroom, for example.  The inclusion of such practices contributes to the Research On Teacher Evaluation Metrics: The Weaponization Of Correlations | Shanker Institute: