Thursday, August 1, 2013

Shanker Blog » So Many Purposes, So Few Tests

Shanker Blog » So Many Purposes, So Few Tests:

So Many Purposes, So Few Tests

Posted by  on August 1, 2013


In a new NBER working paper, economist Derek Neal makes an important point, one of which many people in education are aware, but is infrequently reflected in actual policy. The point is that using the same assessment to measure both student and teacher performance often contaminates the results for both purposes.
In fact, as Neal notes, some of the very features required to measure student performance are the ones that make possible the contamination when the tests are used in high-stakes accountability systems. Consider, for example, a situation in which a state or district wants to compare the test scores of a cohort of fourth graders in one year with those of fourth graders the next year. One common means of facilitating this comparability is administering some of the questions to both groups (or to some “pilot” sample of students prior to those being tested). Otherwise, any difference in scores between the two cohorts might simply be due to differences in the difficulty of the questions. If you cannot check that out, it’s tough to make meaningful comparisons.
But it’s precisely this need to repeat questions that enables one form of so-called “teaching to the test,” in which administrators and educators use questions from prior assessments to guide their instruction for the current year.
This kind of behavior not only potentially corrupts any use of the results to measure teacher/school performance, but it may also serve to compromise the validity of the assessment as a gauge of student performance – if the students are being “coached” in this fashion, increases in their measured performance may not reflect “true” increases in their knowledge of the subject matter.*
To address this conundrum, Neal recommends using two different assessments – one for measuring student