Student test scores: How the sausage is made and why you should care
Contrary to popular belief, modern cognitive assessments—including the new Common Core tests—produce test scores based on sophisticated statistical models rather than the simple percent of items a student answers correctly. While there are good reasons for this, it means that reported test scores depend on many decisions made by test designers, some of which have important implications for education policy. For example, all else equal, the shorter the length of the test, the greater the fraction of students placed in the top and bottom proficiency categories—an important metric for state accountability. On the other hand, some tests report “shrunken” measures of student ability, which pull particularly high- and low-scoring students closer to the average, leading one to understate the proportion of students in top and bottom proficiency categories. Shrunken test scores will also understate important policy metrics such as the black-white achievement gap—if black children score lower on average than white children, then scores of black students will be adjusted up while the opposite is true for white students.
The scaling of test scores is equally important. Despite common perceptions, a 5-point gain at the bottom of the test score distribution may not mean the same thing in terms of additional knowledge as a 5-point gain at the top of the distribution. This fact has important implications for the value-added based comparisons of teacher effectiveness as well as accountability rankings of schools. There are no easy solutions to these issues. Instead there must be greater transparency of the test creation process, and more robust discussion about the inherent tradeoffs about the creation of test scores, and more robust discussion about how different types of test scores are used for policymaking as well as research.
Testing is ubiquitous in education. From placement in specialized classes to college admissions, standardized exams play a large role in a child’s educational career. The introduction of the federal No Child Left Behind(NCLB) legislation in 2001, which required states to test all students in grades 3-8 in reading and math, dramatically increased the prevalence and use of test scores for education policymaking.
Contrary to popular belief, all modern cognitive assessments—including the new Common Core tests—produce test scores based on sophisticated statistical models rather than the simple percent of items a student answers correctly. There are good reasons for this, as explained below. The downside is that what we see as consumers of test scores depends on decisions made by the designers of the tests about characteristics of those models and their implementation. These details are typically hidden in dense technical Student test scores: How the sausage is made and why you should care | Brookings Institution: