Sample Size And Volatility In School Accountability Systems
It is generally well-known that sample size has an important effect on measurement and, therefore, incentives in test-based school accountability systems.
Within a given class or school, for example, there may be students who are sick on testing day, or get distracted by a noisy peer, or just have a bad day. Larger samples attenuate the degree to which unusual results among individual students (or classes) can influence results overall. In addition, schools draw their students from a population (e.g., a neighborhood). Even if the characteristics of the neighborhood from which the students come stay relatively stable, the pool of students entering the school (or tested sample) can vary substantially from one year to the next, particularly when that pool is small.
Classes and schools tend to be quite small, and test scores vary far more between- than within-student (i.e., over time). As a result, testing results often exhibit a great deal of nonpersistent variation (Kane and Staiger 2002). In other words, much of the differences in test scores between schools, and over time, is fleeting, and this problem is particularly pronounced in smaller schools. One very simple, though not original, way to illustrate this relationship is to compare the results for smaller and larger schools.
Schools vary widely in size. Some are large, with hundreds or thousands of tested students, while others serve just a few dozen students or less. As a rule, test results for smaller schools will be more volatile over time. That is, smaller schools will exhibit larger increases or decreases, positive or negative.
In the scatterplot below, each dot is a New York City school. The vertical axis is schools’ change in their overall proficiency rates (positive or negative) between 2013 and 2014. The horizontal axis is the number of tested students (i.e., sample size) at each school (averaged across both years).
The “sideways cone” shape of the dots indicates that the changes among larger schools – i.e., the dots further to the right of the plot – are considerably more modest than those of smaller schools. Just to give a better idea of these differences, consider that roughly one in four schools in this sample have sample sizes of fewer than 200 students, while almost one in five (17 percent) of schools have samples of 500 or more students. The mean absolute change (positive or negative) for the former schools (fewer than 200 tested students) is 6.7 percentage points, which is almost 50 percent larger than the average absolute change (4.5 percentage points) among the latter schools ( samples of 500 or more students).*
In other words, again, smaller schools exhibit much larger year-to-year changes, whether positive or Sample Size And Volatility In School Accountability Systems | Shanker Institute: