Monday, February 24, 2014

Shanker Blog » When Checking Under The Hood Of Overall Test Score Increases, Use Multiple Tools

Shanker Blog » When Checking Under The Hood Of Overall Test Score Increases, Use Multiple Tools:



When Checking Under The Hood Of Overall Test Score Increases, Use Multiple Tools

Posted by  on February 24, 2014
When looking at changes in testing results between years, many people are (justifiably) interested in comparing those changes for different student subgroups, such as those defined by race/ethnicity or income (subsidized lunch eligibility). The basic idea is to see whether increases are shared between traditionally advantaged and disadvantaged groups (and, often, to monitor achievement gaps).
Sometimes, people take this a step further by using the subgroup breakdowns as a crude check on whether cross-sectional score changes are due to changes in the sample of students taking the test. The logic is as follows: If the increases are found when comparing advantaged and more disadvantaged cohorts, then an overall increase cannot be attributed to a change in the backgrounds of students taking the test, as the subgroups exhibited the same pattern. (For reasons discussed here many times before, this is a severely limited approach.)
Whether testing data are cross-sectional or longitudinal, these subgroup breakdowns are certainly important and necessary, but it’s wise to keep in mind that standard variables, such as eligibility for free and reduced-price lunches (FRL), are imperfect proxies for student background (actually, FRL rates aren’t even such a great proxy for income). In fact, one might reach different conclusions depending on which variables are chosen. To illustrate this, let’s take a look at results from the Trial Urban District Assessment (TUDA) for the District of Columbia Public Schools between 2011 and 2013, in which there was a large overall score change that received a great deal of media attention, and break the changes down by different characteristics.
The table below presents average eighth grade math scores of DCPS students in 2011 and 2013. Asterisks denote 2011-2013 changes that are statistically significant, but it’s important to note that many of these estimates in the table are from rather small samples, and statistical significance is to a large extent a function of sample size. Since this exercise is purely illustrative, we might ease our focus on statistical significance a little bit, while still keeping in mind that these are imprecise estimates. It also bears mentioning that score changes are correlated with prior year scores.
If you look at the overall results (“All students,” at the bottom), you see that there was a 5 point increase between 2011 and 2013 (this was inappropriately interpreted as policy evidence by those who favor the reforms in DC, but