Monday, July 20, 2015

The Science Of Grading Teachers Gets High Marks | FiveThirtyEight

The Science Of Grading Teachers Gets High Marks | FiveThirtyEight:

The Science Of Grading Teachers Gets High Marks

Is evaluating teachers an exact science? Many people — including many teachers and their unions — believe current methods are often too subjective and open to abuse and misinterpretation. But new tools for measuring teacher effectiveness have become more sophisticated in recent years, and several large-scale studies in New York, Los Angeles and North Carolina have given those tools more credibility. A new study released on Monday furthers their legitimacy; and as the science of grading teachers advances, it could push for further adoption of these tools.
This evolving science of teacher evaluation was recently thrust into public controversy when, in 2012, nine students sued the state of California, claiming its refusal to fire bad teachers was harming disadvantaged students. To claim that certain teachers were unambiguously bad, and that the state was responsible, the plaintiffs relied on relatively new measures of teacher effectiveness. In that case, Vergara v. California, several top-notch economists testified for each side as expert witnesses, arguing the merits of these complex statistics. In June 2014, the judge ruled that California’s teacher-tenure protections were unconstitutional, a victory for the plaintiffs. Gov. Jerry Brown is appealing, and a similar case has begun in New York state.
But the economists on both sides of the Vergara case are still engaged in cordial debate. On one side is Raj Chetty of Harvard University, John Friedman of Brown University and Jonah Rockoff of Columbia University — hereafter referred to as “CFR” — who authored two influential paperspublished last year in the American Economic Review; Chetty testified for the plaintiffs in the case. On the other side is Jesse Rothstein, of the University of California at Berkeley, who published a critique of CFR’s methods and supported the state in the Vergara case.
On Monday, to come full circle, the CFR researchers published a reply to Rothstein’s criticisms.
At the center of this debate are evaluation models that try to isolate the educational value added by individual teachers, as measured by their students’ standardized-test scores relative to what one would expect given those students’ prior scores. The hard part, as Friedman says, is to “make sure that when you rate a teacher, that you actually rate what the teacher has done, and not whether they had a bunch of very poor or very rich students.”
The CFR researchers — like the plaintiffs in the Vergara case — claim that these so-called “value added” models accurately isolate a teacher’s impact on students, but Rothstein and critics say that value-added models, although improved, are still biased by factors outside the teacher’s control.
In their pioneering papers published last year, the CFR researchers used massive data sets, covering millions of students over decades, to test whether teacher VA scores could accurately predict students’ test scores. They argue they do, when done right, and thus can be used to winnow the good teachers from the bad. In CFR’s method, teachers are judged from a baseline of the students’ prior-year test scores, and by linking student scores The Science Of Grading Teachers Gets High Marks | FiveThirtyEight: