Why It’s Hard to Rate Teacher Training Programs
As discussed in our recent research brief, there is a continuing debate in public policy about the use of value-added measures to evaluate teachers. While the Department of Education has in many cases agreed to reduce the weight assigned to value-added measures in evaluating teachers and schools, they remain by government mandate part of the conversation. Recently, federal regulations were published regarding the use of value-added measures in teacher education programs – teach-the-teachers, as we say at AIER.
Value-added models demonstrate the difference between how students perform on a standardized test, and how they were expected to perform. Such models are intended to show the added value of particular teachers to their students’ achievement.
The new regulation calls for the states to publish ratings of teacher-prep programs, including those at colleges as well as independent programs such as Teach for America. One of the criteria for rating programs: you guessed it, value-added measures. Specifically, they track a teacher’s value-added score using test scores of the students of recently minted teachers. The regulations also call for the publication of other data, such as proportion of the program’s graduates who get jobs in their chosen specialties.
Although I generally believe value-added measures are useful, I am a little skeptical this will be helpful in evaluating teaching programs. The argument in favor is straightforward: If value-added measures do capture something important about an individual teacher’s performance, then shouldn’t the average value-added score of the teachers minted by a teaching program tell us something important about the program’s performance?
Not so fast. This is actually a point that gets to the heart of what value-added measures are meant to do. Recall that traditional value-added measures compare a student’s test scores to expected scores. If one teacher has a class full of students with high past scores and another teacher has a class of students with low past scores, can we measure the teachers’ value-added score based on whose students score higher? No: We would instead compare how each class scores compare to the typical performance of students with similar histories. There are legitimate questions about whether our controls work, but education researchers put a great deal of effort into trying to account for the student’s background before entering the teacher’s classroom.
The same logic must apply to evaluations of teaching programs. Suppose a state’s two largest education programs are the education departments of Flagship University and the less prestigious Safety State. The best aspiring teachers are probably going to attend Flagship U, and Flagship’s graduates are likely to have better value-added scores and place into their chosen fields. But that does not necessarily mean their training was any better. By the same logic used in value-added, to make that claim we would have to know not only that Flagship’s graduates are better teachers than Safety’s, but that they are better teachers than they would have been if they had attended Safety State. For students, we base our expectations partly on prior test scores, but for new teachers we don’t have the equivalent prior teaching evaluations. Ironically, one common attack against value-added scores, a perceived unfairness to teachers who mentor disadvantaged students, may more correctly apply to schools that train less well-prepared future teachers.
Click here to sign up for the Daily Economy weekly digest!