Monday, March 10, 2014

On the Misuse of Statistics in Testing by the NY State Department of Education: Anonymous Posting by a Friendly Statistician

The Common Core and Departments of Education: Lies, Darn Lies, Statistics and Education Statistics
Numbers have taken center stage in the discussion of education policy in the United States. Test score metrics have become a particularly critical set of numbers. They are seen as objective measuring devices, comparable across years, that provide a reliable evaluation of how students, teachers, schools, districts, and the United States as whole are doing. But are they really objective?
The push for implementation of Common Core exams has caught the attention of the public.  In New York State, as in many other states across the nation, questions have been raised about the motivations of those pushing for the roll-out of these exams and their use in high-stakes evaluations. As we will see below such concerns are definitely legitimate given the history of the New York State Department of Education and the Board of Regents in setting cut-scores and changing exams in ways that serve political and other ends.
Let’s start with Biology, a standard course that almost every high school freshman takes. Remember dissecting that frog? In 2001 the New York State Department of Education changed the Biology Regents to a re-named “Living Environment.” A rather remarkable aspect of the change was the dramatic lowering of the passing score. In the Biology exam a student needed at least 59 points (out of a total of 85 possible points) to earn a passing grade of 65. On the new Living Environment Regents students need only 40 points (out of a total of 85 possible points) to earn a passing grade of 65. In some years (e.g. 2004) a student needed only 38 out of 85 points to earn a passing grade of 65.
The story repeats itself in mathematics. Until 2002 the New York State Department of Education required students to take a “Sequential Mathematics I” exam. That test had a total point value of 100 points. The conversion was simple enough, each point was equal to one point and a student needed 65 points to pass. Then, in 2002, the math exam was switched to a “Mathematics A” exam. On this test students needed to score 35 out of a possible 84 points to earn a 65 and pass. Earning 42% of the possible points led to a 65. Then, in 2008, the math exam was switched again, this time to an “Integrated Algebra” exam. On this test students needed to earn 30 out of a possible 87 points to earn a 65 and pass. Earning 34% of the possible points now led to a 65.
The United States and Global History exams underwent similar changes at the turn of the millennium. Before the changes students were required to write 3 essays accounting for 45% of their final score. After the changes students were required to write only two essays accounting for only 35% of their final score. On one of the essays students are provided with extensive information they can use in their writing.
A couple of years later the exact same process occurred with the English Regents. In 2011 the New York State Department of Education changed the exam from a two part six hour test with two essays to a single part three hour test with only one essay. Again the cut scores were dramatically lowered. The scales on these two exams are very different making comparison difficult. One way to measure the change is to look at the grade a student would receive if s/he got exactly half the multiple choice questions correct and earned exactly half of the possible points on the essay(s). On the old English exam that student would have received a grade of 43. On the new English exam a grade of 50.
A year ago the New York State Department of Education changed things yet again. But this time they did not change the exam. They just changed the cut scores. From 2011 until 2013 out of 286 possible point combinations on the exam an average of 74 resulted in a passing grade. Then, in June of 2013, the number of point combinations leading to a passing grade was dramatically lowered by 23%. Since then an average of 63 point combinations out of 286 leads to a passing grade.
It is disturbing that this change occurred at the very moment when the test results would first be used to evaluate teachers. The research base shows that such value-added metrics are unreliable. For example a  RAND report concluded “the research base is currently insufficient for us to recommend the use of VAM for high-stakes decisions.” A report out of Brown University concluded “the promise that value-added systems can provide such a precise, meaningful, and comprehensive picture is not supported by the data.” Nonetheless New York State passed laws requiring school districts to use test scores in teacher evaluations. Why, at the same time, did the Department of Education quietly change the cut scores on the English Regents? Is it an attempt to ensure that more teachers are rated ineffective? This would allow certain interest groups to declare the law a success and claim that “bad teachers” are now being identified and should be fired. Is it an attempt to create evidence that there is an epidemic of failing students in New York State? This would allow certain interest groups to proclaim that the crisis can only be solved if the new Common Core Standards are implemented without delay.

Advocates of the Common Core are either ignorant of or deliberately ignore this history. A decade ago New York State Department of Education decided that the high school graduation rate was too low. They therefore changed exams and cut scores to make them easier. The graduation rate went up. Now it seems that some powerful interests have decided that it is too easy to graduate. So they want the exams made harder and the passing cut scores raised. It is evident from the history reviewed above that playing with cut scores is not the way to improve education. After all that just leaves us in the very place we are in today. Yet we seem to be condemned to repeat this cycle all over again.  We seem to be enamored of easy solutions.  Make exams harder (or easier). Raise cut scores (or lower them). What we do not seem to be willing to do as a nation is roll up our sleeves and do the really, really hard work of ensuring that every student receives a quality education.