By Bruce Torff
Recently, there was a near riot at a town hall event in Poughkeepsie, N.Y., with angry parents shouting down state education officials. Poughkeepsie is no uber-liberal antitest hotbed, but folks there (and elsewhere in the state) were upset that thousands of honor roll students had failed tough new state tests.
Among other things, parents voiced a question seldom heard in the last few decades: What’s the proof that the tests are any good? That is, how do you know these tests provide a valid and reliable measure of student performance?
Times was, decades ago, when there was real debate in our society about the utility of standardized tests, and, at times, the antitest crowd seemed to be winning. The general zeitgeist: Tests provide a poor metric and result in “teaching to the test” and other ills in schools.
But with the reform era starting in 1995 and picking up steam ever since, there has been scant debate about how well tests capture student performance.
Until now. The shouting match in Poughkeepsie indicates the question is back.
You can’t say the question is off base. In professional psychometrics, test developers shoulder the burden of demonstrating how well their tests work. No test is satisfactory just because the test developer says it is. As the ancient Romans put it, nullius in verba (take no one’s word for it).
Rather, demonstrating a test’s mettle requires psychometric validation research; you know, the stuff found in ultranerdy research-design textbooks: construct validity, internal consistency reliability, and such. These things appraise a test’s psychometric bona fides. Accordingly, professional assessments such as the GRE and SAT are subjected to continuous validation research.
Now, you’d think that the higher a test’s stakes, the more extensive the validation research would need to be. Would that you were right.
Rather, the tests with some of the highest stakes of all — pass-to-graduate high school exit exams such as the New York Regents — typically do not make validation results available to the public. Whatever validation data exist, state education officials and their contractors conceal from the public.
Clearly, officials fear that if the public knew how weak the tests were, there would be open revolt (worse than is already happening), and the educational reform train would be derailed. Reformers can’t have that, so they sit on the data. In fact, we do not even know how much validation research has been conducted. Could it be that tests with huge impact on students’ futures rest on inadequate-to-nonexistent validation research? We’ll never know as long as the lid stays on.
Paradoxically, validation data are withheld from the very constituency that paid for developing the tests in the first place: taxpayers. Yes, that’s right: You get to pay for developing, administering, and scoring tests, but you don’t get to see how your investment is faring.
Meanwhile, reformers often push for test results — i.e., students’ scores — to be made public, so parents can see how well schools and teachers are performing, and presumably be motivated to join the chorus calling for reform. Nice, huh? Reformers won’t let the public see how well the tests work, but they will allow the public access to students’ scores.
So how about a deal: State officials can publish test data for every district, school, and teacher in the state, but only if the test validation data are also made public.
Reformers will never take the deal, not in a million years. That says it all about their lack of faith in the psychometric utility of the tests they so prize. They probably didn’t mention that in Poughkeepsie.