Standardized Testing is "Check" Not "Mate"

March 20,2012

By Mark, Selle, Ph.D., Superintendent

Written January 22nd, 2012

I’ve enjoyed chess for nearly as long as I can remember. When I was seven, my dad’s old navy friend visited our home in Spokane, much to my father’s delight. I recall the house vividly. It was the first house address I was ever required to memorize, so I can recite it to this day. They say one can best remember firsts and lasts. I think it’s true.

Perhaps that’s why I recall so vividly my first impressions of chess. My dad and his buddy sat at the dining room table for a game. They had played many such games on the U.S.S. Dixie, a destroyer tender, during Vietnam. Theirs was no quiet or cerebral game! It was filled with bravado, laughter, and intrigue.

I was small. The board was so high above me that I couldn’t even see the pieces well. But I felt the excitement and the adrenaline. It was a battle filled with unexpected twists, turns, checks, and captures! I loved it.

Over the next 10 years, my dad taught me to play. In high school, I could finally beat him. I played for fun and competitively. Now, I pass the game on to my children.

The point of my story lies in the educational process: specifically the role played by standardized testing. My family acquired ChessMaster 9000, a computer software program, some years ago. It includes tutorials and tests over material covered. The tests are, of course, “standardized.” That is, they are the same for everyone—they do not include interaction between the teacher and the student during the testing process. They do not, for example, include the kinds of admonitions my dad provided to me when he was the chess examiner and I the student.

Let me further illustrate the point with “castling.” After learning the normal moves of each chess piece, castling is the first special move a novice learns. Under certain conditions, a player is allowed to move the king two spaces to the left or right and place the rook on the square immediately on the opposite side of the king. A standardized test, like that built-in to ChessMaster 9000, can adequately test knowledge. It cannot, however, adequately examine what a student can do with that knowledge, or what that student understands.

For example, I observed my children getting the right answer when they should not have and getting no credit when they should have. On ChessMaster 9000’s castling test, my eight-year-old got several “right answers” he didn’t deserve. On tests of how the pieces move, my five-year-old didn’t get credit he did earn.

On one hand, a standardized test can count an answer as right even though a student’s reasoning is dead wrong. I discovered this sitting beside my son, seeing him get the “right answer,” and then I proudly interjected, “Do you know why?” only to discover that he didn’t. His understanding was completely incorrect. He had answered the question, “Can White castle long, short, or not at all?” correctly (the answer was “not at all”) but without understanding that the reason was that the king was in check (this is one of the conditions under which one may not castle). I observed the same sort of problem on several other questions.

On the other hand, later, while my five-year-old was energetically pacing through tests on how the pieces move, there was a glitch in the software. It caused him to have to exit and restart the program. He was so disappointed when he got back in because none of his right answers had been saved. He got the right answers, but as far as the system was concerned, he did not.

I’ve seen these kinds of problems in high-stakes tests in education too (see, for example, my 2007 article The WASL Has No Clothes ). Consequently, I began once more to reflect upon standardized testing on the national and state scene. It made me consider the importance of personal interaction between a teacher giving a test and the student taking it. This kind of interplay in the educational process is seen in a teacher’s written comments on an essay or the give and take during an oral examination.

The problem here is that the scope and meaning of standardized tests is limited. Yet public policy treats their results as profoundly meaningful. It says, for example, that a student can’t graduate unless he or she passes the tests. It also says that teachers aren’t doing their jobs and that principals should be replaced if scores aren’t high enough. This kind of freight is simply too heavy for such limited assessments to carry.

California Governor, Jerry Brown (D), seems to understand these limits. The Los Angeles Times reports, “In his State of the State address, Brown calls for limits on standardized tests and wants reduced roles for the U.S. and state in local schools” (see “Brown sharply differs from Obama on education policy”). It’s time for President Obama and Governor Gregoire to listen to a veteran of their own party, who reflects the growing sentiments of both democrats and republicans, both teachers and parents, across the country.

True measurement of learning can be accomplished only in the way my dad did it: through the subtlety and nuance of interaction with a good teacher who detects student growth in the give and take of teaching and learning. I now try to emulate his example both personally in my parenting, and professionally in my educational leadership. High-stakes standardized testing as public policy is simply wrong-headed.



The WASL Has No Clothes: A Fairy Tale

By Mark Selle, Ph.D.

Sunday, March 18th, 2007—Revised March 19th, 2010

The Emperor’s New Clothes is the classic 1837 fairy tale by Hans Christian Andersen in which many of us first encountered the idea of social fear. By that phrase, I mean that we are all fearful of moving against the current of the mainstream. An excellent high school production of Arthur Miller’s The Crucible, shows a contrary theme: standing for truth, even unto death; the hero who did was hanged. A similar but more positive theme also arises from my interest in the Lewis and Clark expedition, which reveals the value of going upstream. This is exactly what the intrepid Corps of Discovery did for 2,565 miles as they sought the headwaters of the Missouri River on their quest to find the best route to the mouth of the Columbia (and the fabled Northwest Passage).

I began to see through the reliability problems with the WASL in 2001, but unlike the innocent child who proclaimed publicly, “But he has nothing on!” I acted more like many of the townspeople who knew but feared to openly contradict the political power. I shared my thoughts only among my closest friends and family—never publicly—never professionally.

I first became aware of the problem when, in a large meeting of my peers, one of my colleagues openly questioned the WASL by citing a study that demonstrated problems with its “inter-rater reliability.” This reliability problem isn’t technical at all. It simply means that there will be a difference in the consistency of how different evaluators will score any given test. Anyone who learned how the WASL free response questions were actually scored naturally raised this question. My colleague pointed out that the margin of error was too large. I later asked him for the study he mentioned. I read it. He was right.

My colleague pointed out that a student who “objectively” passed this high-stakes assessment might actually be scored as failing and that this could happen within the acceptable margin for scoring error (inter-rater reliability).

Enter my daughter: the heroine of my story is a young girl, just as the hero in The Emperor’s New Clothes was a little child. I am a proud father, a pride that burst when my daughter became the only one I had ever personally met who scored a perfect 800 on the verbal portion of the SAT during her junior year of high school. She also took the SAT as a 7th grader participating in the Johns Hopkins talent search, and she was honored in a ceremony held at Eastern Washington University for her status as a top verbal scholar in the state based on her performance on this reading, reasoning, and vocabulary test. Months after that honor, she took the reading WASL—she failed.

At least the emperor said she did; however, he also said she passed every other section except the reading, the one area in which she obviously possessed her greatest gifts. Her high school would not routinely admit students to the honors English program unless they passed the reading and writing WASL. We successfully appealed, and in her senior year she was given the school’s top honor for her outstanding achievement in English. She then attended a rigorous private university, the cost of which was offset by her nearly $58,000 in scholarships.

I am a proud and biased father, but my daughter’s record speaks for itself. Yes, she did pass the WASL with flying colors in high school, but that does not negate her 7th grade WASL results that demonstrated the unreliability of that assessment and the problem with all such high stakes tests to this day. The issue is not that she passed it when it really counted; the issue is that many fail when they should have passed! And there is no way to predict who that will happen to.

Another problem is that politicians in power misuse and distort high stakes test data for their own ends. In regard to the WASL, a glaring example is the hidden change in how it was scored between 1999 and 2004. In 1999, a student had to score at the 73rd percentile to pass the WASL. In 2004, when the score was nearing use as a graduation requirement, that number plummeted to the 25th percentile! As a leader in the public school system, I should have known that these changes were taking place. I should have been informed. I wasn’t. I found the data on my own when researching the MAP assessment on the NWEA website.

When my son entered high school, we decided to go ahead and have him take the high school WASL because the emperor required it for a diploma. My wife and I consciously decided to ask our son to jump through this hoop that had no value for his future. But the fairy tale fact remains. The emperor has no clothes and high stakes testing doesn’t either.

-Mark Selle