Wednesday, September 30, 2009

Live By The Sword, Die By The Sword?

The problem with Jay Mathews' defense ("Measuring Progress At Shaw With More Than Numbers") of a Washington, DC school principal who did not demonstrate student learning gains at his school after one year is that the principal operates within an accountability system that demands such a result. In this case, both Mathews -- and DC Schools Chancellor Michelle Rhee, as described in Mathews' WP column -- are right not to have lowered the boom on Brian Betts, principal of the DC's Shaw Middle School at Garnet-Patterson, based on a single year's worth of test scores.
The state superintendent of education's Web site says Shaw dropped from 38.6 to 30.5 in the percentage of students scoring at least proficient in reading, and from 32.7 to 29.2 in math.

But those were not the numbers Rhee read to Betts over the phone.

Only 17 percent of Shaw's 2009 students had attended the school in 2008, distorting the official test score comparisons. Rhee instead recited the 2008 and 2009 scores of the 44 students who had been there both years. It didn't help much.

The students' decline in reading was somewhat smaller; it went from 34.5 to 29.7. Their math proficiency increased a bit, from 26.2 to 29.5. But Shaw is still short of the 30 percent mark, far below where Rhee and Betts want to be....

Despite the sniping at Rhee, the best teachers I know think that what happened at Shaw is a standard part of the upgrading process. I have watched Betts, his staff, students and parents for a year. The improvement of poor-performing schools has been the focus of my reporting for nearly three decades. The Shaw people are doing nearly everything that the most successful school turnaround artists have done.

They have raised expectations for students. They have recruited energetic teachers who believe in the potential of impoverished students. They have organized themselves into a team that compares notes on youngsters. They regularly review what has been learned, what some critics dismiss as "teaching to the test." They consider it an important part of their jobs.

That's how it's done, usually with a strong and engaging principal like Betts.

Mathews' take -- including consideration of contextual factors, such as the fact that only 17% of the school's students had attended the prior year and the contention that school turnaround requires more than a single year -- is how the education world should work. Embrace the complexity of learning and trying to measure it! To do so would disallow the use of single-year changes in test scores for making high-stakes decisions about schools and individual school personnel. It would also remove the unrealistic pressure on school turnarounds to bear fruit in a single year. Test scores would be used responsibly in combination with other data and evidence to paint a fuller picture about individual school contexts and inform judgments about school leadership and student success.

But Michelle Rhee and other education reform advocates have publicly argued that student performance as measured by test scores is basically the be all and end all. According to this Washington Post story ("Testing Tactics Helped Fuel D.C. School Gains"), Rhee supports strengthening No Child Left Behind to "emphasize year-to-year academic growth." Such a stance creates a problem for such reformers when they are leading a district and staking their leadership on uncomplicated test score gains. Others will assess their leadership and judge their success by this measure -- an ill-advised one in its simplest form.

I would argue that, in addition to doing the right thing (as happened in this instance), reform advocates and school leaders like Rhee also have a responsibility to say and advocate for the right thing. They have a responsibility to be honest about the complexity of student learning and the inability of student assessments to accurate capture all of the nuance going on within schools and classrooms. While the reformers' challenge of the adult-focused policies of the educational status quo is often warranted, some reforms -- accountability, chief among them -- have been taken too far. Student learning, school leadership and teaching cannot be measured and judged good or bad based on a single set of test scores. Test scores must be part of the consideration -- and supporting systems such as accountability, compensation and evaluation must be informed by such data -- but they should not single-handedly define success or failure.

The complexity as presented by Mathews in his article -- and, more importantly, by existing research (such as by Robert Linn, Aaron Pallas, Tim Sass, and embedded within Sunny Ladd's RttT comments) about year-to-year comparisons of both overall test scores and test score gains -- strongly suggests that educational accountability systems should be designed more thoughtfully than they have been to date, but unfortunately that does not seem to be the direction that policymaking is headed at either the federal or state levels. Part of being more thoughtful is moving away from NCLB-style adequate yearly progress and toward a value-added approach, but thoughtfulness also requires not making high-stakes decisions based exclusively on volatile student data. Do I hear "multiple measures"? Sure, but Sherman Dorn offers some provocative thoughts on this subject in a 2007 blog post.

With regard to educational accountability, policymakers first should do their homework -- and then they clearly have more work to do in creating a better system and undoing parts of the existing system that aren't evidence-based and accomplish only in simplifying a truly complex art: learning.

-------------------

For those of you that have gotten this far, there's a related post on the New America Foundation's Ed Money Watch blog discussing a new GAO report that analyzes state spending on student assessment tests -- $640 million in 2007-08.
The increasing cost of developing and scoring assessments has also led many states to implement simpler and more cost-effective multiple choice tests instead of open response tests. In fact, although five states have changed their assessments to include more open response items in both reading and math since 2002, 11 and 13 states have removed open items from their reading and math tests, respectively over the same time period.... This reliance on multiple choice tests has forced states to limit the content and complexity of what they test. In fact, some states develop academic standards for testing separately from standards for instruction, which are often un-testable in a multiple choice system. As a result, state NCLB assessments tend to test and measure memorization of facts and basic skills rather than complex cognitive abilities.
------------

And here's a new story hot off the presses from Education Week. It discusses serious questions raised about New York City's school grading system.

Eighty-four percent of the city’s 1,058 public elementary and middle schools received an A on the city’s report cards this year, compared with 38 percent in 2008, while 13 percent received a B, city officials announced this month.

“It tells us virtually nothing about the actual performance of schools,” Aaron M. Pallas, a professor of sociology and education at Teachers College, Columbia University, said of the city’s grades.

Diane Ravitch, an education historian at New York University, was even sharper: She declared the school grades “bogus” in a Sept. 9 opinion piece for the Daily News of New York, saying the city’s report card system “makes a mockery of accountability.”

But Andrew J. Jacob, a spokesman for the New York City Department of Education, defended the ratings, even as he said the district’s demands on schools would continue to rise next year....

The city employs a complex methodology to devise its overall letter grades, with the primary driver being results from statewide assessments in reading and mathematics, which have also encountered considerable skepticism lately.

The city’s grades are based on three categories: student progress on state tests from one year to the next, which accounts for 60 percent; student performance for the most recent school year, which accounts for 25 percent; and school environment, which makes up 15 percent.

Mr. Pallas of Teachers College argues that one key flaw with the city’s rating system is that it depends heavily on a what he deems a “wholly unreliable” measure of student growth on test scores from year to year that fails to account adequately for statistical error.