Making Sense of Global Comparisons in Education

January 19, 2016

Nearly 50 years ago, the U.S. first got a snapshot of how its students compare with their peers in other countries based on a standardized test. The news was sobering.

“Look towards the bottom of this list, and see the U.S. coming in 11th out of 12 [industrialized] countries” in math, said Tom Loveless of the Brookings Institution, pointing to a chart he presented last month at an Education Writers Association seminar in Washington, D.C. “Only Sweden scored below the U.S.”

Loveless offered this brief history lesson to help dispel misperceptions that may be perpetuated when a fresh crop of international data for dozens of countries comes out this December from two major assessments at the K-12 level.

“If you hear people say there was a time when the U.S. led the world on international tests, that is absolutely false,” said Loveless, a nonresident fellow at Brookings who has written extensively on international comparisons. “The real wake-up call came in 1967. So, 50 years later, we’re still getting mediocre scores.” (England, Germany, Israel, and Japan were among the nations that scored higher than the U.S. on that exam.)

The December 2016 data will come from PISA — which tests 15-year-olds in reading, math and science with a focus on the real-world application of knowledge

Loveless was joined on the panel by Marc Tucker, the president and CEO of the National Center on Education & the Economy, and a prominent advocate for studying global achievement data. Tucker echoed the point that U.S. test scores were disappointing even back in the 1960s.

“Tom is absolutely right,” he said. “It’s not that the quality of American education has fallen. … Our problem is that it has stayed absolutely flat,” even as some of the nation’s economic competitors have surged ahead.

Tucker provided a history lesson of his own, saying the U.S. was “the first country in the world to offer free elementary school public education to the masses. That was in the middle of the 1800s.”

Through the latter half of the 20th century, he said, the U.S. had the best-educated workforce as measured by high school attainment, but U.S. gains halted in the 1970s.

“Although we are now very proud of having a high school completion rate of somewhere around 80 percent, the top-performing countries have completion rates of about 95 percent,” he said. Moreover, he said the majority of U.S. students graduate high school with only 7th or 8th grade levels of literacy.

Tucker drew special attention to a recent analysis on the reading, mathematics, and problem-solving skills of so-called millenials in the American workforce, those ages 16 to 34 years old.

“Only Spain and Italy are lower than the United States on that survey,” he said. “We come in dead last on problem-solving.”

The Shanghai Miracle?

In his presentation, Loveless also spoke at length about Shanghai, which made international headlines in 2012 for its strong performance on PISA, the Program for International Student Assessment. Shanghai outscored all other nations and education systems on PISA that year, the first time it participated. But Loveless cast doubt on the reliability of the achievement data from Shanghai. (China is the only country that has been permitted by the OECD to allow such limited participation, rather than nationwide.)

PISA tests 15-year-olds every three years in reading, math, and science, with a focus on the real-world application of knowledge and skills. When the next PISA results are issued this December, several other systems in China will also be added for the first time, including the cities of Beijing, Jiangsu, and Guandong, as the OECD announced in 2014, and as was reported by the BBC. (Also in December, results will be issued from TIMSS, the Trends in International Mathematics & Science Study, which usually targets grades 4 and 8, but this year will also have results for advanced math and science for 12th graders.)

Loveless argues to beware of the PISA data from China, because the country’s longstanding Hukou system leads to the exclusion of many migrant children from public high schools in big cities like Shanghai, and thereby skews the test results. The Brookings scholar has written repeatedly on this topic.

“Migrants basically can’t attend high schools in the big cities,” he said. “There are some exceptions to that, but generally that is the case.” Because of this policy, Loveless argues that the PISA data is not a reliable barometer for achievement.

“The reason why I question Shanghai’s scores, and consider them questionable — in fact I think they’re uninterpretable — is because of the Hukou system in China,” he said.

Tucker — the author of a book on global comparisons titled “Surpassing Shanghai: An Agenda for American Education Built on the World’s Leading Systems” — did not directly respond to Loveless’s comments on China at the EWA event. But he has done so on previous occasions, including on his opinion blog, published by Education Week.

In that 2014 piece, he makes several points. Tucker said that while he is not an apologist for the Hukou system, he points to some evidence that China is making strides to better serve migrant families, including opening up Shanghai’s elementary and middle schools to such children. In addition, he said it’s not clear to him that the Hukou system is necessarily skewing Shanghai’s results. (For one, he suggests that many 15-year-olds in Shanghai are enrolled in middle schools.) In any case, he argues that there is much to be learned from educational improvement efforts in that city.

“We will find out next time we get PISA results whether Shanghai is by itself in China, or isn’t, whether there are many other provinces which are performing at comparable levels,” Tucker said. “It’s a very big deal, and you should keep your eyes on it.”

Of Comparisons and Causality

Meanwhile, a recent report from the Economic Policy Institute, a Washington think tank, raises questions about the value of global comparisons in education, arguing that state-to-state analyses are far more meaningful and useful. It contends, for instance, that the global data fail to sufficiently consider differences in poverty.

Loveless and Tucker both responded to such critiques. Loveless said all comparisons are imperfect, including those that compare states. “You have to be careful with comparisons, because context is everything,” he said, noting the difficulty of comparing a state like California, with a very large non-English-speaking population, to Montana or Vermont. “It’s a completely different context.”

Tucker directed his attention to the poverty issue. “The idea that the United States is alone with respect to poverty among its kids … is simply wrong,” he said. “If you look at the performance of the Asian countries that are now at the top, in almost every case you will find that over the last 30 or 40 years they have risen from levels of poverty that are hard to imagine in the United States, anywhere.”

Tucker urged people to pay attention to PISA tables that highlight the likelihood that low-income students will achieve at high levels. “The number of countries that are outperforming us on that is very large, which means we are doing a spectacularly bad job of taking kids from the bottom of the economic index and moving them up to high performance.”

When new PISA and TIMSS test data is issued this December, Loveless warned people to be careful when they hear claims that seek to connect those results to the particular policies and strategies of participating nations.

“These tests are not designed to tell us very much at all about causality,” Loveless said. “So when you read about people saying, ‘Here’s why PISA scores went up, or ‘Here’s why PISA scores went down,’ they are almost always wrong. They are guessing. They are merely speculating.”

He added, “Now you can do very sophisticated analysis and tease out some hypotheses about causality, but they’re not good for that.”

But Tucker argues that the reports are more valuable than Loveless suggests.

“I have a somewhat different view of this. PISA has an enormous amount of what is often called background data, which is then correlated by many people, OECD and others, with … student performance,” he said. “There is an enormous amount that you can learn from this data.”

As an example, Tucker pointed to the question of whether higher spending on education will improve academic results.

“The data put the lie to that,” he argued. “There’s hardly any correlation at all between how much a country spends and the results that it gets once you get beyond a certain threshold.”

Tucker did offer some caveats, though.

“You have to be careful about attributing causality in some, what shall I say, definitive way,” he said. “But from our point of view, you can learn an enormous amount from the data, which is helpful in informing hypotheses which can then be tested by different kinds of research.”

Author

Erik Robelen

Deputy Director

Making Sense of Global Comparisons in Education

Thank you for your submission.

Thank you for your message