Goodhart’s Law Explains School Decay

Everywhere in education, you see incentives at work. The incentives, though, are so far removed from the actual goals of education that they produce perverse results. 

Goodhart’s Law is usually stated, “When a measure becomes a target, it ceases to be a good measure.” Economics textbooks often use the allegory of a maker of nails, who receives word that his measure of success at producing nails will be based on the number of nails made. He retools his factory, adjusts his resource use, and produces as many nails as possible, even though many are too thin, or small, or bent to use. When his higher-ups decide to measure productivity on the weight of nails, instead, he makes only a few very large nails, too heavy to be used. Once your incentives are aligned in service of a particular metric, in other words, that metric isn’t an objective measure anymore. 

In the context of education, Goodhart’s law predicts that when dry data, like test scores or school rankings, become the primary or proxy focus of education, the quality of actual learning suffers. Students rightly intuit that high test scores matter much more than learning and understanding the material. Teachers are motivated to focus on test-taking strategies (which have limited real-world value) and neglect important but difficult-to-quantify areas of knowledge. Grade inflation becomes commonplace since the grade matters more than mastery of the material. The evaluation of student learning becomes unmoored from the learning itself, with predictably problematic results. 

We observe this again and again in education. Bureaucrats and lawmakers attempt to set standards that look like “success” from afar, and the schooling system retools production in service of those measures. Because the actual outcomes we want from education are idealized and difficult to measure (such as a population with good civic skills and critical faculties) central planners choose one terrible measure after another, and warp the whole system’s priorities in the process.

Universities Hack the Rankings

Private, Boston-based Northeastern University made a key discovery in the 1990s: the most statistically important metric for a university’s long-term financial viability and success is its ranking in the U.S. News & World Report’s annual “Best College” rankings. At this third-tier commuter school struggling to keep the doors open, leadership decided to pour its efforts into raising its U.S. News rank. Northeastern would crack the top 100 schools by reverse engineering the statistical criteria that fed the rankings. 

Northeastern capped many classes at 19 students, because the formula rewards classes under 20 students. The institution adopted an easy online application system, and recruited heavily, because more students applying (at $75 each) meant a lower ratio were accepted, contributing to rankings for selectiveness. Northeastern also started enticing lower-credentialed high schoolers to spend their first semester abroad, which excluded them from the GPA calculations for the incoming class. Whether the value of education or the institution’s quality ever meaningfully improved is dubious, and to some extent, beside the point. Northeastern’s ranking rose from 162 to 49 in just 17 years. Tuition prices nearly tripled.

Many other schools have admitted to gaming the formula at U.S. News, to the detriment of student experiences. Others, including George Washington University and Emory University, admit to simply cheating, lying about, or exaggerating the statistics U.S. News surveys collect. 

Another familiar example of the demand for data supplanting educational goals is the official college graduation rate. The graduation rate for “four year” degrees, as reflected in government statistics, is just 33.3 percent. Unwilling to publicize that discomfiting number, record keepers began tracking the graduation rate after six years instead, roughly doubling the rate to 64 percent. To a family planning for their child’s future, that distinction is likely to be crucial, even as it is actively obscured by “transparency” data. 

K-12 Schools Lose the Thread

In K-12 education, schools are much less beholden to any private ranking system, but in fact, more obligated to produce positive data (if not necessarily actual results to support it) thanks to federal oversight. 

Standardized testing, interim assessments, attendance records, graduation rates: the streams of data that teachers and administrators compile should give us insight into the kind of education kids are getting in each school. But do they?

Federal No Child Left Behind and later Race to the Top legislation demanded “robust data” to measure student success, teacher impact, and institutional effectiveness. Quantifiable testing became more important than student wellbeing. Target goals were reduced to lines on a stat sheet. Teachers became tools, teaching to the test and ignoring the rest. Music lessons, physical education, recess, art classes, and other less-structured pursuits atrophied, because administrators didn’t see the clear value of creativity and play to boosting test scores (though evidence suggests they do). 

Our obsession with data has taken over any semblance of education during the school year. Administrators mandate, and teachers attend, testing-focus training during development days. Testing days have become testing weeks. The utility of data pales in comparison to the costs of collecting it. The price of relying on it may be higher still.  Making the collection broader, more frequent, more granular, serves only to distract from the true purposes of education. Teachers, administrators, and students shift to meet the needs of the data-gatherers, rather than doing what they do (learn, teach) and allowing it to be passively measured.

Under such misguided incentives, the desire to fudge the numbers (perhaps to make them more representative of on-the-ground experience) is very strong. Teachers cheat and encourage students to cheat, memorably epitomized by the Atlanta Public Schools “cheating scandal” in which 35 educators were indicted for changing a total of a quarter-million test answers from wrong to right. While they did the wrong thing, those teachers were operating under incentive structures that made Scantron score-sheets the measure of how much funding and local control teachers could have to serve kids. Faced with irrational measures and unbelievably high stakes, they gamed the system. Federal regulators had made data the master of outcomes rather than the servant.

Wherever and whenever we measure the intangible, we risk warping the aligned incentives. With something as deeply human and distinctly intangible as learning outcomes, it’s easy for the lure of concrete data to supersede what we know to be valuable and meaningful. By disentangling ourselves from measures and rankings, we can return to a focus on what education does for the human mind, not just the data sheets.