Gerrymandering is clawing across courtrooms and headlines nationwide. The U.S. Supreme Court recently heard cases on the constitutionality of voting districts that allegedly entrenched a strong advantage for Republicans in Wisconsin and Democrats in Maryland but dodged direct rulings in both. Another partisan gerrymandering case from North Carolina is winding its way up with a boost from an emphatic lower court opinion in August. But so far it has been impossible to satisfy the justices with a legal framework for partisan gerrymandering. Part of the problem, as former justice Anthony Kennedy noted in a 2004 case, is that courts high and low have yet to settle on a “workable standard” for identifying a partisan gerrymander in the first place. That is where a growing number of mathematicians around the country think we can help.

Two years ago, with a few friends, I founded a working group to study the applications of geometry and computing to redistricting in the U.S. Since then, the Metric Geometry and Gerrymandering Group has expanded its scope and mission, becoming deeply engaged in research, outreach, training and consulting. More than 1,200 people have attended our workshops around the country, and many of them have become intensely involved in redistricting projects. We think the time is right to make a computational intervention. The mathematics of gerrymandering is surprisingly rich—enough to launch its own subfield—and computing power is arguably just catching up with the scale and complexity of the redistricting problem. Despite our group's technical orientation, our central goal is to reinforce and protect civil rights, and we are working closely with lawyers, political scientists, geographers and community groups to build tools and ideas in advance of the next U.S. Census and the round of redistricting to follow it.

In a country that vests power in elected representatives, there will always be skirmishes for control of the electoral process. And in a system such as that of our House of Representatives—where winner takes all within each geographical district—the delineation of voting districts is a natural battleground. American history is chock-full of egregious line-drawing schemes, from stuffing a district with an incumbent's loyalists to slicing a long-standing district three ways to suppress the political power of black voters. Many varieties of these so-called packing and cracking strategies continue today, and in the big data moment, they have grown enormously more sophisticated. Now more than ever, abusive redistricting is stubbornly difficult to even identify definitively. People think they know gerrymandering by two hallmarks—bizarre shapes and disproportionate electoral outcomes—yet neither one is reliable. So how do we determine when the scales are unfairly tipped?

## The Eyeball Test

The 1812 episode that gave us the word “gerrymander” sprang from the intuition that oddly shaped districts betray an illegitimate agenda. It is named for Elbridge Gerry, who was governor of Massachusetts at the time. Gerry had quite a Founding Father pedigree—signer of the Declaration of Independence, major player at the U.S. Constitutional Convention, member of Congress, James Madison's vice president—so it is amusing to consider that his enduring fame comes from nefarious redistricting. “Gerry-mander,” or Gerry's salamander, was the satirical name given to a curvy district in Boston's North Shore that was thought to favor the governor's Democratic-Republican party over the rival Federalists. A woodcut political cartoon ran in the *Salem Gazette* in 1813; in it, wings, claws and fangs were suggestively added to the district's contours to heighten its appearance of reptilian contortions.

So the idea that erratic districts tip us off to wrongdoing goes a long way back, and the converse notion that close-knit districts promote democratic ideals is as old as the republic. In 1787 Madison wrote in *The Federalist Papers* that “the natural limit of a democracy is that distance from the central point which will just permit the most remote citizens to assemble as often as their public functions demand.” In other words, districts should be transitable. In 1901 a federal apportionment act marked the first appearance in U.S. law of the vague desideratum that districts should be composed of “compact territory.” The word “compact” then proliferated throughout the legal landscape of redistricting but almost always without a definition.

For instance, at a 2017 meeting of the National Conference of State Legislatures, I learned that after the last Census, Utah's lawmakers took the commendable time and effort to set up a Web site, Redistrict Utah, to solicit proposed districting maps from everyday citizens. To be considered, maps were required to be “reasonably compact.” I jumped at the opportunity to find out how exactly that quality was being tested and enforced, only to learn that it was handled by just tossing the funny-looking maps. If that sounds bad, Utah is far from alone. Thirty-seven states have some kind of shape regulation on the books, and in almost every case, the eyeball test is king.

The problem is that the outline of a district tells a very partial and often misleading story. On one hand there can certainly be benign reasons for ugly shapes. Physical geography or reasonable attempts to follow county lines or unite communities of interest can influence a boundary, although just as often, legitimate priorities such as these are merely scapegoated in an attempt to defend the worst-offending districts. On the other hand districts that are plump, squat and symmetrical offer no meaningful seal of quality. Just this year a congressional redistricting plan in Pennsylvania drafted by Republicans in the state legislature achieved strong compactness scores under all five formulas specified by Pennsylvania's supreme court. Yet mathematical analysis revealed that the plan would nonetheless lock in the same extreme partisan skew as the contorted plan, enacted in 2011, that it was meant to replace. So the justices opted for the extraordinary measure of adopting an independent outsider's plan.

## Lopsided Outcomes

If shape is not a reliable indicator of gerrymandering, what about studying the extent to which elected representatives match the voting patterns of the electorate? Surely lopsided outcomes provide prima facie evidence of abuse. But not so fast. Take Republicans in my home state of Massachusetts. In the 13 federal elections for president and Senate since 2000, GOP candidates have averaged more than one third of the votes statewide. That is six times the level needed to win a seat in one of Massachusetts's nine congressional districts because a candidate in a two-way race needs a simple majority to win. Yet no Republican has won a seat in the House since 1994.

We must be looking at a gerrymander that denies Republicans their rightful opportunity districts, right? Except the mathematics here is completely exonerating. Let us look at a statewide race so that we can put uncontested seats and other confounding variables to the side. Take Kenneth Chase, the Republican challenger to Ted Kennedy for the U.S. Senate in 2006, who cracked 30 percent of the statewide vote. Proportionally, you would expect Chase to beat Kennedy in nearly three out of nine congressional districts. But the numbers do not shake out. As it turns out, it is mathematically impossible to select a single district-sized grouping of towns or precincts, even scattered around the state, that preferred Chase. His voters simply were not clustered enough. Instead most precincts went for Chase at levels close to the state average, so there were too few Chase-favoring building blocks to go around.

Any voting minority needs a certain level of nonuniformity in how its votes are distributed for our districting system to offer even a theoretical opportunity to secure representation. And the type of analysis applied to the Chase-Kennedy race does not even consider spatial factors, such as the standard requirement that each district be one connected piece. One may rightfully wonder how we can ever hold district architects accountable when the landscape of possibilities can hold so many surprises.

## Random Walks to the Rescue

The only reasonable way to assess the fairness of a districting plan is to compare it with other valid plans for cutting up the same jurisdiction because you must control for aspects of electoral outcomes that were forced by the state's laws, demographics and geography. The catch is that studying the universe of possible plans becomes an intractably big problem.

Think of a simple four-by-four grid and suppose you want to divide it into four contiguous districts of equal size, with four squares each. If we imagine the grid as part of a chessboard, and we interpret contiguity to mean that a rook should be able to visit the entire district, then there are exactly 117 ways to do it. If corner adjacency is permitted—so-called queen contiguity—then there are 2,620 ways. And they are not so straightforward to count. As my colleague Jim Propp, a professor at the University of Massachusetts Lowell and a leader in the field of combinatorial enumeration, puts it, “In one dimension, you can split paths along the way to divide and conquer, but in two dimensions, suddenly there are many, many ways to get from point A to point B.”

The issue is that the best counting techniques often rely on recursion—that is, solving a problem using a similar problem that is a step smaller—but two-dimensional spatial counting problems just do not recurse well without some extra structure. So complete enumerations must rely on brute force. Whereas a cleverly programmed laptop can classify partitions of small grids nearly instantly, we see huge jumps in complexity as the grid size grows, and the task quickly zooms out of reach. By the time you get to a grid of nine-by-nine, there are more than 700 trillion solutions for equinumerous rook partitions, and even a high-performance computer needs a week to count them all. This seems like a hopeless state of affairs. We are trying to assess one way of cutting up a state without any ability to enumerate—let alone meaningfully compare it against—the universe of alternatives. This situation sounds like groping around in a dark, infinite wilderness.

The good news is that there is an industry standard used across scientific domains for just such a colossal task: Markov chain Monte Carlo (MCMC). Markov chains are random walks in which where you go next is governed by probability, depending only on where you are now (at every position, you roll the dice to choose a neighboring space to move to). Monte Carlo methods are just estimation by random sampling. Put them together, and you get a powerful tool for searching vast spaces of possibilities. MCMC has been successfully used to decode prison messages, probe the properties and phase transitions of liquids, find provably accurate fast approximations for hard computational problems, and much more. A 2009 survey by the eminent statistician Persi Diaconis estimated that MCMC drives 10 to 15 percent of the statistical work in science, engineering and business, and the number has probably only gone up since then. Although computational analysis in redistricting goes back several decades, serious attempts to apply MCMC in that effort only started to appear publicly around 2014.

Imagine that officials in the state of Gridlandia hire you to decide if their legislature's districting plan is reasonable. If Gridlandia is a four-by-four grid of squares, and its state constitution calls for rook-contiguous districts, then you are in luck: there are exactly 117 ways to produce a compliant plan, and you can examine them all. You can set up a perfectly faithful model of this universe of districting plans by using 117 nodes to represent the valid plans and adding edges between the nodes to represent simple moves in which two squares in the grid swap their district assignments. The edges give you a way of conceptualizing how similar two plans are by simply counting the number of swaps needed to transform one to the other. (I call this structure a “metagraph” because it is a graph of ways to cut up another graph.) Now suppose that the state legislature is controlled by the Diamond party, and its rivals suspect that it has rigged the seats in its favor. To determine if that is true, one may turn to the election data. If the Diamond plan would have produced more seats for the party in the last election than, say, 114 out of 117 alternatives and if the same is true for several previous elections, the plan is clearly a statistical outlier. This is persuasive evidence of a partisan gerrymander—and you do not need MCMC for such an analysis.

The MCMC method kicks in when you have a full-sized problem in place of this small toy problem. As soon as you get past 100 or so nodes, there is a similar metagraph, but you cannot completely build it because of its forbidding complexity. That is no deal breaker, though. From any single plan, it is still easy to build out the local neighborhood by performing all possible moves. Now you can take a million, billion or trillion steps and see what you find. There is mathematics in the background (ergodic theory, to be precise) guaranteeing that if you random-walk for long enough, the ensemble of maps you collect will have properties representative of the overall universe, typically long before you have visited even a modest fraction of nodes in your state space. This lets you determine if the map you are evaluating is an extreme outlier according to various partisan metrics.

The cutting edge of scientific inquiry is to build more powerful algorithms and, at the same time, to devise new theorems that certify that we are sampling well enough to draw robust conclusions. There is an emerging scientific consensus around this method but also many directions of ongoing research.

## R.I.P. Governor Gerry

So far courts seem to be smiling on this approach. Two mathematicians—Duke University's Jonathan Mattingly and Carnegie Mellon University's Wes Pegden—have recently testified about MCMC approaches for the federal case in North Carolina and the state-level case in Pennsylvania, respectively.

Mattingly used MCMC to characterize the reasonable range one might observe for various metrics, such as seats won, across ensembles of districting plans. His random walk was weighted to favor plans that were deemed closer to ideal, along the lines of North Carolina state law. Using his ensembles, he argued that the enacted plan was an extreme partisan outlier. Pegden used a different kind of test, appealing to a rigorous theorem that quantifies how unlikely it is that a neutral plan would score much worse than other plans visited by a random walk. His method produces *p*-values, which constrain how improbable it is to find such anomalous bias by chance. Judges found both arguments credible and cited them favorably in their respective decisions.

For my part, Pennsylvania governor Tom Wolf brought me on earlier this year as a consulting expert for the state's scramble to draw new district lines following its supreme court's decision to strike down the 2011 Republican plan. My contribution was to use the MCMC framework to evaluate new plans as they were proposed, harnessing the power of statistical outliers while adding new ways to take into account more of the varied districting principles in play, from compactness to county splits to community structure. My analysis agreed with Pegden's in flagging the 2011 plan as an extreme partisan outlier—and I found the new plan floated by the legislature to be just as extreme, in a way that was not explained away by its improved appearances.

As the 2020 Census approaches, the nation is bracing for another wild round of redistricting, with the promise of litigation to follow. I hope the next steps will play out not just in the courtrooms but also in reform measures that require a big ensemble of maps made with open-source tools to be examined before any plan gets signed into law. In that way, the legislatures preserve their traditional prerogatives to commission and approve district boundaries, but they have to produce some guarantees that they are not putting too meaty a thumb on the scale.

Computing will never make tough redistricting decisions for us and cannot produce an optimally fair plan. But it can certify that a plan behaves as though selected just from the stated rules. That alone can rein in the worst abuses and start to restore trust in the system.

**Editor’s Note (10/23/18): Due to changes in the data, some of the numbers in the center and right columns of the table “How to Compare Countless Districting Plans” were adjusted after posting from those in the version that appears in November's print edition of *Scientific American*.*