In the weeks leading up to the November 2016 election, polls across the country predicted an easy sweep for Democratic nominee Hillary Clinton. From Vanuatu to Timbuktu, everyone knows what happened. Media outlets and pollsters took the heat for failing to project a victory for Donald Trump. The polls were ultimately right about the popular vote. But they missed the mark in key swing states that tilted the Electoral College toward Trump.
This time, prognosticators made assurances that such mistakes were so 2016. But as votes were tabulated on November 3, nervous viewers and pollsters began to experience a sense of déjà vu. Once again, more ballots were ticking toward President Trump than the polls had projected. Though the voter surveys ultimately pointed in the wrong direction for only two states—North Carolina and Florida, both of which had signaled a win for Joe Biden—they incorrectly gauged just how much of the overall vote would go to Trump in both red and blue states. In states where polls had favored Biden, the vote margin went to Trump by a median of 2.6 additional percentage points. And in Republican states, Trump did even better than the polls had indicated—by a whopping 6.4 points.
Four years ago, Sam Wang, a neuroscience professor at Princeton University and co-founder of the blog Princeton Election Consortium, which analyzes election polling, called the race for Clinton. He was so confident that he made a bet to eat an insect if Trump won more than 240 electoral votes—and ended up downing a cricket live on CNN. Wang is coy about any plans for arthropod consumption in 2020, but his predictions were again optimistic: he pegged Biden at 342 electoral votes and projected that the Democrats would have 53 Senate seats and a 4.6 percent gain in the House of Representatives.
Scientific American recently spoke with Wang about what may have gone wrong with the polls this time around—and what bugs remain to be sorted out.
[An edited transcript of the interview follows.]
How did the polling errors for the 2020 election compare with those we saw in the 2016 contest?
Broadly, there was a polling error of about 2.5 percentage points across the board in close states and blue states for the presidential race. This was similar in size to the polling error in 2016, but it mattered less this time because the race wasn’t as close.
The main thing that has changed since 2016 is not the polling but the political situation. I would say that worrying about polling is, in some sense, worrying about the 2016 problem. And the 2020 problem is ensuring there is a full and fair count and ensuring a smooth transition.
Still, there were significant errors. What may have driven some of those discrepancies?
The big polling errors in red states are the easiest to explain because there’s a precedent: in states that are historically not very close for the presidency, the winning candidate usually overperforms. It’s long been known turnout is lower in states that aren’t competitive for the presidency because of our weird Electoral College mechanism. That effect—the winner’s bonus—might be enhanced in very red states by the pandemic. If you’re in a very red state, and you’re a Democratic voter who knows your vote doesn’t affect the outcome of the presidential race, you might be slightly less motivated to turn out during a pandemic.
That’s one kind of polling error that I don’t think we need to be concerned about. But the error we probably should be concerned about is this 2.5-percentage-point error in close states. That error happened in swing states but also in Democratic-trending states. For people who watch politics closely, the expectation was that we had a couple of roads we could have gone down [on election night]. Some states count and report votes on election night, and other states take days to report. The polls beforehand pointed toward the possibility of North Carolina and Florida coming out for Biden. That would have effectively ended the presidential race right there. But the races were close enough that there was also the possibility that things would continue. In the end, that’s what happened: we were watching more counting happen in Pennsylvania, Michigan, Wisconsin, Arizona and Nevada.
How did polling on the presidential race compare with the errors we saw with Senate races this year?
The Senate errors were a bigger deal. There were seven Senate races where the polling showed the races within three points in either direction. Roughly speaking, that meant a range of outcomes for between 49 and 56 Democratic seats. A small polling miss had a pretty consequential outcome because every percentage point missed would lead to, on average, another Senate seat going one way or the other. Missing a few points in the presidential race was not a big deal this year, but missing by a few points in Senate races mattered.
What would more accurate polling have meant for the Senate races?
The real reason polling matters is to help people determine where to put their energy. If we had a more accurate view of where the races were going to end up, it would have suggested political activists put more energy into the Georgia and North Carolina Senate races.
And it’s a weird error that the Senate polls were off by more than the presidential polls. One possible explanation would be that voters were paying less attention to Senate races than presidential races and therefore were unaware of their own preference. Very few Americans lack awareness of whether they prefer Trump or Biden. But maybe more people would be unaware of their own mental processes for say, [Republican incumbent] Thom Tillis versus [Democratic challenger] Cal Cunningham [in North Carolina’s Senate race]. Because American politics have been extremely polarized for the past 25 years, people tend to [end up] voting [a] straight ticket for their own party.
Considering that most of the polls overestimated Biden’s lead, is it possible pollsters were simply not adequately reaching Trump supporters by phone?
David Shor, a data analyst [who was formerly head of political data science at the company Civis Analytics], recently pointed out the possibility that people who respond to polls are not a representative sample. They're pretty weird in the sense that they’re willing to pick up the phone and stay on the phone with a pollster. He gave evidence that people are more likely to pick up the phone if they’re Democrats, more likely to pick up under the conditions of a pandemic and more likely to pick up the phone if they score high in the domain of social trust. It’s fascinating. The idea is that poll respondents score higher on social trust than the general population, and because of that, they’re not a representative sample of the population. That could be skewing the results.
This is also related to the idea that states with more QAnon followers experienced more inaccurate polling. The QAnon belief system is certainly correlated with lower social trust. And those might be people who are simply not going to pick up the phone. If you believe in a monstrous conspiracy of sex abuse involving one of the major political parties of the U.S., then you might be paranoid. One could not rule out the possibility that paranoid people would also be disinclined to answer opinion polls.
In Florida’s Miami-Dade County, we saw a surprising surge of Hispanic voters turning out for Trump. How might the polls have failed to take into account members of that demographic backing Trump?
Pollsters know Hispanic voters to be a difficult-to-reach demographic. In addition, Hispanics are also not a monolithic population. If you look at some of the exit polling, it looks like Hispanics were more favorable to Trump than they were to Clinton four years ago. It’s certainly possible Hispanic support was missed by pollsters this time around.
Given that the presidential polls have been off for the past two elections, how much attention should people pay to polls?
I think polling is critically important because it is a way by which we can measure public sentiment more rigorously than any other method. Polling plays a critical role in our society. One thing we shouldn’t do is convert polling data into probabilities. That obscures the fact that polls can be a few points off. And it’s better to leave the reported data in units of opinion [as a percentage favoring a candidate] rather than try to convert it to a probability.
It’s best not to force too much meaning out of a poll. If a race looks like it’s within three or four points in either direction, we should simply say it's a close race and not force the data to say something they can’t. I think pollsters will take this inaccuracy and try to do better. But at some level, we should stop expecting too much out of the polling data.