June 11, 2014

World Cup Prediction Mathematics Explained

The World Cup is back, and everyone's got a pick for the winner. Gamblers have been predicting the outcome of sporting contests since the first foot race across the savannah, but in recent years a unique type of statistical analysis has taken over the prediction business.

By Michael Moyer

Join Our Community of Science Lovers!

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American

The World Cup is back, and everyone’s got a pick for the winner. Gamblers have been predicting the outcome of sporting contests since the first foot race across the savannah, but in recent years a unique type of statistical analysis has taken over the prediction business. Everyone from Goldman Sachs to Bloomberg to Nate Silver’s FiveThirtyEight has an online World Cup predictor that uses numbers, not hunches, to generate precise probabilities for match outcomes. Goldman Sachs, for instance, gives host nation Brazil a 48.5 percent chance of winning it all; FiveThirtyEight puts the odds at 45 percent while Bloomberg Sports has concluded there’s just a 19.9 percent chance of a triumph for the Seleção.

Where do these numbers come from? All statistical analysis must start with data, and these soccer prediction engines skim results from former matches. A fair bit of judgment is necessary here. Big international soccer tournaments only come around every so often, so the analysts have to choose how to weight team performance in lesser events such as international “friendlies,” where nothing of consequence is at stake. The modelers also have to decide how far back to pull data from—does Brazil’s proud soccer history matter much when its oldest player is 34?—and how to rate the performance of individual players during their time playing for club teams such as Manchester United or Real Madrid.

Wherever the data comes from, the modeler now has to incorporate it into a model. Frequently, the modeler translates the question of “who is going to win?” into the form “how many goals will team X score against team Y?” And for this, she relies [PDF] on a statistical tool called a bivariate Poisson regression.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

Those are three unfamiliar words. Let's unpack them one-by-one. “Bivariate” means there’s two inter-related variables for which we are trying to predict a single outcome—team’s X performance against team Y. “Regression” just means that we’re fitting a set of data to a model. “Poisson” is the interesting one.

Imagine that you’re standing by the side of the road and you want to know how many cars go by in a minute. First, you’d take some data. Armed with a stopwatch and a counter, you’d see that 15 go by one minute, 18 the next, just four the third minute. Do this for enough minutes and you’d begin to see a pattern build up, a Poisson distribution, named for the French mathematician who invented it in order to estimate the frequency of false convictions.

The number of goals in a game also tend to be distributed according to the Poisson distribution. A given team may be most likely to score one or two goals, sometimes zero or three, and much less frequently four or five (or more). Modelers will map the data from a team’s previous performance onto a Poisson distribution of the number of goals they are likely to score against their opponent.

And the gamblers? As of this writing the online sportsbook Betfair has Brazil as a 3-to-1 favorite, or 24.4 percent. If you believe the analysts at Goldman Sachs or FiveThirtyEight, who have Brazil at nearly a 50 percent favorite, a betting opportunity has opened up for you. Of course, presumably all those people betting on Brazil at 3-to-1 odds have also read the Goldman Sachs and FiveThirtyEight analysis.

The question becomes: What do they know that the statisticians don’t?

Image by Digo Souza on Flickr

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can't-miss newsletters, must-watch videos, challenging games, and the science world's best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Thank you,

David M. Ewalt, Editor in Chief, Scientific American