Public health efforts depend heavily on predicting how diseases such as that caused by the 2019 novel coronavirus, now named COVID-19 by the World Health Organization, spread across the globe. During the early days of a new outbreak, when reliable data are still scarce, researchers turn to mathematical models that can predict where people who could be infected are going and how likely they are to bring the disease with them. These computational methods use known statistical equations that calculate the probability of individuals transmitting the illness.

Modern computational power allows these models to quickly incorporate multiple inputs, such as a given disease’s ability to pass from person to person and the movement patterns of potentially infected people traveling by air and land. This process sometimes involves making assumptions about unknown factors, such as an individual’s exact travel pattern. By plugging in different possible versions of each input, however, researchers can update the models as new information becomes available and compare their results to observed patterns for the illness. For example, if investigators want to study how closing a particular airport could affect a disease’s global spread, their computers can swiftly recalculate the risk of importing cases through other airports—all the humans need to do is update the network of flight routes and international travel patterns.

But when working with incomplete data, a small error in one factor can have an outsize effect. Uncertainty about something such as COVID-19’s basic reproduction number (R0)—the average number of new cases caused by an infected individual—can disrupt a model’s results. “If you’re wrong about this number, your estimate will be off by orders of magnitude,” says Dirk Brockmann, a physicist at the Institute for Theoretical Biology at Humboldt University of Berlin and the Robert Koch Institute in Germany. The current estimated R0 for the novel coronavirus varies from two to three, placing it somewhere near SARS’s R0 of two to four in 2003 but much lower than measles’s R0 of 12 to 18.

Because each unknown factor introduces more uncertainty to a model, Brockmann and some other researchers favor focusing on a more limited model that relies on just one main factor. His group has concentrated on using international flight data—without figuring in person-to-person transmission—to predict which airports represent the highest-risk gateways for the coronavirus to spread worldwide. “This risk predicts the expected sequence of countries you would find cases in,” Brockmann explains. “The way it unfolded is very much in line with what the mobility model predicted.”

Flight data can come from official aviation databases, making them fairly reliable, but they do not involve people’s movements on the ground. For that information, researchers use different sources. Alessandro Vespignani, a physicist and director of the Laboratory for the Modeling of Biological and Socio-technical Systems at Northeastern University, leads a team that is simulating the novel coronavirus’s spread using official air-travel data and predicted commuting patterns among census populations. Despite not accounting for person-to-person transmission with an R0, such travel-focused models seem to have consistently and accurately predicted which countries face the highest risk of getting new cases of COVID-19. “If different models point in the same direction,” Vespignani says, “you are more confident there is some level of realism in the results.”

Another recent effort to estimate how the coronavirus is spreading—both inside China and internationally—also incorporates individual mobility data from both flights and ground-travel patterns during the period of the Lunar New Year holiday—which fell on January 25 this year—when the outbreak was picking up steam. In a paper published in the Lancet on January 31, Hong Kong–based researchers estimated this year’s holiday travel patterns by using information from the 2019 Lunar New Year travels of millions of people who used the WeChat app and other services owned by Chinese tech giant Tencent. Unlike the purely travel-focused models, however, this study also included person-to-person transmission estimates, along with travel patterns based on both official flight data and Tencent’s individual mobility data. Its results suggest COVID-19 had already taken root in many major Chinese cities as of January 25 and that those cities’ international airports helped spread the virus internationally.

In addition to combining known and uncertain factors about travel and transmission, models must reckon with the impact of public health interventions—such as the adoption of face masks, school closures or larger governmental measures, such as China’s decision to quarantine entire cities—along with international travel bans and constraints. The Hong Kong researchers estimated that China’s quarantine of Wuhan, which started on January 23, was limited in the difference it made because the disease had probably already spread to other cities in the nation. Still, the authors did recommend that “draconian measures that limit population mobility should be seriously and immediately considered in affected areas.” Public health experts seem uncertain about the effectiveness of such travel restrictions within and between cities. Other studies of past outbreaks suggest that harsh constraints on movement have only limited effects in delaying the international spread of diseases.

Some researchers work on modeling the results of changes in public behavior and government actions before they happen. Lauren Gardner, a civil engineer and co-director of the Center for Systems Science and Engineering at Johns Hopkins University, has been refining a model designed to help U.S. government officials decide which airports should screen arriving passengers with temperature checks and questions and which ones are unlikely to encounter new cases of the novel coronavirus. This information could allow local governments to distribute resources where they are likely to be most needed. “There has been lots of interest from various regional public health offices in using these results to prioritize surveillance efforts,” Gardner says.

These teams are just a few of those working to predict the future spread of COVID-19. Physician Elizabeth Halloran, director of the Center for Inference and Dynamics of Infectious Diseases, headquartered at the Fred Hutchinson Cancer Research Center in Seattle, says that during the 1980s, she could count on her fingers the number of research groups doing such modeling work. Now there are hundreds. “We were on a phone call organized by the [U.S. Centers for Disease Control and Prevention] the other day, and there were 80 call ins [from research groups],” she says. “There are a lot of excellent groups, and we operate together as a big network.” Nobody has all of the necessary data to achieve 100 percent certainty about the outbreak’s future course.

But despite the variety of models, many ultimately agree on key points. For instance, between February 4 and 5, the number of confirmed cases rose from fewer than 25,000 to more than 28,000 within the span of a day. But at the time, Vespignani points out, various models agreed the real count was much higher. “I believe every modeling approach [was] pointing to something that [was] over 100,000 [current] cases in the best-case scenario,” he says. At the time this article is going to press, the number of confirmed cases is greater than 45,000.