Health care officials and aid workers attempting to trace the progression of the Ebola virus disease outbreak that has claimed more than 2,800 lives so far (pdf) have come to rely heavily on a handful of disease-monitoring Web sites that act as pivotal hubs for processing information. Different sites serve slightly different functions, but for the most part they exist to manage the data glut created by countless news articles, social media feeds, medical reports and e-mailed on-scene accounts.
These sites use a combination of artificial intelligence software and human expertise to track, report and map information related to public health crises, often faster than government ministries and international watchdogs can respond. One site, HealthMap, used this hybrid model to spot signs of the emerging Ebola problem days before the World Health Organization (WHO) issued its first report.
Scientific American In-Depth Report, “Ebola: What You Need to Know”
HealthMap’s automated text-processing algorithm has been tracking the Ebola outbreak since March 14, when the Guinean news site reported “a strange fever” in the country’s Macenta prefecture “marked by anal and nasal bleeding.” Within a few days HealthMap had picked up on a report by the Standard Digital news site indicating the “mystery hemorrhagic fever” had already claimed a few dozen victims. The doctor in charge of Guinea’s health ministry stated Ebola was being considered as the culprit, although these would have been the first recorded Ebola cases in that country. Authorities had soon narrowed down the cause to either Ebola or a related disease known as Marburg hemorrhagic fever. By March 22 a Nigerian news site caught HealthMap’s attention with an article that put the words “Ebola” and “outbreak” together.
Plotting help
The Children’s Hospital Informatics Program launched HealthMap in 2006 as a way of using the growing number of digital resources—the Internet, RSS feeds and e-mail lists, to name a few—to plot information about emerging diseases worldwide on a Google Map. HealthMap flagged the current Ebola outbreak for a number of reasons­—in particular, its spread across borders from Guinea to Liberia, Sierra Leone and several other neighboring countries.
HealthMap automates data acquisition, filtering and characterization of information so that it flows from the source through to the Web page without any human intervention. At the same time, the site’s experts in infectious disease and public health review this content to correct and refine the automated classifications, says Clark Freifeld, a research software developer at the Children’s Hospital Informatics Program. The analysts ensure, for example, that running statistics of infections and deaths the site publishes are as accurate as possible, something software has difficulty with because different information sources report different numbers across different timeframes. “We have the technical framework in place to make [posting information] easier,” he adds, but “our approach has always been a human-in-the-loop model.”
Although this is not the first major health crisis that HealthMap has covered—the 2009 H1N1 flu pandemic was one of its largest efforts—the Ebola outbreak has pushed the site in new directions in an effort to provide information for a variety of institutions, including the WHO, the United Nations and the U.S. Centers for Disease Control and Prevention. HealthMap can now process data from tens of thousands of Web pages hourly in 15 different languages. And on Monday HealthMap debuted a projection of how the Ebola outbreak might play out over the next few months, the first time the site has attempted to forecast the spread of a disease.
Forward looking
HealthMap’s modeling tool for short-term outbreak projections can filter data by country and take into account different control scenarios. The tool itself, which projects a worse-case scenario of 14,176 Ebola cases in Guinea, Liberia and Sierra Leone by October 26, is currently built on a model developed by a Toronto-based team of medical and public health researchers, although HealthMap may try other methods down the road, says John Brownstein, HealthMap co-founder and an associate professor of pediatrics at Harvard Medical School.
The Toronto model is a mathematical approach to tracking the expansion and contraction of outbreaks—known as Incidence Decay and Exponential Adjustment (IDEA)—that considers factors that might slow epidemic growth. Such factors might include more adequate isolation of Ebola victims or the arrival of international assistance. IDEA is well suited for providing rapid assessments of outbreak growth and public health interventions, according to the researchers. “It’s more of a time series approach where we are applying modeling to derive estimates of future cases,” Brownstein says.
HealthMap’s machine-learning algorithms for tracking an outbreak’s progression assign retrieved data to one of five categories: breaking news, warnings about possible outbreak conditions, references to past outbreaks, research and other contextual information, and events unrelated to any outbreak. These filters are “a key component of the system, and especially useful when we see large volumes of data around highly visible outbreaks,” Freifeld says.
Disease wiki
The site excels at natural-language processing, says Larry Madoff, founder and editor of the Program for Monitoring Emerging Diseases (ProMED), a global electronic mailing list that receives and summarizes reports on disease outbreaks and that was one of HealthMap’s first sources of data. “They [are] able to suck in our reports and with reasonable accuracy put them on a map,” he says, adding that HealthMap helped automate what ProMED had been doing since 1994.
HealthMap’s algorithm gauges the significance of information based primarily on how often the same material appears in multiple sources, although it does not rate the information based on the source itself, whether it is a New York Times article or a bulletin from a local health ministry. The site does not “take sides” with regard to its sources’ credibility, Freifeld says. Instead, he adds, it follows the logic that significant events tend to see multiple reports from a range of sources.
When it became clear a few months ago that the Ebola outbreak was growing worse and would not be contained anytime soon, the HealthMap team developed a timeline interface to better organize and visualize its reports. “We’re not sure how many people would have predicted it would be this bad,” Brownstein says. The 2014 Ebola Outbreak timeline includes more than 130 entries and has had over 1 million page views since going live in mid-July. Brownstein described HealthMap as "almost like the Wikipedia of emerging infectious diseases" in a March 2010 article. HealthMap works because “it’s a very human-driven but Internet-based system,” Madoff says, adding, “It’s part social network, part news service.”
HealthMap is not the only infectious-disease surveillance site. In addition to newer sites such as Google Flu Trends, there are more established efforts such as ProMED and the Global Public Health Intelligence Network (GPHIN), first developed by Health Canada in collaboration with WHO in 1997. GPHIN software retrieves relevant articles every 15 minutes from news-feed aggregators Al Bawaba and Factiva based on specific search queries that the site updates regularly, according to WebMD’s Medscape site. In addition to software-selected articles, the GPHIN database is populated by submissions from human analysts who comb open-access Web sites in search of relevant public health information.
These and other disease-monitoring sites provide a crucial early-warning system that puts information in front of the public as soon as it becomes available, Madoff says. Looking back to the beginning of AIDS awareness in the 1980s, the disease’s origins stretched back 20 or 30 years before it came to the U.S. and was brought to the world’s attention, he says, adding, “Hard to believe this could happen now.”