The Cincinnati team, which includes a Semantic Web consultant, began by downloading into a workstation the databases that held relevant information but from different origins and in incompatible formats. These databases included Gene Ontology (containing data on genes and gene products), MeSH (focused on diseases and symptoms), Entrez Gene (gene-centered information) and OMIM (human genes and genetic disorders). The investigators translated the formats into RDF and stored the information in a Semantic Web database. They then used Protégé and Jena, freely available Semantic Web software from Stanford University and HP Labs, respectively, to integrate the knowledge.
The researchers then prioritized the hundreds of genes that might be involved with cardiac function by applying a ranking algorithm somewhat similar to the one Google uses to rank Web pages of search results. They found candidate genes that could potentially play a causative role in dilated cardiomyopathy, a weakening of the heart’s pumping ability. The team instructed the software to evaluate the ranking information, as well as the genes’ relations to the characteristics and symptoms of the condition and similar diseases. The software identified four genes with a strong connection to a chromosomal region implicated in dilated cardiomyopathy. The researchers are now investigating the effects of these genes’ mutations as possible targets for new therapeutic treatments. They are also applying the semantic system to other cardiovascular diseases and expect to realize the same dramatic improvement in efficiency. The system could also be readily applied to other disease families.
Similarly, senior scientists at Eli Lilly are applying Semantic Web technologies to devise a complete picture of the most likely drug targets for a given disease. Semantic tools are allowing them to compile numerous incompatible biological descriptions into one unified file, greatly expediting the search for the next breakthrough drug. Pfizer is using Semantic Web technologies to mesh data sets about protein-protein interaction to reveal obscure correlations that could help identify promising medications. Researchers there are convinced that these technologies will increase the chance for serendipitous discoveries, accelerate the speed of delivering new drugs to market and advance the industry as a whole toward personalized medicine. “This is where the Semantic Web could help us,” says Giles Day, head of Pfizer’s Research Technology Center informatics group in Cambridge, Mass.
In each of these cases, the Semantic Web enhances drug discovery by bringing together vast and varied data from different places. New consumer services are being built in similar fashion. For example, the British firm Garlik uses Semantic Web software to compare previously incompatible databases to alert subscribers that they might be the target of an identity thief. Garlik culls disparate personal identity information from across the Web, integrates it using common vocabularies and rules, and presents subscribers with a clear (and sometimes surprising) view of their online identity.
Case Study 2: Health Care The health care industry confronts an equally dense thicket of information. One initiative that has been deployed since 2004 was developed at the University of Texas Health Science Center at Houston to better detect, analyze and respond to emerging public health problems. The system, called SAPPHIRE (for situational awareness and preparedness for public health incidences using reasoning engines), integrates a wide range of data from local health care providers, hospitals, environmental protection agencies and scientific literature. It allows health officials to assess the information through different lenses, such as tracking the spread of influenza or the treatment of HIV cases.
Every 10 minutes in the greater Houston area, SAPPHIRE receives reports on emergency room cases, descriptions of patients’ self-reported symptoms, updated electronic health rec-ords, and clinicians’ notes from eight hospitals that account for more than 30 percent of the region’s emergency room visits. Semantic technologies integrate this information into a single view of current health conditions across the area. A key feature is an ontology that classifies unexplained illnesses that present flulike symptoms (fevers, coughs and sore throats) as potential influenza cases and automatically reports them to the Centers for Disease Control and Prevention. By automatically generating reports, SAPPHIRE has relieved nine nurses from doing such work manually, so they are available for active nursing. And it delivers reports two to three days faster than before. The CDC is now helping local health departments nationwide to implement similar systems, replacing tedious, inconsistent and decades-old paper schemes.
The nimbleness of Semantic Web technologies allows SAPPHIRE to operate effectively in other contexts as well. When Hurricane Katrina evacuees poured into Houston’s shelters, public health officials quickly became concerned about the possible spread of disease. Within eight hours after the shelters were opened, personnel at the University of Texas Health Science Center configured SAPPHIRE to help. They armed public health officials with small handheld computers loaded with health questionnaires. The responses from evacuees were then uploaded to the system, which integrated them with data from the shelters’ emergency clinics and surveillance reports from Houston Department of Health and Human Services epidemiologists in the field. SAPPHIRE succeeded in identifying gastrointestinal, respiratory and conjunctivitis outbreaks in survivors of the disaster much sooner than would have been possible before.