Consumers are also beginning to use the data language and ontologies directly. One example is the Friend of a Friend (FOAF) project, a decentralized social-networking system that is growing in a purely grassroots way. Enthusiasts have created a Semantic Web vocabulary for describing people’s names, ages, locations, jobs and relationships to one another and for finding common interests among them. FOAF users can post information and imagery in any format they like and still seamlessly connect it all, which MySpace and Facebook cannot do because their fields are incompatible and not open to translation. More than one million individuals have already interlinked their FOAF files, including users of LiveJournal and TypePad, two popular Weblog services.
As these examples show, people are moving toward building a Semantic Web where relations can be established among any online pieces of information, whether an item is a document, photograph, tag, financial transaction, experiment result or abstract concept. The data language, called Resource Description Framework (RDF), names each item, and the relations among the items, in a way that allows computers and software to automatically interchange the information. Additional power comes from ontologies and other technologies that create, query, classify and reason about those relations.
The Semantic Web thus permits workers in different organizations to use their own data labels instead of trying to agree industry-wide on one rigid set; it understands that term “X” in database 1 is the same as term “Y” in database 2. What is more, if any term in database 1 changes, the other databases and the data-integration process itself will still understand the new information and update themselves automatically. Finally, the Semantic Web enables the deployment of “reasoners”—software programs that can discover relations among data sources.
Just as the HTML and XML languages have made the original Web robust, the RDF language and the various ontologies based on it are maturing, and vendors are building applications based on them. IBM, Hewlett-Packard and Nokia are promoting open-source Semantic Web frameworks—common tools for crafting polished programs. Oracle’s flagship commercial database, 10g, used by thousands of corporations worldwide, already supports RDF, and the upgrade, 11g, adds further Semantic Web technology. The latest versions of Adobe’s popular graphics programs such as Photoshop use the same technologies to manage photographs and illustrations. Smaller vendors—among them Aduna Software, Altova, @semantics, Talis, OpenLink Software, TopQuadrant and Software AG—offer Semantic Web database programs and ontology editors that are akin to the HTML browsers and editors that facilitated the Web’s vibrant growth. Semantic Web sites can now be built with virtually all of today’s major computer programming languages, including Java, Perl and C++.
We are still finding our way toward the grand vision of agents automating the mundane tasks of our daily lives. But some of the most advanced progress is taking place in the life sciences and health care fields. Researchers in these disciplines face tremendous data-integration challenges at almost every stage of their work. Case studies of real systems built by these pioneers show how powerful the Semantic Web can be.
Case Study 1: Drug Discovery The traditional model for medicinal drugs is that one size fits all. Have high blood pressure? Take atenolol. Have anxiety? Take Valium. But because each person has a unique set of genes and lives in a particular physical and emotional environment, certain individuals will respond better than others. Today, however, a greater understanding of biology and drug activity is beginning to be combined with tools that could predict which drugs—and what doses—will work for a given individual. Such predictions should make custom-tailored, or personalized, medical treatments increasingly possible.
The challenge, of course, is to somehow meld a bewildering array of data sets: all sorts of historic and current medical records about each person and all sorts of scientific reports on a number of drugs, drug tests, potential side effects and outcomes for other patients. Traditional database tools cannot handle the complexity, and manual attempts to combine the databases would be prohibitively expensive. Just maintaining the data is difficult: each time new scientific knowledge is incorporated into one data source, others linked to it must be reintegrated, one by one.
A research team at Cincinnati Children’s Hospital Medical Center is leveraging semantic capabilities to find the underlying genetic causes of cardiovascular diseases. Traditionally, researchers would search for genes that behave differently in normal and diseased tissues, assuming that these genes could somehow be involved in causing the pathology. This exercise could yield tens or hundreds of suspect genes. Researchers would then have to pore through four or five databases for each one, trying to discern which genes (or the proteins they encode) have features most likely to affect the biology of the disorder—a painstaking task. In the end, investigators often cannot afford the hours, and the work falters.