The Internet grew out of an idea to connect various and disparate sources of data, delivering to researchers around the globe unprecedented access to information via their computer screens. As e-Science evolves alongside Web 2.0, however, some are pushing for a fundamental change in the way the Internet catalogues and organizes data to make it more readily available to the growing number of interdisciplinary and highly specialized researchers who spend their working hours nearly entirely online and who tend to collaborate online. Whereas this is not a new argument—the idea of a more intuitive "Semantic Web" has been kicked around for years—it has gotten a fresh set of legs thanks to the recent funding of a software development tool kit expected to better connect researchers with the information they seek.
The National Science Foundation (NSF) awarded a team of researchers at Rensselaer Polytechnic Institute in Troy, N.Y., $1.1 million in October to create a software programming tool kit by mid-2010 that scientists and other researchers will be able to use to make data from their work available to a larger number of their peers as well as laypeople, including educators and policymakers. The money is being provided as part of the American Recovery and Reinvestment Act of 2009.
Newer generations of researchers not schooled in more traditional, library-based (pre-Internet) research methods are used to doing keyword searches on the Internet to discover information. "But if you come from outside a given field, you don't necessarily know what those keywords are," says Alyssa Goodman, a Harvard University astronomy professor. A Semantic Web setup would enable researchers to craft their queries in more natural language. Goodman adds, however, that a fully semantic Web that can read, comprehend and categorize information beyond keywords requires a level of artificial intelligence that is currently not available, something Rensselaer's researchers are trying to address with this new tool kit.
"Earth and space science research today is moving online," says Tom Narock, a faculty research assistant at the University of Maryland, Baltimore County, and at NASA's Goddard Earth Sciences and Technology Center. Narock often searches (for his research on solar physics) for measurements taken by spacecraft, data that is typically stored and managed by multiple research institutions. "The problem is there's a lot of heterogeneity among the different data sets," he says. If he needs to study images of the sun over a specific time period, Narock needs to first find out which spacecraft are taking the images, whether they were in position to take the photos he needs, and whether they were operational during a specific time period, for example. Although many research institutes espouse the idea of open access to their work, finding the right information takes quite a bit of trial and error, he adds.
This is in part because different organizations often store their data using one or more of a variety of data formats. "There's also a deeper semantic issue than what do the columns and rows actually represent in different databases," Narock says. As a result, sifting through different data sources in search of related information can be a very tedious task, where a researcher needs to go to individual databases and inspect files, sometimes even calling fellow researchers for clarification.
Semantic Web technology will be at the heart of the new software tool kit, says Peter Fox, a Rensselaer physics professor and a co-chair of the school's Tetherless World Constellation research team spearheading the project. (At Rensselaer, "constellations" are multidisciplinary teams of senior faculty, junior faculty, graduate students and undergraduates.) "With the new tool kit, the idea is for us to get out and train communities and create a shared resource," Fox says. "This is a tool for e-Science," which is essentially the open collaboration among different scientific disciplines across interconnected networks.
Rather than offering researchers a simple keyword search across a single database that returns information in pieces, the semantic approach proposes to create a more intelligent Internet infrastructure that can assign meaning to the concepts being searched and even to some degree have an understanding of the researcher's intent. Using ontologies, which are formal representations of concepts within a particular discipline and of the relationships among these concepts, searches could understand different nomenclatures that express the same ideas, providing links to related Web sites, nonprofit organizations, upcoming bills before Congress, and even multimedia podcasts, digital images and video files. "The semantic Web is the way of coming up with a shared expression for a common meaning," Fox says.
Ideally, researchers and Web surfers alike will also have the ability to review and correct information when necessary, similar to Wikipedia's model. Access to certain data sets could also be controlled using semantic tags attached to the data, helping those searching for the information to more easily credit the original creator of the data that they are using, whereas data creators could track exactly who is looking at their data, says Deborah McGuinness, a Rensselaer professor of computer science and cognitive science as well as a co-chair of the school's Tetherless World Constellation. Fox and McGuinness are developing the tool kit with the help of fellow Tetherless World co-chair and Rensselaer computer and cognitive sciences professor, Jim Hendler.
A semantic interface would allow a researcher to visit a single research site, describe the information required, and then let ontology and semantics take care of the rest. "The Semantic Web has it's own query language that takes advantage of meanings of concepts and their relationships," Narock says. "You ask your question at very high level, and it takes care of filling in the details for you."
Such a conversion won't be easy, though. As Narock points out, people in charge of massive databases would have to develop ontologies that make the information more accessible, although Fox says Constellation's plan is to have some prepackaged ontologies for programmers to use. To make the Semantic Web work, Narock says, tools such as the one Constellation is developing need to be widely available and, just as importantly, used as data is created.
It's All Semantics: Searching for an Intuitive Internet That Knows What Is Said--And Meant
The National Science Foundation delivers $1.1 million to Rensselaer Polytech researchers to stimulate the Semantic Web