I KNOW WHAT YOU MEAN: Rensselaer Polytech Tetherless World Constellation researchers are working on a software development tool kit they hope will help usher in the Semantic Web, an approach proposes to create a more intelligent Internet infrastructure that can assign meaning to the concepts being searched and, to some degree, have an understanding of the researcher's intent. Image: © ISTOCKPHOTO.COM/ANDREY PROKHOROV
The Internet grew out of an idea to connect various and disparate sources of data, delivering to researchers around the globe unprecedented access to information via their computer screens. As e-Science evolves alongside Web 2.0, however, some are pushing for a fundamental change in the way the Internet catalogues and organizes data to make it more readily available to the growing number of interdisciplinary and highly specialized researchers who spend their working hours nearly entirely online and who tend to collaborate online. Whereas this is not a new argument—the idea of a more intuitive "Semantic Web" has been kicked around for years—it has gotten a fresh set of legs thanks to the recent funding of a software development tool kit expected to better connect researchers with the information they seek.
The National Science Foundation (NSF) awarded a team of researchers at Rensselaer Polytechnic Institute in Troy, N.Y., $1.1 million in October to create a software programming tool kit by mid-2010 that scientists and other researchers will be able to use to make data from their work available to a larger number of their peers as well as laypeople, including educators and policymakers. The money is being provided as part of the American Recovery and Reinvestment Act of 2009.
Newer generations of researchers not schooled in more traditional, library-based (pre-Internet) research methods are used to doing keyword searches on the Internet to discover information. "But if you come from outside a given field, you don't necessarily know what those keywords are," says Alyssa Goodman, a Harvard University astronomy professor. A Semantic Web setup would enable researchers to craft their queries in more natural language. Goodman adds, however, that a fully semantic Web that can read, comprehend and categorize information beyond keywords requires a level of artificial intelligence that is currently not available, something Rensselaer's researchers are trying to address with this new tool kit.
"Earth and space science research today is moving online," says Tom Narock, a faculty research assistant at the University of Maryland, Baltimore County, and at NASA's Goddard Earth Sciences and Technology Center. Narock often searches (for his research on solar physics) for measurements taken by spacecraft, data that is typically stored and managed by multiple research institutions. "The problem is there's a lot of heterogeneity among the different data sets," he says. If he needs to study images of the sun over a specific time period, Narock needs to first find out which spacecraft are taking the images, whether they were in position to take the photos he needs, and whether they were operational during a specific time period, for example. Although many research institutes espouse the idea of open access to their work, finding the right information takes quite a bit of trial and error, he adds.
This is in part because different organizations often store their data using one or more of a variety of data formats. "There's also a deeper semantic issue than what do the columns and rows actually represent in different databases," Narock says. As a result, sifting through different data sources in search of related information can be a very tedious task, where a researcher needs to go to individual databases and inspect files, sometimes even calling fellow researchers for clarification.