INFORMATION EXPLOSION: Researchers say 2002 could be considered the beginning of the digital age, the first year worldwide digital storage capacity overtook total analog capacity. As of 2007 almost 94 percent of our memory is in digital form. Image: COURTESY OF LOOPS7, VIA ISTOCKPHOTO.COM
Data are the common currency that unites all fields of science. As science progresses data proliferate, providing points of reference, revealing trends, and offering evidence to substantiate hypotheses. Decades into the digitization of science, however, data proliferate exponentially, at times threatening to drown knowledge and information in a sea of noise.
The journal Science examines this trend in a special report this week that, according to the editors, turns up two themes: "Most scientific disciplines are finding the data deluge to be extremely challenging, and tremendous opportunities can be realized if we can better organize and access the data." The report features among its articles an analysis of the challenges of understanding the reams of data being produced in particular by climate science, neurology and genomics.
One of the most interesting articles, however, attempts to quantify exactly how much data we're actually talking about and makes a key distinction between data and information. In "The World's Technological Capacity to Store, Communicate and Compute Information" Martin Hilbert, a doctoral candidate the University of Southern California's Annenberg School for Communication and Journalism in Los Angeles, and Priscila López Pavez, a graduate student studying information and knowledge in society at the Open University of Catalonia in Santiago, Chile, report on their efforts to track 60 analog and digital technologies during the period from 1986 to 2007. The researchers found that the amount of data generated those two decades exploded as digital technology moved into the mainstream. For example, the amount of data stored electronically in 2007 was equivalent to 61 CD-ROMs per person living on the planet at the time. For perspective, if those CDs were stacked, they would reach from Earth to the moon plus a quarter of that distance beyond.
Scientific American spoke with Hilbert, who also created and coordinated the United Nations Regional Commission for Latin America and the Caribbean's Information Society Program between 2000 and 2008, about the motivations behind his project with López Pavez, their research's potential impact on Hilbert's field of social science, and how human technological innovation stacks up against Mother Nature.
[An edited transcript of this interview follows.]
What prompted you to calculate the world's technological capacity to store, communicate and compute information?
In social science we've been talking about the digital revolution and the information society for quite some time now. We know that these technologies are the driver for productivity and the economy. We know that they're very important for political freedoms—just think about what's happening in Egypt right now. We know that they change the way a family is organized—consider how family members use cell phones to communicate while away from home. They change social conduct in every aspect. However, contrary to other sciences, social science does not yet walk the talk of the information age. Our paper is basically a contribution to bring social science toward the information age, which is important because information seems to be one of the unifying variables across all areas of science. One specific interest of ours was to see how fast information is growing and how fast we're digitizing this information.
There have been other studies that focused on measuring the hardware capacity of humankind. Now that's not information, that's just data. What we did here was normalize the compression rate, and that basically converts all data into information.
What does it mean to "normalize" compression?
The theory behind our research is actually pretty old and goes back to [U.S. mathematician, electronic engineer and cryptographer] Claude Shannon's information theory, which he introduced in 1948. So, basically what Shannon said is that we define information as the opposite of uncertainty. If you have uncertainty, you don't have information. And once you receive information, uncertainty is being resolved. He defines one bit as something that reduces uncertainty by half. We converted the data contained in storage and communication hardware capacity into informational bits. We measured information as if all redundancy were removed using the most efficient compression algorithms available in 2007.
In a practical sense, you can think of it like this: You have a Word document, and you save it on your hard disk. Let's say it's 100 kilobytes, and then you compress it with a zip file to only 50 kilobytes. What Shannon taught us is, if you compress it and compress it and compress it to the uttermost compression rate, we approach the entropy (or the actual amount of information) in this file. A compression algorithm takes out all of the redundant data in the file and leaves you with pure information.
In your research paper, you come across as pretty savvy with regard to the terminology and technology of information. Did you have to learn a great deal about the how data storage, compression, computation and other technologies work in order to pursue this project?
We had to learn a little bit. I'm an economist by training, and Priscila [López Pavez] is a telecommunications engineer, so I focused on the social statistics, the number of devices and the social interpretation of the information, whereas she focused more on the technology. Shannon was the one who taught us what information is and how to measure information. Our contribution was to take up this pretty old theory and convert it to a methodology that is useful for social science, and we applied this methodology for the first time to one concrete case—to measure how much information there is in the world, how much is stored, communicated and computed. This methodology could also be used for many other applications—for example, you could measure how much information there is in a company or in a tribe or in a society.