
INFORMATION EXPLOSION: Researchers say 2002 could be considered the beginning of the digital age, the first year worldwide digital storage capacity overtook total analog capacity. As of 2007 almost 94 percent of our memory is in digital form.
Image: COURTESY OF LOOPS7, VIA ISTOCKPHOTO.COM
-
The Best Science Writing Online 2012
Showcasing more than fifty of the most provocative, original, and significant online essays from 2011, The Best Science Writing Online 2012 will change the way...
Read More »
Data are the common currency that unites all fields of science. As science progresses data proliferate, providing points of reference, revealing trends, and offering evidence to substantiate hypotheses. Decades into the digitization of science, however, data proliferate exponentially, at times threatening to drown knowledge and information in a sea of noise.
The journal Science examines this trend in a special report this week that, according to the editors, turns up two themes: "Most scientific disciplines are finding the data deluge to be extremely challenging, and tremendous opportunities can be realized if we can better organize and access the data." The report features among its articles an analysis of the challenges of understanding the reams of data being produced in particular by climate science, neurology and genomics.
One of the most interesting articles, however, attempts to quantify exactly how much data we're actually talking about and makes a key distinction between data and information. In "The World's Technological Capacity to Store, Communicate and Compute Information" Martin Hilbert, a doctoral candidate the University of Southern California's Annenberg School for Communication and Journalism in Los Angeles, and Priscila López Pavez, a graduate student studying information and knowledge in society at the Open University of Catalonia in Santiago, Chile, report on their efforts to track 60 analog and digital technologies during the period from 1986 to 2007. The researchers found that the amount of data generated those two decades exploded as digital technology moved into the mainstream. For example, the amount of data stored electronically in 2007 was equivalent to 61 CD-ROMs per person living on the planet at the time. For perspective, if those CDs were stacked, they would reach from Earth to the moon plus a quarter of that distance beyond.
Scientific American spoke with Hilbert, who also created and coordinated the United Nations Regional Commission for Latin America and the Caribbean's Information Society Program between 2000 and 2008, about the motivations behind his project with López Pavez, their research's potential impact on Hilbert's field of social science, and how human technological innovation stacks up against Mother Nature.
[An edited transcript of this interview follows.]
What prompted you to calculate the world's technological capacity to store, communicate and compute information?
In social science we've been talking about the digital revolution and the information society for quite some time now. We know that these technologies are the driver for productivity and the economy. We know that they're very important for political freedoms—just think about what's happening in Egypt right now. We know that they change the way a family is organized—consider how family members use cell phones to communicate while away from home. They change social conduct in every aspect. However, contrary to other sciences, social science does not yet walk the talk of the information age. Our paper is basically a contribution to bring social science toward the information age, which is important because information seems to be one of the unifying variables across all areas of science. One specific interest of ours was to see how fast information is growing and how fast we're digitizing this information.
There have been other studies that focused on measuring the hardware capacity of humankind. Now that's not information, that's just data. What we did here was normalize the compression rate, and that basically converts all data into information.
What does it mean to "normalize" compression?
The theory behind our research is actually pretty old and goes back to [U.S. mathematician, electronic engineer and cryptographer] Claude Shannon's information theory, which he introduced in 1948. So, basically what Shannon said is that we define information as the opposite of uncertainty. If you have uncertainty, you don't have information. And once you receive information, uncertainty is being resolved. He defines one bit as something that reduces uncertainty by half. We converted the data contained in storage and communication hardware capacity into informational bits. We measured information as if all redundancy were removed using the most efficient compression algorithms available in 2007.
In a practical sense, you can think of it like this: You have a Word document, and you save it on your hard disk. Let's say it's 100 kilobytes, and then you compress it with a zip file to only 50 kilobytes. What Shannon taught us is, if you compress it and compress it and compress it to the uttermost compression rate, we approach the entropy (or the actual amount of information) in this file. A compression algorithm takes out all of the redundant data in the file and leaves you with pure information.
In your research paper, you come across as pretty savvy with regard to the terminology and technology of information. Did you have to learn a great deal about the how data storage, compression, computation and other technologies work in order to pursue this project?
We had to learn a little bit. I'm an economist by training, and Priscila [López Pavez] is a telecommunications engineer, so I focused on the social statistics, the number of devices and the social interpretation of the information, whereas she focused more on the technology. Shannon was the one who taught us what information is and how to measure information. Our contribution was to take up this pretty old theory and convert it to a methodology that is useful for social science, and we applied this methodology for the first time to one concrete case—to measure how much information there is in the world, how much is stored, communicated and computed. This methodology could also be used for many other applications—for example, you could measure how much information there is in a company or in a tribe or in a society.




See what we're tweeting about





4 Comments
Add CommentAn enormous amount of data storage capacity is allocated not to useful information but to replicated data, including copies of software program files and application installation and temporary data.
Reply | Report Abuse | Link to thisI assume that by far the majority of data storage capacity is configured on personal devices. Consider the allocation of storage on your own PC (you really have no idea how it's used, do you?). Excluding copied photos, videos, MP3s, backups, temporary files, etc. how much unique, original information is being stored? You know, data that's used in computational analyses to produce useful information... How much information do you compute, anyway?
It's absurd to consider the digital storage capacity installed on personal devices as 'data' in any sense relating to 'information systems'.
In the Star Wars saga the question arises on the difference between knowledge and wisdom. The same could be said between the difference between information and knowledge, for information is the accumulation of the sum of knowledge spread over the course of the six million years of human experience quantified; whether, this knowledge is in a compressed or uncompressed state does not change the nature of the information (knowledge) accumulated. I believe that in order for information (knowledge)to be processed it first has to go through a series of quantitative, qualitative, and then to a quantum understanding of the knowledge (information)learned. In the Foundation, Harri Sheldon, the head of the Encyclopedist, devises a mathematical formula in order to safe the accumulated knowledge of the Empire during the centuries of chaos.
Reply | Report Abuse | Link to thisWe are in the infancy of the informational and technological age, and our collective knowledge has only begun to scratch the surface, The knowledge that will be accumulated in the next one hundred years will be far greater than the knowledge accumulated in the last one thousand years,and the rate of accumulation will increase exponentially over the course of the next thousand or five thousand years. How do you compress that?
The permutative variations in data redundancy are alone enough to discredit any attempt at creating an ultimate compression strategy in information theory on any dynamic model of existence by simply adding or deleting copies of let's say, newspaper clippings in anyone's scrapbook. Absolute notions include the persistence of environment regardless of context. That said, the ultimate religion for artificial intelligence cannot be data and power supplies alone. Machines are dumb. Biology isn't. Both are adaptive and both can wear out and with that we redefine context in myriad ways. Food for thought.
Reply | Report Abuse | Link to thisIt isn't clear to me that we humans will be capable of managing the huge amount of data into useful information before we become buried in it under a scientific Tower of Babel.
Reply | Report Abuse | Link to thisTwo problems arise. One is the inability to find theories which unify the mountains of data into formulas linked together with logic and mathematics and the other is the inability to develop languages which make it possible for people in different fields to communicate with each other.
Scientists exist in small hermetically sealed enclaves who communicate among themselves with exclusive jargons which are incomprehensible to outsiders.
Why should we assume that the human species, which evolved to exist in small nomadic tribes of hunters and gatherers, can adapt to the glass walled modern "megalithic" cities most of us live in today?
We use languages with grammatical structures that are illogical accretions from their long pre-literate written history, and whose evolutionary laws and etiologies are unknown.
We can't even regularized English spelling, not to mention our irregular grammatical structures.
I'm not saying that we wont continue to make progress both socially and technologically, only that it is not a certainty. Maybe that is just another way of saying "God is dead" and there is no magic power above that will make things turn out well just because we humans say the right prayers and follow the rules of the day.