In 2004 the Web 2.0 era officially began. Scores of Internet surfers have abandoned AOL and some plucky first-movers have forsaken home pages altogether, opting instead to get their content from RSS feeds, social bookmarking sites—such as—and collaborative tagging schemes, like photo-sharing hub Flickr.

A team of physicists from the University of Rome "La Sapienza" sought to determine the underlying statistical properties of this new information paradigm by studying the behaviors of tags—single words used to describe the content of a linked article or photo—on the social bookmarking/collaborative tagging sites and Connotea. [Editor's Note: Scientific American and Connotea are owned by the same holding company.]

"The idea was to try and see if we could apply complex systems science methods to modeling a system which is an IT system, but exposing, in a very explicit and complex way, the social component—the activity of people," first author Ciro Cattuto says. "In this system, the linguistic element—the word, the symbol—is a dynamical entity and plays the role of a particle in statistical mechanics."

After studying the manner in which certain tags were associated with a pair of selected ones (an example of a general tag, "blog," and another that was specific, "ajax," as in the shorthand for the Web development technique Asynchronous JavaScript and XML), the researchers determined that user behavior in collaborative tagging schemes followed a power law in which certain words were highly associated with the chosen tags: "design" "web" and "news" appear most frequently with "blog" and "javascript;" "web" and "xmlhttprequest" pop up most often in conjunction with "ajax." There is a precipitous drop-off where several terms appear less often with the chosen tags.

The authors say they were not surprised by the presence of a power law in their model, because this sort of curve is "the standard signature of self-organization and of human activity." The power law does show some semantic effect at the top of the distribution, where the steepness varies depending on the ambiguity of the tag. "A very flat curve at the top of the distribution means that you're dealing with a tag in whose context a lot of other tags are co-occurring with comparable probabilities," Cattuto says, "and so this tag must be somehow vague, must be ambiguous."

Their model exemplifies two primary aspects of user behavior: preferential attachment and aging of resources. Preferential attachment can also be described as a copying attitude. Cattuto, who performed the research with Vittorio Loreto and Luciano Pietronero, offers the example of linking to a photo or article about New York City. The person posting the link can tag the item in several ways, a few of which are "nyc," "newyork_city," or newyork. The choices of previous users, however, are likely to influence the next group of users. "There is pressure, in essence," Cattuto explains, "because if you use tags that are already widespread within the system, people are able to find your entries—so, using popular tags makes your content findable and makes you more visible."

The aging of resources effect follows a previous finding by complex network researcher Albert-László Barabási of the University of Notre Dame, which found that information stays fresh on the Web for only about 36 to 48 hours. Similarly, the researchers found that users on collaborative tagging sites would likely prefer recently added tags to older ones.

Barabási, whose work focuses more on the entire World Wide Web, applauds the Italian researchers for being the first to address the new phenomenon of collaborative tagging and attempting to demystify its behavior. "They are taking a new technology, which kind of enhances the usage of the Web and the underlying network structure—and they use quantitative methods to understand its properties," he says. "This paper probably will not tell you what's going to be the coolest term, but it will tell you what the fundamental structures within the system."

Now, that the team has a model for how tags behave through association with other words, Barabási says, they can ask the question: "Will a tag get oversaturated and become meaningless or will it grow indefinitely?"