January 23, 2007 | 0 comments

Tag, You're It: Scientists Describe Collaborative Tagging Sites like Del.icio.us

Italian researchers determine underlying statistical structure of social bookmarking sites

By Nikhil Swaminathan   

 
Girls at computer

COLLABORATIVE COMPUTING: Scientists in Italy have determined how tags, used on popular websites like Del.icio.us and Flickr, behave statistically
© BLOOMIMAGE/CORBIS

e-mail print comment

In 2004 the Web 2.0 era officially began. Scores of Internet surfers have abandoned AOL and some plucky first-movers have forsaken home pages altogether, opting instead to get their content from RSS feeds, social bookmarking sites—such as Digg.com—and collaborative tagging schemes, like photo-sharing hub Flickr.

A team of physicists from the University of Rome "La Sapienza" sought to determine the underlying statistical properties of this new information paradigm by studying the behaviors of tags—single words used to describe the content of a linked article or photo—on the social bookmarking/collaborative tagging sites del.icio.us and Connotea. [Editor's Note: Scientific American and Connotea are owned by the same holding company.]

"The idea was to try and see if we could apply complex systems science methods to modeling a system which is an IT system, but exposing, in a very explicit and complex way, the social component—the activity of people," first author Ciro Cattuto says. "In this system, the linguistic element—the word, the symbol—is a dynamical entity and plays the role of a particle in statistical mechanics."

After studying the manner in which certain tags were associated with a pair of selected ones (an example of a general tag, "blog," and another that was specific, "ajax," as in the shorthand for the Web development technique Asynchronous JavaScript and XML), the researchers determined that user behavior in collaborative tagging schemes followed a power law in which certain words were highly associated with the chosen tags: "design" "web" and "news" appear most frequently with "blog" and "javascript;" "web" and "xmlhttprequest" pop up most often in conjunction with "ajax." There is a precipitous drop-off where several terms appear less often with the chosen tags.

The authors say they were not surprised by the presence of a power law in their model, because this sort of curve is "the standard signature of self-organization and of human activity." The power law does show some semantic effect at the top of the distribution, where the steepness varies depending on the ambiguity of the tag. "A very flat curve at the top of the distribution means that you're dealing with a tag in whose context a lot of other tags are co-occurring with comparable probabilities," Cattuto says, "and so this tag must be somehow vague, must be ambiguous."

Their model exemplifies two primary aspects of user behavior: preferential attachment and aging of resources. Preferential attachment can also be described as a copying attitude. Cattuto, who performed the research with Vittorio Loreto and Luciano Pietronero, offers the example of linking to a photo or article about New York City. The person posting the link can tag the item in several ways, a few of which are "nyc," "newyork_city," or newyork. The choices of previous del.icio.us users, however, are likely to influence the next group of users. "There is pressure, in essence," Cattuto explains, "because if you use tags that are already widespread within the system, people are able to find your entries—so, using popular tags makes your content findable and makes you more visible."

The aging of resources effect follows a previous finding by complex network researcher Albert-László Barabási of the University of Notre Dame, which found that information stays fresh on the Web for only about 36 to 48 hours. Similarly, the researchers found that users on collaborative tagging sites would likely prefer recently added tags to older ones.

Barabási, whose work focuses more on the entire World Wide Web, applauds the Italian researchers for being the first to address the new phenomenon of collaborative tagging and attempting to demystify its behavior. "They are taking a new technology, which kind of enhances the usage of the Web and the underlying network structure—and they use quantitative methods to understand its properties," he says. "This paper probably will not tell you what's going to be the coolest term, but it will tell you what the fundamental structures within the system."



Read Comments (0) | Post a comment 1 2 Next >


Share
Propeller    Digg!  Reddit delicious  Fark 
Slashdot    RT @sciam Tag, You're It: Scientists Describe Collaborative Tagging Sites like Del.icio.usTwitter Review it on NewsTrust 
sharebar end

You Might Also Like


Discuss This Article


Click here to submit your comment.

VIEW:

2,573 characters remaining
 
  Email me when someone responds to this discussion.
 

risk free issue 

Sciam - cover Email:
Name:
Address:
Address 2:
City:
State:  
spacer




Editor's Pick

  • Adapting to the Freshwater CrisisForward-thinking experts are getting a better handle on the growing global water shortage and coming up with innovative approaches to ensuring the security, safety and sustainability of this resource

Newsletter

Technology Newsletter

Get weekly coverage delivered to your inbox


 Podcasts

  • 60-Second Earth     RSS  · iTunes The Jellyfish Menace
    click to enable

    Download

  • 60-Second Science     RSS  · iTunes Plants Share Light If Neighbor Is Related
    click to enable

    Download





ADVERTISEMENT
 
 


Also on Scientific American


© 1996-2009 Scientific American Inc. All Rights Reserved. Reproduction in whole or in part without permission is prohibited.
ADVERTISEMENT