Cover Image: August 2011 Scientific American Magazine See Inside

Parsing the Twitterverse: New Algorithms Analyze Tweets

Smarter language processors are helping experts analyze millions of short-text messages from across the Internet















Share on Tumblr



Image: Illustrations by Thomas Fuchs

Researchers have been trolling Twitter for insights into the human condition since shortly after the site launched in 2006. In aggregate, the service provides a vast database of what people are doing, thinking and feeling. But the research tools at scientists’ disposal are highly imperfect. Keyword searches, for example, return many hits but offer a poor sense of overall trends.

When computer scientist James H. Martin of the University of Colorado at Boulder searched for tweets about the 2010 earthquake in Haiti, he found 14 million. “You can’t hire grad students to read them all,” he says. Researchers need a more automated approach.

One promising method is to develop programs that label words in tweets with parts of speech—such as subject, verb and object—and then use those tags to determine what each tweet is about. This method, called natural-language processing, is not a new idea, but applying it to short social text is new and growing. “That is just a huge area right now,” Martin says.

Scientists at the Xerox-owned Palo Alto Research Center recently developed one such program. It relies on text processors, called parsers, which are typically tested on news articles. Parsers can distinguish between words and punctuation, label parts of speech and analyze a sentence’s grammatical structure. But “they don’t do as well on Twitter,” says Kyle Dent, one of the Palo Alto researchers. He and his co-author wrote hundreds of rules to account for hash tags, repeated letters (as in “pleaaaaaase”) and other linguistic features perhaps not common in the Wall Street Journal. They will present their work on August 8 at an Association for the Advancement of Artificial Intelligence conference in San Francisco.

Dent and his colleagues also tried to use their program to distinguish between rhetorical questions and those that require a response. Businesses could use such a program to find what people are asking about their products. In a recent trial, their program classified 68 percent of 2,304 tweets correctly. “For a brand-new field, that sounds like a decent first attempt,” says Jeffrey ­Ellen of the Space and Naval Warfare Systems Command, which provides intelligence technology to the U.S. Navy.

Although Twitter-trawling technology is not yet ready to deploy, as a field, “it’s getting there pretty quickly,” Martin says. Once it matures, researchers should have access to an unprecedented trove of data about human behavior. For the first time in history, “watercooler talk” is recorded and publicly available, Ellen says. “A hundred years ago we just didn’t know what everybody was thinking.”



This article was originally published with the title Parsing the Twitterverse.



Subscribe     Buy This Issue

Already a Digital subscriber? Sign-in Now
If your institution has site license access, enter here.

2 Comments

Add Comment
View
  1. 1. A Pedant 06:45 AM 7/22/11

    "Trolling can be phonetically confused with trawling, a different method of fishing where a net (trawl) is drawn through the water instead of lines. Trolling is used both for recreational and commercial fishing whereas trawling is used mainly for commercial fishing."

    Trolling on the Internet is something else entirely of course.

    Reply | Report Abuse | Link to this
  2. 2. jhubert 05:29 PM 7/25/11

    Good article, Francie. Companies have been using NLP to analyze unstructured content (ie, text) for several years, although the explosion in social media has really pushed innovation into high gear. For example, Attensity’s products parse not only text from internal sources (call center notes, surveys, emails) but external sources as well, including blogs and social media. As of last October, Attensity even offers real-time access to Twitter’s “Firehose” or complete data stream (http://bit.ly/aCGs3V), and is now working on offering the capability to parse and analyze the whole Twitterverse.

    Reply | Report Abuse | Link to this
Leave this field empty

Add a Comment

You must sign in or register as a ScientificAmerican.com member to submit a comment.
Click one of the buttons below to register using an existing Social Account.

More from Scientific American

See what we're tweeting about

Scientific American Editors

More »

Free Newsletters


Get the best from Scientific American in your inbox

Solve Innovation Challenges

Powered By: Innocentive

  SA Digital
  SA Digital

Science Jobs of the Week

Email this Article

Parsing the Twitterverse: New Algorithms Analyze Tweets: Scientific American Magazine

X
Scientific American Magazine

Subscribe Today

Save 66% off the cover price and get a free gift!

Learn More >>

X

Please Log In

Forgot: Password

X

Account Linking

Welcome, . Do you have an existing ScientificAmerican.com account?

Yes, please link my existing account with for quick, secure access.



Forgot Password?

No, I would like to create a new account with my profile information.

Create Account
X

Report Abuse

Are you sure?

X

Institutional Access

It has been identified that the institution you are trying to access this article from has institutional site license access to Scientific American on nature.com. To access this article in its entirety through site license access, click below.

Site license access
X

Error

X

Share this Article

X