Parsing the Twitterverse: New Algorithms Analyze Tweets

Smarter language processors are helping experts analyze millions of short-text messages from across the Internet

Join Our Community of Science Lovers!

Researchers have been trolling Twitter for insights into the human condition since shortly after the site launched in 2006. In aggregate, the service provides a vast database of what people are doing, thinking and feeling. But the research tools at scientists’ disposal are highly imperfect. Keyword searches, for example, return many hits but offer a poor sense of overall trends.

When computer scientist James H. Martin of the University of Colorado at Boulder searched for tweets about the 2010 earthquake in Haiti, he found 14 million. “You can’t hire grad students to read them all,” he says. Researchers need a more automated approach.

One promising method is to develop programs that label words in tweets with parts of speech—such as subject, verb and object—and then use those tags to determine what each tweet is about. This method, called natural-language processing, is not a new idea, but applying it to short social text is new and growing. “That is just a huge area right now,” Martin says.


On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


Scientists at the Xerox-owned Palo Alto Research Center recently developed one such program. It relies on text processors, called parsers, which are typically tested on news articles. Parsers can distinguish between words and punctuation, label parts of speech and analyze a sentence’s grammatical structure. But “they don’t do as well on Twitter,” says Kyle Dent, one of the Palo Alto researchers. He and his co-author wrote hundreds of rules to account for hash tags, repeated letters (as in “pleaaaaaase”) and other linguistic features perhaps not common in the Wall Street Journal. They will present their work on August 8 at an Association for the Advancement of Artificial Intelligence conference in San Francisco.

Dent and his colleagues also tried to use their program to distinguish between rhetorical questions and those that require a response. Businesses could use such a program to find what people are asking about their products. In a recent trial, their program classified 68 percent of 2,304 tweets correctly. “For a brand-new field, that sounds like a decent first attempt,” says Jeffrey ­Ellen of the Space and Naval Warfare Systems Command, which provides intelligence technology to the U.S. Navy.

Although Twitter-trawling technology is not yet ready to deploy, as a field, “it’s getting there pretty quickly,” Martin says. Once it matures, researchers should have access to an unprecedented trove of data about human behavior. For the first time in history, “watercooler talk” is recorded and publicly available, Ellen says. “A hundred years ago we just didn’t know what everybody was thinking.”

Scientific American Magazine Vol 305 Issue 2This article was published with the title “Parsing the Twitterverse: New Algorithms Analyze Tweets” in Scientific American Magazine Vol. 305 No. 2 ()
doi:10.1038/scientificamerican082011-2O9yHmrxt9lY2uyQSN0RNK

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can't-miss newsletters, must-watch videos, challenging games, and the science world's best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Thank you,

David M. Ewalt, Editor in Chief, Scientific American

Subscribe