
Image: Illustrations by Thomas Fuchs
-
The Best Science Writing Online 2012
Showcasing more than fifty of the most provocative, original, and significant online essays from 2011, The Best Science Writing Online 2012 will change the way...
Read More »
Researchers have been trolling Twitter for insights into the human condition since shortly after the site launched in 2006. In aggregate, the service provides a vast database of what people are doing, thinking and feeling. But the research tools at scientists’ disposal are highly imperfect. Keyword searches, for example, return many hits but offer a poor sense of overall trends.
When computer scientist James H. Martin of the University of Colorado at Boulder searched for tweets about the 2010 earthquake in Haiti, he found 14 million. “You can’t hire grad students to read them all,” he says. Researchers need a more automated approach.
One promising method is to develop programs that label words in tweets with parts of speech—such as subject, verb and object—and then use those tags to determine what each tweet is about. This method, called natural-language processing, is not a new idea, but applying it to short social text is new and growing. “That is just a huge area right now,” Martin says.
Scientists at the Xerox-owned Palo Alto Research Center recently developed one such program. It relies on text processors, called parsers, which are typically tested on news articles. Parsers can distinguish between words and punctuation, label parts of speech and analyze a sentence’s grammatical structure. But “they don’t do as well on Twitter,” says Kyle Dent, one of the Palo Alto researchers. He and his co-author wrote hundreds of rules to account for hash tags, repeated letters (as in “pleaaaaaase”) and other linguistic features perhaps not common in the Wall Street Journal. They will present their work on August 8 at an Association for the Advancement of Artificial Intelligence conference in San Francisco.
Dent and his colleagues also tried to use their program to distinguish between rhetorical questions and those that require a response. Businesses could use such a program to find what people are asking about their products. In a recent trial, their program classified 68 percent of 2,304 tweets correctly. “For a brand-new field, that sounds like a decent first attempt,” says Jeffrey Ellen of the Space and Naval Warfare Systems Command, which provides intelligence technology to the U.S. Navy.
Although Twitter-trawling technology is not yet ready to deploy, as a field, “it’s getting there pretty quickly,” Martin says. Once it matures, researchers should have access to an unprecedented trove of data about human behavior. For the first time in history, “watercooler talk” is recorded and publicly available, Ellen says. “A hundred years ago we just didn’t know what everybody was thinking.”
This article was originally published with the title Parsing the Twitterverse.
Already a Digital subscriber? Sign-in Now
If your institution has site license access, enter here.




See what we're tweeting about






2 Comments
Add Comment"Trolling can be phonetically confused with trawling, a different method of fishing where a net (trawl) is drawn through the water instead of lines. Trolling is used both for recreational and commercial fishing whereas trawling is used mainly for commercial fishing."
Reply | Report Abuse | Link to thisTrolling on the Internet is something else entirely of course.
Good article, Francie. Companies have been using NLP to analyze unstructured content (ie, text) for several years, although the explosion in social media has really pushed innovation into high gear. For example, Attensity’s products parse not only text from internal sources (call center notes, surveys, emails) but external sources as well, including blogs and social media. As of last October, Attensity even offers real-time access to Twitter’s “Firehose” or complete data stream (http://bit.ly/aCGs3V), and is now working on offering the capability to parse and analyze the whole Twitterverse.
Reply | Report Abuse | Link to this