Five hundred million tweets are broadcast worldwide every day on Twitter. With so many details about personal lives, the social media site is a data trove for scientists looking to find patterns in human behaviors, tease out risk factors for health conditions and track the spread of infectious diseases. By analyzing emotional cues found in the tweets of pregnant women, for instance, Microsoft researchers developed an algorithm that predicts those at risk for postpartum depression. And the U.S. Geological Survey uses Twitter to track the location of earthquakes as people tweet about tremors.
Until now, most interested scientists have been working with a limited number of tweets. Although a majority of tweets are public, if scientists want to freely search the lot, they do it through Twitter's application programming interface, which currently scours only 1 percent of the archive. But that is about to change: in February the company announced that it will make all its tweets, dating back to 2006, freely available to researchers. Now that everything is up for grabs, the use of Twitter as a research tool is likely to skyrocket. With more data points to mine, scientists can ask more complex and specific questions.
The announcement is exciting, but it also raises some thorny questions. Will Twitter retain any legal rights to scientific findings? Is the use of Twitter as a research tool ethical, given that its users do not intend to contribute to research?
To address these concerns, Caitlin Rivers and Bryan Lewis, computational epidemiologists at Virginia Tech, published guidelines for the ethical use of Twitter data in February. Among other things, they suggest that scientists never reveal screen names and make research objectives publicly available. For example, although it is considered ethical to collect information from public spaces—and Twitter is a public space—it would be unethical to share identifying details about a single user without his or her consent. Rivers and Lewis argue that it is crucial for scientists to consider and protect users' privacy as Twitter-based research projects multiply. With great data comes great responsibility.