60-Second Science

Twitter Reveals Language Geographic Distribution

Location-tagged tweets enabled researchers to calculate the dominant language of any given region, down to neighborhoods in New York City. Sophie Bushwick reports

About six million people worldwide post to Twitter, producing some 650,000 new tweets daily. And one percent of these posts include geographic locations. The combination of language and location has allowed scientists to calculate the dominant language of any given region. They presented their work at the American Physical Society's March Meeting. [Delia Mocanu et al., Language Geography from Microblogging Platforms]

The researchers gained free access to a 10th of all tweets, which they ran through an automated language detector. Throwing Twitter languages onto a map revealed highly accurate borders for European countries, a good proof of concept for the effort. On a much smaller scale, Twitter language geography reflected the small pockets of Korean and Russian concentrations within New York City.

The Twitter tracking method has its biases. English is the dominant language of the internet, which skews the language distribution in bilingual cities like Montreal. And, obviously, the scientists can't analyze areas where people don't use Twitter. But overall, the study shows that Twitter can provide cheap and useful information. In other words, people tend to speak what they tweet.

—Sophie Bushwick

[The above text is a transcript of this podcast.]

Rights & Permissions
Share this Article:


You must sign in or register as a member to submit a comment.

Starting Thanksgiving

Enter code: HOLIDAY 2015
at checkout

Get 20% off now! >


Email this Article