ADVERTISEMENT
60-Second Science

Twitter Reveals Language Geographic Distribution

Location-tagged tweets enabled researchers to calculate the dominant language of any given region, down to neighborhoods in New York City. Sophie Bushwick reports

About six million people worldwide post to Twitter, producing some 650,000 new tweets daily. And one percent of these posts include geographic locations. The combination of language and location has allowed scientists to calculate the dominant language of any given region. They presented their work at the American Physical Society's March Meeting. [Delia Mocanu et al., Language Geography from Microblogging Platforms]

The researchers gained free access to a 10th of all tweets, which they ran through an automated language detector. Throwing Twitter languages onto a map revealed highly accurate borders for European countries, a good proof of concept for the effort. On a much smaller scale, Twitter language geography reflected the small pockets of Korean and Russian concentrations within New York City.

The Twitter tracking method has its biases. English is the dominant language of the internet, which skews the language distribution in bilingual cities like Montreal. And, obviously, the scientists can't analyze areas where people don't use Twitter. But overall, the study shows that Twitter can provide cheap and useful information. In other words, people tend to speak what they tweet.

—Sophie Bushwick

[The above text is a transcript of this podcast.]

Share this Article:

Comments

You must sign in or register as a ScientificAmerican.com member to submit a comment.
Scientific American Back To School

Back to School Sale!

12 Digital Issues + 4 Years of Archive Access just $19.99

Order Now >

X

Email this Article



This function is currently unavailable

X