About six million people worldwide post to Twitter, producing some 650,000 new tweets daily. And one percent of these posts include geographic locations. The combination of language and location has allowed scientists to calculate the dominant language of any given region. They presented their work at the American Physical Society's March Meeting. [Delia Mocanu et al., Language Geography from Microblogging Platforms]
The researchers gained free access to a 10th of all tweets, which they ran through an automated language detector. Throwing Twitter languages onto a map revealed highly accurate borders for European countries, a good proof of concept for the effort. On a much smaller scale, Twitter language geography reflected the small pockets of Korean and Russian concentrations within New York City.
The Twitter tracking method has its biases. English is the dominant language of the internet, which skews the language distribution in bilingual cities like Montreal. And, obviously, the scientists can't analyze areas where people don't use Twitter. But overall, the study shows that Twitter can provide cheap and useful information. In other words, people tend to speak what they tweet.
[The above text is a transcript of this podcast.]