Analyzing keywords on Twitter can offer a loose measure of the subjective well-being of a community, as long as you don’t count three words: good, love and LOL.
3 Words Mislead Online Regional Mood Analysis
You can tell a lot about people’s general state of mind based on their social media feeds. Are they always tweeting about their biggest peeves or posting pics of particularly cute kitties? Well, in a similar fashion, researchers are turning to Twitter for clues about the overall happiness of entire geographic communities. What they’re finding is that regional variation in the use of common phrases produces predictions that don’t always reflect the local state of well-being. But removing from their analyses just three specific terms—good, love and LOL—greatly improves the accuracy of the methods. Their work appears in the Proceedings of the National Academy of Sciences.
“We’re living in a crazy COVID-19 era. And now more than ever, we’re using social media to adapt to a new normal and reach out to the friends and family that we can’t meet face-to-face.”
Kokil Jaidka studies computational linguistics at the National University of Singapore.
“But our words aren’t useful just to understand what we, as individuals, think and feel. They’re also useful clues about the community we live in.”
One of the simpler methods that many scientists use to parse the data involves correlating words with positive or negative emotions. But when those tallies are compared with phone surveys that assess regional well-being, Jaidka says, they don’t paint an accurate picture of the local zeitgeist.
To find out why, Jaidka and her colleague Johannes Eichstaedt of Stanford University analyzed billions of tweets from around the United States. And they found that among the most frequently used terms on Twitter are LOL, love and good.
“And they actually throw the analysis off. In fact, when we removed these three words alone, we managed to improve upon the simpler word-counting methods—and obtain better, if not perfect, estimates of happiness.”
Why the disconnect? Well, Jaidka says one issue is ...
“Internet language is really a different beast than regular spoken language. We’ve adapted words from the English vocabulary to mean different things in different situations.”
Take, for example, LOL.
“I’ve tweeted the word LOL to flirt, express irony, annoyance and sometimes just pure surprise. When the methods for measuring LOL as a marker of happiness were created in the 1990s, it still meant laughing out loud.”
There are plenty of terms that are less misleading, says Eichstaedt.
“Our models tell us that words like excited, fun, great, opportunity, interesting, fantastic and those are better words for measuring subjective well-being, just looking at the data.”
Their work appears in the Proceedings of the National Academy of Sciences. [Kokil Jaidka et al., Estimating geographic subjective well-being from Twitter: A comparison of dictionary and data-driven language methods]
Being able to get an accurate read on the mood of the population is no laughing matter.
“That’s particularly important now, in the time of COVID, where we’re expecting a mental health crisis—and we’re already seeing in survey data the largest diminishment in subjective well-being in 10 years at least, if not ever.”
No doubt we could all use more fantastic opportunities for great fun and excitement—give or take the LOL.
[The above text is a transcript of this podcast.]