Like sex, money is a topic that most people avoid discussing publicly. Yet we regularly leave digital traces of our economic standing—even when expressing ourselves within Twitter's 140-character limit.
In an analysis of roughly 10.8 million tweets posted by more than 5,000 users of the online social media network, the pithy messages were found to provide enough information to reveal a user's income bracket. Daniel Preot¸iuc-Pietro, a postdoctoral researcher in natural language processing at the University of Pennsylvania, and his colleagues relied on self-identified profession to sort 90 percent of their sample into corresponding income groups. They then used a machine-learning model, which can learn from data and make predictions based on them, to identify features unique to each group. When they tested the savvy model on the remaining 10 percent of subjects, it successfully predicted the financial means of those users.
As the researchers described this fall in the journal PLOS ONE, those with higher incomes tended to discuss business, politics and nonprofit work. People in lower brackets stuck mostly to personal subjects, such as beauty tips and experiences. “Higher-income people are using Twitter as a means of disseminating information; lower-income people use it more for social communication,” Preot¸iuc-Pietro says. The analysis also revealed that tweets from those who make more money are likelier to express fear or anger.
In previous machine-learning studies, Preot¸iuc-Pietro and his colleagues were able to predict Twitter users' gender, age and political leaning. They could even detect signs of postpartum depression and post-traumatic stress disorder in tweets. The team continues to develop its model, but in the end “machine learning is only as powerful as the data we can get access to,” Preot¸iuc-Pietro says. “People should be aware of how much they inadvertently disclose about themselves.”