AI Tool Predicts Whether Online Health Misinformation Will Cause Real-World Harm

A new AI-based analytical technique reveals that specific language phrasing in Reddit misinformation posts foretold people rejecting COVID vaccinations

By Joanna Thompson edited by Gary Stix

Photo illustration, interface of 'Reddit' is being displayed on a mobile phone screen in front of a red background — Ahmet Serdar Eser/Anadolu via Getty Images

The flood of misinformation online inevitably produces adverse consequences in key measures of public health—and death from COVID among unvaccinated people stands out as perhaps the most prominent example. That cause-and-effect relationship—that scrolling through endless postings about hydroxyquinoline, ivermectin and vaccine conspiracies can lead people astray—seems more than obvious. But it isn’t straightforward to determine scientifically.

Clear linkages between misinformation and adverse consequences have proved very difficult to find—partly because of the complexity of analyzing the workings of a public health system and partly because most social media companies do not usually allow independent outside parties to analyze their data. One exception to keeping data off-limits is Reddit, a platform that has begun to emerge as a place where, with the company’s blessing, social media research can flourish. Now studies using Reddit posts may be putting scientists closer to finding the misinformation missing link.

A new analytical framework that combines elements of social psychology with the computational power of a large language model (LLM) could help bridge the gap between online rhetoric and real-world behavior. These results were recently posted to the preprint server arXiv.org and presented at the Association of Computing Machinery CHI Conference on Human Factors in Computing Systems in Hawaii this week.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

Eugenia Rho, a computer scientist at Virginia Tech and senior author of the new study, wanted to pin down whether a tie exists between people’s behavior and the type of language encountered on a site such as Reddit.

Along with her Ph.D. student Xiaohan Ding and their colleagues, Rho began her research by first tracking down thousands of Reddit posts from banned forums opposing vaccines and COVID prevention measures. Next, the team trained an LLM to recognize the “gist” of each post—the message’s underlying meaning, as opposed to the literal words it is composed of. “That’s sort of the secret sauce here,” says Valerie Reyna, a psychologist at Cornell University and a co-author of the study.

“Fuzzy-trace theory” suggests that people pay more attention to the implications of a piece of information than to its literal meaning. This helps explain why people are more likely to remember an anecdote about someone getting robbed than a dry statistic about crime rates or why gamblers are more apt to place a bet when folding is framed as possibly losing them money rather than potentially gaining it. “People are more moved by certain kinds of messages than others,” says Reyna, who helped pioneer fuzzy-trace theory in the 1990s.

This careful choice of wording enhances persuasiveness. “Over and over and over, studies show that language in the form of a gist is stickier,” Rho says. Her team’s analysis found that in the context of social media, this seems to be especially true for causal gists, or information that implies a direct link between two events. A post might link vaccinations with getting sick in a specific format, using wording that packs a lot of rhetorical punch. For example, one Reddit user posted, “Had my Pfizer jab last [Wednesday] and have felt like death since.” Rho’s team found that every time the causal gists in anti-COVID posts grew stronger, COVID hospitalizations and deaths spiked nationwide, even when the Reddit forums were subsequently banned. The researchers pulled their data from nearly 80,000 posts spanning 20 subreddits active between May 2020 and October 2021.

By using this newly developed framework to monitor social media activity, scientists might be able to anticipate the real-world health outcomes of future pandemics—or even other major events, such as elections. “In principle, it can be applied to any context in which decisions are made,” Reyna says.

But such a framework might not make equally good predictions across the board. “When there is no discernible gist, the approach might be less successful,” says Christopher Wolfe, a cognitive psychologist at Miami University in Ohio, who was not involved in the study. This could be the case for studying the behavior of people seeking treatment for common health issues, such as breast cancer, or trying to view sporadic ephemeral events, such as auroras.

And the approach doesn’t necessarily distinguish what specific type of cause-and-effect relationship exists. “It seems that gists from social media may predict health decisions and outcomes, but the reverse is true as well,” says Rebecca Weldon, a cognitive psychologist at SUNY Polytechnic Institute, who did not contribute to the new research. Rather it suggests that the relationship between social media rhetoric and real-world behavior may be more of a feedback loop, with each one strengthening and reinforcing the other.

Both Wolfe and Weldon praised the authors for their innovative analytical approach. Wolfe calls the framework a potential “game changer” for helping navigate complex information ecosystems online. And Rho’s team hopes that it can help large social media companies and public health officials come together to develop more effective strategies for moderating content. After all, being able to identify the type of misinformation most able to influence people’s behavior is the first step toward being able to combat it.

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can't-miss newsletters, must-watch videos, challenging games, and the science world's best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Thank you,

David M. Ewalt, Editor in Chief, Scientific American