A Brain Scanner Combined with an AI Language Model Can Provide a Glimpse into Your Thoughts

New technology gleans the gist of stories a person hears while laying in a brain scanner

Animation of brain with paper strips with words spiraling upward out of it — A video presents a stylized depiction of a new language decoding process. A decoder generates multiple word sequences (*paper strips*) and predicts how similar each candidate word sequence is to the actual word sequence (*beads of light*) by comparing predictions of the user’s brain responses against the actual recorded responses.

Jerry Tang/Alexander Huth

Functional magnetic resonance imaging (fMRI) captures coarse, colorful snapshots of the brain in action. Although this specialized type of magnetic resonance imaging has transformed cognitive neuroscience, it isn’t a mind-reading tool: neuroscientists can’t look at a brain scan and know what someone was seeing, hearing or thinking in the scanner.

But gradually scientists have been pushing against that fundamental barrier to translate internal experiences into words through brain imaging. This technology could help people who can’t speak or otherwise outwardly communicate, such as those who have suffered strokes or are living with amyotrophic lateral sclerosis. Current brain-computer interfaces require the implantation of devices in the brain, but neuroscientists hope to use noninvasive techniques such as fMRI to decipher internal “speech” without the need for surgery.

In recent years researchers have taken a step forward by combining fMRI’s ability to monitor neural activity with the predictive power of artificial intelligence language models. The hybrid technology has resulted in a decoder that can reproduce, with a surprising level of accuracy, the stories that a person listened to or imagined telling while in the scanner. The decoder was even able to guess the story behind a short film that someone watched in the scanner, though with less accuracy.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

“There’s a lot more information in brain data than we initially thought,” said Jerry Tang, a computational neuroscientist at the University of Texas at Austin and the study’s lead author, during a press briefing. The research, published in May 2023 in Nature Neuroscience, is what Tang describes as “a proof of concept that language can be decoded from noninvasive recordings of brain activity.”

“The current results are better than anything we had before in fMRI language decoding.” —Anna Ivanova, Georgia Institute of Technology

The decoder technology is in its infancy. It must be trained extensively for each person who uses it, and it doesn’t construct an exact transcript of the words they heard or imagined. But it is still a notable advance. Researchers now know that the AI language system, an early relative of the model behind ChatGPT, can help them make informed guesses about the words that evoked brain activity just by looking at fMRI brain scans. Current technological limitations prevent the decoder from being widely used, for good or ill, but the study authors emphasize the need to enact proactive policies that will protect the privacy of one’s internal mental processes.

“What we’re getting is still kind of a ‘gist,’ or more like a paraphrase, of what the original story was,” says Alexander Huth, a computational neuroscientist at the University of Texas at Austin and the study’s senior author.

Here’s an example of what one participant heard, as transcribed in the paper: “I got up from the air mattress and pressed my face against the glass of the bedroom window expecting to see eyes staring back at me but instead finding only darkness.” Inspecting the person’s brain scans, the model went on to decode, “i just continued to walk up to the window and open the glass i stood on my toes and peered out i didn’t see anything and looked up again i saw nothing.”

“Overall, there is definitely a long way to go, but the current results are better than anything we had before in fMRI language decoding,” says Anna Ivanova, a neuroscientist at the Georgia Institute of Technology, who was not involved in the study.

The model misses a lot about the stories it decodes. It struggles with grammatical features such as pronouns. It can’t decipher proper nouns such as names and places, and sometimes it just gets things wrong altogether. But it achieves a high level of accuracy compared with past methods. At 72 to 82 percent of the time points in the stories, the decoder was more accurate at decoding meaning than would be expected from random chance.

“The results just look really good,” says Martin Schrimpf, a computational neuroscientist at EPFL, the Swiss Federal Institute of Technology in Lausanne, who was not involved in the study. Previous attempts to use AI models to decode brain activity showed some success but eventually hit a wall. Tang’s team used “a much more accurate model of the language system,” Schrimpf says. That model is GPT-1, which came out in 2018 and was the original version of GPT-4o, the model that now underpins ChatGPT.

Illustrated graphic of a blue brain scan — Sebastian Kaulitzki/Alamy Stock Photo

Neuroscientists have been working to decipher fMRI brain scans for decades to connect with people who can’t outwardly communicate. In a key 2010 study, scientists used fMRI to pose yes-or-no questions to an individual who couldn’t control his body and outwardly appeared to be unconscious.

But decoding entire words and phrases is a more imposing challenge. The biggest roadblock is fMRI itself, which doesn’t directly measure the brain’s rapid firing of neurons but instead tracks the slow changes in blood flow that supply those neurons with oxygen. Tracking these relatively sluggish changes leaves fMRI scans temporally “blurry”: picture a long-exposure photograph of a bustling city sidewalk, with people’s facial features obscured by their movement. Trying to use fMRI images to determine what happened in the brain at any particular moment is like trying to identify the individuals in that photograph. This problem is a glaring one for deciphering language, which flies by fast; one fMRI image captures the responses of up to about 20 words.

It appears, however, that the predictive abilities of AI language models can help. In the study by Tang and his colleagues, three participants lay stock-still in an fMRI scanner for 15 sessions that totaled 16 hours. Through headphones, they listened to excerpts from podcasts and radio shows such as the Moth Radio Hour and the New York Times podcast Modern Love. Meanwhile the scanner tracked the blood flow across different language-related regions of the brain. These data were then used to train an AI model that found patterns in how each subject’s brain activated in response to certain words and concepts.

After uncovering these patterns, the model took a new series of brain images and predicted what a person was hearing when they were taken. It worked gradually through the story, comparing the new scans to the AI’s predicted patterns for a host of candidate words. To prevent having to check every word in the English language, the researchers used GPT-1 to predict which words were most likely to appear in a particular context. This step created a small pool of possible word sequences, from which the most likely candidate could be chosen. Then GPT-1 moved on to the next string of words, and so on until it had decoded an entire story.

The researchers used the same methods to decode stories that participants only imagined telling. They instructed participants to picture themselves narrating a detailed, one-minute story. The decoder was less accurate in this task, but it still worked better than expected, compared with random chance. This outcome indicates that imagining something and actually perceiving it involve similar brain regions. The ability to translate imagined speech into words is critical for designing brain-computer interfaces for people who are unable to communicate with spoken language.

What’s more, the findings went beyond language. In the most surprising result, the researchers had people watch animated short films without sound in the scanner. Despite being trained explicitly on spoken language, the decoder could still decipher stories from brain scans of participants watching the silent movies. “I was more surprised by the video than the imagined speech,” Huth says, because the movies were muted. “I think we are decoding something that is deeper than language,” he said at the press briefing.

Still, the technology is many years away from being used as a brain-computer interface in everyday life. For one thing, the scanning technology isn’t portable—fMRI machines occupy entire rooms at hospitals and research institutions and cost millions of dollars. But Huth’s team is working to adapt these findings for existing brain-imaging systems that can be worn like a cap, such as functional near-infrared spectroscopy and electroencephalography.

The technology used in the study also requires intense customization, with hours of fMRI data needed for each individual. “It’s not like earbuds, where you can just put them in, and they work for you,” Schrimpf says. With each user, the AI models need to be trained to “adapt and adjust to your brain,” he adds. Schrimpf guesses that the technology will require less customization as researchers uncover commonalities across people’s brains in the future. Huth, in contrast, thinks that the more accurate models will be more detailed, requiring even more precise customization.

The team also tested the technology to see what might happen if someone wanted to resist or sabotage the scans. A study participant could spoof it by simply telling another story in their head. When the researchers asked participants do this, the results were gibberish, Huth says. “[The decoder] just kind of fell apart entirely.”

Even at this early stage, the authors emphasize the importance of considering policies that protect the privacy of our inner words and thoughts. “This can’t work yet to do really nefarious things,” Tang says, “but we don’t want to let it get to that point before we maybe have policies in place that would prevent that.”

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can't-miss newsletters, must-watch videos, challenging games, and the science world's best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Thank you,

David M. Ewalt, Editor in Chief, Scientific American