Image: Skip the Budgie, courtesy Flickr
Oh, to be a fly on the auditory cortex!
That, in a manner of speaking, is exactly what a group of researchers working in Berkeley and San Francisco have done. Measurements of electrical signals in the region of the brain that processes speech enabled the group to decode the words a subject was hearing—in essence, a form of neural eavesdropping.
The goal was far nobler than finding out what your boss really thinks of you or what is going on in the neighboring cubicle. The research sheds light on how the brain sorts out sounds and turns it into language. "The hope," says Brian Pasley, a post-doctoral researcher at the University of California, Berkeley, and lead author on the study, "is that this knowledge can be utilized to help restore communication in the severely disabled." The work could complement other efforts to reconstruct speech using muscle movements in the vocal tract, lips and tongue.
The researchers—who also hailed from the University of Maryland, College Park, Johns Hopkins University and the University of California, San Francisco—published their work today in PLoS Biology (pdf).
During the experiment, subjects listened to words on a loudspeaker or piped through earbuds: sometimes just isolated words like "jazz" or "property"; pseudo-words like "fook" and "nim"; and in a few cases full sentences. Later, the research team studied a record of this activity as it appeared in the brain's auditory cortex, the region that processes what is heard, allowing the comprehension of language, along with other sounds.
The subjects, 15 volunteers with normal language skills, also happened to be undergoing neurosurgical treatments for epilepsy or brain tumors. Because their brain activity was already being monitored at the cortex's surface for seizures, the researchers could examine these direct cortical measurements for their auditory study. Pasley explains that it would have been impossible to have access to such brain scans without these volunteers.
Pasley and colleagues crafted an algorithm—a computational model—to map the sound a listener was hearing to the electrode's measurements. The model could then "learn" how to match sound to the brain's electrical signals.
Next, researchers tested their model by turning the tables: Starting with a listener's brain activity, they used the model to reconstruct the word that a listener had heard. Specifically, the model reconstructed a sound, resembling but not immediately recognizable as a word. To close the loop, the researchers then looked through a set of 47 words to find one that most closely matched the model’s sound.
Not only could they successfully "eavesdrop" via cortical activity, the researchers created two versions of their model to account for different features of sound. One version of their computational model made use of a linear representation of sound, called a spectrogram, which plots frequency over time. The other version used a nonlinear representation of sound called a modulation model. Pasley explains that in the linear version, sound rhythms are coded by the brain’s oscillations whereas in the nonlinear version, rhythms are conveyed by the overall level of brain activity. At a slow speech rhythm, both models work well, but at faster rhythms the nonlinear sound representation creates a more accurate model.
This technique could improve speech-recognition technology. Although smart phones do a decent job, anyone who's received a cryptic Google voice transcription knows that speech recognition still is not perfect.