Neurological conditions that can cause paralysis, such as amyotrophic lateral sclerosis (ALS) and strokes in the brain stem, also rob many patients of their ability to speak. Assistive technologies enable keyboard control for some of these individuals (like the famed late physicist Stephen Hawking), and brain-computer interfaces make it possible for others to control machines directly with their thoughts. But both types of devices are slow and impractical for people with locked-in syndrome and other communication impairments.
Now researchers are developing tools to eavesdrop on speech-related brain activity, decode it and convert it into words spoken by a machine. A recent study used state-of-the-art machine learning and speech-synthesis technology to yield some of the most impressive results to date.
Electrical engineer Nima Mesgarani of Columbia University’s Zuckerman Institute and his colleagues studied five epilepsy patients who had electrodes implanted in or on their brain as part of their treatment. The electrodes covered regions involved in processing speech sounds. The patients listened to stories being read aloud as their brain activity was recorded. The team trained a “deep learning” neural network to match this activity with the corresponding audio. The test was then whether, given neural data it had not seen before, the system could reproduce the original speech.
When the patients heard the digits zero through nine spoken four times each, the system transformed the neural data into values needed to drive a vocoder, a special kind of speech synthesizer. A separate group of participants heard the synthesized words and identified them correctly 75 percent of the time, according to the study, published in January in Scientific Reports. Most previous efforts have not measured how well such reconstructed speech can be understood. “We show that it’s intelligible,” Mesgarani says.
Researchers already knew it was possible to reconstruct speech from brain activity, but the new work is a step toward higher performance. “There’s a lot of room for improvement, but we know the information is there,” says neurosurgeon Edward Chang of the University of California, San Francisco, who was not involved in the study. “Over the next few years it’s going to get even better—this is a field that’s evolving quickly.”
There are some limitations. Mesgarani’s team recorded brain activity from speech-perception regions, not speech-production ones; the researchers also evaluated their system on only a small set of words instead of complete sentences drawing on a large vocabulary. (Other researchers, including Chang, are already working on these problems.) Perhaps most important, the study was designed to decode activity related to speech that was actually heard rather than merely imagined—the latter feat will be required to develop a practical device. “The challenge for all of us is actual versus imagined” speech, Mesgarani says.