Why Siri Is Still the Future

Speech-recognition software is great—unless you're trying to use it on a phone

Join Our Community of Science Lovers!

When Apple unveiled the iPhone 4S last year, the new phone looked just like the previous one. It had a better camera and a faster chip, but it could do only one new thing: Siri.

Siri, as everyone knows by now, is a software assistant that takes spoken orders. No training necessary: just hold down the “Home” button and speak casually.

Siri lit the cultural world on fire. There were YouTube parodies, how-to guides and copycat apps for Android phones. Pundits have proposed new rules of etiquette for using phones in public now that people are speaking to them even when they're not on a call. Speech recognition became all the rage; suddenly, it popped up in television sets and, of course, rival phones. At the crest of the hype, it looked like the way we interact with our gadgets had changed forever.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

And then—the backlash.

“Siri Is Apple's Broken Promise” was the headline at gadget site Gizmodo. People griped that sometimes you'd dictate a whole paragraph, the phone would think and then type—nothing at all. Now there has been a class-action lawsuit asserting that Apple made false claims. (According to Apple, Siri is still in beta testing.)

What happened? How could Siri, the savior of electronics, turn out to be such a bust?

What everybody's missing is the difference between Siri, the virtual assistant, and Siri, the speech-recognition engine. As it turns out, these two different functions have wildly different track records for success.

The assistant half of Siri comes from a company called Siri, which Apple bought. (It was a spin-off from a military artificial intelligence project that wound up at the research firm SRI. Get it?)

But the dictation feature—the text-to-speech part—is provided by Nuance, the company that brought us software such as Dragon NaturallySpeaking.

When you dictate, you generate an audio file that is transmitted to Nuance's servers; they analyze your speech and send the text back to your phone. That is why, when your Internet signal isn't great or when the cell network is congested, Siri may come up short. (When you're on Wi-Fi, dictation works far better.)

That requirement to shuttle data to and from remote servers is at the heart of Siri's frustratingly inaccurate dictation talents.

There are other challenges to the dictation feature, too. Irregular background noise, wind and variable distance from mouth to microphone all make transcription perfection on a cell phone a towering task—and the results are much less accurate than what you would get using PC dictation software, which faces none of those difficulties. Using Siri (and the even less polished dictation feature on Android phones), you might have to correct two or three errors per paragraph.

Desktop dictation software fares much better—close to 100 percent accuracy—because it doesn't have any of those particular challenges. And on your PC, you train the software to recognize only one voice: yours. There's no training on the phone. The computational task is ridiculously hard.

The backlashers have a point. We're used to consumer technology that works every time: e-mail, GPS, digital cameras. Dictation technology that relies on cellular Internet, though, only sort of works. And that can be jarring to encounter in this day and age.

But let's not throw the Siri out with the bathwater. The “virtual assistant” portion of Siri—all those commands to set an alarm, call someone, text someone, record an appointment—works solidly. Even if all you use are basic commands such as “Wake me at,” “Call,” “Text” and “Remind me,” you save time and fumbling.

Free-form cellular dictation is a not-there-yet technology. But as an interface for controlling our electronics, it makes the future of speech every bit as bright as Siri promised a year ago.

Just wait till she comes out of beta.

SCIENTIFIC AMERICAN ONLINE
Eight ways to boost Siri's voice recognition: ScientificAmerican.com/aug2012/pogue

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can't-miss newsletters, must-watch videos, challenging games, and the science world's best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Thank you,

David M. Ewalt, Editor in Chief, Scientific American