When Apple unveiled the iPhone 4S last year, the new phone looked just like the previous one. It had a better camera and a faster chip, but it could do only one new thing: Siri.
Siri, as everyone knows by now, is a software assistant that takes spoken orders. No training necessary: just hold down the “Home” button and speak casually.
Siri lit the cultural world on fire. There were YouTube parodies, how-to guides and copycat apps for Android phones. Pundits have proposed new rules of etiquette for using phones in public now that people are speaking to them even when they're not on a call. Speech recognition became all the rage; suddenly, it popped up in television sets and, of course, rival phones. At the crest of the hype, it looked like the way we interact with our gadgets had changed forever.
And then—the backlash.
“Siri Is Apple's Broken Promise” was the headline at gadget site Gizmodo. People griped that sometimes you'd dictate a whole paragraph, the phone would think and then type—nothing at all. Now there has been a class-action lawsuit asserting that Apple made false claims. (According to Apple, Siri is still in beta testing.)
What happened? How could Siri, the savior of electronics, turn out to be such a bust?
What everybody's missing is the difference between Siri, the virtual assistant, and Siri, the speech-recognition engine. As it turns out, these two different functions have wildly different track records for success.
The assistant half of Siri comes from a company called Siri, which Apple bought. (It was a spin-off from a military artificial intelligence project that wound up at the research firm SRI. Get it?)
But the dictation feature—the text-to-speech part—is provided by Nuance, the company that brought us software such as Dragon NaturallySpeaking.
When you dictate, you generate an audio file that is transmitted to Nuance's servers; they analyze your speech and send the text back to your phone. That is why, when your Internet signal isn't great or when the cell network is congested, Siri may come up short. (When you're on Wi-Fi, dictation works far better.)
That requirement to shuttle data to and from remote servers is at the heart of Siri's frustratingly inaccurate dictation talents.
There are other challenges to the dictation feature, too. Irregular background noise, wind and variable distance from mouth to microphone all make transcription perfection on a cell phone a towering task—and the results are much less accurate than what you would get using PC dictation software, which faces none of those difficulties. Using Siri (and the even less polished dictation feature on Android phones), you might have to correct two or three errors per paragraph.
Desktop dictation software fares much better—close to 100 percent accuracy—because it doesn't have any of those particular challenges. And on your PC, you train the software to recognize only one voice: yours. There's no training on the phone. The computational task is ridiculously hard.
The backlashers have a point. We're used to consumer technology that works every time: e-mail, GPS, digital cameras. Dictation technology that relies on cellular Internet, though, only sort of works. And that can be jarring to encounter in this day and age.
But let's not throw the Siri out with the bathwater. The “virtual assistant” portion of Siri—all those commands to set an alarm, call someone, text someone, record an appointment—works solidly. Even if all you use are basic commands such as “Wake me at,” “Call,” “Text” and “Remind me,” you save time and fumbling.
Free-form cellular dictation is a not-there-yet technology. But as an interface for controlling our electronics, it makes the future of speech every bit as bright as Siri promised a year ago.
Just wait till she comes out of beta.
SCIENTIFIC AMERICAN ONLINE
Eight ways to boost Siri's voice recognition: ScientificAmerican.com/aug2012/pogue