The gadget blogs may work themselves into a frenzy over megapixels and processor speed. But if you want to know what really dazzles the masses, consider a feature that's rarely called out by name: machine recognition of real-world sights and sounds.
The success stories in this category represent triumphs of computation and software. Speech transcription on laptops and desktop computers is awesomely accurate. Gestures on touch screens are generally reliable (there are, after all, a limited number of movements to recognize). The Xbox Kinect and some Samsung television sets have brought us body-movement recognition. The handwriting recognition in Windows 7 and 8 is a hidden gem, whether you print or write in cursive.
Phone apps such as Shazam and SoundHound can recognize pop songs playing in the background—and display their titles, performers and album names. Google Goggles, one of Google's apps for Android phones and the iPhone, attempts visual recognition: snap a picture of a book cover, DVD box, wine label or painting, and the program instantly shows you the Google search results for that item.
Software can even pick out faces in a video, and YouTube's copyright-protection algorithms can compare your videos against known copyrighted material to make sure you're not posting a video that originated from some TV network.
That's all fantastic. When they work, sound, image and motion recognition really seem like magic. Unfortunately, the marketers realize that. They tempt us with myriad other computer-based recognition features that work about as well as cold fusion.
For decades now, I've fallen victim repeatedly to what can only be called recognition-failure heartbreak syndrome (RFHS). You buy something, drawn by its promised ability to recognize human commands, and it just doesn't work well enough to bother with.
Remember the Clapper? As a teenager, I bought one. Sometimes your two claps turned the lamp on, and sometimes it took a few attempts. I bought a Whistle Switch, too. It could turn on your appliances by recognizing sound—in this case, a high-pitched, squeezable whistle. Oh, it turned the lights on, all right—but so did teakettles, squeaky hamster wheels and sharp sneezes.
Predictably, I also fell for the Newton; $700 for handwriting recognition that worked maybe two out of five times.
More recently, Samsung has been promising that its Galaxy S4 phone can translate speech into another language, Star Trek-style. Hold it up to a French speaker saying, “Où sont les toilettes?” and the phone is supposed to say, out loud, “Where is the bathroom?”
In fact, Samsung has just added one not-there-yet recognition technology on top of another. The S Translator app can't even recognize foreign-language speakers' utterances, let alone convert them into spoken English. (I think Samsung knows that, too. If S Translator worked, it would be a headline in the ads, not just a bullet point.)
How many times will we get our hopes up before we start giving up on these features altogether? How many products will we return before manufacturers start to polish these technologies a little more before advertising their “miraculous” abilities?
Look, I sympathize; software-based recognition is no easy task. It's not a crisp problem with one correct outcome, like a spreadsheet adding numbers together. You are asking the software to process fuzzy, vague, variable inputs: sounds, pictures, movements, scrawls. That's why recognition isn't 100 percent. It's not consistent. No wonder it so often disappoints us.
Maybe a few more decades of better sensors, faster processors, bigger data sets and experimentation will finally bring us relief from continuous RFHS.
In the meantime, perhaps both electronics companies and their customers should do a little recognizing of their own: machine recognition of our world is exciting but still evolving.
SCIENTIFIC AMERICAN ONLINE
Eight near-magic recognition apps: ScientificAmerican.com/jul2013/pogue