Software Recognition Technology Is Amazing, but Not Amazing Enough

How the dream of a perfectly cognizant computer continues to break our hearts

Join Our Community of Science Lovers!

The gadget blogs may work themselves into a frenzy over megapixels and processor speed. But if you want to know what really dazzles the masses, consider a feature that's rarely called out by name: machine recognition of real-world sights and sounds.

The success stories in this category represent triumphs of computation and software. Speech transcription on laptops and desktop computers is awesomely accurate. Gestures on touch screens are generally reliable (there are, after all, a limited number of movements to recognize). The Xbox Kinect and some Samsung television sets have brought us body-movement recognition. The handwriting recognition in Windows 7 and 8 is a hidden gem, whether you print or write in cursive.

Phone apps such as Shazam and SoundHound can recognize pop songs playing in the background—and display their titles, performers and album names. Google Goggles, one of Google's apps for Android phones and the iPhone, attempts visual recognition: snap a picture of a book cover, DVD box, wine label or painting, and the program instantly shows you the Google search results for that item.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

Software can even pick out faces in a video, and YouTube's copyright-protection algorithms can compare your videos against known copyrighted material to make sure you're not posting a video that originated from some TV network.

That's all fantastic. When they work, sound, image and motion recognition really seem like magic. Unfortunately, the marketers realize that. They tempt us with myriad other computer-based recognition features that work about as well as cold fusion.

For decades now, I've fallen victim repeatedly to what can only be called recognition-failure heartbreak syndrome (RFHS). You buy something, drawn by its promised ability to recognize human commands, and it just doesn't work well enough to bother with.

Remember the Clapper? As a teenager, I bought one. Sometimes your two claps turned the lamp on, and sometimes it took a few attempts. I bought a Whistle Switch, too. It could turn on your appliances by recognizing sound—in this case, a high-pitched, squeezable whistle. Oh, it turned the lights on, all right—but so did teakettles, squeaky hamster wheels and sharp sneezes.

Predictably, I also fell for the Newton; $700 for handwriting recognition that worked maybe two out of five times.

More recently, Samsung has been promising that its Galaxy S4 phone can translate speech into another language, Star Trek-style. Hold it up to a French speaker saying, “Où sont les toilettes?” and the phone is supposed to say, out loud, “Where is the bathroom?”

In fact, Samsung has just added one not-there-yet recognition technology on top of another. The S Translator app can't even recognize foreign-language speakers' utterances, let alone convert them into spoken English. (I think Samsung knows that, too. If S Translator worked, it would be a headline in the ads, not just a bullet point.)

How many times will we get our hopes up before we start giving up on these features altogether? How many products will we return before manufacturers start to polish these technologies a little more before advertising their “miraculous” abilities?

Look, I sympathize; software-based recognition is no easy task. It's not a crisp problem with one correct outcome, like a spreadsheet adding numbers together. You are asking the software to process fuzzy, vague, variable inputs: sounds, pictures, movements, scrawls. That's why recognition isn't 100 percent. It's not consistent. No wonder it so often disappoints us.

Maybe a few more decades of better sensors, faster processors, bigger data sets and experimentation will finally bring us relief from continuous RFHS.

In the meantime, perhaps both electronics companies and their customers should do a little recognizing of their own: machine recognition of our world is exciting but still evolving.

SCIENTIFIC AMERICAN ONLINE
Eight near-magic recognition apps: ScientificAmerican.com/jul2013/pogue

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can't-miss newsletters, must-watch videos, challenging games, and the science world's best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Thank you,

David M. Ewalt, Editor in Chief, Scientific American