Say What? Google Works to Improve YouTube Auto-Captions for the Deaf

Google continues to develop speech-recognition software that can automatically generate captions for all videos posted to YouTube, but challenges remain















Share on Tumblr

Google,YouTube,video, ray william johnson

WORK IN PROGRESS: More than 60 million YouTube videos have been auto-captioned, even as Google works to improve the technology. Image: COURTESY OF RAY WILLIAM JOHNSON, VIA YOUTUBE

Visitors to YouTube, which now boasts the Internet's second-largest search engine, have uploaded hundreds of millions of videos since its launch in early 2005. For most people YouTube (Google bought the video-sharing site for $1.65 billion in late 2006) is a valuable outlet for sharing personal videos, catching up on college lectures, consulting "how-to" clips and absorbing pop-culture nuggets like "Weird Al" Yankovic's parody of Lady Gaga. Until recently, however, the tens of millions of deaf and hearing-impaired (in the U.S. alone) could not take full advantage of YouTube because they were getting only half of the experience. Google and YouTube engineers are working to fix this by improving software that can automatically add captions to all videos, although this has been a difficult process.

Google's mission is to organize the world's information, and a lot of that information on the Web is spoken rather than written, says the company's research scientist Mike Cohen, who joined Google in 2004 to head up speech technology development. Television, which introduced closed-captioning in the early 1970s and made it more widely available throughout the 1980s, in many ways has had an edge over the Web in serving the needs of the deaf, he adds.

Throughout much of the deaf community, "there was a feeling that after spending years winning the legal battles to have television programming captioned, all of a sudden the world had moved to YouTube," Cohen says. "We wanted to re-win that battle for them in a way that's scalable; it had to be done with technology rather than using humans to input captions with each video."

Reading along
Google introduced the ability to manually add captions to videos on its Google Video site in 2006 and in 2008 added captioning to YouTube. Google introduced machine-generated automatic captions to YouTube in November 2009 and has since sought to improve the technology with the help of speech-recognition modeling software and lots of data. Thus far, more than 60 million videos have been auto-captioned, according to Google.

The company's speech-recognition model has acoustic, lexicon and language components. The acoustic portion is a statistical model of the basic sounds made in spoken language (all of the vowels and consonants, for example). This is a large and complex model because those sounds often vary based on context (that is, where a speaker is raised and the dialect spoken), Cohen says.

The lexicon is basically a list of words in a given language and data about how they are pronounced (consider the two vowel sounds that are acceptable in pronouncing the "e" in "economics," for example). "For something like voice search we have a vocabulary of about one million words with the right pronunciations for those words and the variations in pronunciation," Cohen says.

The language component of Google's speech-recognition model is a statistical model of all of the phrases and sentences that might be used within a language. This helps the auto-captioning function analyze how different words are often grouped together (the word "go," for example, is often followed by the word "to") and predict probable pairings based on that information.

Much of the speech-recognition technology is tuned for the English language, although the company plans to expand auto-captioning to additional languages. For now, YouTube serves its global audience by translating auto-captions into more than 50 languages.

But does it work?
Auto-captioning is an easy sell to the deaf community because it affords them access to more of YouTube. Yet, this feature is often frustrating for deaf users, who find little use for video on the Web if the captions are not accurate. "I love the idea of auto-captioning because it allows me to understand many of YouTube's clips that I [otherwise] would not have," says Arielle Schacter, a 17-year-old junior at The Chapin School in New York City. Schacter, who is hard of hearing, adds, "The reality, however, is that the auto-captioning is often wrong. Instead of being able to read the actual dialogue, I am forced to view nonsensical statements or letters/numbers."



Rights & Permissions

1 Comments

Add Comment
View
  1. 1. SpoonmanWoS 01:55 PM 6/23/11

    "claims that the most recent version of its auto-captioning software has reduced error rates by 20 percent"

    Yeah, I gotta tell you, as a user of Google Voice, I can attest that a 20% increase is useless as 20% of zero is still zero. I still use the transcription feature for voice mails, but only due to the laughs it generates. I've never gotten a single transcription that was even CLOSE to what was said. It's a shame. Having used technologies like Dragon Dictate, the state of Google's voice recognition is about on par with that in 1982. Even DD on the iPhone worked a good 95% of the time, compared with zero for Google.

    Seriously, as excited as I am by the idea of self-driving cars, if this is the state of your "AI" Goog...we'd rather not see your cars on the road.

    Reply | Report Abuse | Link to this
Leave this field empty

Add a Comment

You must sign in or register as a ScientificAmerican.com member to submit a comment.
Click one of the buttons below to register using an existing Social Account.

More from Scientific American

See what we're tweeting about

Scientific American Editors

Tweets could not be retrieved at this time

Free Newsletters


Get the best from Scientific American in your inbox

Solve Innovation Challenges

Powered By: Innocentive

  SA Digital
  SA Digital

Science Jobs of the Week

Email this Article

Say What? Google Works to Improve YouTube Auto-Captions for the Deaf

X
Scientific American MIND iPad

Tap into your MIND

Get Both Print & Tablet Editions for one low price!

Subscribe Now >>

X

Please Log In

Forgot: Password

X

Account Linking

Welcome, . Do you have an existing ScientificAmerican.com account?

Yes, please link my existing account with for quick, secure access.



Forgot Password?

No, I would like to create a new account with my profile information.

Create Account
X

Report Abuse

Are you sure?

X

Institutional Access

It has been identified that the institution you are trying to access this article from has institutional site license access to Scientific American on nature.com. To access this article in its entirety through site license access, click below.

Site license access
X

Error

X

Share this Article

X