Way back in 2009 I was ludicrously excited about Google’s effort to provide automatic captioning on Youtube videos. I thought this was going to usher in a new era of across-the-board captioning for all web videos. If Google could produce quality captions automatically and were then prepared to offer that service to all publishers of online video we would have the pick of any video on the Web.
Here we are, almost three years later, still with poor captions on YouTube. Take this Obama video as an example. I can’t really follow that using subtitles, there’s too many missing words and a fair share of wrong ones too. In 2012 captions still have to be created by humans.
I don’t want to get to down on Google because I know how hard speech to text is. We’ve had good speech to text systems for a number of years that use a training phase: you talk to the software and it learns your voice. After learning what you sound like a computer does a very good job of converting your text to speech, but the Google/Youtube task is much harder. 72 hours of video are uploaded to YouTube every minute! There’s no way there’s enough time to teach a computer what all the people in all those videos sound like.
The Obama video should actually be one of the simpler videos to transcribe because, for most of it, there is only one person talking at a time. Things get even harder when multiple people are talking together.
Could automated captioning ever work?
To be fair to Google, what they have right now is hugely impressive from a technical point of view, to be able to interpret any speech without training on specific voices is a big step. But it’s clearly not enough, it’s not making it easy for us to read along.
I don’t believe that truly automated captioning with enough quality will happen any time soon. The only way I can see this working is to train Google’s computers to understand speech on a grand scale; and to do that we would need a lot of volunteers to manually caption a lot of videos. If we took a sample of, say, one million videos and captioned them all we would have a vast amount of training data for Google to use to automate captioning on other videos. You’d have to train on all different languages, different accents, slang words, people shouting, people whispering, female voices, male voices, yelling – the list goes on and on.
Could that even work? I don’t know. Google has indexed the text of the Web, it has also indexed a vast amount of images and videos, can they index sounds too?
Steve Claridge has been wearing hearing aids for over 30 years. What started off as a minor hearing loss at the age of five is now a severe one, but his hearing aids help a lot. He blogs about all this at www.hearingaidknow.com.
The Limping Chicken is the world's most popular Deaf blog, and is edited by Deaf journalist and filmmaker Charlie Swinbourne. Find out how to write for us by clicking here, how to follow us by clicking here, and read our disclaimer here.
The site exists thanks to our supporters. Check them out below:
- Eyewitness Media: TV and film from a Deaf perspective
- Ai-Media: Remote captioning. Find out about 5 funny ways to use captions!
- Bellman & Symfon: home alerting solutions
- Deaf Umbrella: sign language interpreting and communications support
- Appa: Communication services for Deaf, Deafblind and hard of hearing people
- SignLive: Online video interpreting for Deaf people
- SignVideo: Instant BSL video interpreting online
- 121 Captions: captioning and speech-to-text services
- Signature: Leading awarding body for BSL qualifications
- The National Theatre: Captioned and BSL accessible theatre in London
- Doncaster School for the Deaf: education for Deaf children
- Signworld: Learn BSL online!
- Action Deafness Communications: sign language and Red Dot online video interpreting
- BSLcourses.co.uk: Provider of online BSL courses
- Association of Notetaking Professionals: The professional body representing Electronic and Manual Notetakers
- Sign Solutions: communication support, training and translation
- InterpretersLive: On demand BSL video interpretation
- Hamilton Lodge School in Brighton: education for Deaf children
- Lipspeaker UK: specialist lipspeaking support
- Ozen: Australian hearing aid specialists
- Elmfield School, Bristol: Inclusive education for Deaf pupils
- deafPLUS: BSL advice helpline
- Exeter Deaf Academy: education for Deaf children
- Royal Shakespeare Company: Captioned and BSL interpreted performances (see dates here)
- Royal School for the Deaf, Derby: Residential education for deaf children
- RAD Tax Advice: Tax and Tax Credit info for Deaf people
- Performance Interpreting: BSL interpreting at concerts
- National Deaf Children's Society: The leading charity for deaf children
- Signed Culture: Advocating for BSL access to arts and culture
- SignHealth: healthcare charity for Deaf people
- CJ Interpreting: communication support in BSL
- British Society for Mental Health and Deafness: Promoting positive mental health for deaf people