Angus Grieve-Smith: 10 reasons why sign-to-speech technology won’t be practical anytime soon

Posted on May 4, 2016 by Editor

It’s that time again! A bunch of really eager computer scientists have a prototype that will translate sign language to speech! They’ve got a really cool video that you just gotta see!

They win an award! (from a panel that includes no signers or linguists). Technology news sites go wild! (without interviewing any linguists, and sometimes without even interviewing any deaf people).

…and we computational sign linguists, who have been through this over and over, every year or two, just *facepalm*.

The latest strain of viral computational sign linguistics hype comes from the University of Washington, where two hearing undergrads have put together a system that … supposedly recognizes isolated hand gestures in citation form. But you can see the potential! *facepalm*.

Twelve years ago, after already having a few of these *facepalm* moments, I wrote up a summary of the challenges facing any computational sign linguistics project and published it as part of a paper on my sign language synthesis prototype.

But since most people don’t have a subscription to the journal it appeared in, I’ve put together a quick summary of Ten Reasons why sign-to-speech is not going to be practical any time soon.

Sign languages are languages. They’re different from spoken languages. Yes, that means that if you think of a place where there’s a sign language and a spoken language, they’re going to be different. More different than English and Chinese.
We can’t do this for spoken languages. You know that app where you can speak English into it and out comes fluent Pashto? No? That’s because it doesn’t exist. The Army has wanted an app like that for decades, and they’ve been funding it up the wazoo, and it’s still not here. Sign languages are at least ten times harder.
It’s complicated. Computers aren’t great with natural language at all, but they’re better with written language than spoken language. For that reason, people have broken the speech-to-speech translation task down into three steps: speech-to-text, machine translation, and text-to-speech.
Speech to text is hard. When you call a company and get a message saying “press or say the number after the tone,” do you press or say? I bet you don’t even call if you can get to their website, because speech to text suuucks:

-Say “yes” or “no” after the tone.
-No.
-I think you said, “Go!” Is that correct?
-No.
-My mistake. Please try again.
-No.
-I think you said, “I love cheese.” Is that correct?
-Operator!
There is no text. A lot of people think that text for a sign language is the same as the spoken language, but if you think about point 1 you’ll realize that that can’t possibly be true. Well, why don’t people write sign languages? I believe it can be done, and lots of people have tried, but for some reason it never seems to catch on. It might just be the classifier predicates.
Sign recognition is hard. There’s a lot that linguists don’t know about sign languages already. Computers can’t even get reliable signs from people wearing gloves, never mind video feeds. This may be better than gloves, but it doesn’t do anything with facial or body gestures.
Machine translation is hard going from one written (i.e. written version of a spoken) language to another. Different words, different meanings, different word order. You can’t just look up words in a dictionary and string them together. Google Translate is only moderately decent because it’s throwing massive statistical computing power at the input – and that only works for languages with a huge corpus of text available.
Sign to spoken translation is really hard. Remember how in #5 I mentioned that there is no text for sign languages? No text, no huge corpus, no machine translation. I tried making a rule-based translation system, and as soon as I realized how humongous the task of translating classifier predicates was, I backed off. Matt Huenerfauth has been trying (PDF), but he knows how big a job it is.
Sign synthesis is hard. Okay, that’s probably the easiest problem of them all. I built a prototype sign synthesis system in 1997, I’ve improved it, and other people have built even better ones since.
What is this for, anyway? Oh yeah, why are we doing this? So that Deaf people can carry a device with a camera around, and every time they want to talk to a hearing person they have to mount it on something, stand in a well-lighted area and sign into it? Or maybe someday have special clothing that can recognize their hand gestures, but nothing for their facial gestures? I’m sure that’s so much better than decent funding for interpreters, or teaching more people to sign, or hiring more fluent signers in key positions where Deaf people need the best customer service.

So I’m asking all you computer scientists out there who don’t know anything about sign languages, especially anyone who might be in a position to fund something like this or give out one of these gee-whiz awards: Just stop. Take a minute. Step back from the tech-bling. Unplug your messiah complex. Realize that you might not be the best person to decide whether or not this is a good idea. Ask a linguist. And please, ask a Deaf person!

Note: I originally wrote this post in November 2013, in response to an article about a prototype using Microsoft Kinect. I never posted it. Now I’ve seen at least three more, and I feel like I have to post this. I didn’t have to change much.

This has been shared by kind permission from Angus’s blog.

Angus Grieve-Smith is a hearing linguist and programmer. He created one of the early sign language synthesis prototypes in 1997, and has recently been working on information extraction projects in New York University’s Computer Science Department. He teaches Linguistics at Saint John’s University.

Enjoying our eggs? Support The Limping Chicken:

The Limping Chicken is the world's most popular Deaf blog, and is edited by Deaf writer and photographer Charlie Swinbourne.

Our posts represent the opinions of blog authors, they do not represent the site's views or those of the site's editor. Posting a blog does not imply agreement with a blog's content. Read our disclaimer here and read our privacy policy here.

Find out how to write for us by clicking here, and how to follow us by clicking here.

This site exists thanks to our supporters. Check them out below:

Lumo TV: TV programmes in BSL for the Deaf community
Bellman & Symfon: home alerting solutions, including the mobile phone transceiver!
SignVideo: Instant BSL video interpreting online
British Deaf Association: Working to transform the way BSL is accessed in the UK
999 BSL: call 999 in an emergency, in BSL
Deaf Umbrella: Making colleges and workplaces more accessible
Appa: Looking for RSLIs, TSLIs and CSWs, apply here!
Signly: Adding BSL to websites
Signworld: Learn BSL online!
DCAL: world-class research into deafness, cognition and language
Lipspeaker UK: specialist lipspeaking support
BID: Deaf services, including advice and support, interpreting, employment and letters
Interpreting Matters: BSL interpreting agency
Sign Solutions: Instant access to Interpreters, training and BSL translation nationwide
InterpretersLive: On demand BSL video interpretation
Performance Interpreting: BSL interpreting at concerts
National Deaf Children's Society: The leading charity for deaf children
Action Deafness: “A Deaf-Led Charity” – interpreting & community support services

Posted in: Angus Grieve-Smith

3 Responses “Angus Grieve-Smith: 10 reasons why sign-to-speech technology won’t be practical anytime soon” →

K. Willsen

May 5, 2016

This! I saw the item about the “translation tech” and was intrigued… until I read the article. Yet another poorly-researched “breakthrough”.
sybil

May 5, 2016

Ahhhhhhhh….you get it. I expected no less from this blog.

Another group I follow posted one of the celebratory videos. After watching it, I was really conflicted: should I click ‘like’ because yay? Or maybe the laughter emoji, because this is a joke, right? In the end, I didn’t acknowledge it at all, though I was tempted to use the sad emoji:

A newscaster who had little to no sign knowledge before being asked to demonstrate the mechanism was delivering the segment on the news while wearing the device.

Her signs were English (as opposed to ASL) and were very stilted.

I totally encourage people to learn signs. I will not make fun of your mistakes, I will help you, I will practice with you.

But please don’t do crap like this. How was a novice/nervous SEE user supposed to be the best demonstrator for this device? If a person learns sign language, they can probably just sign directly to the person they want to talk to. Right?

This device is supposedly for translating a signed language (like ASL) into the spoken language of that region. Right? Or am I misunderstanding something? The best demonstration would have been to have a non-signer interview an ASL user strapped into this device. watch that machine smoke in frustration.

(And, seriously? Hearing aids and CIs aren’t enough to make those who need assistance feel like cyborgs? Let’s strap some wires to their arms and fingers! Ignoring that facial expressions have part in some signs. Oooohhhh…..electrodes strapped to the face- brilliant! )

ASL, for example, has so many words/ signs that don’t translate to English well. There is one that I particularly like- when a hearing friend asked me to explain it……it took me a paragraph of English words, and I’m still not sure she understood the depth of the meaning.

Anyway, my take is that this isn’t supposed to make anything any easier for those who use a sign language as their primary language- it’s just another way that hearing people can say ‘we DO understand, just look what we are trying to help you with.’ Then pat themselves in the back.

Until the hearing people creating these ‘helpful’ technologies understand that sign languages are actual, separate languages, there can be no real advances. And once they DO realize that, we will all have to deal with many translations akin to ‘all your base are belong to us.’
ndbeese

May 6, 2016

Amen.

Angus Grieve-Smith: 10 reasons why sign-to-speech technology won’t be practical anytime soon

Now you've read it share it:

Like this: