Researchers develop artificial intelligence that lipreads better than humans. Kind of

According to a BBC News report, scientists at Oxford University have developed a machine that can lip-read better than humans, with 93% accuracy.

By ‘humans,’ we’re unsure whether they mean the typical person on the street, or the best of the Deaf lipreading masses. (There’s a difference, clearly.)

But dig a little closer and the headline isn’t quite what it first appears. The artificial intelligence system appears to match pre-existing text to lip movements. That’s quite a bit different from watching what someone’s saying with no idea what it might be, then figuring out what it is (which is what Deaf folk do every day).

However, the machine did this task quite a lot better than ‘humans’ did. And even if the technology is at an early stage right now, in a decade or so, who knows how good it might be?

And if it could offer speech recognition in noisy environments, then us Deafies might finally get those subtitled glasses we’ve been dreaming of for years. Which might make up for losing all the lipreading work we used to get when there’s a royal wedding on.

The report says:

Scientists at Oxford University have developed a machine that can lip-read better than humans.

The artificial intelligence system – LipNet – watches video of a person speaking and matches the text to the movement of their mouths with 93% accuracy, the researchers said.

Automating the process could help millions, they suggested.

But experts said the system needed to be tested in real-life situations.

Lip-reading is a notoriously tricky business with professionals only able to decipher what someone is saying up to 60% of the time.

“Machine lip-readers have enormous potential, with applications in improved hearing aids, silent dictation in public spaces, covert conversations, speech recognition in noisy environments, biometric identification and silent-movie processing,” wrote the researchers.

They said that the AI system was provided with whole sentences so that it could teach itself which letter corresponded to which lip movement.

To train the AI, the team – from Oxford University’s AI lab – fed it nearly 29,000 videos, labelled with the correct text. Each video was three seconds long and followed a similar grammatical pattern.

While human testers given similar videos had an error rate of 47.7%, the AI had one of just 6.6%.

Read the full article here: http://www.bbc.co.uk/news/technology-37911135

By Charlie Swinbourne, Editor