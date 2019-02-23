By Marek Kohn / The Guardian

Noise, Alex Waibel tells me, is one of the major challenges that artificial speech translation has to meet. A device might be able to recognize speech in a laboratory, or a meeting room, but will struggle to cope with the kind of background noise I can hear surrounding Waibel as he speaks to me from Kyoto station.

I am struggling to follow him in English, on a scratchy line that reminds me we are nearly 10,000km apart — and that distance is still an obstacle to communication even if you are speaking the same language. We have not reached the future yet.

If we had, Waibel would have been able to speak in his native German and I would have been able to hear his words in English. He would also be able to converse hands-free and seamlessly with the Japanese people around him, with all parties speaking their native language.

At Karlsruhe Institute of Technology, where Waibel is professor of computer science, he and his colleagues already give lectures in German that their students can follow in English via an electronic translator.

The system generates text that students can read on their laptops or smartphones, so the process is somewhat akin to subtitling. It helps that lecturers speak clearly, do not have to compete with background chatter and say much the same thing each year.

The idea of artificial speech translation has been around for a long time. Waibel, who is also professor of computer science at Carnegie Mellon University in Pittsburgh, “sort of invented it. I proposed it at MIT [Massachusetts Institute of Technology] in 1978.”

Douglas Adams sort of invented it around the same time, too.

The Hitchhiker’s Guide to the Galaxy featured a life form called the Babel fish which, when placed in the ear, enabled a listener to understand any language in the universe. It came to represent one of those devices that technology enthusiasts dream of long before they become practically realizable, like portable voice communicators and TVs flat enough to hang on walls — a thing that ought to exist and so one day surely will.

Waibel’s first speech translation system, assembled in 1991, had a 500-word vocabulary, ran on large workstations and took several minutes to process what it heard.

“It wasn’t ready for prime time,” he said.

Now devices that look like prototype Babel fish have started to appear, riding a wave of advances in artificial translation and voice recognition.

Google has incorporated a translation feature into its Pixel earbuds, using Google Translate, which can also deliver voice translation via its smartphone app. Skype has a Translator feature that handles speech in 10 languages.

A number of smaller outfits, such as Waverly Labs, a Brooklyn-based start-up, have developed earpiece translators. Reviews in the tech media could reasonably be summarized as “not bad, actually.”

The systems currently available offer proof of the concept, but at this stage they seem to be regarded as eye-catching novelties rather than steps toward what Waibel calls “making a language-transparent society.”

One of the main developments driving artificial speech translation is the vogue for encouraging people to talk to their technology.

“We’re generally very early in the paradigm of voice-enabled devices, but it’s growing very rapidly and translation will be one of the key parts of this journey,” Google Translate director of product Barak Turovsky said.