Noise, Alex Waibel tells me, is one of the major challenges that artificial speech translation has to meet. A device might be able to recognize speech in a laboratory, or a meeting room, but will struggle to cope with the kind of background noise I can hear surrounding Waibel as he speaks to me from Kyoto station.
I am struggling to follow him in English, on a scratchy line that reminds me we are nearly 10,000km apart — and that distance is still an obstacle to communication even if you are speaking the same language. We have not reached the future yet.
If we had, Waibel would have been able to speak in his native German and I would have been able to hear his words in English. He would also be able to converse hands-free and seamlessly with the Japanese people around him, with all parties speaking their native language.
Illustration: Constance Chou
At Karlsruhe Institute of Technology, where Waibel is professor of computer science, he and his colleagues already give lectures in German that their students can follow in English via an electronic translator.
The system generates text that students can read on their laptops or smartphones, so the process is somewhat akin to subtitling. It helps that lecturers speak clearly, do not have to compete with background chatter and say much the same thing each year.
The idea of artificial speech translation has been around for a long time. Waibel, who is also professor of computer science at Carnegie Mellon University in Pittsburgh, “sort of invented it. I proposed it at MIT [Massachusetts Institute of Technology] in 1978.”
Douglas Adams sort of invented it around the same time, too.
The Hitchhiker’s Guide to the Galaxy featured a life form called the Babel fish which, when placed in the ear, enabled a listener to understand any language in the universe. It came to represent one of those devices that technology enthusiasts dream of long before they become practically realizable, like portable voice communicators and TVs flat enough to hang on walls — a thing that ought to exist and so one day surely will.
Waibel’s first speech translation system, assembled in 1991, had a 500-word vocabulary, ran on large workstations and took several minutes to process what it heard.
“It wasn’t ready for prime time,” he said.
Now devices that look like prototype Babel fish have started to appear, riding a wave of advances in artificial translation and voice recognition.
Google has incorporated a translation feature into its Pixel earbuds, using Google Translate, which can also deliver voice translation via its smartphone app. Skype has a Translator feature that handles speech in 10 languages.
A number of smaller outfits, such as Waverly Labs, a Brooklyn-based start-up, have developed earpiece translators. Reviews in the tech media could reasonably be summarized as “not bad, actually.”
The systems currently available offer proof of the concept, but at this stage they seem to be regarded as eye-catching novelties rather than steps toward what Waibel calls “making a language-transparent society.”
One of the main developments driving artificial speech translation is the vogue for encouraging people to talk to their technology.
“We’re generally very early in the paradigm of voice-enabled devices, but it’s growing very rapidly and translation will be one of the key parts of this journey,” Google Translate director of product Barak Turovsky said.
Last month, Google introduced interpreter mode for its home devices.
Saying: “Hey, Google, be my French interpreter” activates spoken and, on smart displays, text translation. Google suggests hotel check-in as a possible application — perhaps the obvious example of a practical alternative to speaking travelers’ English, either as a native or as an additional language.
You can do this already if you have the Translate app on your phone, albeit using an awkwardly small screen and speaker. That kind of simple public interaction accounts for much usage of the app’s conversations feature, but another popular application is what Turovsky calls “romance.”
Data logs reveal the popularity of statements such as: “I love you” and “You have beautiful eyes.”
Much of this might not represent anything very new. After all, chat-up lines have been standard phrasebook content for decades.
Waverly Labs used the chat-up function as a hook for its Indiegogo funding drive, with a video in which company founder and chief executive Andrew Ochoa relates how he got the idea for a translator when he met a French woman on holiday, but could not communicate with her very well.
Trying to use a translation app was “horrible.” Phones get in the way — but earpieces are not in your face. The video shows what might have been: he presents a French woman with an earpiece, and off they go for coffee and sightseeing.
The pitch was spectacularly successful, raising US$4.4 million — 30 times the target.
One customer said the company’s Pilot earpiece had enabled him to speak to his girlfriend’s mother for the first time. Some even report that it has enabled them to speak to their spouses.
“Every once in a while, we’ll receive an e-mail from someone who says they’re using this to speak with their Spanish-speaking wife,” Ochoa said. “It baffles me how they even got together in the first place.”
We might surmise that it was through the Internet and an agency.
Ochoa acknowledges that “the technology has to improve a bit before you’ll really be able to find love through the earbud, but it’s not too far away.”
Many of the early adopters put the Pilot earpiece to entirely unromantic uses, acquiring it for use in organizations. Waverly is now preparing a new model for professional applications, which entails performance improvements in speech recognition, translation accuracy and the time it takes to deliver the translated speech.
“Professionals are less inclined to be patient in a conversation,” Ochoa said.
The new version is also to feature hygienic design improvements, to overcome the Pilot’s least appealing feature. For a conversation, both speakers need to have Pilots in their ears.
“We find that there’s a barrier with sharing one of the earphones with a stranger,” Ochoa said.
That cannot have been totally unexpected. The problem would be solved if earpiece translators became sufficiently prevalent that strangers would be likely to already have their own in their ears.
Whether that happens, and how quickly, probably depends not so much on the earpieces themselves, but on the prevalence of voice-controlled devices and artificial translation in general.
Here, the main driver appears to be access to emerging Asian markets. Google reckons that 50 percent of the Internet’s content is in English, but only 20 percent of the world’s population speak the language.
“If you look at areas where there is a lot of growth in Internet usage, like Asian countries, most of them don’t know English at all,” Turovsky said. “So in that regard, breaking language barriers is an important goal for everyone — and obviously for Google. That’s why Google is investing so many resources into translation systems.”
Waibel also highlights the significance of Asia, noting that voice translation has really taken off in Japan and China. There is still a long way to go, though.
Translation needs to be simultaneous, like the translator’s voice speaking over the foreign politician on the TV, rather than in packets that oblige speakers to pause after every few remarks and wait for the translation to be delivered. It needs to work offline, for situations where Internet access is not possible — and to address concerns about the amount of private speech data accumulating in the cloud, having been sent to servers for processing.
Systems not only need to cope with physical challenges such as noise, Waibel said, they would also need to be socially aware — to know their manners and to address people appropriately.
When I first e-mailed him, aware that he is a German professor and that continental traditions demand solemn respect for academic status, I erred on the side of formality and addressed him as “Dear Prof Waibel.”
As I expected, he replied in international English mode: “Hi Marek.”
Etiquette-sensitive artificial translators could relieve people of the need to be aware of differing cultural norms. They would facilitate interaction while reducing understanding.
At the same time, they might help to preserve local customs, slowing the spread of habits associated with international English, such as its readiness to get on first-name terms.
Professors and other professionals will not outsource language awareness to software, though. If the technology matures into seamless, ubiquitous artificial speech translation — Babel fish, in short — it will actually add value to language skills.
Automated translation would deliver a commodity product: basic, practical, low-prestige information that helps people buy things or find their way around. Whether it would help people conduct their family lives or romantic relationships is open to question — though one noteworthy possibility is that it could overcome the language barriers that often arise between generations after migration, leaving children and their grandparents without a shared language.
Whatever uses it is put to, though, it will never be as good as the real thing. Even if voice-morphing technology simulates the speaker’s voice, their lip movements will not match and they will look like they are in a dubbed movie.
The contrast would underline the value of shared languages, and the value of learning them. Making the effort to learn someone’s language is a sign of commitment and therefore of trustworthiness.
Sharing a language can also promote a sense of belonging and community, as with the international scientists who use English as a lingua franca, where their predecessors used Latin.
Immigrant shopkeepers who learn their customers’ language are not just making sales easier — they are showing that they wish to draw closer to their customers’ community and politely asserting a place in it.
When machine translation becomes a ubiquitous commodity product, human language skills will command a premium. The person who has a language in their head will always have the advantage over somebody who relies on a device, in the same way that somebody with a head for figures has the advantage over somebody who has to reach for a calculator.
Though the practical need for a lingua franca would diminish, the social value of sharing one would persist, and software can never be a substitute for the subtle, but vital understanding that comes with knowledge of a language.
That knowledge will always be needed to pick the nuances from the noise.
With escalating US-China competition and mutual distrust, the trend of supply chain “friend shoring” in the wake of the COVID-19 pandemic and the fragmentation of the world into rival geopolitical blocs, many analysts and policymakers worry the world is retreating into a new cold war — a world of trade bifurcation, protectionism and deglobalization. The world is in a new cold war, said Robin Niblett, former director of the London-based think tank Chatham House. Niblett said he sees the US and China slowly reaching a modus vivendi, but it might take time. The two great powers appear to be “reversing carefully
As China steps up a campaign to diplomatically isolate and squeeze Taiwan, it has become more imperative than ever that Taipei play a greater role internationally with the support of the democratic world. To help safeguard its autonomous status, Taiwan needs to go beyond bolstering its defenses with weapons like anti-ship and anti-aircraft missiles. With the help of its international backers, it must also expand its diplomatic footprint globally. But are Taiwan’s foreign friends willing to translate their rhetoric into action by helping Taipei carve out more international space for itself? Beating back China’s effort to turn Taiwan into an international pariah
Typhoon Krathon made landfall in southwestern Taiwan last week, bringing strong winds, heavy rain and flooding, cutting power to more than 170,000 homes and water supply to more than 400,000 homes, and leading to more than 600 injuries and four deaths. Due to the typhoon, schools and offices across the nation were ordered to close for two to four days, stirring up familiar controversies over whether local governments’ decisions to call typhoon days were appropriate. The typhoon’s center made landfall in Kaohsiung’s Siaogang District (小港) at noon on Thursday, but it weakened into a tropical depression early on Friday, and its structure
Since the end of the Cold War, the US-China espionage battle has arguably become the largest on Earth. Spying on China is vital for the US, as China’s growing military and technological capabilities pose direct challenges to its interests, especially in defending Taiwan and maintaining security in the Indo-Pacific. Intelligence gathering helps the US counter Chinese aggression, stay ahead of threats and safeguard not only its own security, but also the stability of global trade routes. Unchecked Chinese expansion could destabilize the region and have far-reaching global consequences. In recent years, spying on China has become increasingly difficult for the US