Noise, Alex Waibel tells me, is one of the major challenges that artificial speech translation has to meet. A device might be able to recognize speech in a laboratory, or a meeting room, but will struggle to cope with the kind of background noise I can hear surrounding Waibel as he speaks to me from Kyoto station.
I am struggling to follow him in English, on a scratchy line that reminds me we are nearly 10,000km apart — and that distance is still an obstacle to communication even if you are speaking the same language. We have not reached the future yet.
If we had, Waibel would have been able to speak in his native German and I would have been able to hear his words in English. He would also be able to converse hands-free and seamlessly with the Japanese people around him, with all parties speaking their native language.
Illustration: Constance Chou
At Karlsruhe Institute of Technology, where Waibel is professor of computer science, he and his colleagues already give lectures in German that their students can follow in English via an electronic translator.
The system generates text that students can read on their laptops or smartphones, so the process is somewhat akin to subtitling. It helps that lecturers speak clearly, do not have to compete with background chatter and say much the same thing each year.
The idea of artificial speech translation has been around for a long time. Waibel, who is also professor of computer science at Carnegie Mellon University in Pittsburgh, “sort of invented it. I proposed it at MIT [Massachusetts Institute of Technology] in 1978.”
Douglas Adams sort of invented it around the same time, too.
The Hitchhiker’s Guide to the Galaxy featured a life form called the Babel fish which, when placed in the ear, enabled a listener to understand any language in the universe. It came to represent one of those devices that technology enthusiasts dream of long before they become practically realizable, like portable voice communicators and TVs flat enough to hang on walls — a thing that ought to exist and so one day surely will.
Waibel’s first speech translation system, assembled in 1991, had a 500-word vocabulary, ran on large workstations and took several minutes to process what it heard.
“It wasn’t ready for prime time,” he said.
Now devices that look like prototype Babel fish have started to appear, riding a wave of advances in artificial translation and voice recognition.
Google has incorporated a translation feature into its Pixel earbuds, using Google Translate, which can also deliver voice translation via its smartphone app. Skype has a Translator feature that handles speech in 10 languages.
A number of smaller outfits, such as Waverly Labs, a Brooklyn-based start-up, have developed earpiece translators. Reviews in the tech media could reasonably be summarized as “not bad, actually.”
The systems currently available offer proof of the concept, but at this stage they seem to be regarded as eye-catching novelties rather than steps toward what Waibel calls “making a language-transparent society.”
One of the main developments driving artificial speech translation is the vogue for encouraging people to talk to their technology.
“We’re generally very early in the paradigm of voice-enabled devices, but it’s growing very rapidly and translation will be one of the key parts of this journey,” Google Translate director of product Barak Turovsky said.
Last month, Google introduced interpreter mode for its home devices.
Saying: “Hey, Google, be my French interpreter” activates spoken and, on smart displays, text translation. Google suggests hotel check-in as a possible application — perhaps the obvious example of a practical alternative to speaking travelers’ English, either as a native or as an additional language.
You can do this already if you have the Translate app on your phone, albeit using an awkwardly small screen and speaker. That kind of simple public interaction accounts for much usage of the app’s conversations feature, but another popular application is what Turovsky calls “romance.”
Data logs reveal the popularity of statements such as: “I love you” and “You have beautiful eyes.”
Much of this might not represent anything very new. After all, chat-up lines have been standard phrasebook content for decades.
Waverly Labs used the chat-up function as a hook for its Indiegogo funding drive, with a video in which company founder and chief executive Andrew Ochoa relates how he got the idea for a translator when he met a French woman on holiday, but could not communicate with her very well.
Trying to use a translation app was “horrible.” Phones get in the way — but earpieces are not in your face. The video shows what might have been: he presents a French woman with an earpiece, and off they go for coffee and sightseeing.
The pitch was spectacularly successful, raising US$4.4 million — 30 times the target.
One customer said the company’s Pilot earpiece had enabled him to speak to his girlfriend’s mother for the first time. Some even report that it has enabled them to speak to their spouses.
“Every once in a while, we’ll receive an e-mail from someone who says they’re using this to speak with their Spanish-speaking wife,” Ochoa said. “It baffles me how they even got together in the first place.”
We might surmise that it was through the Internet and an agency.
Ochoa acknowledges that “the technology has to improve a bit before you’ll really be able to find love through the earbud, but it’s not too far away.”
Many of the early adopters put the Pilot earpiece to entirely unromantic uses, acquiring it for use in organizations. Waverly is now preparing a new model for professional applications, which entails performance improvements in speech recognition, translation accuracy and the time it takes to deliver the translated speech.
“Professionals are less inclined to be patient in a conversation,” Ochoa said.
The new version is also to feature hygienic design improvements, to overcome the Pilot’s least appealing feature. For a conversation, both speakers need to have Pilots in their ears.
“We find that there’s a barrier with sharing one of the earphones with a stranger,” Ochoa said.
That cannot have been totally unexpected. The problem would be solved if earpiece translators became sufficiently prevalent that strangers would be likely to already have their own in their ears.
Whether that happens, and how quickly, probably depends not so much on the earpieces themselves, but on the prevalence of voice-controlled devices and artificial translation in general.
Here, the main driver appears to be access to emerging Asian markets. Google reckons that 50 percent of the Internet’s content is in English, but only 20 percent of the world’s population speak the language.
“If you look at areas where there is a lot of growth in Internet usage, like Asian countries, most of them don’t know English at all,” Turovsky said. “So in that regard, breaking language barriers is an important goal for everyone — and obviously for Google. That’s why Google is investing so many resources into translation systems.”
Waibel also highlights the significance of Asia, noting that voice translation has really taken off in Japan and China. There is still a long way to go, though.
Translation needs to be simultaneous, like the translator’s voice speaking over the foreign politician on the TV, rather than in packets that oblige speakers to pause after every few remarks and wait for the translation to be delivered. It needs to work offline, for situations where Internet access is not possible — and to address concerns about the amount of private speech data accumulating in the cloud, having been sent to servers for processing.
Systems not only need to cope with physical challenges such as noise, Waibel said, they would also need to be socially aware — to know their manners and to address people appropriately.
When I first e-mailed him, aware that he is a German professor and that continental traditions demand solemn respect for academic status, I erred on the side of formality and addressed him as “Dear Prof Waibel.”
As I expected, he replied in international English mode: “Hi Marek.”
Etiquette-sensitive artificial translators could relieve people of the need to be aware of differing cultural norms. They would facilitate interaction while reducing understanding.
At the same time, they might help to preserve local customs, slowing the spread of habits associated with international English, such as its readiness to get on first-name terms.
Professors and other professionals will not outsource language awareness to software, though. If the technology matures into seamless, ubiquitous artificial speech translation — Babel fish, in short — it will actually add value to language skills.
Automated translation would deliver a commodity product: basic, practical, low-prestige information that helps people buy things or find their way around. Whether it would help people conduct their family lives or romantic relationships is open to question — though one noteworthy possibility is that it could overcome the language barriers that often arise between generations after migration, leaving children and their grandparents without a shared language.
Whatever uses it is put to, though, it will never be as good as the real thing. Even if voice-morphing technology simulates the speaker’s voice, their lip movements will not match and they will look like they are in a dubbed movie.
The contrast would underline the value of shared languages, and the value of learning them. Making the effort to learn someone’s language is a sign of commitment and therefore of trustworthiness.
Sharing a language can also promote a sense of belonging and community, as with the international scientists who use English as a lingua franca, where their predecessors used Latin.
Immigrant shopkeepers who learn their customers’ language are not just making sales easier — they are showing that they wish to draw closer to their customers’ community and politely asserting a place in it.
When machine translation becomes a ubiquitous commodity product, human language skills will command a premium. The person who has a language in their head will always have the advantage over somebody who relies on a device, in the same way that somebody with a head for figures has the advantage over somebody who has to reach for a calculator.
Though the practical need for a lingua franca would diminish, the social value of sharing one would persist, and software can never be a substitute for the subtle, but vital understanding that comes with knowledge of a language.
That knowledge will always be needed to pick the nuances from the noise.
In the US’ National Security Strategy (NSS) report released last month, US President Donald Trump offered his interpretation of the Monroe Doctrine. The “Trump Corollary,” presented on page 15, is a distinctly aggressive rebranding of the more than 200-year-old foreign policy position. Beyond reasserting the sovereignty of the western hemisphere against foreign intervention, the document centers on energy and strategic assets, and attempts to redraw the map of the geopolitical landscape more broadly. It is clear that Trump no longer sees the western hemisphere as a peaceful backyard, but rather as the frontier of a new Cold War. In particular,
As the Chinese People’s Liberation Army (PLA) races toward its 2027 modernization goals, most analysts fixate on ship counts, missile ranges and artificial intelligence. Those metrics matter — but they obscure a deeper vulnerability. The true future of the PLA, and by extension Taiwan’s security, might hinge less on hardware than on whether the Chinese Communist Party (CCP) can preserve ideological loyalty inside its own armed forces. Iran’s 1979 revolution demonstrated how even a technologically advanced military can collapse when the social environment surrounding it shifts. That lesson has renewed relevance as fresh unrest shakes Iran today — and it should
When it became clear that the world was entering a new era with a radical change in the US’ global stance in US President Donald Trump’s second term, many in Taiwan were concerned about what this meant for the nation’s defense against China. Instability and disruption are dangerous. Chaos introduces unknowns. There was a sense that the Chinese Nationalist Party (KMT) might have a point with its tendency not to trust the US. The world order is certainly changing, but concerns about the implications for Taiwan of this disruption left many blind to how the same forces might also weaken
On today’s page, Masahiro Matsumura, a professor of international politics and national security at St Andrew’s University in Osaka, questions the viability and advisability of the government’s proposed “T-Dome” missile defense system. Matsumura writes that Taiwan’s military budget would be better allocated elsewhere, and cautions against the temptation to allow politics to trump strategic sense. What he does not do is question whether Taiwan needs to increase its defense capabilities. “Given the accelerating pace of Beijing’s military buildup and political coercion ... [Taiwan] cannot afford inaction,” he writes. A rational, robust debate over the specifics, not the scale or the necessity,