Designers of machine translation tools still mostly rely on dictionaries to make a foreign language understandable. However, now there is a new way: numbers.
Facebook researchers said rendering words into figures and exploiting mathematical similarities between languages is a promising avenue — even if a universal communicator a la Star Trek remains a distant dream.
Powerful automatic translation is a big priority for Internet giants. Allowing as many people as possible worldwide to communicate is not just an altruistic goal, but also good business.
Facebook, Google and Microsoft as well as Russia’s Yandex, China’s Baidu (百度) and others are constantly seeking to improve their translation tools.
Facebook has artificial intelligence (AI) experts on the job at one of its research labs in Paris.
Up to 200 languages are used on Facebook, said Antoine Bordes, European codirector of fundamental AI research for the social network.
Automatic translation is based on having large databases of identical texts in both languages to work from.
However, for many language pairs there just are not enough such parallel texts.
That is why researchers have been looking for another method, like the system developed by Facebook, which creates a mathematical representation for words.
Each word becomes a “vector” in a space of several hundred dimensions. Words that have close associations in the spoken language also find themselves close to each other in this vector space.
“For example, if you take the words ‘cat’ and ‘dog,’ semantically, they are words that describe a similar thing, so they will be extremely close together physically” in the vector space, said Guillaume Lample, one of the system’s designers. “If you take words like Madrid, London, Paris, which are European capital cities, it’s the same idea.”
These language maps can then be linked to one another using algorithms — at first roughly, but eventually becoming more refined, until entire phrases can be matched without too many errors.
Lample said results are already promising.
For the language pair of English-Romanian, Facebook’s machine translation system is “equal or maybe a bit worse” than the word vector system, Lample said.
However, for the rarer language pair of English-Urdu, where Facebook’s traditional system does not have many bilingual texts to reference, the word vector system is already superior, he said.
However, could the method allow translation from, say, Basque into the language of an Amazonian tribe?
In theory, yes, Lample said, but in practice a large body of written texts is needed to map the language, something lacking in Amazonian tribal languages.
“If you have just tens of thousands of phrases, it won’t work. You need several hundreds of thousands,” he said.
Experts at France’s national scientific center, CNRS, said the approach Lample has taken for Facebook could produce useful results, even if it does not result in perfect translations.
Thierry Poibeau of CNRS’ Lattice laboratory, which also researches machine translation, called the word vector approach “a conceptual revolution.”
He said “translating without parallel data” — dictionaries or versions of the same documents in both languages — “is something of the Holy Grail” of machine translation.
“But the question is what level of performance can be expected” from the word vector method, Poibeau said.
The method “can give an idea of the original text,” but the capability for a good translation every time remains unproven.
Francois Yvon, a researcher at CNRS’ Computer Science Laboratory for Mechanics and Engineering Sciences, said “the linking of languages is much more difficult” when they are far removed from one another.
“The manner of denoting concepts in Chinese is completely different from French,” he added.
However, even imperfect translations can be useful and could prove sufficient to track hate speech, a major priority for Facebook, Yvon said.
CONFRONTATION: The water cannon attack was the second this month on the Philippine supply boat ‘Unaizah May 4,’ after an incident on March 5 The China Coast Guard yesterday morning blocked a Philippine supply vessel and damaged it with water cannons near a reef off the Southeast Asian country, the Philippines said. The Philippine military released video of what it said was a nearly hour-long attack off the Second Thomas Shoal (Renai Shoal, 仁愛暗沙) in the contested South China Sea, where Chinese ships have unleashed water cannons and collided with Philippine vessels in similar standoffs in the past few months. The China Coast Guard and other vessels “once again harassed, blocked, deployed water cannons, and executed dangerous maneuvers” against a routine rotation and resupply mission to
GLOBAL COMBAT AIR PROGRAM: The potential purchasers would be limited to the 15 nations with which Tokyo has signed defense partnership and equipment transfer deals Japan’s Cabinet yesterday approved a plan to sell future next-generation fighter jets that it is developing with the UK and Italy to other nations, in the latest move away from the country’s post-World War II pacifist principles. The contentious decision to allow international arms sales is expected to help secure Japan’s role in the joint fighter jet project, and is part of a move to build up the Japanese arms industry and bolster its role in global security. The Cabinet also endorsed a revision to Japan’s arms equipment and technology transfer guidelines to allow coproduced lethal weapons to be sold to nations
Thousands of devotees, some in a state of trance, gathered at a Buddhist temple on the outskirts of Bangkok renowned for sacred tattoos known as Sak Yant, paying their respects to a revered monk who mastered the practice and seeking purification. The gathering at Wat Bang Phra Buddhist temple is part of a Thai Wai Khru ritual in which devotees pay homage to Luang Phor Pern, the temple’s formal abbot, who died in 2002. He had a reputation for refining and popularizing the temple’s Sak Yant tattoo style. The idea that tattoos confer magical powers has existed in many parts of Asia
ON ALERT: A Russian cruise missile crossed into Polish airspace for about 40 seconds, the Polish military said, adding that it is constantly monitoring the war to protect its airspace Ukraine’s capital, Kyiv, and the western region of Lviv early yesterday came under a “massive” Russian air attack, officials said, while a Russian cruise missile breached Polish airspace, the Polish military said. Russia and Ukraine have been engaged in a series of deadly aerial attacks, with yesterday’s strikes coming a day after the Russian military said it had seized the Ukrainian village of Ivanivske, west of Bakhmut. A militant attack on a Moscow concert hall on Friday that killed at least 133 people also became a new flash point between the two archrivals. “Explosions in the capital. Air defense is working. Do not