Designers of machine translation tools still mostly rely on dictionaries to make a foreign language understandable. However, now there is a new way: numbers.
Facebook researchers said rendering words into figures and exploiting mathematical similarities between languages is a promising avenue — even if a universal communicator a la Star Trek remains a distant dream.
Powerful automatic translation is a big priority for Internet giants. Allowing as many people as possible worldwide to communicate is not just an altruistic goal, but also good business.
Facebook, Google and Microsoft as well as Russia’s Yandex, China’s Baidu (百度) and others are constantly seeking to improve their translation tools.
Facebook has artificial intelligence (AI) experts on the job at one of its research labs in Paris.
Up to 200 languages are used on Facebook, said Antoine Bordes, European codirector of fundamental AI research for the social network.
Automatic translation is based on having large databases of identical texts in both languages to work from.
However, for many language pairs there just are not enough such parallel texts.
That is why researchers have been looking for another method, like the system developed by Facebook, which creates a mathematical representation for words.
Each word becomes a “vector” in a space of several hundred dimensions. Words that have close associations in the spoken language also find themselves close to each other in this vector space.
“For example, if you take the words ‘cat’ and ‘dog,’ semantically, they are words that describe a similar thing, so they will be extremely close together physically” in the vector space, said Guillaume Lample, one of the system’s designers. “If you take words like Madrid, London, Paris, which are European capital cities, it’s the same idea.”
These language maps can then be linked to one another using algorithms — at first roughly, but eventually becoming more refined, until entire phrases can be matched without too many errors.
Lample said results are already promising.
For the language pair of English-Romanian, Facebook’s machine translation system is “equal or maybe a bit worse” than the word vector system, Lample said.
However, for the rarer language pair of English-Urdu, where Facebook’s traditional system does not have many bilingual texts to reference, the word vector system is already superior, he said.
However, could the method allow translation from, say, Basque into the language of an Amazonian tribe?
In theory, yes, Lample said, but in practice a large body of written texts is needed to map the language, something lacking in Amazonian tribal languages.
“If you have just tens of thousands of phrases, it won’t work. You need several hundreds of thousands,” he said.
Experts at France’s national scientific center, CNRS, said the approach Lample has taken for Facebook could produce useful results, even if it does not result in perfect translations.
Thierry Poibeau of CNRS’ Lattice laboratory, which also researches machine translation, called the word vector approach “a conceptual revolution.”
He said “translating without parallel data” — dictionaries or versions of the same documents in both languages — “is something of the Holy Grail” of machine translation.
“But the question is what level of performance can be expected” from the word vector method, Poibeau said.
The method “can give an idea of the original text,” but the capability for a good translation every time remains unproven.
Francois Yvon, a researcher at CNRS’ Computer Science Laboratory for Mechanics and Engineering Sciences, said “the linking of languages is much more difficult” when they are far removed from one another.
“The manner of denoting concepts in Chinese is completely different from French,” he added.
However, even imperfect translations can be useful and could prove sufficient to track hate speech, a major priority for Facebook, Yvon said.
REVENGE: Trump said he had the support of the Syrian government for the strikes, which took place in response to an Islamic State attack on US soldiers last week The US launched large-scale airstrikes on more than 70 targets across Syria, the Pentagon said on Friday, fulfilling US President Donald Trump’s vow to strike back after the killing of two US soldiers. “This is not the beginning of a war — it is a declaration of vengeance,” US Secretary of Defense Pete Hegseth wrote on social media. “Today, we hunted and we killed our enemies. Lots of them. And we will continue.” The US Central Command said that fighter jets, attack helicopters and artillery targeted ISIS infrastructure and weapon sites. “All terrorists who are evil enough to attack Americans are hereby warned
‘POLITICAL LOYALTY’: The move breaks with decades of precedent among US administrations, which have tended to leave career ambassadors in their posts US President Donald Trump’s administration has ordered dozens of US ambassadors to step down, people familiar with the matter said, a precedent-breaking recall that would leave embassies abroad without US Senate-confirmed leadership. The envoys, career diplomats who were almost all named to their jobs under former US president Joe Biden, were told over the phone in the past few days they needed to depart in the next few weeks, the people said. They would not be fired, but finding new roles would be a challenge given that many are far along in their careers and opportunities for senior diplomats can
Seven wild Asiatic elephants were killed and a calf was injured when a high-speed passenger train collided with a herd crossing the tracks in India’s northeastern state of Assam early yesterday, local authorities said. The train driver spotted the herd of about 100 elephants and used the emergency brakes, but the train still hit some of the animals, Indian Railways spokesman Kapinjal Kishore Sharma told reporters. Five train coaches and the engine derailed following the impact, but there were no human casualties, Sharma said. Veterinarians carried out autopsies on the dead elephants, which were to be buried later in the day. The accident site
RUSHED: The US pushed for the October deal to be ready for a ceremony with Trump, but sometimes it takes time to create an agreement that can hold, a Thai official said Defense officials from Thailand and Cambodia are to meet tomorrow to discuss the possibility of resuming a ceasefire between the two countries, Thailand’s top diplomat said yesterday, as border fighting entered a third week. A ceasefire agreement in October was rushed to ensure it could be witnessed by US President Donald Trump and lacked sufficient details to ensure the deal to end the armed conflict would hold, Thai Minister of Foreign Affairs Sihasak Phuangketkeow said after an ASEAN foreign ministers’ meeting in Kuala Lumpur. The two countries agreed to hold talks using their General Border Committee, an established bilateral mechanism, with Thailand