Designers of machine translation tools still mostly rely on dictionaries to make a foreign language understandable. However, now there is a new way: numbers.
Facebook researchers said rendering words into figures and exploiting mathematical similarities between languages is a promising avenue — even if a universal communicator a la Star Trek remains a distant dream.
Powerful automatic translation is a big priority for Internet giants. Allowing as many people as possible worldwide to communicate is not just an altruistic goal, but also good business.
Facebook, Google and Microsoft as well as Russia’s Yandex, China’s Baidu (百度) and others are constantly seeking to improve their translation tools.
Facebook has artificial intelligence (AI) experts on the job at one of its research labs in Paris.
Up to 200 languages are used on Facebook, said Antoine Bordes, European codirector of fundamental AI research for the social network.
Automatic translation is based on having large databases of identical texts in both languages to work from.
However, for many language pairs there just are not enough such parallel texts.
That is why researchers have been looking for another method, like the system developed by Facebook, which creates a mathematical representation for words.
Each word becomes a “vector” in a space of several hundred dimensions. Words that have close associations in the spoken language also find themselves close to each other in this vector space.
“For example, if you take the words ‘cat’ and ‘dog,’ semantically, they are words that describe a similar thing, so they will be extremely close together physically” in the vector space, said Guillaume Lample, one of the system’s designers. “If you take words like Madrid, London, Paris, which are European capital cities, it’s the same idea.”
These language maps can then be linked to one another using algorithms — at first roughly, but eventually becoming more refined, until entire phrases can be matched without too many errors.
Lample said results are already promising.
For the language pair of English-Romanian, Facebook’s machine translation system is “equal or maybe a bit worse” than the word vector system, Lample said.
However, for the rarer language pair of English-Urdu, where Facebook’s traditional system does not have many bilingual texts to reference, the word vector system is already superior, he said.
However, could the method allow translation from, say, Basque into the language of an Amazonian tribe?
In theory, yes, Lample said, but in practice a large body of written texts is needed to map the language, something lacking in Amazonian tribal languages.
“If you have just tens of thousands of phrases, it won’t work. You need several hundreds of thousands,” he said.
Experts at France’s national scientific center, CNRS, said the approach Lample has taken for Facebook could produce useful results, even if it does not result in perfect translations.
Thierry Poibeau of CNRS’ Lattice laboratory, which also researches machine translation, called the word vector approach “a conceptual revolution.”
He said “translating without parallel data” — dictionaries or versions of the same documents in both languages — “is something of the Holy Grail” of machine translation.
“But the question is what level of performance can be expected” from the word vector method, Poibeau said.
The method “can give an idea of the original text,” but the capability for a good translation every time remains unproven.
Francois Yvon, a researcher at CNRS’ Computer Science Laboratory for Mechanics and Engineering Sciences, said “the linking of languages is much more difficult” when they are far removed from one another.
“The manner of denoting concepts in Chinese is completely different from French,” he added.
However, even imperfect translations can be useful and could prove sufficient to track hate speech, a major priority for Facebook, Yvon said.
Reporters Without Borders has accused the Algerian government of taking advantage of the COVID-19 pandemic to “settle scores” with independent journalists, including those covering long-running anti-government protests. In a statement signed with Algerian non-governmental organizations, the watchdog on Thursday called for the immediate release of its correspondent, Khaled Drareni, who has been in pretrial detention since Sunday after being charged with inciting an unarmed gathering and endangering national unity. Drareni has been arrested several times for covering the “Hirak” anti-government protests held in the capital, Algiers, every Friday since February last year. Imprisoning people during a pandemic is “an act of physical endangerment,”
Vietnam has lodged an official protest with China following the sinking of a Vietnamese fishing boat that it said had been rammed by a Chinese maritime surveillance vessel near islands in the South China Sea. The Vietnamese fishing vessel, with eight fishermen onboard, was fishing near the Paracel Islands (Xisha Islands, 西沙群島) on Thursday when it was rammed and sunk by the Chinese vessel, the Vietnamese Ministry of Foreign Affairs said in a statement posted on a government Web site yesterday. All of the fishermen were picked up by the Chinese vessel alive and were transferred to two other Vietnamese fishing vessels
DIVIDED YOUTH: There is a belief that overseas students see themselves as superior, which is compounded by perceptions of their extreme wealth and multiple nationalities Chinese students flying home from overseas to escape the COVID-19 pandemic face a frosty reception from sections of the public who view them as wealthy, spoiled — and potentially contaminated. The number of officially reported cases in China has dwindled dramatically over the last month, but the country is now taking drastic steps to try and stem a second wave of infections brought in from abroad. With most international flights canceled and nearly all foreigners barred from entering the country, the vast majority of returnees are Chinese nationals, including many students. The situation has exposed animosities over class and privilege in Chinese society,
An Australian graduate student arrested for spying and expelled from North Korea last year said that he was threatened with a firing-squad execution and told not even US President Donald Trump could save his “sorry arse.” Among the crimes Alek Sigley was accused of committing was posting a picture of a toy tank on Instagram, which his interrogators told him was military espionage. Sigley, 30, was studying for a master’s degree in Korean literature at Kim Il Sung University in Pyongyang when he went missing in June last year, sparking alarm. A fluent speaker of Korean, he had written articles for several publications