Designers of machine translation tools still mostly rely on dictionaries to make a foreign language understandable. However, now there is a new way: numbers.
Facebook researchers said rendering words into figures and exploiting mathematical similarities between languages is a promising avenue — even if a universal communicator a la Star Trek remains a distant dream.
Powerful automatic translation is a big priority for Internet giants. Allowing as many people as possible worldwide to communicate is not just an altruistic goal, but also good business.
Facebook, Google and Microsoft as well as Russia’s Yandex, China’s Baidu (百度) and others are constantly seeking to improve their translation tools.
Facebook has artificial intelligence (AI) experts on the job at one of its research labs in Paris.
Up to 200 languages are used on Facebook, said Antoine Bordes, European codirector of fundamental AI research for the social network.
Automatic translation is based on having large databases of identical texts in both languages to work from.
However, for many language pairs there just are not enough such parallel texts.
That is why researchers have been looking for another method, like the system developed by Facebook, which creates a mathematical representation for words.
Each word becomes a “vector” in a space of several hundred dimensions. Words that have close associations in the spoken language also find themselves close to each other in this vector space.
“For example, if you take the words ‘cat’ and ‘dog,’ semantically, they are words that describe a similar thing, so they will be extremely close together physically” in the vector space, said Guillaume Lample, one of the system’s designers. “If you take words like Madrid, London, Paris, which are European capital cities, it’s the same idea.”
These language maps can then be linked to one another using algorithms — at first roughly, but eventually becoming more refined, until entire phrases can be matched without too many errors.
Lample said results are already promising.
For the language pair of English-Romanian, Facebook’s machine translation system is “equal or maybe a bit worse” than the word vector system, Lample said.
However, for the rarer language pair of English-Urdu, where Facebook’s traditional system does not have many bilingual texts to reference, the word vector system is already superior, he said.
However, could the method allow translation from, say, Basque into the language of an Amazonian tribe?
In theory, yes, Lample said, but in practice a large body of written texts is needed to map the language, something lacking in Amazonian tribal languages.
“If you have just tens of thousands of phrases, it won’t work. You need several hundreds of thousands,” he said.
Experts at France’s national scientific center, CNRS, said the approach Lample has taken for Facebook could produce useful results, even if it does not result in perfect translations.
Thierry Poibeau of CNRS’ Lattice laboratory, which also researches machine translation, called the word vector approach “a conceptual revolution.”
He said “translating without parallel data” — dictionaries or versions of the same documents in both languages — “is something of the Holy Grail” of machine translation.
“But the question is what level of performance can be expected” from the word vector method, Poibeau said.
The method “can give an idea of the original text,” but the capability for a good translation every time remains unproven.
Francois Yvon, a researcher at CNRS’ Computer Science Laboratory for Mechanics and Engineering Sciences, said “the linking of languages is much more difficult” when they are far removed from one another.
“The manner of denoting concepts in Chinese is completely different from French,” he added.
However, even imperfect translations can be useful and could prove sufficient to track hate speech, a major priority for Facebook, Yvon said.
Indonesia and Malaysia have become the first countries to block Grok, the artificial intelligence (AI) chatbot developed by Elon Musk’s xAI, after authorities said it was being misused to generate sexually explicit and nonconsensual images. The moves reflect growing global concern over generative AI tools that can produce realistic images, sound and text, while existing safeguards fail to prevent their abuse. The Grok chatbot, which is accessed through Musk’s social media platform X, has been criticized for generating manipulated images, including depictions of women in bikinis or sexually explicit poses, as well as images involving children. Regulators in the two Southeast Asian
COMMUNIST ALIGNMENT: To Lam wants to combine party chief and state presidency roles, with the decision resting on the election of 200 new party delegates next week Communist Party of Vietnam General Secretary To Lam is seeking to combine his party role with the state presidency, officials said, in a move that would align Vietnam’s political structure more closely to China’s, where President Xi Jinping (習近平) heads the party and state. Next week about 1,600 delegates are to gather in Hanoi to commence a week-long communist party congress, held every five years to select new leaders and set policy goals for the single-party state. Lam, 68, bade for both top positions at a party meeting last month, seeking initial party approval ahead of the congress, three people briefed by
The Chinese Embassy in Manila yesterday said it has filed a diplomatic protest against a Philippine Coast Guard spokesman over a social media post that included cartoonish images of Chinese President Xi Jinping (習近平). Philippine Coast Guard spokesman Jay Tarriela and an embassy official had been trading barbs since last week over issues concerning the disputed South China Sea. The crucial waterway, which Beijing claims historic rights to despite an international ruling that its assertion has no legal basis, has been the site of repeated clashes between Chinese and Philippine vessels. Tarriela’s Facebook post on Wednesday included a photo of him giving a
ICE DISPUTE: The Trump administration has sought to paint Good as a ‘domestic terrorist,’ insisting that the agent who fatally shot her was acting in self-defense Thousands of demonstrators chanting the name of the woman killed by a US federal agent in Minneapolis, Minnesota, took to the city’s streets on Saturday, amid widespread anger at use of force in the immigration crackdown of US President Donald Trump. Organizers said more than 1,000 events were planned across the US under the slogan “ICE, Out for Good” — referring to the US Immigration and Customs Enforcement, which is drawing growing opposition over its execution of Trump’s effort at mass deportations. The slogan is also a reference to Renee Good, the 37-year-old mother shot dead on Wednesday in her