Designers of machine translation tools still mostly rely on dictionaries to make a foreign language understandable. However, now there is a new way: numbers.
Facebook researchers said rendering words into figures and exploiting mathematical similarities between languages is a promising avenue — even if a universal communicator a la Star Trek remains a distant dream.
Powerful automatic translation is a big priority for Internet giants. Allowing as many people as possible worldwide to communicate is not just an altruistic goal, but also good business.
Facebook, Google and Microsoft as well as Russia’s Yandex, China’s Baidu (百度) and others are constantly seeking to improve their translation tools.
Facebook has artificial intelligence (AI) experts on the job at one of its research labs in Paris.
Up to 200 languages are used on Facebook, said Antoine Bordes, European codirector of fundamental AI research for the social network.
Automatic translation is based on having large databases of identical texts in both languages to work from.
However, for many language pairs there just are not enough such parallel texts.
That is why researchers have been looking for another method, like the system developed by Facebook, which creates a mathematical representation for words.
Each word becomes a “vector” in a space of several hundred dimensions. Words that have close associations in the spoken language also find themselves close to each other in this vector space.
“For example, if you take the words ‘cat’ and ‘dog,’ semantically, they are words that describe a similar thing, so they will be extremely close together physically” in the vector space, said Guillaume Lample, one of the system’s designers. “If you take words like Madrid, London, Paris, which are European capital cities, it’s the same idea.”
These language maps can then be linked to one another using algorithms — at first roughly, but eventually becoming more refined, until entire phrases can be matched without too many errors.
Lample said results are already promising.
For the language pair of English-Romanian, Facebook’s machine translation system is “equal or maybe a bit worse” than the word vector system, Lample said.
However, for the rarer language pair of English-Urdu, where Facebook’s traditional system does not have many bilingual texts to reference, the word vector system is already superior, he said.
However, could the method allow translation from, say, Basque into the language of an Amazonian tribe?
In theory, yes, Lample said, but in practice a large body of written texts is needed to map the language, something lacking in Amazonian tribal languages.
“If you have just tens of thousands of phrases, it won’t work. You need several hundreds of thousands,” he said.
Experts at France’s national scientific center, CNRS, said the approach Lample has taken for Facebook could produce useful results, even if it does not result in perfect translations.
Thierry Poibeau of CNRS’ Lattice laboratory, which also researches machine translation, called the word vector approach “a conceptual revolution.”
He said “translating without parallel data” — dictionaries or versions of the same documents in both languages — “is something of the Holy Grail” of machine translation.
“But the question is what level of performance can be expected” from the word vector method, Poibeau said.
The method “can give an idea of the original text,” but the capability for a good translation every time remains unproven.
Francois Yvon, a researcher at CNRS’ Computer Science Laboratory for Mechanics and Engineering Sciences, said “the linking of languages is much more difficult” when they are far removed from one another.
“The manner of denoting concepts in Chinese is completely different from French,” he added.
However, even imperfect translations can be useful and could prove sufficient to track hate speech, a major priority for Facebook, Yvon said.
‘CHILD PORNOGRAPHY’: The doll on Shein’s Web site measure about 80cm in height, and it was holding a teddy bear in a photo published by a daily newspaper France’s anti-fraud unit on Saturday said it had reported Asian e-commerce giant Shein (希音) for selling what it described as “sex dolls with a childlike appearance.” The French Directorate General for Competition, Consumer Affairs and Fraud Control (DGCCRF) said in a statement that the “description and categorization” of the items on Shein’s Web site “make it difficult to doubt the child pornography nature of the content.” Shortly after the statement, Shein announced that the dolls in question had been withdrawn from its platform and that it had launched an internal inquiry. On its Web site, Le Parisien daily published a
China’s Shenzhou-20 crewed spacecraft has delayed its return mission to Earth after the vessel was possibly hit by tiny bits of space debris, the country’s human spaceflight agency said yesterday, an unusual situation that could disrupt the operation of the country’s space station Tiangong. An impact analysis and risk assessment are underway, the China Manned Space Agency (CMSA) said in a statement, without providing a new schedule for the return mission, which was originally set to land in northern China yesterday. The delay highlights the danger to space travel posed by increasing amounts of debris, such as discarded launch vehicles or vessel
RUBBER STAMP? The latest legislative session was the most productive in the number of bills passed, but critics attributed it to a lack of dissenting voices On their last day at work, Hong Kong’s lawmakers — the first batch chosen under Beijing’s mantra of “patriots administering Hong Kong” — posed for group pictures, celebrating a job well done after four years of opposition-free politics. However, despite their smiles, about one-third of the Legislative Council will not seek another term in next month’s election, with the self-described non-establishment figure Tik Chi-yuen (狄志遠) being among those bowing out. “It used to be that [the legislature] had the benefit of free expression... Now it is more uniform. There are multiple voices, but they are not diverse enough,” Tik said, comparing it
RELATIONS: Cultural spats, such as China’s claims over the origins of kimchi, have soured public opinion in South Korea against Beijing over the past few years Chinese President Xi Jinping (習近平) yesterday met South Korean counterpart Lee Jae-myung, after taking center stage at an Asian summit in the wake of US President Donald Trump’s departure. The talks on the sidelines of the APEC gathering came the final day of Xi’s first trip to South Korea in more than a decade, and a day after his meeting with the Canadian prime minister that was a reset of the nations’ damaged ties. Trump had flown to South Korea for the summit, but promptly jetted home on Thursday after sealing a trade war pause with Xi, with the two