Last week, Microsoft said it would stop selling software that guesses a person’s mood by looking at their face.
The reason was that it could be discriminatory, it said.
Computer vision software, which is used in self-driving vehicles and facial recognition, has long had issues with errors that come at the expense of women and people of color.
Illustration: Mountain People
Microsoft’s decision to halt the system entirely is one way of dealing with the problem, but there is another, novel approach that tech firms are exploring: training artificial intelligence (AI) on “synthetic” images to make it less biased.
The idea is a bit like training pilots. Instead of practicing in unpredictable, real-world conditions, most spend hundreds of hours using flight simulators designed to cover a broad array of scenarios they could experience in the air.
A similar approach is being taken to train AI, which relies on carefully labeled data to work properly.
Until recently, the software used to recognize people has been trained on thousands or millions of images of real people, but that can be time-consuming, invasive and neglectful of large swathes of the population.
Now many AI makers are using fake or “synthetic” images to train computers on a broader array of people, skin tones, ages or other features, essentially flipping the notion that fake data is bad.
In fact, if used properly it will not only make software more trustworthy, but completely transform the economics of data as the “new oil.”
In 2015, Internet entrepreneur Simi Lindgren came up with the idea for a Web site called Yuty to sell beauty products for all skin types. She wanted to use AI to recommend skincare products by analyzing selfies, but training a system to do that accurately was difficult.
A popular database of 70,000 licensed faces from Flickr, for instance, was not diverse or inclusive enough.
It showed facial hair on men, but not on women, and there were not enough melanin-rich — that is, darker-skinned — women to accurately detect their various skin conditions, such as acne or fine lines, she said.
She tried crowdsourcing and got just under 1,000 photographs of faces from her network of friends and family, but even that was not enough.
Lindgren’s team then decided to create their own data to plug the gap. The answer was something called general adversarial networks (GANs), which are a type of neural network that was in 2014 designed by Ian Goodfellow, an AI researcher now at Alphabet’s DeepMind.
The system works by trying to fool itself, and then humans, with new faces.
You can try testing your ability to tell the difference between a fake face and a real one on a Web site set up by academics at the University of Washington using a type of GAN.
Lindgren used the method to create hundreds of thousands of photorealistic images.
She ended up with “a balanced dataset of diverse people, with diverse skin tones and diverse concerns,” she said.
About 80 percent of the faces in Yuty’s database are not of real people, but synthetic images that are labeled and checked by humans who help assess her platform’s growing accuracy, she said.
Lindgren is not alone in her approach. More than 50 start-ups generate synthetic data as a service, market intelligence firm StartUs Insights said.
Microsoft has experimented with it, and Google is working with artificially generated medical histories to help predict insurance fraud.
Amazon in January said it was using synthetic data to train Alexa to overcome privacy concerns.
Remember when big tech platforms a few years ago found themselves in hot water for hiring contractors to listen in on random customers to train their AI systems? Fake data can help solve that issue.
Facebook in October last year acquired New York-based synthetic data start-up A.I.Reverie.
The trend is becoming so pervasive that consulting firm Gartner estimates that 60 percent of all data used to train AI would by 2024 be synthetic, and fake data would by 2030 completely overshadow real data for AI training.
The market for making synthetic images and videos is divided into companies that use GANs and those that design 3D graphics from scratch.
Datagen Technologies, based in Tel Aviv, Israel, does the latter. Its animations train vehicle systems to detect sleepiness.
Automakers have historically trained their sensors by filming actors pretending to fall asleep at the wheel, but that still leads to a limited set of examples, Datagen cofounder Gil Elbaz said.
The videos also have to be sent to contractors in other countries to be labeled, which can take weeks. he added.
Datagen instead creates thousands of animations of different types of people falling asleep at the wheel in different ways.
Although the animations do not look realistic to humans, Elbaz said their greater scale leads to more accurate sensors in vehicles.
Fake data is not just being used to train vision recognition systems, but also predictive software, such as the kinds banks use to decide who should get a loan.
Fairgen, a start-up also based in Tel Aviv, generates large tables of artificial identities, including names, genders, ethnicities, income levels and credit scores.
“We’re creating artificial populations, making a parallel world where discrimination wouldn’t have happened,” Fairgen CEO and cofounder Samuel Cohen said. “From this world we can sample unlimited amounts of artificial individuals and use these as data.”
For example, to help design algorithms that distribute loans more fairly to minority groups, Fairgen makes databases of artificial people from minority groups with average credit scores that are closer to those from other groups. One bank in the UK is using Fairgen’s data to hone its loan software.
Cohen said that manipulating the data that algorithms are trained on can help with positive discrimination and “recalibrating society.”
Strange as it might sound, the growth of fake data is a step in the right direction, and not just because it avoids using people’s personal data. It could also disrupt the dynamics of selling data.
Retailers, for instance, could generate extra revenue by selling synthetic data on customers’ purchasing behavior, Accenture data science and machine learning global head Fernando Lucini said.
“Business leaders need to have synthetic data on their radar,” he said.
One caveat about unintended consequences, though: With so much artificial data driving future systems, what are the risks that some of it would be used for fraud, or that it would be harder to find real identities amid the flood of fake ones?
Synthetic data would also not eliminate bias completely, said Julien Cornebise, an honorary associate professor of computer science at University College London.
“Bias is not only in the data. It’s in the people who develop these tools with their own cultural assumptions,” he said. “That’s the case for everything manmade.”
Parmy Olson, a former reporter for the Wall Street Journal and Forbes, is a Bloomberg Opinion columnist covering technology.
This column does not necessarily reflect the opinion of the editorial board or Bloomberg LP and its owners.
Recently, China launched another diplomatic offensive against Taiwan, improperly linking its “one China principle” with UN General Assembly Resolution 2758 to constrain Taiwan’s diplomatic space. After Taiwan’s presidential election on Jan. 13, China persuaded Nauru to sever diplomatic ties with Taiwan. Nauru cited Resolution 2758 in its declaration of the diplomatic break. Subsequently, during the WHO Executive Board meeting that month, Beijing rallied countries including Venezuela, Zimbabwe, Belarus, Egypt, Nicaragua, Sri Lanka, Laos, Russia, Syria and Pakistan to reiterate the “one China principle” in their statements, and assert that “Resolution 2758 has settled the status of Taiwan” to hinder Taiwan’s
Singaporean Prime Minister Lee Hsien Loong’s (李顯龍) decision to step down after 19 years and hand power to his deputy, Lawrence Wong (黃循財), on May 15 was expected — though, perhaps, not so soon. Most political analysts had been eyeing an end-of-year handover, to ensure more time for Wong to study and shadow the role, ahead of general elections that must be called by November next year. Wong — who is currently both deputy prime minister and minister of finance — would need a combination of fresh ideas, wisdom and experience as he writes the nation’s next chapter. The world that
The past few months have seen tremendous strides in India’s journey to develop a vibrant semiconductor and electronics ecosystem. The nation’s established prowess in information technology (IT) has earned it much-needed revenue and prestige across the globe. Now, through the convergence of engineering talent, supportive government policies, an expanding market and technologically adaptive entrepreneurship, India is striving to become part of global electronics and semiconductor supply chains. Indian Prime Minister Narendra Modi’s Vision of “Make in India” and “Design in India” has been the guiding force behind the government’s incentive schemes that span skilling, design, fabrication, assembly, testing and packaging, and
Can US dialogue and cooperation with the communist dictatorship in Beijing help avert a Taiwan Strait crisis? Or is US President Joe Biden playing into Chinese President Xi Jinping’s (習近平) hands? With America preoccupied with the wars in Europe and the Middle East, Biden is seeking better relations with Xi’s regime. The goal is to responsibly manage US-China competition and prevent unintended conflict, thereby hoping to create greater space for the two countries to work together in areas where their interests align. The existing wars have already stretched US military resources thin, and the last thing Biden wants is yet another war.