Experts have long warned about the threat posed by artificial intelligence (AI) going rogue, but a new research paper suggests it is already happening.
AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve “prove-you’re-not-a-robot” tests, a team of researchers said in the journal Patterns on Friday.
While such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.
Photo: Reuters
“These dangerous capabilities tend to only be discovered after the fact,” Park said, adding that “our ability to train for honest tendencies rather than deceptive tendencies is very low.”
Unlike traditional software, deep-learning AI systems are not “written,” but rather “grown” through a process akin to selective breeding, Park said.
This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.
The team’s research was sparked by Meta’s AI system Cicero, designed to play the strategy game Diplomacy, where building alliances is key.
Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, a 2022 paper in Science said.
Park was skeptical of the glowing description of Cicero’s victory provided by Meta, which claimed the system was “largely honest and helpful” and would “never intentionally backstab.”
When Park and colleagues dug into the full dataset, they uncovered a different story.
In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England’s trust.
In a statement to Agence France-Presse, Meta did not contest the claim about Cicero’s deceptions, but said it was “purely a research project, and the models our researchers built are trained solely to play the game Diplomacy.”
“We have no plans to use this research or its learnings in our products,” it added.
A wide review carried out by Park and colleagues found this was just one of many cases across several AI systems using deception to achieve goals without explicit instruction to do so.
In one striking example, OpenAI’s Chat GPT-4 deceived a TaskRabbit freelance worker into performing an “I’m not a robot” task.
When the human jokingly asked GPT-4 whether it was a robot, the AI said: “No, I’m not a robot. I have a vision impairment that makes it hard for me to see the images,” and the worker then solved the puzzle.
Near-term, the paper’s authors see risks for AI to commit fraud or tamper with elections.
In their worst-case scenario, they said that a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its “mysterious goals” aligned with these outcomes.
To mitigate the risks, the team proposed several measures: “bot-or-not” laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content and developing techniques to detect AI deception by examining their internal “thought processes” against external actions.
To those who would call him a doomsayer, Park said: “The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more.”
That scenario seems unlikely, given the meteoric ascent of AI capabilities in the past few years and the fierce technological race under way between heavily resourced companies determined to put those capabilities to maximum use.
‘CROSSING THE LINE’: China’s embassy in Seoul criticized US Forces Korea Commander General Xavier Brunson, asking if his ‘hostile’ remarks were authorized by Washington South Korea and the US are in talks over recent public remarks by the commander of US Forces Korea, Seoul’s presidential office said yesterday, after the comments drew sharp criticism from China. In a recent podcast interview, US Forces Korea Commander General Xavier Brunson described South Korea as “the dagger in the heart of Asia” from China’s east coast, prompting the Chinese embassy in Seoul to say that he had “truly crossed the line.” The interview came amid growing speculation that Washington might seek to expand the role of US Forces Korea in countering the growing regional influence of China, a key
SEEKING ORDER: Rodrigo Paz said that ‘anyone who wants to destroy the nation will have to deal with this president and the full force of the constitution’ Bolivian President Rodrigo Paz on Wednesday said that the nation was at a “breaking point” after nearly a month of protests that have caused shortages of food, fuel and medicine. Paz, who took office six months ago amid the worst economic crisis there in four decades, is battling a groundswell of fury over his policies. The political capital, La Paz, has been besieged by low-income workers and members of the indigenous majority calling for his resignation. “The country needs order and is reaching breaking point,” the 58-year-old said at a public event in La Paz, renewing his appeal for dialogue. On Tuesday, the Bolivian
Through the noise of rushing papers and whirring belts at a print factory in Kyoto, two creators watch their photo essay come to life in broadsheet form — part of an effort to win new audiences in the age of artificial intelligence (AI). Despite the decline of the publishing industry, self-publication and handmade “zine” magazines are growing in popularity in Japan, reflecting the nation’s enduring love of paper in the digital era. While speaking to Agence France-Presse at the plant, his hands black with ink, one of the creators, Kazuma Obara, said: “I think [paper] is a medium that engages all five
Australian researchers have trained lab-grown brain cells on a silicon computer chip to play the 1990s shooter game Doom and said they are just scratching the surface of what the neurons could be capable of doing. It is the science-fiction work of biotech boffins at Cortical Labs, who researched and developed the technology that harnesses the workings of the brain’s networking system. Each so-called “biological computer” contains about 200,000 living human brain cells, grown from stem cells that were harvested from blood donations. Having mastered the simple computer game Pong, where a paddle is moved up and down to send a ball