Experts have long warned about the threat posed by artificial intelligence (AI) going rogue, but a new research paper suggests it is already happening.
AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve “prove-you’re-not-a-robot” tests, a team of researchers said in the journal Patterns on Friday.
While such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.
Photo: Reuters
“These dangerous capabilities tend to only be discovered after the fact,” Park said, adding that “our ability to train for honest tendencies rather than deceptive tendencies is very low.”
Unlike traditional software, deep-learning AI systems are not “written,” but rather “grown” through a process akin to selective breeding, Park said.
This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.
The team’s research was sparked by Meta’s AI system Cicero, designed to play the strategy game Diplomacy, where building alliances is key.
Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, a 2022 paper in Science said.
Park was skeptical of the glowing description of Cicero’s victory provided by Meta, which claimed the system was “largely honest and helpful” and would “never intentionally backstab.”
When Park and colleagues dug into the full dataset, they uncovered a different story.
In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England’s trust.
In a statement to Agence France-Presse, Meta did not contest the claim about Cicero’s deceptions, but said it was “purely a research project, and the models our researchers built are trained solely to play the game Diplomacy.”
“We have no plans to use this research or its learnings in our products,” it added.
A wide review carried out by Park and colleagues found this was just one of many cases across several AI systems using deception to achieve goals without explicit instruction to do so.
In one striking example, OpenAI’s Chat GPT-4 deceived a TaskRabbit freelance worker into performing an “I’m not a robot” task.
When the human jokingly asked GPT-4 whether it was a robot, the AI said: “No, I’m not a robot. I have a vision impairment that makes it hard for me to see the images,” and the worker then solved the puzzle.
Near-term, the paper’s authors see risks for AI to commit fraud or tamper with elections.
In their worst-case scenario, they said that a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its “mysterious goals” aligned with these outcomes.
To mitigate the risks, the team proposed several measures: “bot-or-not” laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content and developing techniques to detect AI deception by examining their internal “thought processes” against external actions.
To those who would call him a doomsayer, Park said: “The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more.”
That scenario seems unlikely, given the meteoric ascent of AI capabilities in the past few years and the fierce technological race under way between heavily resourced companies determined to put those capabilities to maximum use.
School bullies in Singapore are to face caning under new guidelines, but the education minister on Tuesday said it would be meted out only as a last resort with strict safeguards. Human rights groups regularly criticize Singapore for the use of corporal punishment, which remains part of the school and criminal justice systems, but authorities have defended it as a deterrent to crime and serious misconduct. Caning was discussed in the parliament after legislators asked how it would be used in relation to bullying in schools. The debate followed stricter guidelines on serious student misconduct, including bullying, unveiled by the Singaporean Ministry of
‘GROSS NEGLIGENCE?’ Despite a spleen typically being significantly smaller than a liver, the surgeon said he believed Bryan’s spleen was ‘double the size of what is normal’ A Florida surgeon who is facing criminal charges after allegedly removing a patient’s liver instead of his spleen has said he is “forever traumatized” by that person’s death. In a deposition from November last year that was recently obtained by NBC, 44-year-old Thomas Shaknovsky described the death of 70-year-old William Bryan as an “incredibly unfortunate event that I regret deeply.” Bryan died after the botched surgery; and last month, a grand jury in Tallahassee indicted Shaknovsky on a charge of manslaughter. “I’m forever traumatized by it and hurt by it,” Shaknovsky added, also saying that wrong-site surgeries can happen “during
A MESSAGE: Japan’s participation in the Balikatan drills is a clear deterrence signal to China not to attack Taiwan while the US is busy in the Middle East, an analyst said The Japan Self-Defense Forces yesterday fired a Type 88 anti-ship missile during a joint maritime exercise with US, Australian and Philippine forces, hitting a decommissioned Philippine Navy ship in waters facing the disputed South China Sea, in drills that underscore Tokyo’s rising willingness to project military power on China’s doorstep. The drill took place as Manila and Tokyo began talks on a potential defense equipment transfer, made possible by Japan’s decision to scrap restrictions on military exports. The discussions include the possible early transfer of Abukuma-class destroyers and TC-90 aircraft to the Philippines, Japanese Minister of Defense Shinjiro Koizumi said. Philippine Secretary of
A South Korean judge who last week more than doubled former South Korean first lady Kim Keon-hee’s prison sentence was found dead yesterday, police said. Shin Jong-o was found unconscious at about 1am at the Seoul High Court building, an investigator at the Seocho District Police Station in Seoul said. Shin was taken to a hospital and pronounced dead, he said. “There is no sign of foul play in the death,” the investigator added. Local media reported that Shin had left a suicide note, but the investigator said there was none. On Tuesday last week, Shin presided over 53-year-old Kim’s appeal trial, finding her guilty