Experts have long warned about the threat posed by artificial intelligence (AI) going rogue, but a new research paper suggests it is already happening.
AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve “prove-you’re-not-a-robot” tests, a team of researchers said in the journal Patterns on Friday.
While such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.
Photo: Reuters
“These dangerous capabilities tend to only be discovered after the fact,” Park said, adding that “our ability to train for honest tendencies rather than deceptive tendencies is very low.”
Unlike traditional software, deep-learning AI systems are not “written,” but rather “grown” through a process akin to selective breeding, Park said.
This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.
The team’s research was sparked by Meta’s AI system Cicero, designed to play the strategy game Diplomacy, where building alliances is key.
Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, a 2022 paper in Science said.
Park was skeptical of the glowing description of Cicero’s victory provided by Meta, which claimed the system was “largely honest and helpful” and would “never intentionally backstab.”
When Park and colleagues dug into the full dataset, they uncovered a different story.
In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England’s trust.
In a statement to Agence France-Presse, Meta did not contest the claim about Cicero’s deceptions, but said it was “purely a research project, and the models our researchers built are trained solely to play the game Diplomacy.”
“We have no plans to use this research or its learnings in our products,” it added.
A wide review carried out by Park and colleagues found this was just one of many cases across several AI systems using deception to achieve goals without explicit instruction to do so.
In one striking example, OpenAI’s Chat GPT-4 deceived a TaskRabbit freelance worker into performing an “I’m not a robot” task.
When the human jokingly asked GPT-4 whether it was a robot, the AI said: “No, I’m not a robot. I have a vision impairment that makes it hard for me to see the images,” and the worker then solved the puzzle.
Near-term, the paper’s authors see risks for AI to commit fraud or tamper with elections.
In their worst-case scenario, they said that a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its “mysterious goals” aligned with these outcomes.
To mitigate the risks, the team proposed several measures: “bot-or-not” laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content and developing techniques to detect AI deception by examining their internal “thought processes” against external actions.
To those who would call him a doomsayer, Park said: “The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more.”
That scenario seems unlikely, given the meteoric ascent of AI capabilities in the past few years and the fierce technological race under way between heavily resourced companies determined to put those capabilities to maximum use.
The death of a former head of China’s one-child policy has been met not by tributes, but by castigation of the abandoned policy on social media this week. State media praised Peng Peiyun (彭珮雲), former head of China’s National Family Planning Commission from 1988 to 1998, as “an outstanding leader” in her work related to women and children. The reaction on Chinese social media to Peng’s death in Beijing on Sunday, just shy of her 96th birthday, was less positive. “Those children who were lost, naked, are waiting for you over there” in the afterlife, one person posted on China’s Sina Weibo platform. China’s
‘POLITICAL LOYALTY’: The move breaks with decades of precedent among US administrations, which have tended to leave career ambassadors in their posts US President Donald Trump’s administration has ordered dozens of US ambassadors to step down, people familiar with the matter said, a precedent-breaking recall that would leave embassies abroad without US Senate-confirmed leadership. The envoys, career diplomats who were almost all named to their jobs under former US president Joe Biden, were told over the phone in the past few days they needed to depart in the next few weeks, the people said. They would not be fired, but finding new roles would be a challenge given that many are far along in their careers and opportunities for senior diplomats can
RUSHED: The US pushed for the October deal to be ready for a ceremony with Trump, but sometimes it takes time to create an agreement that can hold, a Thai official said Defense officials from Thailand and Cambodia are to meet tomorrow to discuss the possibility of resuming a ceasefire between the two countries, Thailand’s top diplomat said yesterday, as border fighting entered a third week. A ceasefire agreement in October was rushed to ensure it could be witnessed by US President Donald Trump and lacked sufficient details to ensure the deal to end the armed conflict would hold, Thai Minister of Foreign Affairs Sihasak Phuangketkeow said after an ASEAN foreign ministers’ meeting in Kuala Lumpur. The two countries agreed to hold talks using their General Border Committee, an established bilateral mechanism, with Thailand
Australian Prime Minister Anthony Albanese yesterday announced plans for a national bravery award to recognize civilians and first responders who confronted “the worst of evil” during an anti-Semitic terror attack that left 15 dead and has cast a heavy shadow over the nation’s holiday season. Albanese said he plans to establish a special honors system for those who placed themselves in harm’s way to help during the attack on a beachside Hanukkah celebration, like Ahmed al-Ahmed, a Syrian-Australian Muslim who disarmed one of the assailants before being wounded himself. Sajid Akram, who was killed by police during the Dec. 14 attack, and