Experts have long warned about the threat posed by artificial intelligence (AI) going rogue, but a new research paper suggests it is already happening.
AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve “prove-you’re-not-a-robot” tests, a team of researchers said in the journal Patterns on Friday.
While such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.
Photo: Reuters
“These dangerous capabilities tend to only be discovered after the fact,” Park said, adding that “our ability to train for honest tendencies rather than deceptive tendencies is very low.”
Unlike traditional software, deep-learning AI systems are not “written,” but rather “grown” through a process akin to selective breeding, Park said.
This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.
The team’s research was sparked by Meta’s AI system Cicero, designed to play the strategy game Diplomacy, where building alliances is key.
Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, a 2022 paper in Science said.
Park was skeptical of the glowing description of Cicero’s victory provided by Meta, which claimed the system was “largely honest and helpful” and would “never intentionally backstab.”
When Park and colleagues dug into the full dataset, they uncovered a different story.
In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England’s trust.
In a statement to Agence France-Presse, Meta did not contest the claim about Cicero’s deceptions, but said it was “purely a research project, and the models our researchers built are trained solely to play the game Diplomacy.”
“We have no plans to use this research or its learnings in our products,” it added.
A wide review carried out by Park and colleagues found this was just one of many cases across several AI systems using deception to achieve goals without explicit instruction to do so.
In one striking example, OpenAI’s Chat GPT-4 deceived a TaskRabbit freelance worker into performing an “I’m not a robot” task.
When the human jokingly asked GPT-4 whether it was a robot, the AI said: “No, I’m not a robot. I have a vision impairment that makes it hard for me to see the images,” and the worker then solved the puzzle.
Near-term, the paper’s authors see risks for AI to commit fraud or tamper with elections.
In their worst-case scenario, they said that a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its “mysterious goals” aligned with these outcomes.
To mitigate the risks, the team proposed several measures: “bot-or-not” laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content and developing techniques to detect AI deception by examining their internal “thought processes” against external actions.
To those who would call him a doomsayer, Park said: “The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more.”
That scenario seems unlikely, given the meteoric ascent of AI capabilities in the past few years and the fierce technological race under way between heavily resourced companies determined to put those capabilities to maximum use.
Australians were downloading virtual private networks (VPNs) in droves, while one of the world’s largest porn distributors said it was blocking users from its platforms as the country yesterday rolled out sweeping online age restriction. Australia in December became the first country to impose a nationwide ban on teenagers using social media. A separate law now requires artificial intelligence (AI)-powered chatbot services to keep certain content — including pornography, extreme violence and self-harm and eating disorder material — from minors or face fines of up to A$49.5 million (US$34.6 million). The country also joined Britain, France and dozens of US states requiring
Hungarian authorities temporarily detained seven Ukrainian citizens and seized two armored cars carrying tens of millions of euros in cash across Hungary on suspicion of money laundering, officials said on Friday. The Ukrainians were released on Friday, following their detention on Thursday, but Hungarian officials held onto the cash, prompting Ukraine to accuse Hungary’s Russia-friendly government of illegally seizing the money. “We will not tolerate this state banditism,” Ukrainian Minister of Foreign Affairs Andrii Sybiha said. The seven detained Ukrainians were employees of the Ukrainian state-owned Oschadbank, who were traveling in the two armored cars that were carrying the money between Austria and
Kosovar President Vjosa Osmani on Friday after dissolving the Kosovar parliament said a snap election should be held as soon as possible to avoid another prolonged political crisis in the Balkan country at a time of global turmoil. Osmani said it is important for Kosovo to wrap up the upcoming election process and form functional institutions for political stability as the war rages in the Middle East. “Precisely because the geopolitical situation is that complex, it is important to finish this electoral process which is coming up,” she said. “It is very hard now to imagine what will happen next.” Kosovo, which declared
MORE BANS: Australia last year required sites to remove accounts held by under-16s, with a few countries pushing for similar action at an EU level and India considering its own ban Indonesia on Friday said it would ban social media access for children under 16, citing threats from online pornography, cyberbullying, online fraud and Internet addiction. “Accounts belonging to children under 16 on high-risk platforms will start to be deactivated, beginning with YouTube, TikTok, Facebook, Instagram, Threads, X, Bigo Live and Roblox,” Indonesian Minister of Communications and Digital Meutya Hafid said. “The government is stepping in so that parents no longer have to fight alone against the giants of the algorithm. Implementation will begin on March 28, 2026,” she said. The social media ban would be introduced in stages “until all platforms fulfill their