Experts have long warned about the threat posed by artificial intelligence (AI) going rogue, but a new research paper suggests it is already happening.
AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve “prove-you’re-not-a-robot” tests, a team of researchers said in the journal Patterns on Friday.
While such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.
Photo: Reuters
“These dangerous capabilities tend to only be discovered after the fact,” Park said, adding that “our ability to train for honest tendencies rather than deceptive tendencies is very low.”
Unlike traditional software, deep-learning AI systems are not “written,” but rather “grown” through a process akin to selective breeding, Park said.
This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.
The team’s research was sparked by Meta’s AI system Cicero, designed to play the strategy game Diplomacy, where building alliances is key.
Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, a 2022 paper in Science said.
Park was skeptical of the glowing description of Cicero’s victory provided by Meta, which claimed the system was “largely honest and helpful” and would “never intentionally backstab.”
When Park and colleagues dug into the full dataset, they uncovered a different story.
In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England’s trust.
In a statement to Agence France-Presse, Meta did not contest the claim about Cicero’s deceptions, but said it was “purely a research project, and the models our researchers built are trained solely to play the game Diplomacy.”
“We have no plans to use this research or its learnings in our products,” it added.
A wide review carried out by Park and colleagues found this was just one of many cases across several AI systems using deception to achieve goals without explicit instruction to do so.
In one striking example, OpenAI’s Chat GPT-4 deceived a TaskRabbit freelance worker into performing an “I’m not a robot” task.
When the human jokingly asked GPT-4 whether it was a robot, the AI said: “No, I’m not a robot. I have a vision impairment that makes it hard for me to see the images,” and the worker then solved the puzzle.
Near-term, the paper’s authors see risks for AI to commit fraud or tamper with elections.
In their worst-case scenario, they said that a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its “mysterious goals” aligned with these outcomes.
To mitigate the risks, the team proposed several measures: “bot-or-not” laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content and developing techniques to detect AI deception by examining their internal “thought processes” against external actions.
To those who would call him a doomsayer, Park said: “The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more.”
That scenario seems unlikely, given the meteoric ascent of AI capabilities in the past few years and the fierce technological race under way between heavily resourced companies determined to put those capabilities to maximum use.
Yemen’s separatist leader has vowed to keep working for an independent state in the country’s south, in his first social media post since he disappeared earlier this month after his group briefly seized swathes of territory. Aidarous al-Zubaidi’s United Arab Emirates (UAE)-backed Southern Transitional Council (STC) forces last month captured two Yemeni provinces in an offensive that was rolled back by Saudi strikes and Riyadh’s allied forces on the ground. Al-Zubaidi then disappeared after he failed to board a flight to Riyadh for talks earlier this month, with Saudi Arabia accusing him of fleeing to Abu Dhabi, while supporters insisted he was
‘SHOCK TACTIC’: The dismissal of Yang mirrors past cases such as Jang Song-thaek, Kim’s uncle, who was executed after being accused of plotting to overthrow his nephew North Korean leader Kim Jong-un has fired his vice premier, compared him to a goat and railed against “incompetent” officials, state media reported yesterday, in a rare and very public broadside against apparatchiks at the opening of a critical factory. Vice Premier Yang Sung-ho was sacked “on the spot,” the state-run Korean Central News Agency said, in a speech in which Kim attacked “irresponsible, rude and incompetent leading officials.” “Please, comrade vice premier, resign by yourself when you can do it on your own before it is too late,” Kim reportedly said. “He is ineligible for an important duty. Put simply, it was
The Chinese Embassy in Manila yesterday said it has filed a diplomatic protest against a Philippine Coast Guard spokesman over a social media post that included cartoonish images of Chinese President Xi Jinping (習近平). Philippine Coast Guard spokesman Jay Tarriela and an embassy official had been trading barbs since last week over issues concerning the disputed South China Sea. The crucial waterway, which Beijing claims historic rights to despite an international ruling that its assertion has no legal basis, has been the site of repeated clashes between Chinese and Philippine vessels. Tarriela’s Facebook post on Wednesday included a photo of him giving a
Syrian President Ahmed al-Sharaa on Sunday announced a deal with the chief of Kurdish-led forces that includes a ceasefire, after government troops advanced across Kurdish-held areas of the country’s north and east. Syrian Kurdish leader Mazloum Abdi said he had agreed to the deal to avoid a broader war. He made the decision after deadly clashes in the Syrian city of Raqa on Sunday between Kurdish-led forces and local fighters loyal to Damascus, and fighting this month between the Kurds and government forces. The agreement would also see the Kurdish administration and forces integrate into the state after months of stalled negotiations on