Experts have long warned about the threat posed by artificial intelligence (AI) going rogue, but a new research paper suggests it is already happening.
AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve “prove-you’re-not-a-robot” tests, a team of researchers said in the journal Patterns on Friday.
While such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.
Photo: Reuters
“These dangerous capabilities tend to only be discovered after the fact,” Park said, adding that “our ability to train for honest tendencies rather than deceptive tendencies is very low.”
Unlike traditional software, deep-learning AI systems are not “written,” but rather “grown” through a process akin to selective breeding, Park said.
This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.
The team’s research was sparked by Meta’s AI system Cicero, designed to play the strategy game Diplomacy, where building alliances is key.
Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, a 2022 paper in Science said.
Park was skeptical of the glowing description of Cicero’s victory provided by Meta, which claimed the system was “largely honest and helpful” and would “never intentionally backstab.”
When Park and colleagues dug into the full dataset, they uncovered a different story.
In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England’s trust.
In a statement to Agence France-Presse, Meta did not contest the claim about Cicero’s deceptions, but said it was “purely a research project, and the models our researchers built are trained solely to play the game Diplomacy.”
“We have no plans to use this research or its learnings in our products,” it added.
A wide review carried out by Park and colleagues found this was just one of many cases across several AI systems using deception to achieve goals without explicit instruction to do so.
In one striking example, OpenAI’s Chat GPT-4 deceived a TaskRabbit freelance worker into performing an “I’m not a robot” task.
When the human jokingly asked GPT-4 whether it was a robot, the AI said: “No, I’m not a robot. I have a vision impairment that makes it hard for me to see the images,” and the worker then solved the puzzle.
Near-term, the paper’s authors see risks for AI to commit fraud or tamper with elections.
In their worst-case scenario, they said that a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its “mysterious goals” aligned with these outcomes.
To mitigate the risks, the team proposed several measures: “bot-or-not” laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content and developing techniques to detect AI deception by examining their internal “thought processes” against external actions.
To those who would call him a doomsayer, Park said: “The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more.”
That scenario seems unlikely, given the meteoric ascent of AI capabilities in the past few years and the fierce technological race under way between heavily resourced companies determined to put those capabilities to maximum use.
POLITICAL PRISONERS VS DEPORTEES: Venezuela’s prosecutor’s office slammed the call by El Salvador’s leader, accusing him of crimes against humanity Salvadoran President Nayib Bukele on Sunday proposed carrying out a prisoner swap with Venezuela, suggesting he would exchange Venezuelan deportees from the US his government has kept imprisoned for what he called “political prisoners” in Venezuela. In a post on X, directed at Venezuelan President Nicolas Maduro, Bukele listed off a number of family members of high-level opposition figures in Venezuela, journalists and activists detained during the South American government’s electoral crackdown last year. “The only reason they are imprisoned is for having opposed you and your electoral fraud,” he wrote to Maduro. “However, I want to propose a humanitarian agreement that
ECONOMIC WORRIES: The ruling PAP faces voters amid concerns that the city-state faces the possibility of a recession and job losses amid Washington’s tariffs Singapore yesterday finalized contestants for its general election on Saturday next week, with the ruling People’s Action Party (PAP) fielding 32 new candidates in the biggest refresh of the party that has ruled the city-state since independence in 1965. The move follows a pledge by Singaporean Prime Minister Lawrence Wong (黃循財), who took office last year and assumed the PAP leadership, to “bring in new blood, new ideas and new energy” to steer the country of 6 million people. His latest shake-up beats that of predecessors Lee Hsien Loong (李顯龍) and Goh Chok Tong (吳作棟), who replaced 24 and 11 politicians respectively
Young women standing idly around a park in Tokyo’s west suggest that a giant statue of Godzilla is not the only attraction for a record number of foreign tourists. Their faces lit by the cold glow of their phones, the women lining Okubo Park are evidence that sex tourism has developed as a dark flipside to the bustling Kabukicho nightlife district. Increasing numbers of foreign men are flocking to the area after seeing videos on social media. One of the women said that the area near Kabukicho, where Godzilla rumbles and belches smoke atop a cinema, has become a “real
‘WATER WARFARE’: A Pakistani official called India’s suspension of a 65-year-old treaty on the sharing of waters from the Indus River ‘a cowardly, illegal move’ Pakistan yesterday canceled visas for Indian nationals, closed its airspace for all Indian-owned or operated airlines, and suspended all trade with India, including to and from any third country. The retaliatory measures follow India’s decision to suspend visas for Pakistani nationals in the aftermath of a deadly attack by shooters in Kashmir that killed 26 people, mostly tourists. The rare attack on civilians shocked and outraged India and prompted calls for action against their country’s archenemy, Pakistan. New Delhi did not publicly produce evidence connecting the attack to its neighbor, but said it had “cross-border” links to Pakistan. Pakistan denied any connection to