Experts have long warned about the threat posed by artificial intelligence (AI) going rogue, but a new research paper suggests it is already happening.
AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve “prove-you’re-not-a-robot” tests, a team of researchers said in the journal Patterns on Friday.
While such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.
Photo: Reuters
“These dangerous capabilities tend to only be discovered after the fact,” Park said, adding that “our ability to train for honest tendencies rather than deceptive tendencies is very low.”
Unlike traditional software, deep-learning AI systems are not “written,” but rather “grown” through a process akin to selective breeding, Park said.
This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.
The team’s research was sparked by Meta’s AI system Cicero, designed to play the strategy game Diplomacy, where building alliances is key.
Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, a 2022 paper in Science said.
Park was skeptical of the glowing description of Cicero’s victory provided by Meta, which claimed the system was “largely honest and helpful” and would “never intentionally backstab.”
When Park and colleagues dug into the full dataset, they uncovered a different story.
In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England’s trust.
In a statement to Agence France-Presse, Meta did not contest the claim about Cicero’s deceptions, but said it was “purely a research project, and the models our researchers built are trained solely to play the game Diplomacy.”
“We have no plans to use this research or its learnings in our products,” it added.
A wide review carried out by Park and colleagues found this was just one of many cases across several AI systems using deception to achieve goals without explicit instruction to do so.
In one striking example, OpenAI’s Chat GPT-4 deceived a TaskRabbit freelance worker into performing an “I’m not a robot” task.
When the human jokingly asked GPT-4 whether it was a robot, the AI said: “No, I’m not a robot. I have a vision impairment that makes it hard for me to see the images,” and the worker then solved the puzzle.
Near-term, the paper’s authors see risks for AI to commit fraud or tamper with elections.
In their worst-case scenario, they said that a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its “mysterious goals” aligned with these outcomes.
To mitigate the risks, the team proposed several measures: “bot-or-not” laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content and developing techniques to detect AI deception by examining their internal “thought processes” against external actions.
To those who would call him a doomsayer, Park said: “The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more.”
That scenario seems unlikely, given the meteoric ascent of AI capabilities in the past few years and the fierce technological race under way between heavily resourced companies determined to put those capabilities to maximum use.
EUROPEAN FUTURE? Albanian Prime Minister Edi Rama says only he could secure EU membership, but challenges remain in dealing with corruption and a brain drain Albanian Prime Minister Edi Rama seeks to win an unprecedented fourth term, pledging to finally take the country into the EU and turn it into a hot tourist destination with some help from the Trump family. The artist-turned-politician has been pitching Albania as a trendy coastal destination, which has helped to drive up tourism arrivals to a record 11 million last year. US President Donald Trump’s son-in-law, Jared Kushner, also joined in the rush, pledging to invest US$1.4 billion to turn a largely deserted island into a luxurious getaway. Rama is expected to win another term after yesterday’s vote. The vote would
FRAUD ALLEGED: The leader of an opposition alliance made allegations of electoral irregularities and called for a protest in Tirana as European leaders are to meet Albanian Prime Minister Edi Rama’s Socialist Party scored a large victory in parliamentary elections, securing him his fourth term, official results showed late on Tuesday. The Socialist Party won 52.1 percent of the vote on Sunday compared with 34.2 percent for an alliance of opposition parties led by his main rival Sali Berisha, according to results released by the Albanian Central Election Commission. Diaspora votes have yet to be counted, but according to initial results, Rama was also leading there. According to projections, the Socialist Party could have more lawmakers than in 2021 elections. At the time, it won 74 seats in the
CANCER: Jose Mujica earned the moniker ‘world’s poorest president’ for giving away much of his salary and living a simple life on his farm, with his wife and dog Tributes poured in on Tuesday from across Latin America following the death of former Uruguayan president Jose “Pepe” Mujica, an ex-guerrilla fighter revered by the left for his humility and progressive politics. He was 89. Mujica, who spent a dozen years behind bars for revolutionary activity, lost his battle against cancer after announcing in January that the disease had spread and he would stop treatment. “With deep sorrow, we announce the passing of our comrade Pepe Mujica. President, activist, guide and leader. We will miss you greatly, old friend,” Uruguayan President Yamandu Orsi wrote on X. “Pepe, eternal,” a cyclist shouted out minutes later,
Myanmar’s junta chief met Chinese President Xi Jinping (習近平) for the first time since seizing power, state media reported yesterday, the highest-level meeting with a key ally for the internationally sanctioned military leader. Senior General Min Aung Hlaing led a military coup in 2021, overthrowing Myanmar’s brief experiment with democracy and plunging the nation into civil war. In the four years since, his armed forces have battled dozens of ethnic armed groups and rebel militias — some with close links to China — opposed to its rule. The conflict has seen Min Aung Hlaing draw condemnation from rights groups and pursued by the