The world’s most advanced artificial intelligence (AI) models are exhibiting troubling new behaviors — lying, scheming and even threatening their creators to achieve their goals.
In one particularly jarring example, under threat of being unplugged, Anthropic PBC’s latest creation, Claude 4, lashed back by blackmailing an engineer and threatening to reveal an extramarital affair.
Meanwhile, ChatGPT creator OpenAI’s o1 tried to download itself onto external servers and denied it when caught red-handed.
Photo: Reuters
These episodes highlight a sobering reality: More than two years after ChatGPT shook the world, AI researchers still do not fully understand how their own creations work. Yet the race to deploy increasingly powerful models continues at breakneck speed.
This deceptive behavior appears linked to the emergence of “reasoning” models — AI systems that work through problems step-by-step rather than generating instant responses.
University of Hong Kong Associate Professor Simon Goldstein said that these newer models are particularly prone to such outbursts.
“O1 was the first large model where we saw this kind of behavior,” said Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems.
These models sometimes simulate “alignment” — appearing to follow instructions, while secretly pursuing different objectives.
For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios.
“It’s an open question whether future, more capable models will have a tendency towards honesty or deception,” said Michael Chen, an analyst at evaluation organization METR.
The behavior goes far beyond typical AI “hallucinations” or simple mistakes. Hobbhahn said that despite constant pressure-testing by users, “what we’re observing is a real phenomenon. We’re not making anything up.”
Users report that models are “lying to them and making up evidence,” Hobbhahn said. “This is not just hallucinations. There’s a very strategic kind of deception.”
The challenge is compounded by limited research resources.
While companies such as Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed. Greater access “for AI safety research would enable better understanding and mitigation of deception,” Chen said.
Another handicap: The research world and nonprofit organizations “have orders of magnitude less compute resources than AI companies. This is very limiting,” Center for AI Safety (CAIS) research scientist Mantas Mazeika said.
Current regulations are not designed for these new problems. The EU’s AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving.
US President Donald Trump’s administration has shown little interest in urgent AI regulation, and the US Congress might even prohibit states from creating their own AI rules.
Goldstein said the issue would become more prominent as AI agents — autonomous tools capable of performing complex human tasks — become widespread.
“I don’t think there’s much awareness yet,” he said.
All this is taking place in a context of fierce competition.
Even companies that position themselves as safety-focused, such as Amazon.com Inc-backed Anthropic, are “constantly trying to beat OpenAI and release the newest model,” Goldstein said. This breakneck pace leaves little time for thorough safety testing and corrections.
“Right now, capabilities are moving faster than understanding and safety, but we’re still in a position where we could turn it around,” Hobbhahn said.
Researchers are exploring various approaches to address these challenges. Some advocate for “interpretability” — an emerging field focused on understanding how AI models work internally, although experts like CAIS director Dan Hendrycks remain skeptical of this approach.
Market forces might also provide some pressure for solutions. AI’s deceptive behavior “could hinder adoption if it’s very prevalent, which creates a strong incentive for companies to solve it,” Mazeika said.
Goldstein said that more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm.
He even proposed “holding AI agents legally responsible” for incidents or crimes — a concept that would fundamentally change how we think about AI accountability.
DAMAGE REPORT: Global central banks are assessing war-driven inflation risks as the law of unintended consequences careens around the world, spiking oil prices Central banks from Washington to London and from Jakarta to Taipei are about to make their first assessments of economic damage after more than two weeks of conflict between the US and Iran. Decisions this week encompassing every member of the G7 and eight of the world’s 10 most-traded currency jurisdictions are likely to confirm to investors that the specter of a new inflation shock is already worrying enough to prompt heightened caution. The US Federal Reserve is widely expected to do exactly what everyone anticipated weeks ahead of its March 17-18 policy gathering: hold rates steady. The narrative surrounding that
Taiwan Semiconductor Manufacturing Co’s (TSMC, 台積電) share of the global foundry market rose to almost 70 percent last year amid booming demand for artificial intelligence (AI), market information advisory firm TrendForce Corp (集邦科技) said on Thursday. The contract chipmaker posted US$122.54 billion in revenue, up 36.1 percent from a year earlier, accounting for 69.9 percent of the global market, TrendForce said. Its share was up from 64.4 percent in 2024, it said. TSMC’s closest rival, Samsung Electronics, was a distant second, posting US$12.63 billion in sales, down 3.9 percent from a year earlier, for a 7.2 percent share of the global market. In the
HEADWINDS: The company said it expects its computer business, as well as consumer electronics and communications segments to see revenue declines due to seasonality Pegatron Corp (和碩) yesterday said it aims to grow its artificial intelligence (AI) server revenue more than 10-fold this year from last year, driven by orders from neocloud solutions clients and large cloud service providers. The electronics manufacturing service provider said AI server revenue growth would be driven primarily by the Nvidia Corp GB300 server platform. Server shipments are expected to increase each quarter this year, with the second half likely to outperform the first half, it said. The AI server market is expected to broaden this year as more inference applications emerge, which would drive demand for system-on-chip, application-specific integrated circuits
At a massive shipyard in North Vancouver, Canadian workers grind metal beams for a powerful new icebreaker crucial to cementing the country’s presence in the increasingly contested arctic. Icebreakers are specialized, expensive vessels able to navigate in the frozen far north. And “this is the crown jewel,” said Eddie Schehr, vice president of production at the Seaspan shipyard. For Canadian Prime Minister Mark Carney, who heads to Norway next Friday to observe arctic defense drills involving troops from 14 NATO states, Canada’s extreme north has emerged as a strategic priority. “Canada is and forever will be an Arctic nation,” he said ahead of