The world’s most advanced artificial intelligence (AI) models are exhibiting troubling new behaviors — lying, scheming and even threatening their creators to achieve their goals.
In one particularly jarring example, under threat of being unplugged, Anthropic PBC’s latest creation, Claude 4, lashed back by blackmailing an engineer and threatening to reveal an extramarital affair.
Meanwhile, ChatGPT creator OpenAI’s o1 tried to download itself onto external servers and denied it when caught red-handed.
Photo: Reuters
These episodes highlight a sobering reality: More than two years after ChatGPT shook the world, AI researchers still do not fully understand how their own creations work. Yet the race to deploy increasingly powerful models continues at breakneck speed.
This deceptive behavior appears linked to the emergence of “reasoning” models — AI systems that work through problems step-by-step rather than generating instant responses.
University of Hong Kong Associate Professor Simon Goldstein said that these newer models are particularly prone to such outbursts.
“O1 was the first large model where we saw this kind of behavior,” said Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems.
These models sometimes simulate “alignment” — appearing to follow instructions, while secretly pursuing different objectives.
For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios.
“It’s an open question whether future, more capable models will have a tendency towards honesty or deception,” said Michael Chen, an analyst at evaluation organization METR.
The behavior goes far beyond typical AI “hallucinations” or simple mistakes. Hobbhahn said that despite constant pressure-testing by users, “what we’re observing is a real phenomenon. We’re not making anything up.”
Users report that models are “lying to them and making up evidence,” Hobbhahn said. “This is not just hallucinations. There’s a very strategic kind of deception.”
The challenge is compounded by limited research resources.
While companies such as Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed. Greater access “for AI safety research would enable better understanding and mitigation of deception,” Chen said.
Another handicap: The research world and nonprofit organizations “have orders of magnitude less compute resources than AI companies. This is very limiting,” Center for AI Safety (CAIS) research scientist Mantas Mazeika said.
Current regulations are not designed for these new problems. The EU’s AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving.
US President Donald Trump’s administration has shown little interest in urgent AI regulation, and the US Congress might even prohibit states from creating their own AI rules.
Goldstein said the issue would become more prominent as AI agents — autonomous tools capable of performing complex human tasks — become widespread.
“I don’t think there’s much awareness yet,” he said.
All this is taking place in a context of fierce competition.
Even companies that position themselves as safety-focused, such as Amazon.com Inc-backed Anthropic, are “constantly trying to beat OpenAI and release the newest model,” Goldstein said. This breakneck pace leaves little time for thorough safety testing and corrections.
“Right now, capabilities are moving faster than understanding and safety, but we’re still in a position where we could turn it around,” Hobbhahn said.
Researchers are exploring various approaches to address these challenges. Some advocate for “interpretability” — an emerging field focused on understanding how AI models work internally, although experts like CAIS director Dan Hendrycks remain skeptical of this approach.
Market forces might also provide some pressure for solutions. AI’s deceptive behavior “could hinder adoption if it’s very prevalent, which creates a strong incentive for companies to solve it,” Mazeika said.
Goldstein said that more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm.
He even proposed “holding AI agents legally responsible” for incidents or crimes — a concept that would fundamentally change how we think about AI accountability.
Merida Industry Co (美利達) has seen signs of recovery in the US and European markets this year, as customers are gradually depleting their inventories, the bicycle maker told shareholders yesterday. Given robust growth in new orders at its Taiwanese factory, coupled with its subsidiaries’ improving performance, Merida said it remains confident about the bicycle market’s prospects and expects steady growth in its core business this year. CAUTION ON CHINA However, the company must handle the Chinese market with great caution, as sales of road bikes there have declined significantly, affecting its revenue and profitability, Merida said in a statement, adding that it would
RISING: Strong exports, and life insurance companies’ efforts to manage currency risks indicates the NT dollar would eventually pass the 29 level, an expert said The New Taiwan dollar yesterday rallied to its strongest in three years amid inflows to the nation’s stock market and broad-based weakness in the US dollar. Exporter sales of the US currency and a repatriation of funds from local asset managers also played a role, said two traders, who asked not to be identified as they were not authorized to speak publicly. State-owned banks were seen buying the greenback yesterday, but only at a moderate scale, the traders said. The local currency gained 0.77 percent, outperforming almost all of its Asian peers, to close at NT$29.165 per US dollar in Taipei trading yesterday. The
RECORD LOW: Global firms’ increased inventories, tariff disputes not yet impacting Taiwan and new graduates not yet entering the market contributed to the decrease Taiwan’s unemployment rate last month dropped to 3.3 percent, the lowest for the month in 25 years, as strong exports and resilient domestic demand boosted hiring across various sectors, the Directorate-General of Budget, Accounting and Statistics (DGBAS) said yesterday. After seasonal adjustments, the jobless rate eased to 3.34 percent, the best performance in 24 years, suggesting a stable labor market, although a mild increase is expected with the graduation season from this month through August, the statistics agency said. “Potential shocks from tariff disputes between the US and China have yet to affect Taiwan’s job market,” Census Department Deputy Director Tan Wen-ling
UNCERTAINTIES: The world’s biggest chip packager and tester is closely monitoring the US’ tariff policy before making any capacity adjustments, a company official said ASE Technology Holding Inc (日月光投控), the world’s biggest chip packager and tester, yesterday said it is cautiously evaluating new advanced packaging capacity expansion in the US in response to customers’ requests amid uncertainties about the US’ tariff policy. Compared with its semiconductor peers, ASE has been relatively prudent about building new capacity in the US. However, the company is adjusting its global manufacturing footprint expansion after US President Donald Trump announced “reciprocal” tariffs in April, and new import duties targeting semiconductors and other items that are vital to national security. ASE subsidiary Siliconware Precision Industries Co (SPIL, 矽品精密) is participating in Nvidia