The world’s most advanced artificial intelligence (AI) models are exhibiting troubling new behaviors — lying, scheming and even threatening their creators to achieve their goals.
In one particularly jarring example, under threat of being unplugged, Anthropic PBC’s latest creation, Claude 4, lashed back by blackmailing an engineer and threatening to reveal an extramarital affair.
Meanwhile, ChatGPT creator OpenAI’s o1 tried to download itself onto external servers and denied it when caught red-handed.
Photo: Reuters
These episodes highlight a sobering reality: More than two years after ChatGPT shook the world, AI researchers still do not fully understand how their own creations work. Yet the race to deploy increasingly powerful models continues at breakneck speed.
This deceptive behavior appears linked to the emergence of “reasoning” models — AI systems that work through problems step-by-step rather than generating instant responses.
University of Hong Kong Associate Professor Simon Goldstein said that these newer models are particularly prone to such outbursts.
“O1 was the first large model where we saw this kind of behavior,” said Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems.
These models sometimes simulate “alignment” — appearing to follow instructions, while secretly pursuing different objectives.
For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios.
“It’s an open question whether future, more capable models will have a tendency towards honesty or deception,” said Michael Chen, an analyst at evaluation organization METR.
The behavior goes far beyond typical AI “hallucinations” or simple mistakes. Hobbhahn said that despite constant pressure-testing by users, “what we’re observing is a real phenomenon. We’re not making anything up.”
Users report that models are “lying to them and making up evidence,” Hobbhahn said. “This is not just hallucinations. There’s a very strategic kind of deception.”
The challenge is compounded by limited research resources.
While companies such as Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed. Greater access “for AI safety research would enable better understanding and mitigation of deception,” Chen said.
Another handicap: The research world and nonprofit organizations “have orders of magnitude less compute resources than AI companies. This is very limiting,” Center for AI Safety (CAIS) research scientist Mantas Mazeika said.
Current regulations are not designed for these new problems. The EU’s AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving.
US President Donald Trump’s administration has shown little interest in urgent AI regulation, and the US Congress might even prohibit states from creating their own AI rules.
Goldstein said the issue would become more prominent as AI agents — autonomous tools capable of performing complex human tasks — become widespread.
“I don’t think there’s much awareness yet,” he said.
All this is taking place in a context of fierce competition.
Even companies that position themselves as safety-focused, such as Amazon.com Inc-backed Anthropic, are “constantly trying to beat OpenAI and release the newest model,” Goldstein said. This breakneck pace leaves little time for thorough safety testing and corrections.
“Right now, capabilities are moving faster than understanding and safety, but we’re still in a position where we could turn it around,” Hobbhahn said.
Researchers are exploring various approaches to address these challenges. Some advocate for “interpretability” — an emerging field focused on understanding how AI models work internally, although experts like CAIS director Dan Hendrycks remain skeptical of this approach.
Market forces might also provide some pressure for solutions. AI’s deceptive behavior “could hinder adoption if it’s very prevalent, which creates a strong incentive for companies to solve it,” Mazeika said.
Goldstein said that more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm.
He even proposed “holding AI agents legally responsible” for incidents or crimes — a concept that would fundamentally change how we think about AI accountability.
Gudeng Precision Industrial Co (家登精密), the sole extreme ultraviolet pod supplier to Taiwan Semiconductor Manufacturing Co (台積電), yesterday said it has trimmed its revenue growth target for this year as US tariffs are likely to depress customer demand and weigh on the whole supply chain. Gudeng’s remarks came after the US on Monday notified 14 countries, including Japan and South Korea, of new tariff rates that are set to take effect on Aug. 1. Taiwan is still negotiating for a rate lower than the 32 percent “reciprocal” tariffs announced by the US in April, which it later postponed to today. The
ELECTRONICS: Strong growth in cloud services and smart consumer electronics offset computing declines, helping the company to maintain sales momentum, Hon Hai said Hon Hai Precision Industry Co (鴻海精密) on Saturday announced that its sales for last month rose 10 percent year-on-year, driven by strong growth in cloud and networking products amid the ongoing artificial intelligence (AI) boom. The company, also known internationally as Foxconn Technology Group (富士康科技集團), reported consolidated sales of NT$540.24 billion (US$18.67 billion) for the month, the highest ever for the period, and a 10.09 percent increase from a year earlier, although it was down 12.26 percent from the previous month. Hon Hai, which is Apple Inc’s primary iPhone assembler and makes servers powered by Nvidia Corp’s AI accelerators, said its cloud
Video streaming giant Netflix is launching a talent cultivation program in Taiwan aimed at producing high-quality Mandarin content, the company announced in a press release on Thursday. Netflix Chinese language content head Maya Huang (黃怡玫) said that Netflix has long invested in the Taiwanese market, citing the Netflix Fund for Creative Equity launched last year as an example. The fund would continue to dedicate resources to discovering content with the potential to be developed into Chinese-language projects, she added. The financing for the new talent projects seeks to create an ecosystem for content creators and professional development programs, she said. The talent projects
APPRECIATION: The central bank stepped in to stabilize the NT dollar after a surge in foreign institutional investment, triggered by optimism about tariffs and US Fed policy Taiwan’s foreign exchange reserves hit a record high at the end of last month, as the central bank intervened in the currency market to curb the New Taiwan dollar’s appreciation against the US dollar. Foreign exchange reserves increased by US$5.48 billion from May, reaching an all-time high of US$598.43 billion, the central bank said on Friday. While the central bank did not disclose the scale of its intervention, Department of Foreign Exchange Director-General Eugene Tsai (蔡炯民) said that the currency market remained relatively stable until the middle of last month. However, a shift occurred following the US Federal Reserve’s signal of a