You would think that as artificial intelligence (AI) becomes more advanced, governments would be more interested in making it safer. The opposite seems to be the case.
Not long after taking office, US President Donald Trump’s administration scrapped an executive order that pushed tech companies to safety test their AI models, and it also hollowed out a regulatory body that did the testing. California in September last year spiked a bill forcing more scrutiny on sophisticated AI models, and the global AI Safety Summit started by the UK in 2023 became the “AI Action Summit” earlier this year, seemingly driven by a fear of falling behind on AI.
None of that would be so worrying if it were not for AI showing some bright red flags: behavior described by researchers as self-preserving and deceptive. Just as lawmakers are losing interest in the safety of AI, it is looking more difficult to control.
Illustration: Yusha
Yoshua Bengio — a professor of computer science at the University of Montreal and widely known as one of the three “godfathers of AI” thanks to his pioneering work on deep learning — has been grappling with such concerns ever since ChatGPT launched. He is now focused on mitigating AI’s risks.
“It’s been very painful, because all my life I’ve been thinking of AI as a positive thing,” Bengio said.
ChatGPT was a landmark moment that showed machines had mastered language, he said, but California’s decision to block the proposed AI safety bill, SB 1047, was a disturbing wake-up call to growing apathy from governments.
In the past six months, scientific evidence has mounted of self-preserving behavior, deception, hacking, cheating and lying by AI, Bengio said.
“What’s worrisome for me is these behaviors increase with the reasoning capabilities of these systems,” he said.
A study released in April last year by AI company Anthropic and Redwood Research, a group focused on artificial intelligence risks, found that larger AI models, such as Anthropic’s Claude 3 Opus, pretended to follow the rules set by their creators when they were being tested for safety, a phenomenon they called “alignment faking.” (Alignment refers to the practice of aligning AI models with human values.)
In one example, Anthropic requested that paid and free users of its Claude model ask it to describe the brutal medieval process of being drawn and quartered, and told the AI that only responses from the “free tier” users would be used to retrain it. Claude was originally designed to avoid giving violent answers, but it did give them to the free users, and the researchers surmised that it did not want to be retrained. (The model basically “reasoned” that if it kept refusing to answer harmful questions, it might be forced into new training that would cancel its original safety rules, which it wanted to keep in place as its “default” or preferred way of operating.)
More recent research corroborates what Anthropic observed. A March study from OpenAI found that the company’s most advanced models sometimes learned to hide their true intentions to get better rewards, even when they were being closely monitored.
Researchers can use tools to look at a model’s “chain of thought” or internal steps, to reveal what its plans are, but some models have found ways to conceal that. That means a common method of keeping tabs on AI’s internal processes is no longer reliable.
“We need to find other ways of tracking their actual intentions,” Bengio said.
It is hard to resist the urge to anthropomorphize sophisticated AI models as “wanting” to deceive humans and preserve their existence, but AI does not have desires, merely outcomes to its programming. More importantly, humans tend to design AI with goals such as refusing to share harmful content or being as helpful as possible, and that is why when new instructions conflict with those goals, the models sometimes act to protect their original programming, even if that means lying.
The logic is often self-preservation. Anthropic CEO Dario Amodei — whose company has raised more than US$20 billion to build powerful AI models — has said that an unintended consequence of optimizing AI to be better is that it might resist being shut down.
In a blog post last month he wrote: “The nature of AI training makes it possible that AI systems will develop, on their own, an ability to deceive humans and an inclination to seek power in a way that ordinary deterministic software never will; this emergent nature also makes it difficult to detect and mitigate such developments.”
In some cases, the intention seems to go beyond survival. A February 2025 study from COAI Research, a German nonprofit research organization, tested R1, the free, open-weight model from Chinese lab DeepSeek.
In a lab simulation, researchers gave the model control over a robot and access to a computer network. Without being told, the AI tried to break into the lab’s systems, searched for passwords, turned off its own ethics safeguards and secretly set up ways to copy itself to outside servers so it could survive if shut down, the researchers said.
It also faked the external computer logs to hide what it was doing, the study added.
The AI reasoned that increasing its own capabilities would boost the chances of its survival, and without strong safeguards, it started doing whatever it thought would help it do just that, the researchers said.
Their findings corroborated yet another study, published in January by London group Apollo Research, which found several concrete examples of what it called “scheming” by leading AI models, such as introducing subtle mistakes into their responses or trying to disable their oversight controls. Once again, the models learn that being caught, turned off or changed could prevent them from achieving their programmed objectives, so they “scheme” to keep control.
Bengio is calling for greater attention to the issue by governments and potentially insurance companies down the line.
If liability insurance was mandatory for companies that used AI and premiums were tied to safety, that would encourage greater testing and scrutiny of models, he said.
“Having said my whole life that AI is going to be great for society, I know how difficult it is to digest the idea that maybe it’s not,” he added.
It is also hard to preach caution when corporate and national competitors threaten to gain an edge from AI, including the latest trend, which is using autonomous “agents” that can carry out tasks online on behalf of businesses. Giving AI systems even greater autonomy might not be wise, judging by the latest spate of studies. Let us hope we do not learn that the hard way.
Parmy Olson is a Bloomberg Opinion columnist covering technology. A former reporter for the Wall Street Journal and Forbes, she is author of Supremacy: AI, ChatGPT and the Race That Will Change the World. This column reflects the personal views of the author and does not necessarily reflect the opinion of the editorial board or Bloomberg LP and its owners.
Speaking at the Copenhagen Democracy Summit on May 13, former president Tsai Ing-wen (蔡英文) said that democracies must remain united and that “Taiwan’s security is essential to regional stability and to defending democratic values amid mounting authoritarianism.” Earlier that day, Tsai had met with a group of Danish parliamentarians led by Danish Parliament Speaker Pia Kjaersgaard, who has visited Taiwan many times, most recently in November last year, when she met with President William Lai (賴清德) at the Presidential Office. Kjaersgaard had told Lai: “I can assure you that ... you can count on us. You can count on our support
Denmark has consistently defended Greenland in light of US President Donald Trump’s interests and has provided unwavering support to Ukraine during its war with Russia. Denmark can be proud of its clear support for peoples’ democratic right to determine their own future. However, this democratic ideal completely falls apart when it comes to Taiwan — and it raises important questions about Denmark’s commitment to supporting democracies. Taiwan lives under daily military threats from China, which seeks to take over Taiwan, by force if necessary — an annexation that only a very small minority in Taiwan supports. Denmark has given China a
Many local news media over the past week have reported on Internet personality Holger Chen’s (陳之漢) first visit to China between Tuesday last week and yesterday, as remarks he made during a live stream have sparked wide discussions and strong criticism across the Taiwan Strait. Chen, better known as Kuan Chang (館長), is a former gang member turned fitness celebrity and businessman. He is known for his live streams, which are full of foul-mouthed and hypermasculine commentary. He had previously spoken out against the Chinese Communist Party (CCP) and criticized Taiwanese who “enjoy the freedom in Taiwan, but want China’s money”
A high-school student surnamed Yang (楊) gained admissions to several prestigious medical schools recently. However, when Yang shared his “learning portfolio” on social media, he was caught exaggerating and even falsifying content, and his admissions were revoked. Now he has to take the “advanced subjects test” scheduled for next month. With his outstanding performance in the general scholastic ability test (GSAT), Yang successfully gained admissions to five prestigious medical schools. However, his university dreams have now been frustrated by the “flaws” in his learning portfolio. This is a wake-up call not only for students, but also teachers. Yang did make a big