India is fast becoming one of the world’s biggest artificial intelligence (AI) user bases. The question now is how it can turn that scale into superpower status rather than just training Silicon Valley for free.
That is a tall order for a country largely caught flat-footed by the boom, but let us start with the basics: The three main building blocks of AI are talent, compute (including high-end chips and infrastructure) and data. India does not lack engineers, but it does not have foundational research training at scale or enough advanced processors at public labs and universities. What it does have, in abundance, is data. It should start treating this like a strategic asset rather than leaking it out as a free export.
It is a key reason US big tech is making a blitz for the market. With roughly 1 billion people online and a massive, mobile-first population, India generates a torrent of messages, voice notes, digital payments — and increasingly the kind of human feedback that makes AI systems better — on a daily basis. The world’s most-populous country is the second-biggest user base of OpenAI’s ChatGPT and Anthropic PBC’s Claude after the US, while accounting for just a fraction of these platforms’ revenue. The dynamic exposes how much more the market matters for training purposes right now than making money.
Illustration: Yusha
These free-to-use services and promotions bombarding Indian consumers come with a cost. It is part of a strategic Silicon Valley land grab for Indian languages, voices and behaviors that would make foreign systems smarter first. The South Asian nation risks repeating a familiar historical pattern of exporting the raw materials for pennies then buying back the imported models at a premium. Meanwhile, it would be left to absorb the jobs shock and social impacts at home.
India’s linguistic diversity also raises the stakes. The country has more than 20 official languages and dozens more that are unofficial. If models are not trained on enough local speech and cultural contexts, they would misunderstand users and become unreliable in classrooms, clinics, courts and even customer support settings. Closing this language gap sits at the heart of Indian Prime Minister Narendra Modi’s promise to democratize AI and make its impacts real for everyone from farmers to small-business owners, rather than just English-speaking elites.
At the same time, the AI future that the likes of Meta Platforms Inc or OpenAI are selling — marked by personal agents and voice-powered ambient devices — would not work in India unless they can listen and speak local languages, and get the nuances right.
Some start-ups, including Andreessen Horowitz-backed Poseidon AI and big tech-supported nonprofit efforts are already trying to crowd-source and create local language datasets. New Delhi should be paying far more attention, not just because data-labeling and collection practices have adopted a global reputation of being exploitative, but also because these efforts could anchor a domestic ecosystem. India cannot demand “AI for all” while outsourcing the work of building the linguistic foundation. However, if done well, these datasets can become infrastructure for its AI economy.
The same logic applies beyond language. India should push hard for the creation of specialized, high-impact and localized datasets in sectors such as healthcare or finance. AI can improve diagnostics and personalized care, but the most valuable data for accomplishing this still lies in largely inaccessible hospital systems. On the sidelines of the AI Impact Summit, I attended a dialogue convened by nonprofit coalition iSpirit, in which local entrepreneurs laid out a framework to let researchers tap this more sensitive data securely. Privacy fears are real and should be taken seriously, but accessing this data could also mean saving lives. Unlocking and organizing it is the hard, unglamorous work that takes Modi’s branding of “AI for good” beyond just slogans.
Ultimately, India’s data reckoning should be about who controls this strategic input to AI and who captures the value from it.
The answer is not to wall off user outputs from the world. It is about finding creative solutions and leveraging them to set rules that reflect what is actually being extracted. If its peoples’ data is a key ingredient for building advanced AI, the Indian government should demand more than apps and marketing in return.
It can ask for partnerships that build capacity, including public compute commitments, access to high-end chips, serious training pipelines for AI researchers and collaborations that go beyond token commitments. New Delhi should also set norms that treat local datasets as a public good and consider revenue-sharing models that maintain the upsides at home. Transparency is crucially important. Policymakers should require foreign model builders to disclose the kind of data that shaped their systems, and how they have been evaluated for harms and biases in Indian contexts.
More than building foundation models, setting equitable data policies is where India has the biggest opportunity to truly lead the “global south” in the AI era. Otherwise, it risks becoming an open mine and fueling systems that automate local jobs, concentrate power abroad and deepen dependencies.
Catherine Thorbecke is a Bloomberg Opinion columnist covering Asia tech. Previously she was a tech reporter at CNN and ABC News. This column reflects the personal views of the author and does not necessarily reflect the opinion of the editorial board or Bloomberg LP and its owners.
A gap appears to be emerging between Washington’s foreign policy elites and the broader American public on how the United States should respond to China’s rise. From my vantage working at a think tank in Washington, DC, and through regular travel around the United States, I increasingly experience two distinct discussions. This divergence — between America’s elite hawkishness and public caution — may become one of the least appreciated and most consequential external factors influencing Taiwan’s security environment in the years ahead. Within the American policy community, the dominant view of China has grown unmistakably tough. Many members of Congress, as
After declaring Iran’s military “gone,” US President Donald Trump appealed to the UK, France, Japan and South Korea — as well as China, Iran’s strategic partner — to send minesweepers and naval forces to reopen the Strait of Hormuz. When allies balked, the request turned into a warning: NATO would face “a very bad” future if it refused. The prevailing wisdom is that Trump faces a credibility problem: having spent years insulting allies, he finds they would not rally when he needs them. That is true, but superficial, as though a structural collapse could be caused by wounded feelings. Something
Former Taipei mayor and Taiwan People’s Party (TPP) founding chairman Ko Wen-je (柯文哲) was sentenced to 17 years in prison on Thursday, making headlines across major media. However, another case linked to the TPP — the indictment of Chinese immigrant Xu Chunying (徐春鶯) for alleged violations of the Anti-Infiltration Act (反滲透法) on Tuesday — has also stirred up heated discussions. Born in Shanghai, Xu became a resident of Taiwan through marriage in 1993. Currently the director of the Taiwan New Immigrant Development Association, she was elected to serve as legislator-at-large for the TPP in 2023, but was later charged with involvement
Out of 64 participating universities in this year’s Stars Program — through which schools directly recommend their top students to universities for admission — only 19 filled their admissions quotas. There were 922 vacancies, down more than 200 from last year; top universities had 37 unfilled places, 40 fewer than last year. The original purpose of the Stars Program was to expand admissions to a wider range of students. However, certain departments at elite universities that failed to meet their admissions quotas are not improving. Vacancies at top universities are linked to students’ program preferences on their applications, but inappropriate admission