Taiwan’s Sovereign AI Training Corpus has grown to include more than 1.1 billion tokens just more than a month after its official launch, the Ministry of Digital Affairs said yesterday.
The platform, launched on Dec. 24 last year, aims to gather high-quality data in traditional Mandarin to train sovereign artificial intelligence (AI) models, ensuring that outputs better reflect the language patterns and cultural references familiar to Taiwanese.
The platform initially contained more than 2,000 datasets totaling more than 600 million units of data, also known as tokens, Department of Data Innovation Director-General Chuang Ming-fen (莊明芬) said.
Photo courtesy of the Ministry of Digital Affairs
The corpus has since nearly doubled in size, surpassing 1.1 billion tokens, with weekly updates tracking the steady release of data by government agencies, she said.
Most of the data on the platform are provided by the Ministry of Culture and Ministry of Education, covering subjects such as education, languages, history and tourism, the ministry said.
The language and vocabulary section also features dictionaries, a category that consistently ranks among users’ most frequently searched resources, it said.
Ministry statistics showed that the platform was viewed more than 35,000 times, and about 20 organizations in academia and industries have applied for access.
“That shows that people in research institutions, government agencies and the corporate world pay close attention to the high-quality data released by the government to train sovereign AI databases. It has set a good starting point for subsequent AI model developments,” Chuang said.
The ministry said that it would gradually expand data sources for sovereign AI to include inputs contributed by local governments during the first and second quarters of this year.
Local government officials would be invited to join a ministry-hosted seminar, where they would learn about the policy governing sovereign AI as well as procedures they need to follow to upload data, the ministry said.
Workshops could also be organized to assist local governments in uploading data to the platform, it added.
The ministry is planning to begin forming partnerships with the private sector in the second half of this year, and is also seeking authorization from Academia Sinica and the National Museum of Taiwan Literature to upload their data to the platform.
In related news, the government’s Open Data Platform has attracted about 175.84 million views and 22.27 million downloads more than a decade since its launch.
The platform was created in accordance with the government’s policy of digital government and data governance.
The three most frequently downloaded topics include information on earthquakes, as well as closing prices and monthly average prices of stocks, Chuang said.
“That shows that people are mostly interested in data closely related to their lives,” she added.
A preclearance service to facilitate entry for people traveling to select airports in Japan would be available from Thursday next week to Feb. 25 at Taiwan Taoyuan International Airport, Taoyuan International Airport Corp (TIAC) said on Tuesday. The service was first made available to Taiwanese travelers throughout the winter vacation of 2024 and during the Lunar New Year holiday. In addition to flights to the Japanese cities of Hakodate, Asahikawa, Akita, Sendai, Niigata, Okayama, Takamatsu, Kumamoto and Kagoshima, the service would be available to travelers to Kobe and Oita. The service can be accessed by passengers of 15 flight routes operated by
Alain Robert, known as the "French Spider-Man," praised Alex Honnold as exceptionally well-prepared after the US climber completed a free solo ascent of Taipei 101 yesterday. Robert said Honnold's ascent of the 508m-tall skyscraper in just more than one-and-a-half hours without using safety ropes or equipment was a remarkable achievement. "This is my life," he said in an interview conducted in French, adding that he liked the feeling of being "on the edge of danger." The 63-year-old Frenchman climbed Taipei 101 using ropes in December 2004, taking about four hours to reach the top. On a one-to-10 scale of difficulty, Robert said Taipei 101
Taiwanese and US defense groups are collaborating to introduce deployable, semi-autonomous manufacturing systems for drones and components in a boost to the nation’s supply chain resilience. Taiwan’s G-Tech Optroelectronics Corp subsidiary GTOC and the US’ Aerkomm Inc on Friday announced an agreement with fellow US-based Firestorm Lab to adopt the latter’s xCell, a technology featuring 3D printers fitted in 6.1m container units. The systems enable aerial platforms and parts to be produced in high volumes from dispersed nodes capable of rapid redeployment, to minimize the risk of enemy strikes and to meet field requirements, they said. Firestorm chief technology officer Ian Muceus said
MORE FALL: An investigation into one of Xi’s key cronies, part of a broader ‘anti-corruption’ drive, indicates that he might have a deep distrust in the military, an expert said China’s latest military purge underscores systemic risks in its shift from collective leadership to sole rule under Chinese President Xi Jinping (習近平), and could disrupt its chain of command and military capabilities, a national security official said yesterday. If decisionmaking within the Chinese Communist Party has become “irrational” under one-man rule, the Taiwan Strait and the regional situation must be approached with extreme caution, given unforeseen risks, they added. The anonymous official made the remarks as China’s Central Military Commission Vice Chairman Zhang Youxia (張又俠) and Joint Staff Department Chief of Staff Liu Zhenli (劉振立) were reportedly being investigated for suspected “serious