Taiwan’s Sovereign AI Training Corpus has grown to include more than 1.1 billion tokens just more than a month after its official launch, the Ministry of Digital Affairs said yesterday.
The platform, launched on Dec. 24 last year, aims to gather high-quality data in traditional Mandarin to train sovereign artificial intelligence (AI) models, ensuring that outputs better reflect the language patterns and cultural references familiar to Taiwanese.
The platform initially contained more than 2,000 datasets totaling more than 600 million units of data, also known as tokens, Department of Data Innovation Director-General Chuang Ming-fen (莊明芬) said.
Photo courtesy of the Ministry of Digital Affairs
The corpus has since nearly doubled in size, surpassing 1.1 billion tokens, with weekly updates tracking the steady release of data by government agencies, she said.
Most of the data on the platform are provided by the Ministry of Culture and Ministry of Education, covering subjects such as education, languages, history and tourism, the ministry said.
The language and vocabulary section also features dictionaries, a category that consistently ranks among users’ most frequently searched resources, it said.
Ministry statistics showed that the platform was viewed more than 35,000 times, and about 20 organizations in academia and industries have applied for access.
“That shows that people in research institutions, government agencies and the corporate world pay close attention to the high-quality data released by the government to train sovereign AI databases. It has set a good starting point for subsequent AI model developments,” Chuang said.
The ministry said that it would gradually expand data sources for sovereign AI to include inputs contributed by local governments during the first and second quarters of this year.
Local government officials would be invited to join a ministry-hosted seminar, where they would learn about the policy governing sovereign AI as well as procedures they need to follow to upload data, the ministry said.
Workshops could also be organized to assist local governments in uploading data to the platform, it added.
The ministry is planning to begin forming partnerships with the private sector in the second half of this year, and is also seeking authorization from Academia Sinica and the National Museum of Taiwan Literature to upload their data to the platform.
In related news, the government’s Open Data Platform has attracted about 175.84 million views and 22.27 million downloads more than a decade since its launch.
The platform was created in accordance with the government’s policy of digital government and data governance.
The three most frequently downloaded topics include information on earthquakes, as well as closing prices and monthly average prices of stocks, Chuang said.
“That shows that people are mostly interested in data closely related to their lives,” she added.
A magnitude 6.1 earthquake struck off the coast of Yilan County at 8:39pm tonight, the Central Weather Administration (CWA) said, with no immediate reports of damage or injuries. The epicenter was 38.7km east-northeast of Yilan County Hall at a focal depth of 98.3km, the CWA’s Seismological Center said. The quake’s maximum intensity, which gauges the actual physical effect of a seismic event, was a level 4 on Taiwan’s 7-tier intensity scale, the center said. That intensity level was recorded in Yilan County’s Nanao Township (南澳), Hsinchu County’s Guansi Township (關西), Nantou County’s Hehuanshan (合歡山) and Hualien County’s Yanliao (鹽寮). An intensity of 3 was
Instead of focusing solely on the threat of a full-scale military invasion, the US and its allies must prepare for a potential Chinese “quarantine” of Taiwan enforced through customs inspections, Stanford University Hoover fellow Eyck Freymann said in a Foreign Affairs article published on Wednesday. China could use various “gray zone” tactics in “reconfiguring the regional and ultimately the global economic order without a war,” said Freymann, who is also a nonresident research fellow at the US Naval War College. China might seize control of Taiwan’s links to the outside world by requiring all flights and ships entering or leaving Taiwan
The first of 10 new high-capacity trains purchased from South Korea’s Hyundai Rotem arrived at the Port of Taipei yesterday to meet the demands of an expanding metro network, Taipei Rapid Transit Corp (TRTC) said yesterday. The train completed a three-day, 1,200km voyage from the Port of Masan in South Korea, the company said. Costing NT$590 million (US$18.79 million) each, the new six-carriage trains feature a redesigned interior based on "human-centric" transportation concepts, TRTC said. The design utilizes continuous longitudinal seating to widen the aisles and optimize passenger flow, while also upgrading passenger information displays and driving control systems for a more comfortable
Taiwan's first indigenous defense submarine, the SS-711 Hai Kun (海鯤, or Narwhal), departed for its 13th sea trial at 7am today, marking its seventh submerged test, with delivery to the navy scheduled for July. The outing also marked its first sea deployment since President William Lai (賴清德) boarded the submarine for an inspection on March 19, drawing a crowd of military enthusiasts who gathered to show support. The submarine this morning departed port accompanied by CSBC Corp’s Endeavor Manta (奮進魔鬼魚號) uncrewed surface vessel and a navy M109 assault boat. Amid public interest in key milestones such as torpedo-launching operations and overnight submerged trials,