Taiwan’s Sovereign AI Training Corpus has grown to include more than 1.1 billion tokens just more than a month after its official launch, the Ministry of Digital Affairs said yesterday.
The platform, launched on Dec. 24 last year, aims to gather high-quality data in traditional Mandarin to train sovereign artificial intelligence (AI) models, ensuring that outputs better reflect the language patterns and cultural references familiar to Taiwanese.
The platform initially contained more than 2,000 datasets totaling more than 600 million units of data, also known as tokens, Department of Data Innovation Director-General Chuang Ming-fen (莊明芬) said.
Photo courtesy of the Ministry of Digital Affairs
The corpus has since nearly doubled in size, surpassing 1.1 billion tokens, with weekly updates tracking the steady release of data by government agencies, she said.
Most of the data on the platform are provided by the Ministry of Culture and Ministry of Education, covering subjects such as education, languages, history and tourism, the ministry said.
The language and vocabulary section also features dictionaries, a category that consistently ranks among users’ most frequently searched resources, it said.
Ministry statistics showed that the platform was viewed more than 35,000 times, and about 20 organizations in academia and industries have applied for access.
“That shows that people in research institutions, government agencies and the corporate world pay close attention to the high-quality data released by the government to train sovereign AI databases. It has set a good starting point for subsequent AI model developments,” Chuang said.
The ministry said that it would gradually expand data sources for sovereign AI to include inputs contributed by local governments during the first and second quarters of this year.
Local government officials would be invited to join a ministry-hosted seminar, where they would learn about the policy governing sovereign AI as well as procedures they need to follow to upload data, the ministry said.
Workshops could also be organized to assist local governments in uploading data to the platform, it added.
The ministry is planning to begin forming partnerships with the private sector in the second half of this year, and is also seeking authorization from Academia Sinica and the National Museum of Taiwan Literature to upload their data to the platform.
In related news, the government’s Open Data Platform has attracted about 175.84 million views and 22.27 million downloads more than a decade since its launch.
The platform was created in accordance with the government’s policy of digital government and data governance.
The three most frequently downloaded topics include information on earthquakes, as well as closing prices and monthly average prices of stocks, Chuang said.
“That shows that people are mostly interested in data closely related to their lives,” she added.
FUKUOKA SITUATION: Japanese media reported that the pathogen is expected to be identified by the summer, while the CDC downplayed the idea that it was hMPV A “mysterious cold-like illness” reported in Japan’s Fukuoka Prefecture does not seem to be a new disease, but Japanese authorities have been asked about the situation, the Centers for Disease Control (CDC) said yesterday. The Fukuoka Prefectural Medical Association on Wednesday told a news conference that a “mystery cold” that has become a hot topic on social media is “highly likely to be caused by some kind of viral infection,” Japan’s KBC News reported. “Many people are experiencing symptoms starting with a sore throat, followed by a runny nose, phlegm and a severe cough,” KBC News reported, citing association officials. Health authorities are
Nvidia Corp CEO Jensen Huang (黃仁勳) arrived in Taiwan yesterday ahead of upcoming AI and technology events, saying he plans to meet with clients and Taiwan Semiconductor Manufacturing Co Chairman C.C. Wei (魏哲家) during his visit. After landing at Taipei Songshan Airport, Huang posed for photos with fans and handed out Yakult drinks to reporters and supporters waiting at the scene, saying he has “a lot to do” during the trip. Asked about reports that Nvidia’s planned headquarters site in Taipei’s Beitou Shilin Technology Park could break ground on May 27, Huang said that if the company holds an event, he would
The Ministry of Finance this afternoon announced the winning numbers for the March-April uniform invoice lottery. The winning number for the NT$10 million (US$318,060) special prize is 19531471, and the winning number for the NT$2 million grand prize is 85941329. Three numbers were drawn for the NT$200,000 first prize: 07225810, 20231230 and 83518781. Those with receipts matching the last seven digits of any of the first-prize numbers will win the NT$40,000 second prize, while those matching the last six digits will win the NT$10,000 third prize. Those whose receipts match the last five digits of the first-prize numbers can claim the NT$4,000 fourth prize,
SIX SUBSIDIES: The monthly allowance for older farmers is to increase to NT$10,000, and NT$5,000 is to be given to homemakers under the national pension system, Lai said The government is to implement major welfare policies for disadvantaged groups, including raising the monthly allowance for older farmers to NT$10,000 and providing homemakers with NT$5,000 per month, President William Lai (賴清德) said yesterday. Lai made the remarks during a visit to Wangling Temple in Chiayi County, saying that the planned increases were being introduced amid economic growth and an increase in tax revenue. Touting a policy, in which the government plans to provide a monthly allowance of NT$5,000 for every child under the age of 18 in a bid to address Taiwan’s low birthrate, Lai said that if received for the