In the past 15 years, we have witnessed an explosion in the amount of digital data available — from the Internet, social media, scientific equipment, smartphones, surveillance cameras and many other sources — and in the computer technologies used to process it. “Big data,” as it is known, will undoubtedly deliver important scientific, technological and medical advances, but it also poses serious risks if it is misused or abused.
Already, major innovations such as Internet search engines, machine translation and image labeling have relied on applying machine-learning techniques to vast data sets. In the near future, big data could significantly improve government policymaking, social-welfare programs and scholarship.
However, having more data is no substitute for having high-quality data. For example, a recent article in Nature reports that election pollsters in the US are struggling to obtain representative samples of the population because they are legally permitted to call only landline telephones, whereas Americans increasingly rely on cellphones.
While one can find countless political opinions on social media, these are not reliably representative of voters, either. In fact, a substantial share of tweets and Facebook posts about politics are computer-generated.
In recent years, automated programs based on biased data sets have caused numerous scandals.
For example, in April last year, when a college student searched Google Images for “unprofessional hairstyles for work,” the results showed mostly pictures of black people; when the student changed the first search term to “professional,” Google returned mostly pictures of white people.
However, this was not the result of bias on the part of Google’s programmers; rather, it reflected how people had labeled pictures on the Internet.
A big data program that used this search result to evaluate hiring and promotion decisions might penalize black candidates who resembled the pictures in the results for “unprofessional hairstyles,” thereby perpetuating traditional social biases.
This is not just a hypothetical possibility. Last year, a ProPublica investigation of “recidivism risk models” demonstrated that a widely used methodology to determine sentences for convicted criminals systematically overestimates the likelihood that black defendants will commit crimes in the future, and underestimates the risk that white defendants will do so.
Another hazard of big data is that it can be gamed. When people know that a data set is being used to make important decisions that will affect them, they have an incentive to tip the scales in their favor. For example, teachers who are judged according to their students’ test scores may be more likely to “teach to the test,” or even to cheat.
Similarly, college administrators who want to move their institutions up in the US News and World Report rankings have made unwise decisions, such as investing in extravagant gyms at the expense of academics. Worse, they have made grotesquely unethical decisions, such as the effort by Mount Saint Mary’s University to boost its “retention rate” by identifying and expelling weaker students in the first few weeks of school.
Even Google’s search engine is not immune. Despite being driven by an enormous amount of data overseen by some of the world’s top data scientists, its results are susceptible to “search-engine optimization” and manipulation, such as “Google bombing,” “spamdexing” and other methods serving parochial interests.
A third hazard is privacy violations, because so much of the data now available contains personal information. In recent years, enormous collections of confidential data have been stolen from commercial and government sites; and researchers have shown how people’s political opinions or even sexual preferences can be accurately gleaned from seemingly innocuous online postings, such as movie reviews — even when they are published pseudonymously.
Finally, big data poses a challenge for accountability. Someone who feels that he or she has been treated unfairly by an algorithm’s decision often has no way to appeal it, either because specific results cannot be interpreted, or because the people who have written the algorithm refuse to provide details about how it works.
While governments or corporations might intimidate anyone who objects by describing their algorithms as “mathematical” or “scientific,” they, too, are often awed by their creations’ behavior. The EU recently adopted a measure guaranteeing people affected by algorithms a “right to an explanation,” but only time will tell how this will work in practice.
When people who are harmed by big data have no avenues for recourse, the results can be toxic and far-reaching, as data scientist Cathy O’Neil demonstrates in her recent book Weapons of Math Destruction.
The good news is that the hazards of big data can be largely avoided, but they will not be unless we zealously protect people’s privacy, detect and correct unfairness, use algorithmic recommendations prudently and maintain a rigorous understanding of algorithms’ inner workings and the data that informs their decisions.
Ernest Davis is a professor of computer science at the Courant Institute of Mathematical Sciences, New York University.
Copyright: Project Syndicate
On Monday, the day before Chinese Nationalist Party (KMT) Chairwoman Cheng Li-wun (鄭麗文) departed on her visit to China, the party released a promotional video titled “Only with peace can we ‘lie flat’” to highlight its desire to have peace across the Taiwan Strait. However, its use of the expression “lie flat” (tang ping, 躺平) drew sarcastic comments, with critics saying it sounded as if the party was “bowing down” to the Chinese Communist Party (CCP). Amid the controversy over the opposition parties blocking proposed defense budgets, Cheng departed for China after receiving an invitation from the CCP, with a meeting with
Chinese Nationalist Party (KMT) Chairwoman Cheng Li-wun (鄭麗文) is leading a delegation to China through Sunday. She is expected to meet with Chinese President Xi Jinping (習近平) in Beijing tomorrow. That date coincides with the anniversary of the signing of the Taiwan Relations Act (TRA), which marked a cornerstone of Taiwan-US relations. Staging their meeting on this date makes it clear that the Chinese Communist Party (CCP) intends to challenge the US and demonstrate its “authority” over Taiwan. Since the US severed official diplomatic relations with Taiwan in 1979, it has relied on the TRA as a legal basis for all
Taiwan ranks second globally in terms of share of population with a higher-education degree, with about 60 percent of Taiwanese holding a post-secondary or graduate degree, a survey by the Organisation for Economic Co-operation and Development showed. The findings are consistent with Ministry of the Interior data, which showed that as of the end of last year, 10.602 million Taiwanese had completed post-secondary education or higher. Among them, the number of women with graduate degrees was 786,000, an increase of 48.1 percent over the past decade and a faster rate of growth than among men. A highly educated population brings clear advantages.
In the opening remarks of her meeting with Chinese President Xi Jinping (習近平) in the Great Hall of the People in Beijing on Friday, Chinese Nationalist Party (KMT) Chairwoman Cheng Li-wun (鄭麗文) framed her visit as a historic occasion. In his own remarks, Xi had also emphasized the history of the relationship between the KMT and the Chinese Communist Party (CCP). Where they differed was that Cheng’s account, while flawed by its omissions, at least partially corresponded to reality. The meeting was certainly historic, albeit not in the way that Cheng and Xi were signaling, and not from the perspective