For many people, Facebook is the Internet, and its number of users is still growing, according to Meta Platforms Inc’s latest financial results. But Meta Platforms Inc CEO Mark Zuckerberg is not just celebrating that continuing growth. He wants to take advantage of it by using data from Facebook and Instagram to create powerful, general-purpose artificial intelligence. Sounds great and Meta is well positioned to do it, but his billions of users might end up paying the price with their privacy and more.
Here is how Zuckerberg teased his next move in AI on Thursday last week:
“The next key part of our playbook is learning from unique data and feedback loops in our products… On Facebook and Instagram, there are hundreds of billions of publicly shared images and tens of billions of public videos, which we estimate is greater than the Common Crawl dataset and people share large numbers of public text posts in comments across our services as well.”
The point that Zuck makes here about “Common Crawl” startled observers in the tech press, because that archive is already huge: 250 billion Web pages spanning 17 years. It is one of the biggest and most popular repositories of the public Internet used for training AI systems today. When OpenAI launched its GPT-3 language model in 2020, close to 60 percent of the text used to train the system came from Common Crawl.
However, Meta’s data mountain is even bigger, which means it could theoretically build “smarter” AI. That is because research has shown that training AI models on more data tends to make them more capable and accurate. That formula has worked wonders for OpenAI, which over the years has increased the amount of data used to create models like ChatGPT.
If Zuckerberg wants to make a more powerful chatbot, the pile of data he is sitting on is especially valuable because so much of it comes from comment threads. Any text that represents human dialogue is critical for training so-called conversational agents, which is why OpenAI heavily mined the Internet forum Reddit Inc to build its own popular chatbot.
It is easy to scoff whenever Zuckerberg talks about a new ambition — whether it is bots or crypto or the metaverse. His latest quixotic vision is especially grand: to build “general intelligence,” or software systems that meet or surpass human intelligence. But with all that data, Zuckerberg’s quest looks doable. The problem is what the fallout could be for the rest of us.
It is odd that in the same message where Zuckerberg said that his AI team had been working on building general intelligence “for more than a decade,” he also said that Facebook would only now turn to its users’ data to build models as “the next key part of our playbook.” Why has Meta not done that already? Perhaps because using all that data is not so straightforward. For one thing, it would represent yet another infringement on the privacy of Facebook’s 3 billion users and Instagram’s 1.5 billion users.
In the same way OpenAI has come under fire for scraping up the copyrighted data of artists and writers to train its models, Facebook stands to face reputational blowback for exploiting people’s data all over again. Not only does that raise thorny ethical questions, doing so could require stringent data handling practices and compliance with global data protection laws, which could raise the hackles of European regulators.
The other issue is all the bias and toxicity in the data. OpenAI had to deal with this issue with Common Crawl, whose vast trove included Web pages like adultmovietop100.com and adelaide-femaleescorts.webcam, according to a 2021 study by the University of Montreal. The same study says that between 4 percent and 6 percent of all the Web sites in Common Crawl included racial slurs, hate speech or racially charged conspiracy theories.
While Facebook’s content-moderation software has become better at blocking hate speech and conspiracy theories, it is not perfect and tends to be worse in countries outside the US. Some of the content on Facebook that gets flagged as toxic does not get reviewed by a human anymore and is left on the site. Worse: When Zuckerberg said that Meta’s data were bigger than that of Common Crawl, he was likely lumping in the company’s historic archive that would include all the hyperbolic political content and fake news that were on the site before Zuckerberg took pains to clean it up.
All the work that must go into careful data handling and checking might explain why Zuckerberg has only now talked about capitalizing on the data mountain that he sits on. If he does not do it properly, he risks reliving the nightmare of public criticism about how Facebook has handled fake news and harmful content.
Still, if there is one thing we know about Zuckerberg, it is that he has a Caesar-like obsession with winning and domination. Last week, about 24 hours after he faced a crowd of parents in Washington, DC, who accused him of leading their children to self-harm or even suicide, he went on to announce one of Meta’s most successful financial quarters yet and tease how he would use people’s data to create powerful AI.
The proximity of those events should serve as a reminder: Facebook’s path to riches has hurt many. So too might its road to building powerful AI.
Parmy Olson is a Bloomberg Opinion columnist covering technology. A former reporter for the Wall Street Journal and Forbes, she is author of We Are Anonymous.
This column does not necessarily reflect the opinion of the editorial board or Bloomberg LP and its owners.
The gutting of Voice of America (VOA) and Radio Free Asia (RFA) by US President Donald Trump’s administration poses a serious threat to the global voice of freedom, particularly for those living under authoritarian regimes such as China. The US — hailed as the model of liberal democracy — has the moral responsibility to uphold the values it champions. In undermining these institutions, the US risks diminishing its “soft power,” a pivotal pillar of its global influence. VOA Tibetan and RFA Tibetan played an enormous role in promoting the strong image of the US in and outside Tibet. On VOA Tibetan,
Sung Chien-liang (宋建樑), the leader of the Chinese Nationalist Party’s (KMT) efforts to recall Democratic Progressive Party (DPP) Legislator Lee Kun-cheng (李坤城), caused a national outrage and drew diplomatic condemnation on Tuesday after he arrived at the New Taipei City District Prosecutors’ Office dressed in a Nazi uniform. Sung performed a Nazi salute and carried a copy of Adolf Hitler’s Mein Kampf as he arrived to be questioned over allegations of signature forgery in the recall petition. The KMT’s response to the incident has shown a striking lack of contrition and decency. Rather than apologizing and distancing itself from Sung’s actions,
US President Trump weighed into the state of America’s semiconductor manufacturing when he declared, “They [Taiwan] stole it from us. They took it from us, and I don’t blame them. I give them credit.” At a prior White House event President Trump hosted TSMC chairman C.C. Wei (魏哲家), head of the world’s largest and most advanced chip manufacturer, to announce a commitment to invest US$100 billion in America. The president then shifted his previously critical rhetoric on Taiwan and put off tariffs on its chips. Now we learn that the Trump Administration is conducting a “trade investigation” on semiconductors which
By now, most of Taiwan has heard Taipei Mayor Chiang Wan-an’s (蔣萬安) threats to initiate a vote of no confidence against the Cabinet. His rationale is that the Democratic Progressive Party (DPP)-led government’s investigation into alleged signature forgery in the Chinese Nationalist Party’s (KMT) recall campaign constitutes “political persecution.” I sincerely hope he goes through with it. The opposition currently holds a majority in the Legislative Yuan, so the initiation of a no-confidence motion and its passage should be entirely within reach. If Chiang truly believes that the government is overreaching, abusing its power and targeting political opponents — then