When US analysts hunting terrorists sought new ways to comb through the troves of phone records, e-mails and other data piling up as digital communications exploded over the past decade, they turned to Silicon Valley computer experts who had developed complex equations to thwart Russian mobsters intent on credit card fraud.
The partnership between the intelligence community and Palantir Technologies, a Palo Alto, California, company founded by a group of inventors from PayPal, is just one of many that the US National Security Agency (NSA) and other agencies have forged in recent years as they have rushed to unlock the secrets of “Big Data.”
Today, a revolution in software technology that allows for the highly automated and instantaneous analysis of enormous volumes of digital information has transformed the NSA, turning it into the virtual landlord of the digital assets of Americans and foreigners alike. The new technology has, for the first time, given US spies the ability to track the activities and movements of people almost anywhere in the world without actually watching them or listening to their conversations.
New disclosures that the NSA has secretly acquired the telephone records of millions of Americans, and access to e-mails, videos and other data of foreigners from nine US Internet companies have provided a rare glimpse into the growing reach of the US’ largest spy agency.
They have also alarmed the government: On Saturday night, Shawn Turner, a spokesman for the director of national intelligence, said that “a crimes report has been filed by the NSA.”
With little public debate, the NSA has been undergoing rapid expansion in order to exploit the mountains of new data being created each day. The US government has poured billions of dollars into the agency over the past decade, building a 93,000m2 fortress in the mountains of Utah, apparently to store huge volumes of personal data indefinitely. It created intercept stations across the country, according to former industry and intelligence officials, and helped build one of the world’s fastest computers to crack the codes that protect information.
While the flow of data across the Internet once appeared too overwhelming for the NSA to keep up with, the revelations of the last few days suggest that the agency’s abilities are now far greater than most outsiders believed.
“Five years ago, I would have said they don’t have the capability to monitor a significant amount of Internet traffic,” said Herbert Lin, an expert in computer science and telecommunications at the National Research Council.
Now, he said, it appears “that they are getting close to that goal.”
On Saturday, it became clear how close: Another NSA document, again cited by the Guardian, showed a “global heat map” that appeared to represent how much data the NSA sweeps up around the world.
It showed that in March there were 97 billion pieces of data collected from networks worldwide; about 14 percent of it was in Iran, much was from Pakistan and about 3 percent came from inside the US, though some of that might have been foreign data traffic routed through US-based servers.
A SHIFT IN FOCUS
The agency’s ability to mine meta-data, data about who is calling or e-mailing, has made wiretapping and eavesdropping on communications far less vital, according to data experts. That access to data from companies that Americans depend on daily raises troubling questions about privacy and civil liberties that officials in Washington, insistent on near-total secrecy, have yet to address.
“American laws and American policy view the content of communications as the most private and the most valuable, but that is backward today,” said Marc Rotenberg, the executive director of the Electronic Privacy Information Center, a Washington group. “The information associated with communications today is often more significant than the communications itself, and the people who do the data mining know that.”
In the 1960s, when the NSA successfully intercepted the primitive car telephones used by Soviet leaders driving around Moscow in their Zil limousines, there was no chance the agency would accidentally pick up Americans.
Today, if it is scanning for a foreign politician’s Gmail account or hunting for the mobile phone number of a suspected terrorist, the possibilities for what the NSA calls “incidental” collection of Americans are far greater.
US laws restrict wiretapping and eavesdropping on the actual content of the communications of US citizens, but offer very little protection to the digital data thrown off by the telephone when a call is made. And they offer virtually no protection to other forms of nontelephone-related data, like credit card transactions.
Because of smartphones, tablets, social media sites, e-mail and other forms of digital communications, the world creates 2.5 quintillion bytes of new data daily, according to IBM. The computer giant estimates that 90 percent of the data that now exists in the world has been created in just the past two years. From now until 2020, the digital universe is expected to double every two years, according to a study by International Data Corp.
Accompanying that explosive growth has been rapid progress in the ability to manipulate the data.
When separate streams of data are integrated into large databases — matching, for example, time and location data from cellphones with credit card purchases or E-ZPass use — intelligence analysts are given a mosaic of a person’s life that would never be available from simply listening to their conversations. Just four data points about the location and time of a mobile phone call, a study published in Nature found, make it possible to identify the caller 95 percent of the time.
“We can find all sorts of correlations and patterns,” said one government computer scientist who spoke on condition of anonymity because he was not authorized to comment publicly. “There’ve been tremendous advances.”
When then-US president George W. Bush secretly began the NSA’s warrantless wiretapping program in October 2001, to listen in on the international telephone calls and e-mails of US citizens without court approval, the program was accompanied by large-scale data mining.
Those secret programs prompted a showdown in March 2004 between Bush White House officials and a group of top US Department of Justice and FBI officials in the hospital room of John Ashcroft, the then-attorney general.
Department of Justice lawyers, who were willing to go along with warrantless wiretapping, argued that the data mining raised greater constitutional concerns.
In 2003, after a Pentagon plan to create a data-mining operation known as the Total Information Awareness program was disclosed, a firestorm of protest forced the Bush administration to back off.
However, since then, the intelligence community’s data-mining operations have grown enormously, according to industry and intelligence experts.
The confrontation in Ashcroft’s hospital room took place just one month after a Harvard undergraduate, Mark Zuckerberg, created a startup called Facebook; Twitter would not be founded for another two years. Apple’s iPhone and iPad did not yet exist.
“More and more services like Google and Facebook have become huge central repositories for information,” said Dan Auerbach, a technology analyst with the Electronic Frontier Foundation. “That’s created a pile of data that is an incredibly attractive target for law enforcement and intelligence agencies.”
The spy agencies have long been among the most demanding customers for advanced computing and data-mining software — and even more so in recent years, according to industry analysts.
“They tell you that somewhere there is an American who is going to be blown up,” said a former technology executive, and “the only thing that stands between that and him living is you.”
In 2006, the Bush administration established a program known as the Intelligence Advanced Research Projects Activity, to accelerate the development of intelligence-related technology.
Its stated purpose was to undertake “high-risk, high-payoff research programs that have the potential to provide the US with an overwhelming intelligence advantage over future adversaries.”
IBM’s Watson, the supercomputing technology that defeated human Jeopardy! champions in 2011, is a prime example of the power of data-intensive artificial intelligence. Watson-style computing, analysts said, is precisely the technology that would make the ambitious data-collection program of the NSA seem practical. Computers could instantly sift through the mass of Internet communications data, see patterns of suspicious online behavior and thus narrow the hunt for terrorists.
Both the NSA and the CIA have been testing IBM’s Watson in the last two years, said a consultant who has advised the government and asked not to be named because he was not authorized to speak.
Industry experts say that intelligence and law enforcement agencies also use a new technology, known as trilaterization, that allows tracking of an individual’s location, moment to moment. The data, obtained from cellphone towers, can track the altitude of a person, down to the specific floor in a building. There is even software that exploits the cellphone data seeking to predict a person’s most likely route.
“It is extreme Big Brother,” said Alex Fielding, an expert in networking and data centers.
In addition to opening the Utah data center, reportedly scheduled for this year, the NSA has secretly enlarged its footprint inside the US, according to accounts from whistleblowers in recent years.
In Virginia, a telecommunications consultant reported, Verizon had set up a dedicated fiber-optic line running from New Jersey to Quantico, Virginia, home to a large military base, allowing government officials to gain access to all communications flowing through the carrier’s operations center.
In Georgia, an NSA official said in interviews, the agency had combed through huge volumes of routine e-mails to and from Americans.
And in San Francisco, a technician at AT&T reported on the existence of a secret room reserved for the NSA that allowed the spy agency to copy and store millions of domestic and international telephone calls routed through that station.
Nothing revealed in recent days suggests that NSA eavesdroppers have violated the law by targeting ordinary Americans. On Friday, US President Barack Obama defended the agency’s collection of telephone records and other metadata, saying it did not involve listening to conversations or reading the content of e-mails.
“Some of the hype we’ve been hearing over the past day or so — nobody has listened to the content of people’s phone calls,” Obama said.
Still, some privacy advocates say that the language used by top government officials to describe the NSA’s activities masks the full extent of its operations.
Officials say that the agency does not purposefully “collect” the private data of US citizens unless they are suspected of terrorist activity.
However, the NSA considers “collection” to refer only to the data that agency employees actually analyze and read. The agency acquires huge amounts of US data each day, and then applies what they call “minimization” procedures to protect the information from being used in its analysis. The agency still retains the private data, and the procedures used to block access to it are of its own making.
Privacy advocates say that a national debate must take place to come up with new rules to limit the intelligence community’s access to the mountains of data.
“It is a bit of a fantasy to think that the government can seize so much information without implicating the Fourth Amendment interests of American citizens,” Rotenberg said, referring to the constitutional limits on search and seizure.