Technology and data mining in the US

June 10, 2013 |

The growing controversy in the United States about data mining is again on the front page of the news.  In How the U.S. Uses Technology to Mine More Data More Quickly the New York Times provides a very useful background on how data is mined.

It provides:

WASHINGTON — When American analysts hunting terrorists sought new ways to comb through the troves of phone records, e-mails and other data piling up as digital communications exploded over the past decade, they turned to Silicon Valley computer experts who had developed complex equations to thwart Russian mobsters intent on credit card fraud.

 The partnership between the intelligence community and Palantir Technologies, a Palo Alto, Calif., company founded by a group of inventors from PayPal, is just one of many that the National Security Agency and other agencies have forged as they have rushed to unlock the secrets of “Big Data.”

Today, a revolution in software technology that allows for the highly automated and instantaneous analysis of enormous volumes of digital information has transformed the N.S.A., turning it into the virtual landlord of the digital assets of Americans and foreigners alike. The new technology has, for the first time, given America’s spies the ability to track the activities and movements of people almost anywhere in the world without actually watching them or listening to their conversations.

New disclosures that the N.S.A. has secretly acquired the phone records of millions of Americans and access to e-mails, videos and other data of foreigners from nine United States Internet companies have provided a rare glimpse into the growing reach of the nation’s largest spy agency. They have also alarmed the government: on Saturday night, Shawn Turner, a spokesman for the director of national intelligence, said that “a crimes report has been filed by the N.S.A.”

With little public debate, the N.S.A. has been undergoing rapid expansion in order to exploit the mountains of new data being created each day. The government has poured billions of dollars into the agency over the last decade, building a one-million-square-foot fortress in the mountains of Utah, apparently to store huge volumes of personal data indefinitely. It created intercept stations across the country, according to former industry and intelligence officials, and helped build one of the world’s fastest computers to crack the codes that protect information.

While once the flow of data across the Internet appeared too overwhelming for N.S.A. to keep up with, the recent revelations suggest that the agency’s capabilities are now far greater than most outsiders believed. “Five years ago, I would have said they don’t have the capability to monitor a significant amount of Internet traffic,” said Herbert S. Lin, an expert in computer science and telecommunications at the National Research Council. Now, he said, it appears “that they are getting close to that goal.”

On Saturday, it became clear how close: Another N.S.A. document, again cited by The Guardian, showed a “global heat map” that appeared to represent how much data the N.S.A. sweeps up around the world. It showed that in March 2013 there were 97 billion pieces of data collected from networks worldwide; about 14 percent of it was in Iran, much was from Pakistan and about 3 percent came from inside the United States, though some of that might have been foreign data traffic routed through American-based servers.

A Shift in Focus

The agency’s ability to efficiently mine metadata, data about who is calling or e-mailing, has made wiretapping and eavesdropping on communications far less vital, according to data experts. That access to data from companies that Americans depend on daily raises troubling questions about privacy and civil liberties that officials in Washington, insistent on near-total secrecy, have yet to address.

“American laws and American policy view the content of communications as the most private and the most valuable, but that is backwards today,” said Marc Rotenberg, the executive director of the Electronic Privacy Information Center, a Washington group. “The information associated with communications today is often more significant than the communications itself, and the people who do the data mining know that.”

In the 1960s, when the N.S.A. successfully intercepted the primitive car phones used by Soviet leaders driving around Moscow in their Zil limousines, there was no chance the agency would accidentally pick up Americans. Today, if it is scanning for a foreign politician’s Gmail account or hunting for the cellphone number of someone suspected of being a terrorist, the possibilities for what N.S.A. calls “incidental” collection of Americans are far greater.

United States laws restrict wiretapping and eavesdropping on the actual content of the communications of American citizens but offer very little protection to the digital data thrown off by the telephone when a call is made. And they offer virtually no protection to other forms of non-telephone-related data like credit card transactions.

Because of smartphones, tablets, social media sites, e-mail and other forms of digital communications, the world creates 2.5 quintillion bytes of new data daily, according to I.B.M.

The company estimates that 90 percent of the data that now exists in the world has been created in just the last two years. From now until 2020, the digital universe is expected to double every two years, according to a study by the International Data Corporation.

Accompanying that explosive growth has been rapid progress in the ability to sift through the information.

When separate streams of data are integrated into large databases — matching, for example, time and location data from cellphones with credit card purchases or E-ZPass use — intelligence analysts are given a mosaic of a person’s life that would never be available from simply listening to their conversations. Just four data points about the location and time of a mobile phone call, a study published in Nature found, make it possible to identify the caller 95 percent of the time.

“We can find all sorts of correlations and patterns,” said one government computer scientist who spoke on condition of anonymity because he was not authorized to comment publicly. “There have been tremendous advances.”

Secret Programs

When President George W. Bush secretly began the N.S.A.’s warrantless wiretapping program in October 2001, to listen in on the international telephone calls and e-mails of American citizens without court approval, the program was accompanied by large-scale data mining operations.

Those secret programs prompted a showdown in March 2004 between Bush White House officials and a group of top Justice Department and F.B.I. officials in the hospital room of John Ashcroft, then the attorney general. Justice Department lawyers who were willing to go along with warrantless wiretapping argued that the data mining raised greater constitutional concerns.

In 2003, after a Pentagon plan to create a data-mining operation known as the Total Information Awareness program was disclosed, a firestorm of protest forced the Bush administration to back off.

But since then, the intelligence community’s data-mining operations have grown enormously, according to industry and intelligence experts.

The confrontation in Mr. Ashcroft’s hospital room took place just one month after a Harvard undergraduate, Mark Zuckerberg, created Facebook; Twitter would not be founded for two more years. Apple’s iPhone and iPad did not yet exist.

“More and more services like Google and Facebook have become huge central repositories for information,” observed Dan Auerbach, a technology analyst with the Electronic Frontier Foundation. “That’s created a pile of data that is an incredibly attractive target for law enforcement and intelligence agencies.”

The spy agencies have long been among the most demanding customers for advanced computing and data-mining software — and even more so in recent years, according to industry analysts. “They tell you that somewhere there is an American who is going to be blown up,” said a former technology executive, and “the only thing that stands between that and him living is you.”

In 2006, the Bush administration established a program known as the Intelligence Advanced Research Projects Activity, to accelerate the development of intelligence-related technology intended “to provide the United States with an overwhelming intelligence advantage over future adversaries.”

I.B.M.’s Watson, the supercomputing technology that defeated human Jeopardy! champions in 2011, is a prime example of the power of data-intensive artificial intelligence.

Watson-style computing, analysts said, is precisely the technology that would make the ambitious data-collection program of the N.S.A. seem practical. Computers could instantly sift through the mass of Internet communications data, see patterns of suspicious online behavior and thus narrow the hunt for terrorists.

Both the N.S.A. and the Central Intelligence Agency have been testing Watson in the last two years, said a consultant who has advised the government and asked not to be identified because he was not authorized to speak.


Industry experts say that intelligence and law enforcement agencies also use a new technology, known as trilaterization, that allows tracking of an individual’s location, moment to moment. The data, obtained from cellphone towers, can track the altitude of a person, down to the specific floor in a building. There is even software that exploits the cellphone data seeking to predict a person’s most likely route. “It is extreme Big Brother,” said Alex Fielding, an expert in networking and data centers.

In addition to opening the Utah data center, reportedly scheduled for this year, N.S.A. has secretly enlarged its footprint inside the United States, according to accounts from whistle-blowers in recent years.

In Virginia, a telecommunications consultant reported, Verizon had set up a dedicated fiber-optic line running from New Jersey to Quantico, Va., home to a large military base, allowing government officials to gain access to all communications flowing through the carrier’s operations center.

In Georgia, an N.S.A. official said in interviews, the agency had combed through huge volumes of routine e-mails to and from Americans.

And in San Francisco, a technician at AT& T reported on the existence of a secret room there reserved for the N.S.A. that allowed the spy agency to copy and store millions of domestic and international phone calls routed through that station.

Nothing revealed in recent days suggests that N.S.A. eavesdroppers have violated the law by targeting ordinary Americans. On Friday, President Obama defended the agency’s collection of phone records and other metadata, saying it did not involve listening to conversations or reading the content of e-mails. “Some of the hype we’ve been hearing over the past day or so — nobody has listened to the content of people’s phone calls,” he said.

Mr. Rotenberg, referring to the constitutional limits on search and seizure, said, “It is a bit of a fantasy to think that the government can seize so much information without implicating the Fourth Amendment interests of American citizens.”

The Guardian has run especially hard on the story (see here).  The privacy issues associated with broad ranging data mining, data dredging might be a better term, of a broad cross section of Americans without probable cause, or any real cause at all beyond a vague maybe are immediately apparent.  This issue was covered extensively in more than one seminar I attended at MIT 8.

The USA’s privacy protections in the constitution is effective in fairly limited circumstances, primarily the 4th amendment to the Constitution.  Data protection in the USA is patchy, sectorally focused and generally quite weak.  That has led to calls for increased privacy protections there(see article here).

The Australian Privacy Act ostensibly provides greater protections than in the USA.  That said the exemptions given, by other laws, to law enforcement and national security agencies make  protections far from comprehensive or effective.  That it only applies to businesses with a turnover of more than $3million per year makes the coverage inadequate as does the lack of coverage for political parties.  To the extent that such data mining of innocent peoples internet and telecommunication’s activities occur in Australia and is discovered the question is whether such interference is actionable.  It would be difficult for a person to bring an action as a breach of confidence.  What the Privacy Commissioner would do is a matter of conjecture.  A statutory right of privacy would provide the appropriate framework.  Of course if there is legislative fiat to undertake such actions then that would constitute a defence even under the tort as proposed by the Australian Law Reform Commission. It is better to have it than not.

Leave a Reply

Verified by MonsterInsights