We continue to learn more about the scope of the U.S. government’s data collection efforts. According to The Wall Street Journal, the NSA, National Security Agency, has relied on technology developed in the private sector to sift through information it’s collected.
The government’s been using something called Apache Hadoop, The Journal reports, an open-source software runs circles around what the government’s got in house. The people who developed Hadoop call it software for “scalable, distributed computing.”
Garth Gibson, a computer scientist at Carnegie Melon, says it’s used “to process a huge amount of data in a relatively short period of time using a lot of computing resources.”
Hadoop takes the data and breaks it into smaller pieces, so thousands of computers can split up the workload. It’s part of Yahoo!’s search engine, it’s behind Facebook’s social network and now the government can use it for surveillance and to find patterns.
Amy Apon chairs the division of computer science at Clemson University. She says to think of Hadoop like a gas station. “They have rows and rows of gas pumps, and lots of cars can pull in and get gas at the same time.” More computers means more efficiency.
The government relies on Hadoop and systems like it because they work well, they’ve been improved over time, and they’re not that expensive.
“The government wants to use the most cost-effective technology it can to accomplish its goals,” says tech analyst Carl Howe, with the Yankee Group. “I mean, it’s no different than any other business.”
But the government is not leading the way here. According to Ken Birman, the N. Rama Rao Professor of Computer Science at Cornell University, companies that have developed “distributed computing” programs like Hadoop have an edge. They have used open-source to collaborate, and they have outspent the government on innovation.
“All that investment has created a very powerful technology base,” he says. One the government just can’t match.