Scraping data from public websites is legal. That’s the upshot of a decision by the Ninth Circuit Court of Appeals earlier this week.
LinkedIn lost its legal battle against data analytics company hiQ, having argued it was illegal for hiQ to “scrape” users’ profile data to analyze employee turnover rates under the federal Computer Fraud and Abuse Act (CFAA).
Tiffany Li, a technology attorney and professor of law at the University of New Hampshire, joins “Marketplace Tech” host Meghan McCarty Carino to discuss the shape of the CFAA, which dates to 1986, and how the law might be updated.
Tiffany Li: So the CFAA is something that gets to the heart of a lot of cybersecurity claims, a lot of claims of people stealing data, hacking websites and so on. What’s important for us here is that the CFAA has a specific limitation that says you cannot intentionally access a computer without authorization, or exceed authorized access and then obtain information from that computer. The CFAA did come about before what we consider, maybe, the modern internet. So we maybe didn’t have as many cases of these large websites full of millions of people’s data and these large commercial scraping companies relying on that data. Many lawyers do not like this law, because it is not necessarily super updated. But it is what we have.
Meghan McCarty Carino: In your mind, what should an update to the CFAA look like?
Li: I think there are two ways that we can update the CFAA. The first is within the text of the statute. We could try to clarify specific terms like “access.” We can also clarify terms like “computer,” which is actually an interesting definition of the statute, because computers have really changed in the past few decades. Now we have to look at systems like, say, whether or not accessing the memory of a small mobile device counts as a computer, or whether or not a website storage counts as a computer. In the future, we might have to think about something very futuristic, things like considering if you have data embedded in DNA, for example. Or if you have different cloud applications, how would you define and delineate computer systems when something is very distributed? So it’s a little futuristic and out there, but I do think, eventually, we’re going to have to clarify what “computer” means.
McCarty Carino: So who might be cheering this ruling? And why? I mean, I know journalists, for instance, often use scraped data for investigations and things like that. Who else could be affected?
Li: So people might be happy about the ruling if they are related to … say, freedom of information advocates, so people who care about being able to access information freely, being able to use data freely: artists, journalists, writers, people who are involved with libraries or archives. There are also some people who may be upset about the ruling, because perhaps there might be some privacy issues at play. Maybe this ruling doesn’t protect the privacy of the LinkedIn users.
McCarty Carino: For the average person that is interacting with the internet, you know, has a profile on LinkedIn, Facebook, social media, et cetera, what does this ruling mean? Does it change at all how we understand how we interact with these services?
Li: Your information might not be as private as you think it is. Anyone’s social media profile could be used by pretty much any company out there. And it could be someone who has relatively good faith, you know, like hiQ arguably is. Or it could be a company like Clearview AI. They have scraped millions, perhaps billions, right now, face photos for use in facial recognition technology and we’re not completely sure who they sell that technology to. But aside from that, in general, there isn’t any direct immediate impact. But we should be considering these privacy issues.
Related Links: More insight from Meghan McCarty Carino
TechCrunch points out that the ruling in the LinkedIn case is a notable win for archivists and researchers. It’s also helpful for long-running projects to archive websites that have gone offline, as well as efforts to use publicly available web data for academic research.
As Tiffany Li noted, scraped data can have some more controversial applications, such as its use by facial recognition software company Clearview AI which the The New York Times dug into back in 2020.
Clearview uses billions of photos scraped from Facebook, YouTube and dozens of other sites to train its facial recognition AI. According to the Times’ reporting, it was being used by more than 600 law enforcement agencies and multiple private companies for security purposes.
As for the Computer Fraud and Abuse Act — which Professor Li said could probably use some updating — it’s coming up in the current discourse on Netflix and its password-sharing problem.
The practice apparently could be considered a federal crime under the CFAA. A 2016 ruling widely interpreted as a threat to streaming freeloaders found that the sharing of employee login information violated the law.
Plenty of legal analysis, like this piece from cybersecurity professor Josephine Wolff in Slate, threw cold water on that alarmism. Wolff did point out, however, that when it comes to the outdated CFAA, legal questions almost never have clear answers.
So, streaming account moochers … you’ve been warned.
The future of this podcast starts with you.
Every day, the “Marketplace Tech” team demystifies the digital economy with stories that explore more than just Big Tech. We’re committed to covering topics that matter to you and the world around us, diving deep into how technology intersects with climate change, inequity, and disinformation.
As part of a nonprofit newsroom, we’re counting on listeners like you to keep this public service paywall-free and available to all.