Internet infrastructure company Cloudflare said this week it’s launching a system to block bots from scraping clients’ sites or at least allow them to charge AI companies for access. These AI bots collect data all over the internet, sometimes to train large language models or answer questions asked of a chatbot, or, for that matter, a search engine.
But the increase in scraping from AI has publishers concerned that it could fundamentally break the business model of the internet.
Web crawlers have been doing their thing since practically the beginning of the World Wide Web, according to Chirag Shah, an adjunct professor at the University of Washington. Before search engines indexed the web with bots, it was pretty hard to navigate.
“It was mutually beneficial because website builders had a way to be found through being crawled,” he said.
And eventually, websites had a way to make money with advertising. “We've developed this whole ecosystem around that where a publisher can earn revenue by being seen, being crawled.”
That ecosystem could be collapsing. AI bots crawl a lot more pages and don’t often send users to those sites.
“They're just using the content from that site to generate the answer right there,” said Shah.
“What is the incentive going to look like for people that are going to decide to create high-impact content that feeds these machines?” said Daniel Newman, an analyst at Futurum Group.
Some publishers, like the New York Times, have made deals with AI companies to license their content, he said — but there’s only so many New York Times.
“You look at the trickle, and the trickle gets really, really to a point there's only a few people left to make money,” he said.
AI companies have a stake in keeping internet publishing alive, Newman added, because the models are only as good as the content they crawl.