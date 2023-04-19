The so-called front page of the internet is erecting a paywall around its data. Well, actually a blend of a paywall and one of those “Are you a robot?” captcha things.

On Tuesday, the social media platform Reddit announced it’s going to start charging big companies to access its absolute avalanche of 13 billion (and counting) posts and comments.

There are two big reasons for this: One, Reddit is supposed to go public soon. But two — and maybe more importantly — Reddit is a huge source of data for AI platforms like ChatGPT.

Now, you know how you’ll hear a couple of teenagers talking, and sometimes it feels like it’s a different language. They’ll be like, “That’s so cringe,” and you’re like, “What does that mean?” Then they’re like, “OK, boomer,” and you respond in a huff, “I’m 38!”

Well, large language AI models like ChatGPT can empathize (if they had feelings, that is).

“Language models learn about the meaning of ‘cringe’ from the context from which it’s used. So, if you don’t have examples of ‘cringe’ being used in the new context, it’s not going to know anything about it,” said Amin Ahmad, chief technology officer at an AI company called Vectara.

Language changes quickly online, he said. “If you take Reddit out of the equation, you’ve taken out a huge percentage of the conversations that are going on on the internet,” Ahmad said. “There’s still other sources left, but I’m not sure that compares with Reddit in either volume or quality.”

If you look at Reddit conversations, they are fairly civil, conversational back and forth (at least relative to other social media). And there are literally billions of those conversations.

Until now, all that was free. But expect that to change, said Nathan Lambert at the AI company Hugging Face — and not just at Reddit.

“It’s a smart business move,” he said. “Platforms that have very large amounts of data on the internet will likely see that they can make a decent amount of money on this.”

So beyond selling user data to advertisers, social media companies can sell that data to AI. And if you’re generally worried about how AI will respond to being fed a steady diet of social media, just think: We fed an entire generation of humans on social media. And it’s not like it was harmful or anything.