What does an AI chatbot know about you?
Apr 26, 2023

What does an AI chatbot know about you?

Silver Keskküla created the website Have I Been Encoded to let people check what AI chatbots say about them and raise awareness of the widespread desire to opt out.

It’s the new Googling yourself — querying your name with an artificial intelligence chatbot and seeing what it spits out.

Many large language models like ChatGPT and Bard, developed by OpenAI and Google respectively, are trained on vast amounts of data from the internet, so they’ve encoded text about individuals, especially public-facing ones.

But, as we know, they don’t always stick to the facts, and that’s particularly troubling when it comes to your good name on the internet.

That’s why engineer Silver Keskküla founded the website Have I Been Encoded.

It makes it easy to check in one place what different chatbots are saying about you. So Marketplace’s Meghan McCarty Carino checked what these bots were saying about her.

Some of these answers were far from correct. Others could be boiled down to “No comment.”

McCarty Carino spoke with Keskküla about why people might want to remain outside the scope of the chatbots and how the tech industry, as well as regulators, might help make that happen.

The following is an edited transcript of their conversation.

Silver Keskküla: These large language models have a lot of potential. So it’s already obvious that they’re going to change the search space. They’re actually saving people a lot of time because you don’t have to go through all the links when you search on Google or something like that. So I think they will become very popular. The things that are coming out of the large language models are not always true. And so what does that mean for me as an individual? First of all, am I OK with the companies that are building large language models actually responding to questions about me? And secondly, when they do, is that information correct? And do I have any say in whether that information is correct?

Meghan McCarty Carino: So what are the repercussions of not knowing what these large language model tools are saying about us?

Keskküla: I think, right now, perhaps there are no large repercussions, but the more people will start to use these AIs to actually ask questions and the more these models actually give out information about people, the higher is the risk that some of these hallucinations actually can be detrimental to you. If those models say something that actually affects your career, for example, then that’s certainly a point to be concerned about. One reason why I made it about the people themselves is that they’re the specialists in this particular domain, they know the most about themselves. So it’s a really easy way to see, oh, this is what all these models are getting wrong. And it’s also, like, my attempt to bring this topic to the attention of people that, hey, aligning these AIs and getting them to do exactly what we want is not exactly an easy problem.

McCarty Carino: I mean, this seems to raise some obvious questions about whether we need mechanisms to correct or delete what these things say about us.

Keskküla: I think certainly so because, first of all, I think there’s a large group of individuals who already have rights that kind of apply in this case as well. So people in European Union have the “right to be forgotten.” It’s not yet clear how well this will work. But like one of my ideas is that I think there should be just a one-stop shop where you say, “Hey, I’m not comfortable with AI answering questions about me as a person, and I’d like to opt out.” I think when we have enough people who are giving a clear signal that this is what they want, at some point the policies might actually start speaking to this. And so I can give you some stats, for example. So from all the people that are signing up to the service, 33% of them have said, “I do not want AIs to give responses about me.” So I think that’s a relatively large number. Specifically for people who say that they are also in EU, out of these people 44% say they would not like the models to give outputs about them. So maybe not everyone is really excited about getting those models to start answering questions about them, at least at this point in time.

McCarty Carino: But at this point, there’s not really any way to opt out, right?

Keskküla: Yeah, exactly. So initially, [this website] is just to establish, like, what is the sort of general feeling? Are they OK with these models actually answering about them or not? This is a much easier conversation when you have data to back up your claims. So if I go to the language model companies, then I can have quite a serious conversation if I have a list of, of thousands of people who are saying that they want to opt out of something. There is an example of something very similar working in the context of generative models in visual space. So the [image-generating] Stable Diffusion model as such, that creates these beautiful art based on just these text inputs, there was actually a group, what they did was create a site called Have I Been Trained. And artists can sign up and just say that they don’t want their work to be used in training these models. And one of the big companies leading that space, Stability AI, said that they will respect those decisions. When people sign up there, they will actually remove their data from the training process. And so there is already an example of this working in a space where the regulation is really unclear. Like we don’t know what the rights are to those outputs. But in the context of information about people, we already have laws in place both in, like, California and in the European Union that already covered this topic. So I would be very surprised if this is something that can be just ignored. So it will happen at one point where these regulators or these companies will maybe need to start accepting the inputs from the users as well. And so we’ll see how it goes.

Keskküla’s website, along with the other one he mentioned — Have I Been Trained? which tells you if your image has been used to train an AI art model — are references to an older website, ‘;–have i been pwned? which tells you if your personal information shows up in any data breaches.

And speaking of data protection, Keskküla brought up Europe’s General Data Protection Regulation, often called GDPR, which includes a “right to be forgotten” clause that gives individuals the right to request that organizations delete their data.

We don’t really have anything like that in the U.S. California’s landmark consumer data privacy law provides something similar, though it’s a bit more limited.

As to whether the companies that make these chatbots could be held liable for any misinformation that’s spread about individuals, that is kind of an open question — one we talked about a couple of months ago with Matt Perault of the University of North Carolina at Chapel Hill.

He said that unlike social media platforms, which have been shielded from liability under federal law, chatbots do not just host content.

There’s an argument that they create content, which means people might actually be able to sue for defamation.

The future of this podcast starts with you.

Every day, the “Marketplace Tech” team demystifies the digital economy with stories that explore more than just Big Tech. We’re committed to covering topics that matter to you and the world around us, diving deep into how technology intersects with climate change, inequity, and disinformation.

As part of a nonprofit newsroom, we’re counting on listeners like you to keep this public service paywall-free and available to all.

Support “Marketplace Tech” in any amount today and become a partner in our mission.

The team

Daisy Palacios Senior Producer
Daniel Shin Producer
Jesús Alvarado Associate Producer