🎁'Tis the season to support public service journalism Donate Now
How private images captured by a robot vacuum ended up online
Jan 11, 2023

How private images captured by a robot vacuum ended up online

A beta version of a Roomba snapped a photo of a person on a toilet and sent it to the cloud. That and other leaked photos shed light on the data supply chain that powers smart home automation.

Sure, robot vacuums are convenient and they make for great cat videos. But these devices — like many other “connected home” technologies — have the potential to collect a lot of data from the private setting of our homes.

Images of children’s faces, the layout of a house, even someone sitting on the toilet were all captured by iRobot vacuum test models in North America, Europe and Asia. Those photos found their way into a private Facebook group for Venezuelan gig workers, where they were then leaked to journalists at MIT Technology Review.

Marketplace’s Meghan McCarty Carino spoke to Eileen Guo, a senior reporter at MIT Technology Review who has been investigating this.

She said the images weren’t collected from consumers, but rather as part of iRobot’s product development process to train the artificial intelligence used by the vacuums to recognize obstacles in a home.

The following is an edited transcript of their conversation.

Eileen Guo: The images were captured in the homes of beta testers. They were sent back to iRobot servers. From iRobot servers, they were then shared with service providers, like Scale AI, that does the outsourced training and labeling of data, what’s called data labelers or data annotators. These are essentially gig workers that have the job of taking all sorts of training data, like images, video, text or audio, and giving it extra context so that an algorithm can understand what it’s listening to. The images that we received were from data labelers in Venezuela, who then shared it on these Facebook and Discord groups where, to be clear, they weren’t trying to violate anyone’s privacy. They were just trying to get help on how to identify some of the strange shapes in homes and countries with very different setups than what they have in Venezuela. So, it’s part of this massive data supply chain of what happens with our data when it is collected by companies, shared internally, used internally and shared with third-party service providers. It’s this whole data supply chain that consumers really have no idea exists.

Meghan McCarty Carino: Tell me more about this process of data annotation. Who does it and what is it?

Guo: What happens with artificial intelligence, and specifically machine learning, is that you have to teach these machine learning models how to recognize patterns. When we think of a robot vacuum, for example, being able to recognize dog waste, you have to ask the question of how does it recognize that pet waste? So, these robot vacuums record this image or the raw data, and the algorithm then is able to recognize that something is pet waste because you have given it a lot of pictures of pet waste to look at and compare it to. That’s what machine learning is. So, to get to the point where the machine can recognize it, you actually need a human to teach it what it’s looking at. There’s so much nuance that algorithms themselves and computers won’t understand without these human workers that are sitting in a room somewhere, actually clicking manually.

McCarty Carino: Now, in this specific case that you investigated, this data was collected, ostensibly, from consenting product testers, not consumers. In your follow-up report, you did speak to some of those testers of this specific series of smart vacuums for iRobot. What did they have to say?

Guo: One of iRobot’s key points was that they recorded everything with consent, but after we published our first story, 10 people that had participated in various tests from 2019 to 2022 reached out and disputed this idea of consent and what it means. A lot of them did understand that their robot vacuums would be recording them, for example, but they didn’t know that humans would be looking at the images, or they didn’t understand the extent of the annotation. There is this conception, even if you understand AI, that humans would only be stepping in when the AI, the algorithm, gets confused. And so with consent policies, privacy policies, or end user license agreements, whatever it is — most of the time, no one reads them. In addition to that, even when you do read those policies, there are so many gaps in information of what they are required to share with us. There’s so much nuance in what we’re being told and how we’re being communicated to about privacy. That is really a big problem.

McCarty Carino: Of course, not everyone is going to be testing products. But there’s clearly a growing appetite among consumers of all stripes for these smart home devices. What do you think your investigation says about the tension between privacy and convenience when it comes to relying on these kinds of tools?

Guo: One thing that multiple people who work in robot vacuums told us was that no one is trying to violate privacy on purpose. That’s their point of view. But you have to be able to make products better and live up to consumer expectations. And to do that — when AI is involved — you need to give it real data. When I started this investigation, I was concerned that this robot vacuum company was using customer data to train these images. So, it was actually a little bit of a relief to know that in this case, iRobot is not using customer data. But I think the bottom line is that, as more technology products incorporate artificial intelligence, unless there are very significant changes in how we think about and enforce privacy, the likelihood of data being used to train AI is really going to grow.

McCarty Carino: How did iRobot respond to your investigation?

Guo: IRobot confirmed that the 15 images were theirs. They told us that they did inform those 15 individuals that their images made it online. They started an investigation into how these images were leaked and ended up terminating their contract with Scale AI. They said they’re taking measures to ensure this didn’t happen again. But they haven’t responded to any questions that we have asked about what those measures are. One of the things I found really interesting is that they didn’t see anything wrong with sharing faces with their contractors. We saw a minor in the images and blocked out his face in the images that we published. But it was such a striking image because it’s this kid that’s maybe 8 or 9, and he has this expression on his face of absolute curiosity and interest. We also saw the face of a woman that was sitting on the toilet and in the screenshot that, that was shared with us. There’s a note on it for the data labelers that says, “Don’t tag this image, it’s only here for your context.” So, to me, that’s really telling that iRobot doesn’t see this as an issue. They see it as an issue that the photo is online against the nondisclosure agreements and other agreements that they have with Scale AI. But the way that the system was set up was not secure if you’re dependent on gig workers that you can’t control.

McCarty Carino: This is a specific case with a specific company’s specific device, but what do you think are the implications for consumers of technology in general?

Guo: I think the implication for consumers is that we really need to understand that when we are giving our data to whatever company it is that we’re giving it to, we have to trust not just that company, but we also have to trust every third-party service provider that they’re working with. We just don’t know that right now, and that’s a really scary thing.

Eileen Guo recently wrote a follow-up article for MIT Technology Review where she heard directly from beta testers of this particular smart vacuum. Some of them felt misled after they found out their data could have ended up online.

We reached out to iRobot for comment. Here’s an abridged version of what they had to say:

“The images are not from production robots in consumers’ homes. They are from development robots used by paid data collectors and employees in 2020. These development robots are modified with software and hardware expressly for data collection to support machine learning efforts. The modifications are not present on production robots that consumers purchase. The development robots are affixed with a clearly visible sticker stating, ‘video recording in progress.’ Data collectors are informed and acknowledge how the data will be collected. Our production/retail Roomba j Series robot that can detect and avoid obstacles, like shoes and socks, is programmed to automatically and immediately delete any image detecting a person. iRobot takes data privacy and security very seriously — not only with our customers but in every aspect of our business, including research and development.”

In an unrelated matter, iRobot is also getting some attention from the Federal Trade Commission after a little company called Amazon made a deal to acquire it for about $1.7 billion last September.

The FTC is still reviewing whether Amazon’s acquisition of yet another smart home product that sucks up personal data would harm competition in the space.

The future of this podcast starts with you.

Every day, the “Marketplace Tech” team demystifies the digital economy with stories that explore more than just Big Tech. We’re committed to covering topics that matter to you and the world around us, diving deep into how technology intersects with climate change, inequity, and disinformation.

As part of a nonprofit newsroom, we’re counting on listeners like you to keep this public service paywall-free and available to all.

Support “Marketplace Tech” in any amount today and become a partner in our mission.

The team

Daniel Shin Daniel Shin
Jesus Alvarado Assistant Producer