Researchers at Johns Hopkins University tested several AI models’ ability to interpret how humans were interacting with each other in short videos and found they were largely unable to consistently describe what was happening. Leyla Isik, professor of cognitive science at Johns Hopkins University and senior scientist of the study, explains what their findings could mean for the future of AI development.
Last week, Marketplace’s Stephanie Hughes met with Leyla Isik, a professor of cognitive science at Johns Hopkins University, at her lab where she's got some watercolors of brains on the wall.
Isik is also a senior scientist on a new study looking at how good AI is at reading social cues. She and her research team took short videos of people doing things - two people chatting, two babies on a playmat, two people doing a synchronized skate routine - and showed them to human participants. After, they were asked questions like: Are these two communicating with each other? Is it a positive or negative interaction?
Then, they gave the same videos to over 350 open source AI models, which is a lot, though it didn't include all the latest and greatest ones out there. And Isik found that the AI models were a lot worse than humans at understanding what was going on.
Marketplace’s Stephanie Hughes visited Isik at her lab in Johns Hopkins to discuss the findings.
The following is an edited transcript of their conversation.
Leyla Isik: One thing we found was that actually none of the models could do a good job of matching behavior or brain responses to these different social attributes, like, are people communicating? Surprisingly none of them could even do a great job at telling us things like are these people facing each other? I think we had a feeling that there would be elements of this that the AI could not capture, but we were pretty surprised by, in general, the poor performance.
Stephanie Hughes: And so basically, the AI’s across the board couldn’t tell if people were communicating, if they were facing each other?
Isik: There was some variety, so like I said, we tested 350 models. Some models were better at it than others which yielded some interesting insights but no single model could provide a match to all the human behaviors we tested.
Hughes: Why does this matter? Why would it helpful for AI to be good at this?
Isik: Yeah, well I think anytime you want to have AI interacting with humans, you want to know what those humans are doing, what they’re doing with each other, what they’re about to do next. And I think this just really highlights how far a lot of these systems are from being able to do that.
Hughes: What do your findings mean for possible business applications for artificial intelligence?
Isik: Yeah, I think the businesses where this is probably most closely being applied or currently being applied are things like self-driving cars. People - the drivers - have this intentionality and the pedestrians and you have to be able to understand that. For example, I think it’s very hard for self-driving cars to make an unprotected left turn.
Hughes: It’s hard for humans too.
Isik: It’s hard for humans too sometimes and when you do that, you have to really look around and think about who is doing what next and those sorts of things. And I think this just highlights how much more work needs to be done, both in the development of these systems to improve them but also I think it highlights some new ways to be stress testing these systems against humans.
Hughes: I think some people envision this future where we all work alongside our AI colleagues or buddies and I wonder, what do your findings mean for the short term, at least, about AI’s ability to do that? Like, will it be the Michael Scott in “The Office”?
Isik: Perhaps, but I think there are even some more baseline findings, like baseline problems than that. Like I said, you want it to be able to tell what the person is doing, what it’s close to, who’s close to who, and even those more basic things than reading intentions it seems to be lacking in, as well.
Hughes: You know, I’m a grown-up and I’m still learning how to pick up on social cues, like it’s a life-long process. Do you think the AI will get there?
Isik: Yeah, you mentioned you were a grown-up. I mean, I think it’s really striking how much of this even little babies can do though. Not to the full sophisticated level that we keep developing and refining through development and through our lifetime, but there are some base abilities that seem to be present from at least very early in childhood. And I think AI should be able to get there and I think the progress AI has been making over the last decade or so has been really amazing. But I think that some of these problems might require a fundamentally different approach than sort of the brute force, just get more data and bigger networks solutions that have taken us pretty far, but I think there might be limits to that.
Hughes: Another place that AI is being used in customer service. I wonder what your findings could mean for customer service and AI’s use in that?
Isik: Yeah, I mean so I think right now a lot of those applications are all text-based, like chatbot-type things, but if you really wanted to scale that up, or any sort of assistive robots, AI, you would want them to interact with people based on visual cues. We use visual cues all the time to interact with each other. So I think that this has important implications anytime you want an AI to be interacting with humans.
Hughes: Do you have any advice for AI makers?
Isik: Yeah, I mean I think historically and still loosely to some extent, AI has drawn a lot of inspiration from humans, from cognitive science, from neuroscience. And I think in the latest AI boom, those two fields have sort of diverged. But I think this is an important point to start coming back together and where the things we know that humans care about and the sort of structure we imbue on the world can help improve these AI models.