🎁 'Tis the season to support public service journalism Donate Now
A new machine learning model could help public health officials get ahead of the next crisis
Nov 4, 2022

A new machine learning model could help public health officials get ahead of the next crisis

NYU researcher Daniel Neill says the algorithm, based on emergency department visits, can identify illness clusters. But epidemiologists would still need to interpret the data and decide how to respond.

Diagnosing and containing a disease outbreak, or the health effects of a disruptive event like a natural disaster, can be a huge task. A study out Friday from New York University suggests that a new machine learning model could improve health officials’ ability to respond to future pandemics and other public health crises.

The research was done in partnership with Carnegie Mellon University and New York City’s Department of Health and Mental Hygiene.

Marketplace’s Kimberly Adams speaks with Daniel Neill, a computer science professor at NYU and the director of its Machine Learning for Good Laboratory, which released the study. He explains how this machine learning model works. The following is an edited transcript of their conversation.

Daniel Neill: Our approach uses textual data from emergency department visits. So in particular, the main thing that the patient has come to the emergency department for. And that textual data contains a lot richer data than just “a person has flulike symptoms.” We might know exactly what kind of symptoms they have or what they’ve been exposed to, and so by detecting patterns in this textual data, we can surface new outbreaks, things that public health was not already looking for as well as other sorts of events.

Kimberly Adams: How might this algorithm be deployed in a health department to maybe identify new or unidentified disease outbreaks?

Neill: The hope is that public health departments would actually be running this sort of approach on a daily basis, where each day the algorithm would surface clusters in the past, let’s say, 24 hours of data that public health could look at and, if necessary, respond to. It can also help public health deal with all of the myriad things that they have to address on a daily basis, which could be a cluster of cases comes in from smoke inhalation, or there’s some sort of chemical exposure, or we’re seeing a new cluster of drug overdoses because of some new synthetic drug. So, again, the goal is to give them day-to-day awareness of everything that’s going on in their jurisdiction.

Adams: So maybe you might spot, I don’t know, an outbreak of something like Legionnaires’ disease earlier than you would otherwise?

Neill: Yeah, that’s right. That’s a nice example of something with rare symptoms. And you can also imagine if something comes along with novel symptoms, things we’ve never seen before, like it’s causing people’s noses to turn blue and fall off. Now, it shouldn’t take very many cases of something like that for us to realize that we’ve got something new and different that public health needs to deal with. But the irony is that typical disease surveillance systems will just map those to your existing syndrome categories and essentially miss the fact that there actually is something novel there. So what we provide is a safety net to catch all of those sorts of events that other systems might miss.

Adams: What happens if there’s a lapse in the data, or you just don’t have people talking about their symptoms?

Neill: That’s right. That’s absolutely a limitation of the system, which is it’s dependent on data quality, data availability and data timeliness. So, for example, if a jurisdiction is not getting emergency department data from local hospitals in a timely fashion, that’s going to impact all of their ability to respond to any patterns in that data. Similarly, if there were major errors in the way data was collected, those have the potential to propagate into what we can detect using that data. Also, you’re absolutely right, things that might not result in emergency department visits would not necessarily be detectable through this particular data source. There are, however, a wide variety of data sources that public health do use for outbreak detection.

Adams: One of the ways you all tested out this algorithm was looking at data that came into hospitals after Hurricane Sandy. Can you walk me through what you saw and how the algorithm responded to it?

Neill: Sure. We found a very interesting progression of clusters of cases in New York City emergency departments. In the day or two after Sandy hit, we saw kind of what we’d expect, which is a lot of acute cases — people coming in with leg injuries or shortness of breath, other things that are kind of direct results of the hurricane’s impact. A couple of days after that, we started seeing clusters of cases related more to mental health issues. So people coming in with things like depression and anxiety. And then a few days after that, we saw yet another type of case. We saw people coming into the emergency department for things like dialysis or methadone maintenance. These are all things that typically would not be dealt with in a hospital emergency department. But because all of the outpatient clinics were closed, people essentially had to use the ED for those reasons as well. So what this really shows us is the progression of different stresses on an emergency department in the aftermath of a natural disaster. And I think it’s very informative to hospital ED personnel knowing what they might need to anticipate and be ready for and have adequate resources to address all of these different sorts of problems.

Adams: Why is using machine learning a better tool for this particular set of public health problems than the way that we’ve been doing it before?

Neill: By no means is this a task where [artificial intelligence] should be replacing humans. So what our system does is it makes the humans, health epidemiologists, aware of events that are emerging in the data that they might not otherwise see. So surfacing what is important in all of this massive, complex data that a human might care about and might want to respond to is really the key.

The future of this podcast starts with you.

Every day, the “Marketplace Tech” team demystifies the digital economy with stories that explore more than just Big Tech. We’re committed to covering topics that matter to you and the world around us, diving deep into how technology intersects with climate change, inequity, and disinformation.

As part of a nonprofit newsroom, we’re counting on listeners like you to keep this public service paywall-free and available to all.

Support “Marketplace Tech” in any amount today and become a partner in our mission.

The team

Daniel Shin Daniel Shin
Jesus Alvarado Assistant Producer