The Centers for Disease Control and Prevention estimates that up to 23 million Americans have been affected by long COVID. That term describes a wide variety of conditions, from brain fog and chronic fatigue to neurological problems and blood clots, that persist for months or even years after infection.
But clear answers about exactly how often this happens, who’s most at risk and why, are still elusive.
Marketplace’s Meghan McCarty Carino speaks with Emily Pfaff, an assistant professor of medicine at the University of North Carolina at Chapel Hill who uses artificial intelligence to analyze electronic health records, looking for patterns that might better identify the syndrome and treat patients. Pfaff said some of the most common markers the algorithm detects are fatigue, shortness of breath and frequent doctor visits.
The following is an edited transcript of their conversation.
Emily Pfaff: In order to best understand long COVID, the best thing that we have in our toolbox is clinical research. But you can’t perform clinical research on patients with long COVID if you can’t figure out who has long COVID. And so, we are using a machine-learning model or artificial intelligence to take patients that we know have long COVID, or we’re pretty sure they have long COVID, and find other patients in our large data set that look like those patients. So machine learning is really useful in being able to find patterns and then match those patterns in people it hasn’t seen before. And that’s exactly how we’re using it here.
Meghan McCarty Carino: Tell me more about some of the hallmark symptoms the algorithm identified.
Pfaff: So age is actually the most important feature in our model, which of course is not a symptom. But it is indicative of the fact that the age you are really does have an effect on your likelihood of getting long COVID. It has a nonlinear relationship with whether you have long COVID or not. So it’s not easy to say, you know, if you’re 17 years old, you’re more likely than if you’re 25 years old. But certainly age does have an impact on the model output. The top features that we see that the model uses to match up people that it doesn’t know about to people that it has seen as potentially having long COVID is things like shortness of breath, fatigue, that’s certainly a hallmark symptom that you’ve probably seen in plenty of video reports about long COVID. That’s a huge indicator. How often you go to the doctor is a very important feature. So people who are maybe bouncing around between specialists or just going into the office a lot because they feel terrible, those folks are clearly more likely to be classified as long COVID. And then new prescriptions for things like asthma inhalers — albuterol — and other sort of asthma-related medications are also important as well as female sex.
McCarty Carino: Something we talk a lot about on this show is that algorithms are only as smart as the data that they’re trained on. Yours relies on these electronic health records. What kind of biases are inherent to this data?
Pfaff: That’s something that we spend a lot of time thinking about because it’s so important to consider. The one that I feel the strongest about when it comes to electronic health record is the fact that the data that we have are data about people who were able to go to the doctor. So if you don’t have insurance, if you can’t get time off of work, if you don’t have child care, you’re not going to be in my data set. One other thing that we have heard from some of the patient advocates in the long COVID community is that many long COVID patients have gotten frustrated with care. Just because it’s so difficult to find treatments that work and that make people feel better, and so that’s actually another group of people that’s not represented in our data set. And that’s a group that I’m really afraid of losing out on as well.
McCarty Carino: And then I guess on the flip side, you know, people who are more likely to be persistent in seeking care may also share certain characteristics.
Pfaff: Absolutely. And something that we talk about in using electronic health record for any use case, not just long COVID, is that it’s always going to be biased towards sicker people because sicker people use care more often. And all of these things are necessary to keep in the back of your mind when you’re working with electronic health record data. It’s not to say that the data shouldn’t be used or that they’re not useful. But if you don’t keep the caveats in mind, you’re likely to come up with skewed conclusions.
McCarty Carino: Now, how do you see this tool being used in real-world health care settings?
Pfaff: My dream would be for AI models like this to be able to be run over the entire data set of a health care system to potentially identify patients that could be good matches for treatment trials, that would be good matches for potentially long COVID specialty care. And so these kinds of models, they’re not going to be right 100% of the time, probably not even quite close to that. But they are going to narrow down the set of people that are likely highly enriched for folks that do have long COVID and could benefit from getting a phone call or getting a recruitment call for a trial.
McCarty Carino: And what about other diseases beyond long COVID? Maybe other post-viral syndromes which we know exist but have not been studied to the extent that long COVID has or even other chronic or rare diseases.
Pfaff: Absolutely. So I think that this kind of model and the idea, by the way, of using machine learning to phenotype diseases in electronic data, that’s not a new idea. And I think that it applies really well to things like long COVID, which is the innovation here, because long COVID is a new disease. So I think that this kind of methodology works really quite well with, and is an appealing option with new diseases with, as you said, rare diseases that may not have good indicators of the hallmark symptoms, as well as diseases that have kind of diffuse symptoms the way that long COVID does — where one lab test isn’t going to tell you whether a patient has disease X or does not. It’s more like a constellation of features that AI is really good at synthesizing. Humans, not so much.
Related links: More insight from Meghan McCarty Carino
We’ve got the full published article by Emily Pfaff and her partners, along with a summary from the National Institutes of Health, which supported Pfaff’s research. In fact, the electronic health records she used are part of a public data set the NIH has made available to aid research that improves our understanding of long COVID.
Electronic health records were also used in that CDC research I mentioned that found up to 23 million Americans, about 1 in 5 adult COVID survivors, have experienced some symptom of long COVID. Now we should note that study didn’t consider vaccination status and was done in the first 18 months of the pandemic.
As Pfaff pointed out, analyzing electronic health records does present some limitations, but there are a lot of opportunities there.
The Mayo Clinic recently launched a startup incubator for health care AI companies. They’ll be given access to the health care network’s anonymized database of 10 million patients. The first cohort of companies is working to improve care for patients with chronic conditions like diabetes and epilepsy and better predict the needs of patients who message their providers through these systems.
Not gonna lie — there is something a little freaky about the proliferation of AI in health care, given the recent news of Google’s maybe sentient chatbot. I mean, Hal 9000 in “2001: A Space Odyssey” had great bedside manner at first … but we all know how that ended.