On the first full day of his administration, President Joe Biden signed an executive order designed to ensure a data-driven response to COVID-19 and future public health threats. The administration already faces a big choice around COVID-19 data. In July, the Trump administration directed hospitals to stop sending data to the Centers for Disease Control and Prevention, and instead to send it to the Department of Health and Human Services. And HHS used the data analysis company Palantir to harmonize all that data.
At first, it was a hot mess, said Alexis Madrigal, who runs the COVID-19 Tracking Project at The Atlantic. But by the fall, the system was working, and now it’s tracking hospitalizations really well. Madrigal said the Biden administration should try to ignore the messy politics in favor of the good data. The following is an edited transcript of our conversation.
Alexis Madrigal: Part of it is [that the] CDC is seen as a less political organization, versus HHS, in which it’s seen as more a part of the administration and not standing apart. One of the things I really learned in this reporting is that that description is not totally accurate in all cases. The people who built this new hospital data system, they’re all career civil servants. And so it really takes getting pretty deep inside these agencies to really understand the dynamics within these very complex organizations.
Molly Wood: So what choices does the Biden administration face around this data collection?
Madrigal: They could try to push the data collection for hospitalizations back into CDC, into this less flexible and older system, or they could keep it inside HHS. And so one of the things that I’ve been really worried about is that to do this sort of abstract good of having the data collection be in the place where it belongs, that you actually take away this system that’s working really well right now and that’s incredibly transparent for the public.
Wood: Do you have any sense from the one week the Biden administration has existed of which way they might go?
Madrigal: I think the real thing is that the vaccine data is on fire. We saw CDC Director [Rochelle] Walensky saying she wasn’t sure about the vaccine data right now. And I’ve just got to say, I mean, that was a system that was designed and built inside the CDC.
Wood: I mean, this is arguably the first major pandemic to exist in the Big Data age, right? It sounds like you’re saying there’s just a skill set there that might not have been developed?
Madrigal: Yeah, and it’s not as if the CDC doesn’t collect data. Of course the CDC collects tons of data. But it’s kind of for different purposes. It’s one thing to need rough and ready data to make decisions today. It’s another thing to collect data for research projects over time in which you really want precise answers, but you have a lot of time to develop those datasets and the questions to ask, the processes that you build. In the pandemic response, I would say my primary criticism of the CDC on a bunch of different levels is they’ve just moved too slowly. It hasn’t seemed enough like a crisis. I mean, the early example of that, for me, was in the very early days when the CDC had put up their COVID-19 tracking apparatus, they just didn’t update it on the weekends at a time when cases were doubling. So they’d stop updating on Friday, they’d update on Monday, and there’d be twice as many cases as when they stopped. And I just thought to myself, “Guys, everyone is working the weekend right now. We need to know what’s happening, the public needs to understand what’s happening. You can’t just take the weekend off.” And I’m happy to say that the vaccine tracking that the CDC is doing, they’re updating it over the weekend. So maybe this is a good sign that the Biden administration CDC maybe is reinvigorated and has some renewed sense of purpose and are treating this like the crisis that it really is.
Wood: Talk to me a little bit about Palantir and its role in this data collection, because Palantir is the name that inspires some dread, either in the [“Lord of the Rings”], or with respect to privacy and transparency. What do we know about its role in this data collection and how much transparency there is and what they can use this data for?
Madrigal: So Palantir, it was co-founded by Peter Thiel, who, I think, for a lot of Democrats has become sort of a Republican supervillain. And Peter Thiel and Palantir have a lot of government contracts … and I think people are rightfully worried about the extent of their reach into the federal government. But here’s the thing: People used the fact that HHS’s data system, which is called HHS Protect [and] was built by Palantir, as a reason to [argue we should] move data out of HHS. The problem is that HHS Protect actually grew out of a CDC system, also built by Palantir. Also, [the National Institutes of Health], they also use Palantir. So we have a system in which Palantir is pretty thoroughly threaded throughout our public health surveillance infrastructure. In my mind, the way that I would set it up if I were doing this? Probably not. On the other hand, it’s not really an issue of HHS versus CDC. They both use Palantir. Palantir says that they don’t use that data that’s flowing into the system they built for anything else, that they basically just built the database and their hands are off of it, for what it’s worth.
Wood: Do you think we have that in writing somewhere in a federal contract and for taxpayers to see?
Madrigal: Let’s hope so.
Wood: I want to ask you about all of this data and your data, because you can’t, of course, track what you don’t measure. Are you going to keep the COVID-19 Tracking Project going? Do you think there’s still a need for that?
Madrigal: I think eventually we’ll stop data collection. And I think we’ll continue to do a lot of the really deep research that’s necessary to understand these metrics, because I think there’s really two pieces to what we do. One is, we go state by state with this army of volunteers and paid staffers, and we collect all this data and we make sense of it into national summary statistics and analysis. The other thing that we do, though, is we spend a lot of time reaching out to states. I mean, we probably had hundreds or even thousands of contacts with state officials at this time. We also work a lot with understanding the federal bureaucracy and data systems. And that work, I think, will continue long into the future, because we don’t actually know what happened in many states. And we don’t really know how well a lot of these data systems actually are working on the ground. And there’s so much work left to be done there. All that said, the right place for this data collection and publication is the federal government. It is a function of a functional state to do this. And I think, eventually, we shouldn’t do it. We’ve said that from the first day of the project, and I think we’re seeing some encouraging signs than we have over the last few months that the feds are really, really stepping up with the amount of data they’re publishing, the quality of that data and the transparency of the methods that are used to create that data.
Wood: I do want to go back to, though, what you said about vaccine data being “on fire.” What is going on there? What are we missing?
Madrigal: Well, we don’t know precisely what the problems are with those vaccine data systems. But it strikes me that the most likely thing is that the lag time of understanding how many vaccines have been used, actually administered, is throwing off some of the complex, supposedly real-time logistics of getting more vaccines out. And we probably, as Americans, rely too much on the idea of precise data, as opposed to building systems that are resilient to rough data. And I think that we actually have a lot to learn from some of the poorer countries of the world during this pandemic, where they just didn’t expect to have perfect data, so they didn’t build systems that were reliant on it. That means you need to build in other things, but it’s doable. You don’t always need precise data in order to have an effective public health response as we see, for example, in Vietnam.
Related links: More insight from Molly Wood
Right now, Biden’s executive order is asking for a review of data-collecting procedures and, notably, asks the federal government to figure out how to make more COVID-19 data available to the public.
Here’s Madrigal’s piece from the Atlantic about the HHS system, how it functions and how it got to be working as well as it is. And here’s also a link to that executive order I mentioned, which, after talking with Madrigal, definitely seems to validate the idea that we’re going to try to create perfect data systems to tackle our next big crisis, especially since creating data-driven responses to future public health threats is in the title of the order. But we’re still at the information gathering stage, so we’ll reserve judgment for now.
And we should be clear that while the HHS data is doing a good job at tracking hospitalizations, we still don’t have solid data around testing, or even remotely enough testing, so we’re not counting cases accurately. And despite the appalling death count, it’s likely that it’s too low, maybe by 100,000, according to the CDC.
The future of this podcast starts with you.
Every day, the “Marketplace Tech” team demystifies the digital economy with stories that explore more than just Big Tech. We’re committed to covering topics that matter to you and the world around us, diving deep into how technology intersects with climate change, inequity, and disinformation.
As part of a nonprofit newsroom, we’re counting on listeners like you to keep this public service paywall-free and available to all.