NASA’s data is headed for the cloud
Oct 25, 2021

NASA’s data is headed for the cloud

HTML EMBED:
COPY
NASA's earth science archive stores about 40 petabytes of data and is growing. It's a challenge to find a place to put it all.

A lot of NASA’s work has to do with what happens here on Earth, especially as we work to track and respond to the climate crisis. And at NASA’s earth sciences division, much of the data collected by future missions will be stored in the cloud, hosted by Amazon.

That’s partly because the agency needs more, well, space.

It has troves of images and readings from generations of satellites. The earth science archive stores about 40 petabytes of data. In the next four years, as it launches more missions focused on the Earth and its climate, it’s expected to hold more than five times that.

For those not familiar with a petabyte, just one is enough space to store about 250,000 full-length movies.

Kevin Murphy is the chief science data officer for NASA. He says it may be more expensive to store data in the cloud, but it should make the data easier to access, and for researchers, businesses and maybe you and me to use. The following is an edited transcript of our conversation.

Kevin Murphy smiles in front of a map of the Earth.
Kevin Murphy (Courtesy Karen Michael)

Kevin Murphy: So, we launched missions to address very specific science questions, so that’s one type of user. But this information is freely and openly available for anybody to use to conduct citizen science activities, to help plan how we work within an environment, how we make new discoveries on Mars, or from the sun. That information has so many other purposes that we need to make it broadly available.

Kimberly Adams: Part of this is you’re moving a lot of NASA’s data from various physical locations that are run and controlled by NASA and universities into the cloud. Why are you doing that?

Murphy: We’re doing that for a variety of different reasons. One we can take advantage of new types of technologies like [artificial intelligence] or machine learning in those environments a bit more easily. The second thing is that we can, especially with these very large amounts of data, remove some of the burden from the users of having to download the information and then you and kind of organize and manage it themselves. If you think about how hard it is to find photos in your photo album, about specific events or activities, we can apply similar types of AI to do similarity searches for interesting events in this giant pile of data.

Adams: I want to try to help people wrap their heads around this by having you walk me through this data story of a mission. I was looking at this specific example of the SWOT program — Surface Water and Ocean Topography. Can you tell me about that mission and the type and scale of data you expected to generate?

Murphy: This is going to be one of the first missions that we have that’s really able to look at large rivers and lakes, and map those over time to see how they change. So it’s going to be really important for a variety of different things that actually impact people on a pretty regular basis. And this is going to be one of the first satellite data streams that we have that are really cloud native. This will allow people to access the multiple petabytes that it collects each year in a much more interactive way.

Adams: For people who don’t necessarily spend a lot of time thinking about data and petabytes, and how much NASA’s collating and figuring out how to sort through, what do you think is the most important thing for them to know about the work that you do?

Murphy: The most important thing to know is that these investments that we place in scientific instruments, in satellites and rovers, the data that comes back from there is incredibly valuable. There are things that we don’t know today about the information that we collected before, and through well managed data programs, we can maintain that information for future generations to make their own discoveries.

Related links: More insight from Kimberly Adams

Kevin Murphy is NASA’s very first chief science data officer. But probably not the last.

I asked Murphy if he was worried that by working with a private company to store data, he might also be subjecting NASA’s data to its whims. For example, Amazon suspended hosting for the social network Parler earlier this year.

This chart shows the projected growth of the Earth Science Division database from 2015 until 2025. The orange area shows the order of magnitude increase. (Courtesy NASA’s Earth Observing System Data and Information System).

Murphy said NASA’s always dealt with private data companies, and this isn’t that different, but also that the agency doesn’t put all of its eggs in one basket. And that backup data might be with a different vendor or stored on a NASA-owned hard drive somewhere.

NASA’s decision to move to the cloud, as Murphy explained, was motivated by the need for space. This chart shows just how much more data your average mission brings home nowadays compared to the past.

NASA also has a pilot for a cloud computing platform called Nebula that’s basically a server farm in a shipping container, so they can move the cloud around as needed.

And if you want to learn more about that SWOT mission I was talking to Murphy about, here’s a link to the mission page. Countdown to launch is just about a year and a month away.

The future of this podcast starts with you.

Every day, Molly Wood and the “Tech” team demystify the digital economy with stories that explore more than just “Big Tech.” We’re committed to covering topics that matter to you and the world around us, diving deep into how technology intersects with climate change, inequity, and disinformation.

As part of a nonprofit newsroom, we’re counting on listeners like you to keep this public service paywall-free and available to all.

Support “Marketplace Tech” in any amount today and become a partner in our mission.

The team

Molly Wood Host
Michael Lipkin Senior Producer
Stephanie Hughes Producer
Jesus Alvarado Assistant Producer