How machine learning is unfolding the mysteries of proteins
Aug 10, 2022

How machine learning is unfolding the mysteries of proteins

A public database containing millions of predicted protein structures generated by a machine learning program could speed drug discovery, vaccine development and more.

Understanding proteins — like the spike protein of the coronavirus — is superimportant for the study of diseases and the development of drugs and vaccines.

So there’s a lot of excitement about the AlphaFold Protein Structure Database, built by the artificial intelligence lab DeepMind with the European Molecular Biology Laboratory. Researchers there have used machine learning to predict and map more than 200 million protein structures from all kinds of organisms.

Meghan McCarty-Carino of “Marketplace Tech” spoke with Matthew Higgins, professor of molecular parasitology at the University of Oxford. He studies malaria parasites for a potential vaccine, and he said the database has sped up that work.

The following is an edited transcript of their conversation.

Matthew Higgins: So machine learning starts with all of the protein structures which are already known, and then it learns from those protein structures. It looks at how protein molecules fold up to predict how proteins, that we don’t know the structure, how they fold too. And this is really helpful. There are two main methods by which we could work out the structure of a protein. And one of these is called the electron microscopy method. It might give us quite a fuzzy view. And yet, we can take the predicted structure from a machine learning approach and dock it into that fuzzy view, see how well these two fit together, and that can allow us to generate a much sharper and much more detailed map.

Meghan McCarty-Carino: How did this machine learning protein database help you in your work on the malaria parasite molecule?

Front facing photo of Professor Matthew Higgins from the University of Oxford
Matthew Higgins (Courtesy University of Oxford)

Higgins: Yeah, absolutely. We’ve been working on a particular molecule for a number of years. And we find it really hard to work out the structure. The postdoctoral researcher in my lab was really banging her head against the wall trying to work out how to do this. And then the AlphaFold database came along. And suddenly, she could see a great match between the detailed AlphaFold model and the fuzzy view that we were getting from our experimental information, and she could put them together. And she could understand straightaway how this molecule worked and its structure and its architecture. So it really helped us accelerate that project and move straight on to the next phase of the project, which is to test their ability as vaccines in this kind of preclinical assay. And I know from colleagues around the world that AlphaFold is really helping them to push forwards really quickly projects like this, to make sure that they can move on to the next stage, which is to test things as vaccines or to design drugs.

McCarty-Carino: So this AlphaFold database is also open to the public. I mean, what’s the significance of that, especially for researchers like yourself?

Higgins: So it’s valuable for the sort of work that we do. But it’s also particularly valuable for people who are doing large-scale comparative studies. Let’s say you want to target a particular molecule from a bacteria with a drug and the human body has a similar-looking molecule. You can see how similar it is and in which ways it’s similar or different by comparing these models. And that will allow you to work out, for example, how you would change your drug molecule so it doesn’t bind to the human enzyme, it only binds to the bacterial enzyme, reducing the chances of off-target effects from your drug molecules.

McCarty-Carino: And would you expect this database to sort of usher in kind of an era of accelerated advancements in the broader biotech sector?

Higgins: Yes, absolutely. I mean, biotech companies for many years or decades have been using structural information in order to design drug molecules. I mean, the other thing that is a huge strength of these deep-learning approaches is actually to create proteins totally from scratch. So for example, you could design an enzyme which degrades plastic or which gets rid of a waste product using the AlphaFold models in the protein predictions.

Higgins told me that searching the database is basically like doing a Google search. And I should note the AI company that developed it — DeepMind — is a subsidiary of Google’s parent company, Alphabet.

AlphaFold isn’t the only machine learning program that’s being used to predict these protein structures. Scientists at the University of Washington created their own tool called RoseTTAFold, which, they say, can predict a protein’s structure “in as little as ten minutes on a single gaming computer.”

You might notice the “fold” theme in the naming. That’s because the chain of amino acids that make up proteins folds, sometimes sort of like a Slinky, giving proteins their structure.

And I mentioned earlier, a protein we all know and really don’t love is the spike protein of the coronavirus. Scientists at the University of California, San Francisco, have released a working paper about how they used the AlphaFold database to study how COVID works and how to design new drugs to fight it.

The future of this podcast starts with you.

Every day, the “Marketplace Tech” team demystifies the digital economy with stories that explore more than just Big Tech. We’re committed to covering topics that matter to you and the world around us, diving deep into how technology intersects with climate change, inequity, and disinformation.

As part of a nonprofit newsroom, we’re counting on listeners like you to keep this public service paywall-free and available to all.

Support “Marketplace Tech” in any amount today and become a partner in our mission.

The team

Daniel Shin Producer
Jesús Alvarado Associate Producer