Machine learning streamlines the complexities of making better proteins

The AI framework predicts how proteins will function with several interacting mutations

An illustration of interconnected nodes like in a protein. Machine learning is helping to find the right protein faster.

A new machine learning framework called MULTI-evolve dramatically condenses the protein engineering process.

Eugene Mymrin/Moment/Getty Images

Making high-performance proteins for medicines or consumer products can take trial after trial of tweaks, experiments and fine-tuning. A new machine learning framework squeezes all that into a single round of testing.

The technique, called MULTI-evolve, predicts how proteins will behave when several of their amino acids are swapped for others. MULTI-evolve blends laboratory experiments with machine learning to find these upgraded proteins, researchers report February 19 in Science.

Specially-crafted proteins play a role in everyday products like medicines, biofuels and even laundry detergent. Scientists usually need to swap out multiple amino acids during the design process to boost a protein’s performance. But replacing one amino acid with another can change how the next swap will affect the protein’s function, so finding combinations of swaps that work well together often requires many iterative rounds of modifications and laboratory tests. “It’s this very high-dimensional search problem where we effectively do guess and check,” says Patrick Hsu, a bioengineer at the University of California, Berkeley, and the Arc Institute in Palo Alto, Calif.

Hsu and colleagues built the MULTI-evolve workflow to cut out most of those iterations and predict high-performing proteins with multiple swaps, or mutations, in one round of testing. To do that, they needed information about how different mutations affected each other. For each protein the team targeted, the workflow had three steps. First, the researchers used either previous data or machine learning techniques to predict how single amino acid swaps would affect protein function. Then, to establish how the mutations interacted with each other, they made a series of proteins that each had two of those mutations in the lab and tested how well each one worked. Finally, they trained a machine learning model on that laboratory data and asked it to predict how well the target protein would function with five or more mutations.

The team tested MULTI-evolve on three proteins, including an antibody relevant to autoimmune diseases and a protein used in CRISPR gene editing. In each case, the model found several combinations of mutations that in laboratory tests outperformed the original proteins, suggesting the model could pick out a set of swaps that work well together.

Among the many protein jobs MULTI-evolve could streamline, Hsu highlighted two: using one protein to track another’s movement inside a cell and building better gene therapies for people whose bodies don’t produce certain enzymes. “We’re excited about this work,” Hsu says. “I think there’s tremendous interest in how this actually changes the practice of science.”

Skyler Ware was the 2023 AAAS Mass Media Fellow with Science News. She has a Ph.D. in chemistry from Caltech, where she studied chemical reactions that use or create electricity.