Artificial intelligence crowdsources data to speed up drug discovery

The system would help pharmaceutical companies share info while keeping it secret

DRUG DATA A new computing system lets pharmaceutical companies pool data to train AI programs for discovering new medications — without having to share confidential information with competitors.

foxaon1987/Shutterstock

By Maria Temming

October 18, 2018 at 2:00 pm - More than 2 years ago

A new cryptographic system could allow pharmaceutical companies and academic labs to work together to develop new medications more quickly — without revealing any confidential data to their competitors.

The centerpiece of this computing system is an artificial intelligence program known as a neural network. The AI studies information about which drugs interact with various proteins in the human body to predict new drug-protein interactions.

More training data beget a smarter AI, which was a challenge in the past because drug developers generally don’t share data due to intellectual property concerns. The new system allows an AI to crowdsource data while keeping that information private, which could encourage partnerships for speedier drug development, researchers report in the Oct. 19 Science.

Identifying new drug-protein interactions can uncover potential new treatments for various diseases. Or it could reveal whether drugs interact with unintended protein targets, which might indicate if a medication is likely to cause particular side effects, says Ivet Bahar, a computational biologist at the University of Pittsburgh not involved in the work.

In the new AI-training system, data pooled from research groups get divvied up among multiple servers, and the owner of each server sees what appear to be only random numbers. “That’s where the crypto-magic happens,” says computer scientist David Wu of the University of Virginia in Charlottesville, who wasn’t involved in the work. Although no individual participant can see the millions of drug-protein interactions that compose the training set, the servers can collectively use that information to teach a neural network to predict the interactivity of previously unseen drug-protein combinations.

“This work is visionary,” says Jian Peng, a computer scientist at the University of Illinois at Urbana-Champaign not involved in the study. “I think [it] will lay the groundwork for the future of collaborations in biomedicine.”

MIT computational biologist Bonnie Berger and colleagues Brian Hie and Hyunghoon Cho evaluated their system’s accuracy by training a neural network on about 1.4 million drug-protein pairs. Half of these pairs were drawn from the STITCH database of known drug-protein interactions; the other half comprised drug-protein pairs that don’t interact. When shown new drug-protein pairs known to interact or not, the AI picked out which sets interacted with 95 percent accuracy.

To test whether the system could identify hitherto unknown drug-protein interactions, Berger’s team then trained the neural network on nearly 2 million drug-protein pairs: the entire STITCH dataset of known interactions, plus the same number of noninteracting pairs. The fully trained AI suggested several interactions that had never before been reported or that had been reported but were not in the STITCH database.

For instance, the AI identified an interaction between estrogen receptor proteins and a drug developed to treat breast cancer called droloxifene. The neural network also found a never-before-seen interaction between the leukemia medication imatinib and the protein ErbB4, which is thought to be involved in different types of cancer. The researchers confirmed this interaction with lab experiments.

This secure computing network may also encourage more collaboration in areas outside of pharmaceutical development. Hospitals could share confidential health records to train AI programs that predict patient prognoses or devise treatment strategies, Peng says.

“Whenever you want to do a study on a large number of people on behavior, on genomics, on medical records, legal records, financial records — anything that’s privacy-sensitive, these kinds of techniques can be very useful,” Wu says.