Hunting for New Drugs with AI

The pharmaceutical industry is in a drug-discovery slump. How much can AI help?

THERE ARE MANY REASONS that promising drugs wash out during pharmaceutical development, and one of them is cytochrome P450. A set of enzymes mostly produced in the liver, CYP450, as it is commonly called, is involved in breaking down chemicals and preventing them from building up to dangerous levels in the bloodstream. Many experimental drugs, it turns out, inhibit the production of CYP450—a vexing side effect that can render such a drug toxic in humans.

Drug companies have long relied on conventional tools to try to predict whether a drug candidate will inhibit CYP450 in patients, such as by conducting chemical analyses in test tubes, looking at CYP450 interactions with better-understood drugs that have chemical similarities, and running tests on mice. But their predictions are wrong about a third of the time. In those cases, CYP450-related toxicity may come to light only during human trials, resulting in millions of dollars and years of effort going to waste. This costly inaccuracy can, at times, feel like “the bane of our existence,” says Saurabh Saha, senior vice president of research and development and translational medicine at Bristol-Myers Squibb.

Inefficiencies such as this one contribute to a larger problem: the $1-trillion global pharmaceutical industry has been in a drug development and productivity slide for at least two decades. Pharmaceutical companies are spending more and more—the 10 largest ones now pay nearly $80 billion a year—to come up with fewer and fewer successful drugs. Ten years ago every dollar invested in research and development saw a return of 10 cents; today it yields less than two cents. In part, that is because the drugs that are easiest to find and that safely and effectively treat common disorders have all been found; what is left is hunting for drugs that address problems with complex and elusive solutions and that would treat disorders affecting only tiny portions of the population—and thus could return far less in revenue.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

Because finding new, successful drugs has become so much harder, the average cost of bringing one to market nearly doubled between 2003 and 2013 to $2.6 billion, according to the Tufts Center for the Study of Drug Development. These same challenges have increased the lab-to-market time line to 12 years, with 90 percent of drugs washing out in one of the phases of human trials.

It’s no wonder, then, that the industry is enthusiastic about artificial-intelligence tools for drug development. These tools do not work by having expert-developed analytical techniques programmed into them; rather users feed them sample problems (a molecule) and solutions (how the molecule ultimately behaves as a drug) so that the software can develop its own computational approaches for producing those same solutions.

Most AI-based drug-discovery applications take the form of a technique called machine learning, including a subset of the approach called deep learning. Most machine-learning programs can work with small data sets that are organized and labeled, whereas deep-learning programs can work with raw, unstructured data and require much larger volumes. Thus, a machine-learning program might learn to recognize the different features of a cell after being shown tens of thousands of examples of photographs of cells in which the parts are already labeled. A deep-learning version can figure out those parts on its own from unlabeled cell images, but it might need to look at a million of them to do it.

Many scientists in the field think that AI will ultimately improve drug development in several ways: by identifying more promising drug candidates; by raising the “hit rate,” or the percentage of candidates that make it through clinical trials and gain regulatory approval; and by speeding up the overall process. A machine-learning program recently deployed by Bristol-Myers Squibb, for instance, was trained to find patterns in data that correlate with CYP450 inhibition. Saha says the program boosted the accuracy of its CYP450 predictions to 95 percent—a sixfold reduction in the failure rate compared with conventional methods. These results help researchers quickly screen out potentially toxic drugs and focus instead on candidates that have a stronger shot at making it all the way through multiple human trials to U.S. Food and Drug Administration approval. “Where AI can make a huge difference is having drugs that fail early on, before we make all that investment in them,” says Vipin Gopal, chief data and analytics officer at Eli Lilly.

Resources are now piling into the field. AI-based drug-discovery start-ups raised more than $1 billion in funding in 2018, and as of last September, they were on track to raise $1.5 billion in 2019. Every one of the major pharmaceutical companies has announced a partnership with at least one such firm. Only a few AI-discovered drugs are actually in the human-testing pipeline, however, and none has begun phase 3 human trials, the gold-standard test for experimental drugs. Saha concedes that it will be several years before he can say for sure whether the company’s hit rates will go up as a result of the AI prediction rate of CYP450 inhibition. For all the hype in the industry, it is far from certain that early results will translate to more and better drugs.

SIFTING THROUGH MILLIONS OF MOLECULES

EMERGING AI PROGRAMS are not exactly a revolutionary update in the drug industry, which has for some time been building sophisticated analytical solutions that aid with drug development. The rise of powerful statistical and biophysical modeling programs well over a decade ago as part of the growth of the field of bioinformatics—the quest to use computational tools to derive biological insights from large amounts of data—led to tools that can predict the properties of molecules. But these programs have been limited by scientists’ incomplete understanding of how molecules interact: they cannot tell conventional software how to find insights in data when they do not know what elements of the data are most important and how they relate to one another. Imbued with the ability to derive their own insights into which data elements matter, newer AI programs can extract better predictions for a wider range of variables.

AI tools tackle different aspects of drug discovery in several ways. Some AI companies, for example, are focusing on the problem of designing a drug that can safely and effectively work on a known target—usually a specific, well-studied protein that is associated with a disease. The goal is typically to come up with a molecule that can chemically bind to the target protein and modify it so that it no longer contributes to the disease or its symptoms. Cyclica, a Canadian firm, puts its software to work on matching the biophysical structures and biochemical properties of millions of molecules to the structures and properties of some 150,000 proteins to uncover molecules likely to bind to target proteins, as well as those to avoid.

But molecules that are good candidates as drugs still have to jump through other hoops. Those include making it through the gut into the bloodstream without being immediately broken down by the liver or metabolic processes; working in a particular organ such as the kidney without disrupting other organs; avoiding binding to and incapacitating any of the thousands of other proteins in the human body that are important to health; and breaking down and leaving the body before drug levels become potentially dangerous. Cyclica’s AI software takes all those requirements into consideration. “A molecule that can interact with a protein target can usually interact with upward of 300 proteins,” Cyclica’s CEO Naheed Kurji says. “If you’re designing a molecule, it behooves you to consider the other 299 interactions that could have disastrous effects in humans.”

There is growing recognition among biomedical researchers that complex diseases such as cancer and Alzheimer’s involve hundreds of proteins, and hitting just one of them is not likely to be disruptive enough. Cyclica is attempting to find individual compounds that can interact with dozens of target proteins yet avoid interacting with hundreds of other proteins, Kurji explains. Currently under development, he adds, is the incorporation of a wealth of anonymized global genetic data about variations in proteins, so that the software can specify which patients the candidate drugs would work best on. Kurji claims that together these features will eventually be able to shave five years off the typical seven-year-long time frame for bringing a candidate drug from initial identification to human trials.

Merck and Bayer are among the big pharma companies that have announced partnerships with Cyclica. As is the case with most AI-pharma partnerships, the companies are not releasing much insight into exactly what AI-generated drug candidates may be coming out of the collaborations. But Cyclica has shared some details of its successes in identifying a key target protein linked to already FDA-approved drugs for systemic scleroderma, an autoimmune disease of the skin and other organs, as well as one linked to the Ebola virus. Each drug is already FDA-approved for the treatment of other disorders—HIV and depression, respectively—which means they both could be quickly “repurposed” for the new applications if the research continues to pan out.

Sometimes researchers identify a target protein that might play a critical role in disease but find that—as is true of about 90 percent of the proteins in the human body—not much is known about its structure and properties. With little data to go on, most machine- and deep-learning programs will not be able to figure out how to “drug” the protein—that is, come up with compounds that will bind to it and meet the other criteria for safety and efficacy. A handful of AI companies are focusing on these kinds of “small data” problems, including Exscientia, which uses its software to hunt down molecules that might work with a target protein. It can produce useful insights with as few as 10 pieces of data about a protein, says the company’s CEO, Andrew Hopkins, a professor of medicinal informatics at the University of Dundee in Scotland.

Exscientia’s algorithms compare the limited information available about a target protein against a database of about a billion protein interactions. This step narrows down the list of possible compounds that might work and specifies what additional data would help further refine the focus. Such data might come from looking at tissue samples to learn more about how the protein behaves in the body, for example. The resulting new data are then fed into the software, which pares the list again and suggests another round of needed data. This process is repeated until the software is ready to generate a manageable list of compounds that are favorable drug candidates for the target.

Hopkins claims that Exscientia’s process can cut the time spent in discovery from 4.5 years to as little as one year, reduces discovery costs by 80 percent and results in one-fifth the number of synthesized compounds as is normally needed to produce a single winning drug. Exscientia is partnering with biotech giant Celgene in an effort to find new potential drugs for three targets.

Meanwhile an Exscientia partnership with GlaxoSmithKline has led to what the companies say is a promising molecule targeting a novel pathway to treat chronic obstructive pulmonary disease. But as with any AI company addressing drug development, Exscientia simply has not been in the game long enough to have generated enough new candidates that could have made it through to late-stage trials—a process that typically takes five to eight years. Hopkins claims one of the candidates Exscientia has identified may reach human trials as early as this year. “At the end of the day we’ll be judged on the drugs we deliver,” he says.

THE NEED FOR NEW TARGETS

FINDING A MOLECULE to hit a new target is not the only major challenge in drug discovery. There is also the need to identify targets in the first place. To spot proteins that might have roles in diseases, biopharma company Berg applies AI to sift through information derived from human tissue samples. This approach aims to solve two problems that hang over most research into drug targets, according to Berg’s CEO Niven R. Narain: the efforts tend to be based on a researcher’s theory or hunch, which can bias the results and overly restrict the pool of candidates, and they often turn up targets that are correlated to the disease but do not ultimately prove causative, which means drugging them will not help.

Berg’s approach involves plugging in every piece of data that can be wrung out of a patient’s tissue samples, organ fluids and bloodwork. These extracted data include genomics, proteomics, metabolomics, lipidomics, and more—an unusually broad range to consider in a hunt for targets. Samples are taken from people with and without a particular disease and at different stages of disease progression. Living cells from the samples are exposed in the laboratory to various compounds and conditions, such as low levels of oxygen or high levels of glucose. This method produces data on corresponding changes ranging from a cell’s ability to produce energy to the rigidity of its membrane.

All the data are then run through a set of deep-learning programs that search for any differences between nondisease and disease states, with an eye to eventually focusing on proteins whose presence seem to have an impact on the disease. In some cases, those proteins become candidates as targets, at which point Berg’s software can start searching for compounds to drug those targets. What is more, because the software can discern when the target seems to cause disease health in only a subset of patients, it can look for distinguishing characteristics of those patients, such as certain genes.That paves the way for a precision-medicine approach, meaning patients can be tested before they take the drug to determine whether it is likely to be effective for them.

The most exciting drug to come out of Berg’s work—and perhaps the most exciting to emerge from any drug-discovery-related AI effort to date—is a cancer drug called BPM31510. It recently completed a phase 2 trial for patients with advanced pancreatic cancer, which is extremely aggressive and difficult to treat. Phase 1 trials often do not indicate much about a drug’s potential except whether it is dangerously toxic at a given dose, but BPM31510’s phase 1 trial against other cancers provided some verification of the ability of Berg’s software to predict the roughly 20 percent of patients who were likely to respond to it, as well as those who were more likely to experience adverse reactions.

Additionally, tissue-sample analysis from the trial led Berg’s software to predict, counterintuitively, that the drug would work best against more aggressive cancers because it attacks mechanisms that play a larger role in those cancers. Should the drug gain approval, Berg might do a postmarket analysis of perhaps one out of 100 patients taking it, “so that we can keep improving how it’s used,” Narain says.

Berg is partnering with pharma giant AstraZeneca to seek targets for Parkinson’s and other neurological diseases and with Sanofi Pasteur to pursue improved flu vaccines. It is also working with the U.S. Department of Veterans Affairs and the Cleveland Clinic on targets for prostate cancer. The software has already identified mechanisms for diagnostic tests that could differentiate prostate cancer from benignly enlarged prostates, which currently is often difficult to do without surgery.

GETTING BEYOND THE HYPE

BIG PHARMA’S INTEREST in injecting these kinds of AI efforts into drug discovery can be gauged by the fact that at least 20 separate partnerships have been reported between the major companies and AI-drug-discovery tech companies. Pfizer, GlaxoSmithKline and Novartis are among the pharma companies said to have also built substantial AI expertise in-house, and it is likely that others are in the process of doing the same.

Although research executives at these companies have expressed enthusiasm for some of the early results, they are quick to admit that AI is no sure thing for the bottom line given how few new AI-aided candidates have made it to the animal-testing stage of drug development, let alone to human trials. The jury is out on whether AI will successfully make drug discovery more efficient, says Sara Kenkare-Mitra, senior vice president of development sciences at Roche subsidiary Genentech, and even if it does, “we can’t yet say whether it will be an incremental improvement or an exponential leap.” If many of the drugs that result from AI efforts make it well into human testing, this question will still not be answered fully unless the drugs progress all the way through to FDA approval.

Bristol-Myers Squibb’s Saha suggests that AI-aided drugs’ rate of entry into the market is likely to remain low for some time. That rate could pick up dramatically, however, if the processes for testing and approval were streamlined to take into account the ability of machine- and deep-learning systems to more accurately predict which drugs are highly likely to be safe and effective and which patients they are best suited for. “When regulatory agencies see the same value we see in AI, the floodgates could open,” he says. “In some cases, we might be allowed to pass over animal models and go straight to human testing once we show these drugs can hit their targets with no toxicity.” But those changes are probably many years away, he admits. He adds that it is wrong to imply that AI replaces scientists and conventional research—whereas AI supports and amplifies human efforts, it still depends on humans to generate novel biological insights, set research directions and priorities, guide and validate results, and produce needed data.

The breathless hype around AI-based drug discovery might actually be damaging, Berg’s Narain says, because overpromising could lead to disappointment and backlash. “These are early days, and we need to be sober about the fact that these are tools that can help—they’re not solutions yet,” he says. Cyclica’s Kurji points the finger at AI companies that make what he says are exaggerated marketing claims, such as having reduced the many years and billions of dollars it takes to develop a drug to a few weeks and a few hundred thousand dollars. “It’s simply not true,” he says. “And it’s irresponsible and destructive to say so.”

But if hype hurts, Kurji insists he also knows what will give the AI-drug-discovery industry a big boost: more high-quality information to feed the various programs. “We rely on three things: data, data and more data,” he says. That sentiment is echoed by Enoch Huang, vice president of medicinal sciences at Pfizer, who says that having the right algorithm isn’t the most important factor.

The need to feed AI software with large volumes of relevant data is actually starting to change science, as researchers run more experiments specifically with the production of AI-relevant data in mind. Genentech’s Kenkare-Mitra notes that this has already happened in immunotherapy drug research. “There aren’t always enough data from the clinic to use with machine learning,” she says. “But we can [often] generate that data in vitro and feed them to the system.”

That kind of approach could lead to a virtuous cycle in drug discovery in which AI helps elucidate areas where researchers need to look for targets and drugs. Moreover, the resulting research provides larger, more relevant data sets that allow the software to point to even more fertile research avenues. “It’s not so much AI we believe in,” Kenkare-Mitra says, “as a human-AI partnership.”

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can't-miss newsletters, must-watch videos, challenging games, and the science world's best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Thank you,

David M. Ewalt, Editor in Chief, Scientific American