There is an enormous chemical universe of small-molecule compounds—an estimated 10 to the 33rd power (or a billion trillion trillion) molecules—that can be manufactured using organic chemistry. Trillions of these molecules could lead to therapeutic advances. But due to the size of this chemical universe, brute-force exploration of its contents is doomed to failure because synthesis is time consuming and resource intensive. Another strategy is needed.
The goal of modern drug discovery is to develop drugs that are safely administered with minimal side effects while strongly binding to disease-associated proteins in our bodies. And yet, despite a growing list of validated target proteins in the wake of the Human Genome Project, the rate of new FDA approvals for novel small-molecule drugs has stagnated over the past 15 years.
While multiple factors are at play in this stagnation, one stands out: lack of exploration of the chemical universe. This limits the size and diversity of pharmaceutical companies' compound collections so strongly that the number of chemically distinct, synthesized testable compounds is estimated to be less than ten million molecules—a tiny portion of the chemical universe.
Drug discovery suffers from a bottleneck. Currently, to test a compound in the lab, it must first be synthesized and then tested physically. The only way to tap the uncharted chemical universe is to use methods that bypass initial physical synthesis when evaluating a compound.
Enter computation. The central idea of modern computational drug discovery is the accurate prediction of the binding of drug-like molecules to a disease-causing target protein. Computation can then peer deep into the chemical universe without the need for prior synthesis. Binding a small-molecule drug to a protein, however, is a fiendishly complicated modeling problem, involving complex interactions of the many atoms of the protein and a drug-like small molecule surrounded by water. Without a solution to this problem there is no way to find the undiscovered compounds that could lead to future ground-breaking treatments.
In computational drug discovery, AI-based methods have a data problem; there’s not enough. Existing experimental data is not remotely up to the task of training AI, because it can come only from the small number of already-synthesized compounds, which are not representative of the many unknown compounds that lie outside our current local chemical knowledgebase. The result is that AI tends to predict drug molecules that are similar to ones that have already been explored, preventing novel discoveries.
Meanwhile, physics-based molecular modeling faces a complexity problem. A full quantum-level description of protein-small molecule binding is practically impossible, because the complexity grows exponentially as the number of atoms increases. Approximations can make physics-based molecular modeling more computationally efficient, but at the cost of the reduced accuracy that has bedeviled the drug industry for years.
Fortunately, recent improvements in computational power and innovative breakthroughs in molecular modeling show promise. Still, advancements in both physics-based molecular modeling and AI-based methods are sorely needed for new breakthroughs. There is growing consensus that a combination of the two will prove to be the solution, with the strengths of one making up for the weaknesses of the other.
It is imperative that a computational solution be found so that we can develop the small-molecule treatments of tomorrow. Otherwise, we will forever be confined to the small portion of chemicals in the neighborhood of what we know now, unable to explore the rest of the vast chemical universe that teems with possibilities.
Mr. Kita is CSO of Verseon.