Despite all the medical advances we have achieved over the last hundred years, there are still too many diseases we can’t treat or treat with a long list of undesirable side effects. To state the obvious, humanity needs to find ways to accelerate the development of new medicines. And within the field of pharmaceutical medicines, small molecule drugs have been the mainstay of modern pharmacopeia for good reason. These low-molecular weight organic compounds can typically be administered orally and can enter cells to interrupt disease processes that involve intracellular targets. But systematic design and development of novel small molecules drugs have proved to be dauntingly difficult.
High throughput screening (HTS), which is a collection of brute-force trial-and-error experiments on a pool of pre-synthesized compounds, continues to be the foundation of current small molecule drug-discovery—and fundamentally limits the speed of discovery and our ability to explore completely new drug-like compounds that the world needs. The total number of distinct chemotypes in the world’s HTS libraries is less than 10 million. The time and resources required to synthesize novel chemotypes means that there exists a very practical synthesis bottleneck limiting any hope of dramatically increasing the number of distinct molecules across HTS libraries in a reasonable timeframe. By comparison, the total number of distinct, synthesizable, drug-like compounds is more than 1033. 1 Many, if not most, drugs of the future, with highly desirable therapeutic profiles, lie in that uncharted chemical ocean of possibilities. Current drug discovery methods are not even exploring a tide-pool by the side of the ocean; they’re perhaps exploring a droplet!
Now many players in the pharmaceutical industry hope to feed currently available experimental data to AI to solve all problems of small molecule drug discovery. Unfortunately, using AI in this manner does nothing to help find completely novel drugs for all the diseases we cannot adequately treat today. To understand why, requires us to step back and analyze the problem just a bit more in depth.
The success of machine learning depends heavily on the availability and quality of large and dense data sets for training. In general, the more training data there is available, the better most AI models perform. However, the problem with small molecule drug discovery is that there are many, many more orders of magnitude of potential small molecule drugs that can bind to disease-causing proteins than are represented in available data. And AI can only predict things that are highly similar to what is in its training set. It cannot predict a novel drug molecule from the uncharted chemical ocean if it has not been trained on anything similar. This limitation is evident in any of the molecules that have been advanced to date by companies using so-called pure AI driven methods. Human chemists often create greater variations on existing drug structures than these AI models can. However, the AI models can be used to generate large numbers of molecular structures that are similar to known molecules and help speed up the drug optimization work done by medicinal chemists. But finding novel drugs in the uncharted ocean remains outside the reach of any AI model trained on existing data.
What if we could computationally design novel synthesizable drug-like molecules and assess their binding to a target protein using molecular physics modeling?
This approach would simulate the actual physical experiment without relying on AI or training data. And this would eliminate the need for having to first synthesize the novel molecule and then perform a laboratory screening assay. But this is easier said than done. If a process like this really worked, the steps could be replicated billions of times to explore novel molecules in the uncharted chemical ocean. This is certainly not a new idea. People have attempted so-called virtual library screening for the last couple of decades. There are two problems: the virtual libraries aren’t very novel and usually mimic existing HTS libraries; and the molecular physics modeling is so woefully inaccurate that it cannot reliably replace the need for actual laboratory screening. Simple-minded docking, scoring, and molecular dynamic simulations using current tools have fallen far short of delivering on the promise of this approach. This manner of computer aided drug design has been hamstrung by the inability of current modeling tools to overcome the complexity of the problem.
Based on two decades of work, my colleagues and I have seen that this complexity challenge can actually be overcome. But it requires fundamental advances in modeling the interactions of a fully flexible protein and drug in water. Properly assessing the binding free energy of such interaction is fiendishly complicated. Current commercially available or open-source tools generally being used across industry and academia are entirely inadequate for the task. Exploring novel chemical space also needs breakthroughs in computational design of synthetically tractable drug-like molecules. The required advances in multiple distinct areas of science are highly non-trivial, but they open the door to exploring the uncharted chemical ocean. When promising new molecules are identified via physics modeling, they can be synthesized and put through extensive biological characterization. AI trained on data from these novel molecules can then be used for further optimization of drug candidates. Such synergistic integration of physics advances, and purpose-built AI tools working on new data, can deliver multiple chemically distinct candidates with uniquely desirable therapeutic profiles for every program. Finally, these candidates can be advanced through AI-driven adaptive trials that take personalization of medicines to a new level.
Given the recent interest in potential applicability of AI in almost every sphere of human activity, some additional words on the utility of AI in small molecule drug discovery would be warranted. Much has been made recently of the use of AI in protein target identification. AI can be helpful in building so-called knowledge graphs and generating potential hypotheses for new targets in a disease pathway. Such target hypotheses must then be rigorously validated in follow-up biological experiments. Trusting the hypotheses, or perhaps the hype, without thorough in vitro and in vivo validation is a formula for eventual disappointment in a drug program. While such use of AI in generating target hypotheses has utility and should be used as an aid in ongoing disease pathway analysis, we currently live in a world where there is an enormous number of validated targets for which we can’t find any reasonable drug candidates. Use of AI notwithstanding, biologists across the world have worked assiduously to find and validate disease targets, starting from the early days of the genomics revolution. For most of those targets, we have no viable drugs. This brings us to the pressing need for a systematic process of exploring new chemical space for interesting drug candidates. Once a target is chosen, the physics, computational chemistry and AI driven process described above can do just that.
The public discourse on the use of AI in medicine also conflates many other disparate applications of AI and leads to much confusion. Recent advancement in protein structure prediction is often cited as a major revolution in drug discovery. For starters, protein structure prediction is entirely different from the problem of discovering small molecule drugs. Evolutionary rules and homology models are critical to the improvements in protein structure prediction. No such aid exists for the prediction of protein drug interactions. Further, most current drug programs start with readily available target protein structures from the Worldwide Protein Data Bank. Even in the case of many GPCR targets, high quality structures built using homology and NMR data have been available for quite some time now. The recent improvements in protein structure prediction help to fill out the small percentage of cases where good structures were not available yet. The real problem of drug discovery starts after a target has been chosen.
Some recent discussions lump together the discovery of biologics and small molecule drugs. Here again the two problems are entirely different. AI can have much greater utility in the optimization of biologics on account of the same evolutionary rules.
The start of a small molecule drug discovery campaign must either rely on HTS and associated data, or highly accurate and extremely computationally intensive physics simulations. When certain types of AI applications are layered within such simulations, they can help reduce the computing burden of characterizing quantum behavior of such systems. But here is a very different type of application of AI, one which falls well outside of how most companies are trying to use AI in medicine today.
Moving forward through the steps of small molecule drug development, given the availability of relevant data, as described earlier, AI can be used for analyzing structure activity relationships and optimizing drug candidates. Once more, the details matter. In most such situations in small molecule drug optimization, we operate in a realm of “small” data, not big data. Applicable descriptors such as small or big depend on context. Given the number of parameters involved in a typical drug optimization problem, the amount of available data is often small and sparse. The traditional tools of big-data driven AI and deep learning do not work well in this regime. While most current practitioners in the field simply try to utilize existing software libraries, what are needed are AI tools purpose built for such small data scenarios. Building such tools from scratch is important, but falls far beyond the capabilities of most teams working in this arena.
Perhaps the takeaway message is that one cannot gloss over the details. Any one tool is just a tool among many, and not a panacea. Small molecule drug discovery is a complex problem that requires significant advancements across diverse disciplines with many different resulting tools that must come together and work in concert.
Thoughtful integration of advances in physics, AI and many other areas of science and technology can unlock the potential of the uncharted chemical ocean and allow the discovery of small molecule treatments of tomorrow. And we’ve seen the benefits of exploring entirely novel chemical entities: our new drug candidates display unique therapeutic profiles and promise to change the standard of care for the diseases they address. A steady stream of such drugs is needed to give humanity the healthier future it deserves. We are now entering an exciting new era in drug discovery and development with immense implications for human health.
- Journal of Chemoinformatics. CReM: chemically reasonable mutations framework for structure generation. Available at: https://jcheminf.biomedcentral.com/articles/10.1186/ s13321-020-00431-w. Accessed May 2, 2023.
Adityo Prakash started Verseon to change the way the world finds new medicines. He enjoys building fundamental science-based solutions to major business problems that impact society. Since the company’s inception, Adityo has guided the development of Verseon’s drug discovery platform, novel drug pipeline, and overall business strategy. Previously, he was the CEO of Pulsent Corporation and is the primary inventor of technology at the heart of all video streaming today. He is an inventor on 40 patent families and received his BS in Physics and Mathematics from Caltech.