Molecular de novo Design Through Deep Generative Models
Main Authors: | Engkvist, Ola, Arús-Pous, Josep, Bjerrum, Esben Jannik, Chen, Hongming |
---|---|
Format: | Book publication-section Journal |
Terbitan: |
, 2020
|
Subjects: | |
Online Access: |
https://zenodo.org/record/3628194 |
Daftar Isi:
- Machine learning (ML) and Artificial Intelligence (AI) have had a renaissance during the last few years and have become a hot topic not only in drug discovery but in the whole society. There are many reasons for the comeback: access to larger volume of data through automation, faster computers (i.e. GPUs) and methodological progress within deep learning. Drug discovery has also benefited from these trends and, as shown in this book, ML and AI are becoming much more prominent.1 Besides impacting areas that has been using ML for many years such as QSAR modelling, completely new areas have opened up with deep learning (DL). One is synthesis prediction, where rule-based methods have been replaced by ML methods.2,3 But most importantly, an area that has been transformed by DL is molecular de novo generation, which will be discussed in this chapter. The goal with deep-learning-based de novo molecular design is to be able to sample the whole chemical space. Estimation on the size of the chemical space vary wildly, but the most common estimate is that it consists of 1060 molecules.4 Irrespectively how large the chemical space is everyone agrees that it is too large to be explicitly enumerated. Historically, molecular de novo design has been done in several ways.5 Most commonly, when structure or ligand-based constraints are given, molecules can be generated in silico to fulfil them. This can be done using brute-force: fully enumerating a virtual library and scoring each of the compounds on how well they fulfil the constraints. The best scoring molecules are then prioritized for synthesis. These virtual libraries are mainly constructed from in-house or commercially available building blocks and reactions that are assumed to be robust. There have also been efforts to search libraries that are non-enumerable with various search techniques such as genetic algorithms. While many successes using these techniques have been reported in the literature, it is also possible to point out drawbacks.6 The main difference with deep learning approaches that will be described in this chapter is the lack of prior knowledge of what a drug-like molecule should look like. This concept is neither present in combinatorial enumeration of libraries nor in genetic algorithm type of approaches. DL-based molecular de novo generation has recently been reviewed extensively.7 The prospective user needs to do several choices on how to generate molecules. A first one is to decide if the generation will be string or graph-based. Another is to decide which architecture to use. A main advantage of DL is that a huge array of architectures can be used. For example, recurrent neural networks (RNN), variational auto-encoders (VAE) or generative adversarial networks (GAN). In this chapter both string-based and graph-based methods are reviewed, and the different DL architectures discussed. With the explosion of articles describing DL-based molecular de novo generation in the last 2-3 years there has been an increased awareness that it is necessary to create benchmarks to measure the diversity and the coverage of the chemical space of generated molecules. That is why a large part of the chapter will focus on discussing the latest developments in benchmarking. It is important that benchmarks both cover explorative aspects, which corresponds to identifying a new chemical series and exploitative aspects which corresponds to optimize a chemical series.