47
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Low Data Drug Discovery with One-Shot Learning

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Recent advances in machine learning have made significant contributions to drug discovery. Deep neural networks in particular have been demonstrated to provide significant boosts in predictive power when inferring the properties and activities of small-molecule compounds (Ma, J. et al. J. Chem. Inf. Model.2015, 55, 263–274[ PubMed]). However, the applicability of these techniques has been limited by the requirement for large amounts of training data. In this work, we demonstrate how one-shot learning can be used to significantly lower the amounts of data required to make meaningful predictions in drug discovery applications. We introduce a new architecture, the iterative refinement long short-term memory, that, when combined with graph convolutional neural networks, significantly improves learning of meaningful distance metrics over small-molecules. We open source all models introduced in this work as part of DeepChem, an open-source framework for deep-learning in drug discovery (Ramsundar, B. deepchem.io. https://github.com/deepchem/deepchem, 2016).

          Abstract

          We demonstrate how one-shot learning can lower the amount of data required to make meaningful predictions in drug discovery. Our architecture, the iterative refinement long short-term memory, permits the learning of meaningful distance metrics on small-molecule space.

          Related collections

          Most cited references13

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          New types of deep neural network learning for speech recognition and related applications: an overview

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data.

            Refined nearest neighbor analysis was recently introduced for the analysis of virtual screening benchmark data sets. It constitutes a technique from the field of spatial statistics and provides a mathematical framework for the nonparametric analysis of mapped point patterns. Here, refined nearest neighbor analysis is used to design benchmark data sets for virtual screening based on PubChem bioactivity data. A workflow is devised that purges data sets of compounds active against pharmaceutically relevant targets from unselective hits. Topological optimization using experimental design strategies monitored by refined nearest neighbor analysis functions is applied to generate corresponding data sets of actives and decoys that are unbiased with regard to analogue bias and artificial enrichment. These data sets provide a tool for Maximum Unbiased Validation (MUV) of virtual screening methods. The data sets and a software package implementing the MUV design workflow are freely available at http://www.pharmchem.tu-bs.de/lehre/baumann/MUV.html.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Molecular Graph Convolutions: Moving Beyond Fingerprints

              Molecular "fingerprints" encoding structural information are the workhorse of cheminformatics and machine learning in drug discovery applications. However, fingerprint representations necessarily emphasize particular aspects of the molecular structure while ignoring others, rather than allowing the model to make data-driven decisions. We describe molecular "graph convolutions", a machine learning architecture for learning from undirected graphs, specifically small molecules. Graph convolutions use a simple encoding of the molecular graph---atoms, bonds, distances, etc.---which allows the model to take greater advantage of information in the graph structure. Although graph convolutions do not outperform all fingerprint-based methods, they (along with other graph-based methods) represent a new paradigm in ligand-based virtual screening with exciting opportunities for future improvement.
                Bookmark

                Author and article information

                Journal
                ACS Cent Sci
                ACS Cent Sci
                oc
                acscii
                ACS Central Science
                American Chemical Society
                2374-7943
                2374-7951
                03 April 2017
                26 April 2017
                : 3
                : 4
                : 283-293
                Affiliations
                []Department of Biological Engineering, Massachusetts Institute of Technology , Cambridge, Massachusetts 02139-4307, United States
                [22] Department of Computer Science and §Department of Chemistry, Stanford University , Stanford, California 94305, United States
                Author notes
                Article
                10.1021/acscentsci.6b00367
                5408335
                28470045
                a280a704-26f3-45cb-8f69-ce81af7447d5
                Copyright © 2017 American Chemical Society

                This is an open access article published under an ACS AuthorChoice License, which permits copying and redistribution of the article or any adaptations for non-commercial purposes.

                History
                : 30 November 2016
                Categories
                Research Article
                Custom metadata
                oc6b00367
                oc-2016-00367d

                Comments

                Comment on this article