5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      QBMG: quasi-biogenic molecule generator with deep recurrent neural network

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Biogenic compounds are important materials for drug discovery and chemical biology. In this work, we report a quasi-biogenic molecule generator (QBMG) to compose virtual quasi-biogenic compound libraries by means of gated recurrent unit recurrent neural networks. The library includes stereo-chemical properties, which are crucial features of natural products. QMBG can reproduce the property distribution of the underlying training set, while being able to generate realistic, novel molecules outside of the training set. Furthermore, these compounds are associated with known bioactivities. A focused compound library based on a given chemotype/scaffold can also be generated by this approach combining transfer learning technology. This approach can be used to generate virtual compound libraries for pharmaceutical lead identification and optimization.

          Electronic supplementary material

          The online version of this article (10.1186/s13321-019-0328-9) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references25

          • Record: found
          • Abstract: found
          • Article: not found

          Counting on natural products for drug design.

          Natural products and their molecular frameworks have a long tradition as valuable starting points for medicinal chemistry and drug discovery. Recently, there has been a revitalization of interest in the inclusion of these chemotypes in compound collections for screening and achieving selective target modulation. Here we discuss natural-product-inspired drug discovery with a focus on recent advances in the design of synthetically tractable small molecules that mimic nature's chemistry. We highlight the potential of innovative computational tools in processing structurally complex natural products to predict their macromolecular targets and attempt to forecast the role that natural-product-derived fragments and fragment-like natural products will play in next-generation drug discovery.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Molecular de-novo design through deep reinforcement learning

            This work introduces a method to tune a sequence-based generative model for molecular de novo design that through augmented episodic likelihood can learn to generate structures with certain specified desirable properties. We demonstrate how this model can execute a range of tasks such as generating analogues to a query structure and generating compounds predicted to be active against a biological target. As a proof of principle, the model is first trained to generate molecules that do not contain sulphur. As a second example, the model is trained to generate analogues to the drug Celecoxib, a technique that could be used for scaffold hopping or library expansion starting from a single molecule. Finally, when tuning the model towards generating compounds predicted to be active against the dopamine receptor type 2, the model generates structures of which more than 95% are predicted to be active, including experimentally confirmed actives that have not been included in either the generative model nor the activity prediction model. Graphical abstract . Electronic supplementary material The online version of this article (doi:10.1186/s13321-017-0235-x) contains supplementary material, which is available to authorized users.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition

              Inspired by natural language processing techniques, we here introduce Mol2vec, which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Like the Word2vec models, where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that point in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing the vectors of the individual substructures and, for instance, be fed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pretrained once, yields dense vector representations, and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as a reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment-independent and thus can also be easily used for proteins with low sequence similarities.
                Bookmark

                Author and article information

                Contributors
                yanxin_0736@hotmail.com
                junxu@biochemomes.com
                Journal
                J Cheminform
                J Cheminform
                Journal of Cheminformatics
                Springer International Publishing (Cham )
                1758-2946
                17 January 2019
                17 January 2019
                2019
                : 11
                : 5
                Affiliations
                [1 ]ISNI 0000 0001 2360 039X, GRID grid.12981.33, Research Center for Drug Discovery, School of Pharmaceutical Sciences, , Sun Yat-Sen University, ; 132 East Circle at University City, Guangzhou, 510006 China
                [2 ]ISNI 0000 0001 2375 7370, GRID grid.500400.1, School of Computer Science and Technology, , Wuyi University, ; 99 Yingbin Road, Jiangmen, 529020 China
                [3 ]ISNI 0000 0001 2360 039X, GRID grid.12981.33, National Supercomputer Center in Guangzhou and School of Data and Computer Science, , Sun Yat-Sen University, ; 132 East Circle at University City, Guangzhou, 510006 China
                Author information
                http://orcid.org/0000-0002-1075-0337
                Article
                328
                10.1186/s13321-019-0328-9
                6689867
                30656426
                42de6975-e903-4fb8-a05b-a40c29802189
                © The Author(s) 2019

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 2 July 2018
                : 9 January 2019
                Funding
                Funded by: the national science & technology major project of the ministry of science and technology of China
                Award ID: 2018ZX09735010
                Award Recipient :
                Funded by: GD Frontier & Key Techn. Innovation Program
                Award ID: 2015B010109004
                Award Recipient :
                Funded by: GD-NSF
                Award ID: 2016A030310228
                Award Recipient :
                Funded by: National Science Foundation of China
                Award ID: U1611261
                Award ID: 81473138
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/501100001809, National Natural Science Foundation of China;
                Award ID: U1611261
                Award Recipient :
                Funded by: Guangdong Introducing Innovative and Enterpreneurial Teams
                Award ID: 2016ZT06D211
                Award Recipient :
                Categories
                Research Article
                Custom metadata
                © The Author(s) 2019

                Chemoinformatics
                deep learning,recurrent neural networks,natural product,virtual library
                Chemoinformatics
                deep learning, recurrent neural networks, natural product, virtual library

                Comments

                Comment on this article