2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      67 million natural product-like compound database generated via molecular language processing

      data-paper

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Natural products are a rich resource of bioactive compounds for valuable applications across multiple fields such as food, agriculture, and medicine. For natural product discovery, high throughput in silico screening offers a cost-effective alternative to traditional resource-heavy assay-guided exploration of structurally novel chemical space. In this data descriptor, we report a characterized database of 67,064,204 natural product-like molecules generated using a recurrent neural network trained on known natural products, demonstrating a significant 165-fold expansion in library size over the approximately 400,000 known natural products. This study highlights the potential of using deep generative models to explore novel natural product chemical space for high throughput in silico discovery.

          Related collections

          Most cited references50

          • Record: found
          • Abstract: not found
          • Article: not found

          SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Extended-connectivity fingerprints.

            Extended-connectivity fingerprints (ECFPs) are a novel class of topological fingerprints for molecular characterization. Historically, topological fingerprints were developed for substructure and similarity searching. ECFPs were developed specifically for structure-activity modeling. ECFPs are circular fingerprints with a number of useful qualities: they can be very rapidly calculated; they are not predefined and can represent an essentially infinite number of different molecular features (including stereochemical information); their features represent the presence of particular substructures, allowing easier interpretation of analysis results; and the ECFP algorithm can be tailored to generate different types of circular fingerprints, optimized for different uses. While the use of ECFPs has been widely adopted and validated, a description of their implementation has not previously been presented in the literature.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Scikit-learn : machine learning in Python

                Bookmark

                Author and article information

                Contributors
                dillon_tay@isce2.a-star.edu.sg
                ang_shi_jun@ihpc.a-star.edu.sg
                Journal
                Sci Data
                Sci Data
                Scientific Data
                Nature Publishing Group UK (London )
                2052-4463
                19 May 2023
                19 May 2023
                2023
                : 10
                : 296
                Affiliations
                [1 ]GRID grid.185448.4, ISNI 0000 0004 0637 0221, Institute of Sustainability for Chemicals, Energy and Environment (ISCE2), Agency for Science, Technology and Research (A*STAR), ; 8 Biomedical Grove, #07-01 Neuros Building, Singapore, 138665 Republic of Singapore
                [2 ]GRID grid.512261.3, ISNI 0000 0004 0637 0440, Hwa Chong Institution, ; 661 Bukit Timah Road, Singapore, 269734 Republic of Singapore
                [3 ]National Junior College, 37 Hillcrest Road, Singapore, 288913 Republic of Singapore
                [4 ]GRID grid.4280.e, ISNI 0000 0001 2180 6431, Synthetic Biology Translational Research Program, Yong Loo Lin School of Medicine, , National University of Singapore, ; 10 Medical Drive, Singapore, 117597 Republic of Singapore
                [5 ]GRID grid.418742.c, ISNI 0000 0004 0470 8006, Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), ; 1 Fusionopolis Way, #16-16 Connexis, Singapore, 138632 Republic of Singapore
                Author information
                http://orcid.org/0000-0002-4831-5525
                http://orcid.org/0000-0002-3962-2080
                Article
                2207
                10.1038/s41597-023-02207-x
                10199072
                37208372
                187edb13-b27f-4acc-b33c-4f428bf8f2b2
                © The Author(s) 2023

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 27 March 2023
                : 21 April 2023
                Funding
                Funded by: FundRef https://doi.org/10.13039/501100001348, Agency for Science, Technology and Research (A*STAR);
                Award ID: #21719
                Award ID: #21719
                Award ID: #21719
                Award ID: #21719
                Award ID: #21719
                Award Recipient :
                Categories
                Data Descriptor
                Custom metadata
                © Springer Nature Limited 2023

                cheminformatics,combinatorial libraries,sustainability

                Comments

                Comment on this article