37
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      GEOM, energy-annotated molecular conformations for property prediction and molecular generation

      data-paper
      1 , 2 , 2 ,
      Scientific Data
      Nature Publishing Group UK
      Computational chemistry, Quantum chemistry

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Machine learning (ML) outperforms traditional approaches in many molecular design tasks. ML models usually predict molecular properties from a 2D chemical graph or a single 3D structure, but neither of these representations accounts for the ensemble of 3D conformers that are accessible to a molecule. Property prediction could be improved by using conformer ensembles as input, but there is no large-scale dataset that contains graphs annotated with accurate conformers and experimental data. Here we use advanced sampling and semi-empirical density functional theory (DFT) to generate 37 million molecular conformations for over 450,000 molecules. The Geometric Ensemble Of Molecules (GEOM) dataset contains conformers for 133,000 species from QM9, and 317,000 species with experimental data related to biophysics, physiology, and physical chemistry. Ensembles of 1,511 species with BACE-1 inhibition data are also labeled with high-quality DFT free energies in an implicit water solvent, and 534 ensembles are further optimized with DFT. GEOM will assist in the development of models that predict properties from conformer ensembles, and generative models that sample 3D conformations.

          Abstract

          Measurement(s) Conformer geometries and properties
          Technology Type(s) Computational Chemistry

          Related collections

          Most cited references82

          • Record: found
          • Abstract: not found
          • Article: not found

          Random Forests

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Development and testing of a general amber force field.

            We describe here a general Amber force field (GAFF) for organic molecules. GAFF is designed to be compatible with existing Amber force fields for proteins and nucleic acids, and has parameters for most organic and pharmaceutical molecules that are composed of H, C, N, O, S, P, and halogens. It uses a simple functional form and a limited number of atom types, but incorporates both empirical and heuristic models to estimate force constants and partial atomic charges. The performance of GAFF in test cases is encouraging. In test I, 74 crystallographic structures were compared to GAFF minimized structures, with a root-mean-square displacement of 0.26 A, which is comparable to that of the Tripos 5.2 force field (0.25 A) and better than those of MMFF 94 and CHARMm (0.47 and 0.44 A, respectively). In test II, gas phase minimizations were performed on 22 nucleic acid base pairs, and the minimized structures and intermolecular energies were compared to MP2/6-31G* results. The RMS of displacements and relative energies were 0.25 A and 1.2 kcal/mol, respectively. These data are comparable to results from Parm99/RESP (0.16 A and 1.18 kcal/mol, respectively), which were parameterized to these base pairs. Test III looked at the relative energies of 71 conformational pairs that were used in development of the Parm99 force field. The RMS error in relative energies (compared to experiment) is about 0.5 kcal/mol. GAFF can be applied to wide range of molecules in an automatic fashion, making it suitable for rational drug design and database searching. Copyright 2004 Wiley Periodicals, Inc.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              The ORCA program system

                Bookmark

                Author and article information

                Contributors
                rafagb@mit.edu
                Journal
                Sci Data
                Sci Data
                Scientific Data
                Nature Publishing Group UK (London )
                2052-4463
                21 April 2022
                21 April 2022
                2022
                : 9
                : 185
                Affiliations
                [1 ]GRID grid.38142.3c, ISNI 000000041936754X, Harvard University, Department of Chemistry and Chemical Biology, ; Cambridge, MA 02138 USA
                [2 ]GRID grid.116068.8, ISNI 0000 0001 2341 2786, Massachusetts Institute of Technology, Department of Materials Science and Engineering, ; Cambridge, MA 02139 USA
                Author information
                http://orcid.org/0000-0002-9495-8599
                Article
                1288
                10.1038/s41597-022-01288-4
                9023519
                35449137
                bc3098bc-ae32-42ca-93b0-d8b67d0d25bc
                © The Author(s) 2022

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 12 February 2021
                : 4 March 2022
                Funding
                Funded by: XSEDE COVID-19 HPC Consortium, project CHE200039
                Categories
                Data Descriptor
                Custom metadata
                © The Author(s) 2022

                computational chemistry,quantum chemistry
                computational chemistry, quantum chemistry

                Comments

                Comment on this article