6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Pushing the limits of solubility prediction via quality-oriented data selection

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Summary

          Accurate prediction of the solubility of chemical substances in solvents remains a challenge. The sparsity of high-quality solubility data is recognized as the biggest hurdle in the development of robust data-driven methods for practical use. Nonetheless, the effects of the quality and quantity of data on aqueous solubility predictions have not yet been scrutinized. In this study, the roles of the size and the quality of data sets on the performances of the solubility prediction models are unraveled, and the concepts of actual and observed performances are introduced. In an effort to curtail the gap between actual and observed performances, a quality-oriented data selection method, which evaluates the quality of data and extracts the most accurate part of it through statistical validation, is designed. Applying this method on the largest publicly available solubility database and using a consensus machine learning approach, a top-performing solubility prediction model is achieved.

          Graphical Abstract

          Highlights

          • Consensus machine learning models perform better than singular models

          • Quality-oriented data selection yields better results than using all data

          • The uncertainty of test data determines the theoretical limit of a model's performance

          • The concepts of actual and observed performances of solubility models are introduced

          Abstract

          Chemistry; Analytical Reagents; Computational Chemistry; Artificial Intelligence

          Related collections

          Most cited references43

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules

          To be effective as a drug, a potent molecule must reach its target in the body in sufficient concentration, and stay there in a bioactive form long enough for the expected biologic events to occur. Drug development involves assessment of absorption, distribution, metabolism and excretion (ADME) increasingly earlier in the discovery process, at a stage when considered compounds are numerous but access to the physical samples is limited. In that context, computer models constitute valid alternatives to experiments. Here, we present the new SwissADME web tool that gives free access to a pool of fast yet robust predictive models for physicochemical properties, pharmacokinetics, drug-likeness and medicinal chemistry friendliness, among which in-house proficient methods such as the BOILED-Egg, iLOGP and Bioavailability Radar. Easy efficient input and interpretation are ensured thanks to a user-friendly interface through the login-free website http://www.swissadme.ch. Specialists, but also nonexpert in cheminformatics or computational chemistry can predict rapidly key parameters for a collection of molecules to support their drug discovery endeavours.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints.

            PaDEL-Descriptor is a software for calculating molecular descriptors and fingerprints. The software currently calculates 797 descriptors (663 1D, 2D descriptors, and 134 3D descriptors) and 10 types of fingerprints. These descriptors and fingerprints are calculated mainly using The Chemistry Development Kit. Some additional descriptors and fingerprints were added, which include atom type electrotopological state descriptors, McGowan volume, molecular linear free energy relation descriptors, ring counts, count of chemical substructures identified by Laggner, and binary fingerprints and count of chemical substructures identified by Klekota and Roth. PaDEL-Descriptor was developed using the Java language and consists of a library component and an interface component. The library component allows it to be easily integrated into quantitative structure activity relationship software to provide the descriptor calculation feature while the interface component allows it to be used as a standalone software. The software uses a Master/Worker pattern to take advantage of the multiple CPU cores that are present in most modern computers to speed up calculations of molecular descriptors. The software has several advantages over existing standalone molecular descriptor calculation software. It is free and open source, has both graphical user interface and command line interfaces, can work on all major platforms (Windows, Linux, MacOS), supports more than 90 different molecular file formats, and is multithreaded. PaDEL-Descriptor is a useful addition to the currently available molecular descriptor calculation software. The software can be downloaded at http://padel.nus.edu.sg/software/padeldescriptor. Copyright © 2010 Wiley Periodicals, Inc.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Visualizing data using t‐SNE

                Bookmark

                Author and article information

                Contributors
                Journal
                iScience
                iScience
                iScience
                Elsevier
                2589-0042
                17 December 2020
                22 January 2021
                17 December 2020
                : 24
                : 1
                : 101961
                Affiliations
                [1 ]DIFFER - Dutch Institute for Fundamental Energy Research, De Zaale 20, 5612 AJ Eindhoven, the Netherlands
                [2 ]CCER - Center for Computational Energy Research, De Zaale 20, 5612 AJ Eindhoven, the Netherlands
                [3 ]Department of Applied Physics, Eindhoven University of Technology, 5600 MB Eindhoven, the Netherlands
                Author notes
                []Corresponding author s.er@ 123456differ.nl
                [4]

                Lead contact

                Article
                S2589-0042(20)31158-5 101961
                10.1016/j.isci.2020.101961
                7788089
                33437941
                3327adf2-c32d-4dc9-bd3c-af0d10780ca1
                © 2020 The Authors

                This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

                History
                : 16 October 2020
                : 18 November 2020
                : 15 December 2020
                Categories
                Article

                chemistry,analytical reagents,computational chemistry,artificial intelligence

                Comments

                Comment on this article