0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Development and Evaluation of Conformal Prediction Methods for Quantitative Structure–Activity Relationship

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The quantitative structure–activity relationship (QSAR) regression model is a commonly used technique for predicting the biological activities of compounds using their molecular descriptors. Besides accurate activity estimation, obtaining a prediction uncertainty metric like a prediction interval is highly desirable. Quantifying prediction uncertainty is an active research area in statistical and machine learning (ML), but the implementation for QSAR remains challenging. However, most ML algorithms with high predictive performance require add-on companions for estimating the uncertainty of their prediction. Conformal prediction (CP) is a promising approach as its main components are agnostic to the prediction modes, and it produces valid prediction intervals under weak assumptions on the data distribution. We proposed computationally efficient CP algorithms tailored to the most widely used ML models, including random forests, deep neural networks, and gradient boosting. The algorithms use a novel approach to the derivation of nonconformity scores from the estimates of prediction uncertainty generated by the ensembles of point predictions. The validity and efficiency of proposed algorithms are demonstrated on a diverse collection of QSAR data sets as well as simulation studies. The provided software implementing our algorithms can be used as stand-alone or easily incorporated into other ML software packages for QSAR modeling.

          Related collections

          Most cited references74

          • Record: found
          • Abstract: not found
          • Article: not found

          Gradient-based learning applied to document recognition

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Extended-connectivity fingerprints.

            Extended-connectivity fingerprints (ECFPs) are a novel class of topological fingerprints for molecular characterization. Historically, topological fingerprints were developed for substructure and similarity searching. ECFPs were developed specifically for structure-activity modeling. ECFPs are circular fingerprints with a number of useful qualities: they can be very rapidly calculated; they are not predefined and can represent an essentially infinite number of different molecular features (including stereochemical information); their features represent the presence of particular substructures, allowing easier interpretation of analysis results; and the ECFP algorithm can be tailored to generate different types of circular fingerprints, optimized for different uses. While the use of ECFPs has been widely adopted and validated, a description of their implementation has not previously been presented in the literature.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Random forest: a classification and regression tool for compound classification and QSAR modeling.

              A new classification and regression tool, Random Forest, is introduced and investigated for predicting a compound's quantitative or categorical biological activity based on a quantitative description of the compound's molecular structure. Random Forest is an ensemble of unpruned classification or regression trees created by using bootstrap samples of the training data and random feature selection in tree induction. Prediction is made by aggregating (majority vote or averaging) the predictions of the ensemble. We built predictive models for six cheminformatics data sets. Our analysis demonstrates that Random Forest is a powerful tool capable of delivering performance that is among the most accurate methods to date. We also present three additional features of Random Forest: built-in performance assessment, a measure of relative importance of descriptors, and a measure of compound similarity that is weighted by the relative importance of descriptors. It is the combination of relatively high prediction accuracy and its collection of desired features that makes Random Forest uniquely suited for modeling in cheminformatics.
                Bookmark

                Author and article information

                Journal
                ACS Omega
                ACS Omega
                ao
                acsodf
                ACS Omega
                American Chemical Society
                2470-1343
                27 June 2024
                09 July 2024
                : 9
                : 27
                : 29478-29490
                Affiliations
                []Early Development Statistics, Merck & Co., Inc. , Rahway, New Jersey 07065, United States
                []Modeling and Informatics, Merck & Co., Inc. , Rahway, New Jersey 07033, United States
                Author notes
                Author information
                https://orcid.org/0000-0003-2091-3854
                Article
                10.1021/acsomega.4c02017
                11238240
                60748485-92a0-47fb-a8ac-39f9e5170d3f
                © 2024 Merck & Co., Inc., Rahway, NJ, USA and its affiliates. Published by American Chemical Society

                Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works ( https://creativecommons.org/licenses/by-nc-nd/4.0/).

                History
                : 29 February 2024
                : 12 June 2024
                : 10 June 2024
                Categories
                Article
                Custom metadata
                ao4c02017
                ao4c02017

                Comments

                Comment on this article