11
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Structure–activity relationship modelling is frequently used in the early stage of drug discovery to assess the activity of a compound on one or several targets, and can also be used to assess the interaction of compounds with liability targets. QSAR models have been used for these and related applications over many years, with good success. Conformal prediction is a relatively new QSAR approach that provides information on the certainty of a prediction, and so helps in decision-making. However, it is not always clear how best to make use of this additional information. In this article, we describe a case study that directly compares conformal prediction with traditional QSAR methods for large-scale predictions of target-ligand binding. The ChEMBL database was used to extract a data set comprising data from 550 human protein targets with different bioactivity profiles. For each target, a QSAR model and a conformal predictor were trained and their results compared. The models were then evaluated on new data published since the original models were built to simulate a “real world” application. The comparative study highlights the similarities between the two techniques but also some differences that it is important to bear in mind when the methods are used in practical drug discovery applications.

          Electronic supplementary material

          The online version of this article (10.1186/s13321-018-0325-4) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references43

          • Record: found
          • Abstract: found
          • Article: not found

          Extended-connectivity fingerprints.

          Extended-connectivity fingerprints (ECFPs) are a novel class of topological fingerprints for molecular characterization. Historically, topological fingerprints were developed for substructure and similarity searching. ECFPs were developed specifically for structure-activity modeling. ECFPs are circular fingerprints with a number of useful qualities: they can be very rapidly calculated; they are not predefined and can represent an essentially infinite number of different molecular features (including stereochemical information); their features represent the presence of particular substructures, allowing easier interpretation of analysis results; and the ECFP algorithm can be tailored to generate different types of circular fingerprints, optimized for different uses. While the use of ECFPs has been widely adopted and validated, a description of their implementation has not previously been presented in the literature.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            ChEMBL: towards direct deposition of bioassay data

            Abstract ChEMBL is a large, open-access bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012, 2014 and 2017 Nucleic Acids Research Database Issues. In the last two years, several important improvements have been made to the database and are described here. These include more robust capture and representation of assay details; a new data deposition system, allowing updating of data sets and deposition of supplementary data; and a completely redesigned web interface, with enhanced search and filtering capabilities.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              The ChEMBL database in 2017

              ChEMBL is an open large-scale bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012 and 2014 Nucleic Acids Research Database Issues. Since then, alongside the continued extraction of data from the medicinal chemistry literature, new sources of bioactivity data have also been added to the database. These include: deposited data sets from neglected disease screening; crop protection data; drug metabolism and disposition data and bioactivity data from patents. A number of improvements and new features have also been incorporated. These include the annotation of assays and targets using ontologies, the inclusion of targets and indications for clinical candidates, addition of metabolic pathways for drugs and calculation of structural alerts. The ChEMBL data can be accessed via a web-interface, RDF distribution, data downloads and RESTful web-services.
                Bookmark

                Author and article information

                Contributors
                nbosc@ebi.ac.uk
                Journal
                J Cheminform
                J Cheminform
                Journal of Cheminformatics
                Springer International Publishing (Cham )
                1758-2946
                10 January 2019
                10 January 2019
                2019
                : 11
                : 4
                Affiliations
                ISNI 0000 0000 9709 7726, GRID grid.225360.0, Chemogenomics Team, , European Bioinformatics Institute (EMBL-EBI), ; Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD UK
                Author information
                http://orcid.org/0000-0003-3562-1328
                Article
                325
                10.1186/s13321-018-0325-4
                6690068
                30631996
                dbe1b7c8-7156-4693-8b2c-2d96ee726886
                © The Author(s) 2019

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 14 September 2018
                : 24 December 2018
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100011272, FP7 Health;
                Award ID: 602156
                Funded by: H2020 Research and Innovation
                Award ID: 654248
                Funded by: FundRef http://dx.doi.org/10.13039/100004440, Wellcome Trust;
                Award ID: WT104104/Z/14/Z
                Funded by: FundRef http://dx.doi.org/10.13039/100013060, European Molecular Biology Laboratory;
                Categories
                Research Article
                Custom metadata
                © The Author(s) 2019

                Chemoinformatics
                qsar,mondrian conformal prediction,chembl,classification models,cheminformatics
                Chemoinformatics
                qsar, mondrian conformal prediction, chembl, classification models, cheminformatics

                Comments

                Comment on this article