35
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: not found
      • Article: not found

      Similarity methods in chemoinformatics

      Annual Review of Information Science and Technology
      Wiley-Blackwell

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Related collections

          Most cited references276

          • Record: found
          • Abstract: found
          • Article: not found

          Random forest: a classification and regression tool for compound classification and QSAR modeling.

          A new classification and regression tool, Random Forest, is introduced and investigated for predicting a compound's quantitative or categorical biological activity based on a quantitative description of the compound's molecular structure. Random Forest is an ensemble of unpruned classification or regression trees created by using bootstrap samples of the training data and random feature selection in tree induction. Prediction is made by aggregating (majority vote or averaging) the predictions of the ensemble. We built predictive models for six cheminformatics data sets. Our analysis demonstrates that Random Forest is a powerful tool capable of delivering performance that is among the most accurate methods to date. We also present three additional features of Random Forest: built-in performance assessment, a measure of relative importance of descriptors, and a measure of compound similarity that is weighted by the relative importance of descriptors. It is the combination of relatively high prediction accuracy and its collection of desired features that makes Random Forest uniquely suited for modeling in cheminformatics.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            The problem of overfitting.

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A fast flexible docking method using an incremental construction algorithm.

              We present an automatic method for docking organic ligands into protein binding sites. The method can be used in the design process of specific protein ligands. It combines an appropriate model of the physico-chemical properties of the docked molecules with efficient methods for sampling the conformational space of the ligand. If the ligand is flexible, it can adopt a large variety of different conformations. Each such minimum in conformational space presents a potential candidate for the conformation of the ligand in the complexed state. Our docking method samples the conformation space of the ligand on the basis of a discrete model and uses a tree-search technique for placing the ligand incrementally into the active site. For placing the first fragment of the ligand into the protein, we use hashing techniques adapted from computer vision. The incremental construction algorithm is based on a greedy strategy combined with efficient methods for overlap detection and for the search of new interactions. We present results on 19 complexes of which the binding geometry has been crystallographically determined. All considered ligands are docked in at most three minutes on a current workstation. The experimentally observed binding mode of the ligand is reproduced with 0.5 to 1.2 A rms deviation. It is almost always found among the highest-ranking conformations computed.
                Bookmark

                Author and article information

                Journal
                Annual Review of Information Science and Technology
                Ann. Rev. Info. Sci. Tech.
                Wiley-Blackwell
                00664200
                2009
                February 16 2009
                : 43
                : 1
                : 1-117
                Article
                10.1002/aris.2009.1440430108
                5e0bc8a6-0e96-443f-b2ce-35d747c52cd1
                © 2009

                http://doi.wiley.com/10.1002/tdm_license_1.1

                History

                Comments

                Comment on this article