Processing math: 100%
Inviting an author to review:
Find an author and click ‘Invite to review selected article’ near their name.
Search for authorsSearch for similar articles
48
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      ProDis-ContSHC: learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval

      research-article
      1 , 2 , 1 , , 2 , 2 , 3
      BMC Bioinformatics
      BioMed Central
      The 2011 International Conference on Intelligent Computing (ICIC 2011)
      11-14 August 2011

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          The need to retrieve or classify protein molecules using structure or sequence-based similarity measures underlies a wide range of biomedical applications. Traditional protein search methods rely on a pairwise dissimilarity/similarity measure for comparing a pair of proteins. This kind of pairwise measures suffer from the limitation of neglecting the distribution of other proteins and thus cannot satisfy the need for high accuracy of the retrieval systems. Recent work in the machine learning community has shown that exploiting the global structure of the database and learning the contextual dissimilarity/similarity measures can improve the retrieval performance significantly. However, most existing contextual dissimilarity/similarity learning algorithms work in an unsupervised manner, which does not utilize the information of the known class labels of proteins in the database.

          Results

          In this paper, we propose a novel protein-protein dissimilarity learning algorithm, ProDis-ContSHC. ProDis-ContSHC regularizes an existing dissimilarity measure d ij by considering the contextual information of the proteins. The context of a protein is defined by its neighboring proteins. The basic idea is, for a pair of proteins ( i, j), if their context N(i) and N(j) is similar to each other, the two proteins should also have a high similarity. We implement this idea by regularizing d ij by a factor learned from the context N(i) and N(j) .

          Moreover, we divide the context to hierarchial sub-context and get the contextual dissimilarity vector for each protein pair. Using the class label information of the proteins, we select the relevant (a pair of proteins that has the same class labels) and irrelevant (with different labels) protein pairs, and train an SVM model to distinguish between their contextual dissimilarity vectors. The SVM model is further used to learn a supervised regularizing factor. Finally, with the new Supervised learned Dissimilarity measure, we update the Protein Hierarchial Context Coherently in an iterative algorithm-- ProDis-ContSHC.

          We test the performance of ProDis-ContSHC on two benchmark sets, i.e., the ASTRAL 1.73 database and the FSSP/DALI database. Experimental results demonstrate that plugging our supervised contextual dissimilarity measures into the retrieval systems significantly outperforms the context-free dissimilarity/similarity measures and other unsupervised contextual dissimilarity measures that do not use the class label information.

          Conclusions

          Using the contextual proteins with their class labels in the database, we can improve the accuracy of the pairwise dissimilarity/similarity measures dramatically for the protein retrieval tasks. In this work, for the first time, we propose the idea of supervised contextual dissimilarity learning, resulting in the ProDis-ContSHC algorithm. Among different contextual dissimilarity learning approaches that can be used to compare a pair of proteins, ProDis-ContSHC provides the highest accuracy. Finally, ProDis-ContSHC compares favorably with other methods reported in the recent literature.

          Related collections

          Most cited references53

          • Record: found
          • Abstract: found
          • Article: not found

          Surprising similarities in structure comparison.

          Examination of a protein's structural 'neighbors' can reveal distant evolutionary relationships that are otherwise undetectable, and perhaps suggest unsuspected functional properties. In the past, such analyses have often required specialized software and computer skills, but new structural comparison methods, developed in the past two years, increasingly offer this opportunity to structural and molecular biologists in general. These methods are based on similarity-search algorithms that are fast enough to have effectively removed the computer-time limitation for structure-structure search and alignment, and have made it possible for several groups to conduct systematic comparisons of all publicly available structures, and offer this information via the World Wide Web. Furthermore, and perhaps surprisingly given the difficulty of the structure-comparison problem, these groups seem to have converged on quite similar approaches with respect to both fast search algorithms and the identification of statistically significant similarities.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            A Relationship Between Arbitrary Positive Matrices and Doubly Stochastic Matrices

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The ASTRAL compendium for protein structure and sequence analysis.

              The ASTRAL compendium provides several databases and tools to aid in the analysis of protein structures, particularly through the use of their sequences. The SPACI scores included in the system summarize the overall characteristics of a protein structure. A structural alignments database indicates residue equivalencies in superimposed protein domain structures. The PDB sequence-map files provide a linkage between the amino acid sequence of the molecule studied (SEQRES records in a database entry) and the sequence of the atoms experimentally observed in the structure (ATOM records). These maps are combined with information in the SCOPdatabase to provide sequences of protein domains. Selected subsets of the domain database, with varying degrees of similarity measured in several different ways, are also available. ASTRALmay be accessed at http://astral.stanford.edu/
                Bookmark

                Author and article information

                Conference
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central
                1471-2105
                2012
                8 May 2012
                : 13
                : Suppl 7
                : S2
                Affiliations
                [1 ]King Abdullah University of Science and Technology (KAUST), Mathematical and Computer Sciences and Engineering Division, Thuwal, 23955-6900, Saudi Arabia
                [2 ]Shanghai Institute of Applied Physics, Chinese Academy of Sciences, 2019 Jialuo Road, Jiading District, Shanghai 201800, China
                [3 ]Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai 200433, China
                Article
                1471-2105-13-S7-S2
                10.1186/1471-2105-13-S7-S2
                3348016
                22594999
                928e118d-8ef6-4ecf-9a8c-60bd880b9742
                Copyright ©2012 Wang et al.; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                The 2011 International Conference on Intelligent Computing (ICIC 2011)
                Zhengzhou, China
                11-14 August 2011
                History
                Categories
                Proceedings

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article