138
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Identifying a protein's functional sites is an important step towards characterizing its molecular function. Numerous structure- and sequence-based methods have been developed for this problem. Here we introduce ConCavity, a small molecule binding site prediction algorithm that integrates evolutionary sequence conservation estimates with structure-based methods for identifying protein surface cavities. In large-scale testing on a diverse set of single- and multi-chain protein structures, we show that ConCavity substantially outperforms existing methods for identifying both 3D ligand binding pockets and individual ligand binding residues. As part of our testing, we perform one of the first direct comparisons of conservation-based and structure-based methods. We find that the two approaches provide largely complementary information, which can be combined to improve upon either approach alone. We also demonstrate that ConCavity has state-of-the-art performance in predicting catalytic sites and drug binding pockets. Overall, the algorithms and analysis presented here significantly improve our ability to identify ligand binding sites and further advance our understanding of the relationship between evolutionary sequence conservation and structural and functional attributes of proteins. Data, source code, and prediction visualizations are available on the ConCavity web site ( http://compbio.cs.princeton.edu/concavity/).

          Author Summary

          Protein molecules are ubiquitous in the cell; they perform thousands of functions crucial for life. Proteins accomplish nearly all of these functions by interacting with other molecules. These interactions are mediated by specific amino acid positions in the proteins. Knowledge of these “functional sites” is crucial for understanding the molecular mechanisms by which proteins carry out their functions; however, functional sites have not been identified in the vast majority of proteins. Here, we present ConCavity, a computational method that predicts small molecule binding sites in proteins by combining analysis of evolutionary sequence conservation and protein 3D structure. ConCavity provides significant improvement over previous approaches, especially on large, multi-chain proteins. In contrast to earlier methods which only predict entire binding sites, ConCavity makes specific predictions of positions in space that are likely to overlap ligand atoms and of residues that are likely to contact bound ligands. These predictions can be used to aid computational function prediction, to guide experimental protein analysis, and to focus computationally intensive techniques used in drug discovery.

          Related collections

          Most cited references28

          • Record: found
          • Abstract: found
          • Article: not found

          CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues

          Cavities on a proteins surface as well as specific amino acid positioning within it create the physicochemical properties needed for a protein to perform its function. CASTp () is an online tool that locates and measures pockets and voids on 3D protein structures. This new version of CASTp includes annotated functional information of specific residues on the protein structure. The annotations are derived from the Protein Data Bank (PDB), Swiss-Prot, as well as Online Mendelian Inheritance in Man (OMIM), the latter contains information on the variant single nucleotide polymorphisms (SNPs) that are known to cause disease. These annotated residues are mapped to surface pockets, interior voids or other regions of the PDB structures. We use a semi-global pair-wise sequence alignment method to obtain sequence mapping between entries in Swiss-Prot, OMIM and entries in PDB. The updated CASTp web server can be used to study surface features, functional regions and specific roles of key residues of proteins.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data.

            The Catalytic Site Atlas (CSA) provides catalytic residue annotation for enzymes in the Protein Data Bank. It is available online at http://www.ebi.ac.uk/thornton-srv/databases/CSA. The database consists of two types of annotated site: an original hand-annotated set containing information extracted from the primary literature, using defined criteria to assign catalytic residues, and an additional homologous set, containing annotations inferred by PSI-BLAST and sequence alignment to one of the original set. The CSA can be queried via Swiss-Prot identifier and EC number, as well as by PDB code. CSA Version 1.0 contains 177 original hand- annotated entries and 2608 homologous entries, and covers approximately 30% of all EC numbers found in PDB. The CSA will be updated on a monthly basis to include homologous sites found in new PDBs, and new hand-annotated enzymes as and when their annotation is completed.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior.

              The degree to which an amino acid site is free to vary is strongly dependent on its structural and functional importance. An amino acid that plays an essential role is unlikely to change over evolutionary time. Hence, the evolutionary rate at an amino acid site is indicative of how conserved this site is and, in turn, allows evaluation of its importance in maintaining the structure/function of the protein. When using probabilistic methods for site-specific rate inference, few alternatives are possible. In this study we use simulations to compare the maximum-likelihood and Bayesian paradigms. We study the dependence of inference accuracy on such parameters as number of sequences, branch lengths, the shape of the rate distribution, and sequence length. We also study the possibility of simultaneously estimating branch lengths and site-specific rates. Our results show that a Bayesian approach is superior to maximum-likelihood under a wide range of conditions, indicating that the prior that is incorporated into the Bayesian computation significantly improves performance. We show that when branch lengths are unknown, it is better first to estimate branch lengths and then to estimate site-specific rates. This procedure was found to be superior to estimating both the branch lengths and site-specific rates simultaneously. Finally, we illustrate the difference between maximum-likelihood and Bayesian methods when analyzing site-conservation for the apoptosis regulator protein Bcl-x(L).
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Comput Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, USA )
                1553-734X
                1553-7358
                December 2009
                December 2009
                4 December 2009
                : 5
                : 12
                : e1000585
                Affiliations
                [1 ]Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
                [2 ]Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
                [3 ]European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
                Max-Planck-Institut für Informatik, Germany
                Author notes

                Conceived and designed the experiments: JAC MS TAF. Performed the experiments: JAC TAF. Analyzed the data: JAC MS TAF. Wrote the paper: JAC MS TAF. Provided methodological input: RAL JMT.

                Article
                09-PLCB-RA-0521R3
                10.1371/journal.pcbi.1000585
                2777313
                19997483
                133f1285-64c0-4419-b139-4d3adb29d878
                Capra et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                History
                : 11 May 2009
                : 30 October 2009
                Page count
                Pages: 18
                Categories
                Research Article
                Computational Biology
                Computational Biology/Macromolecular Sequence Analysis
                Computational Biology/Macromolecular Structure Analysis
                Computer Science
                Molecular Biology/Bioinformatics

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article