26
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Calpain Cleavage Prediction Using Multiple Kernel Learning

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Calpain, an intracellular -dependent cysteine protease, is known to play a role in a wide range of metabolic pathways through limited proteolysis of its substrates. However, only a limited number of these substrates are currently known, with the exact mechanism of substrate recognition and cleavage by calpain still largely unknown. While previous research has successfully applied standard machine-learning algorithms to accurately predict substrate cleavage by other similar types of proteases, their approach does not extend well to calpain, possibly due to its particular mode of proteolytic action and limited amount of experimental data. Through the use of Multiple Kernel Learning, a recent extension to the classic Support Vector Machine framework, we were able to train complex models based on rich, heterogeneous feature sets, leading to significantly improved prediction quality (6% over highest AUC score produced by state-of-the-art methods). In addition to producing a stronger machine-learning model for the prediction of calpain cleavage, we were able to highlight the importance and role of each feature of substrate sequences in defining specificity: primary sequence, secondary structure and solvent accessibility. Most notably, we showed there existed significant specificity differences across calpain sub-types, despite previous assumption to the contrary. Prediction accuracy was further successfully validated using, as an unbiased test set, mutated sequences of calpastatin (endogenous inhibitor of calpain) modified to no longer block calpain's proteolytic action. An online implementation of our prediction tool is available at http://calpain.org.

          Related collections

          Most cited references51

          • Record: found
          • Abstract: found
          • Article: not found

          The calpain system.

          The calpain system originally comprised three molecules: two Ca2+-dependent proteases, mu-calpain and m-calpain, and a third polypeptide, calpastatin, whose only known function is to inhibit the two calpains. Both mu- and m-calpain are heterodimers containing an identical 28-kDa subunit and an 80-kDa subunit that shares 55-65% sequence homology between the two proteases. The crystallographic structure of m-calpain reveals six "domains" in the 80-kDa subunit: 1). a 19-amino acid NH2-terminal sequence; 2). and 3). two domains that constitute the active site, IIa and IIb; 4). domain III; 5). an 18-amino acid extended sequence linking domain III to domain IV; and 6). domain IV, which resembles the penta EF-hand family of polypeptides. The single calpastatin gene can produce eight or more calpastatin polypeptides ranging from 17 to 85 kDa by use of different promoters and alternative splicing events. The physiological significance of these different calpastatins is unclear, although all bind to three different places on the calpain molecule; binding to at least two of the sites is Ca2+ dependent. Since 1989, cDNA cloning has identified 12 additional mRNAs in mammals that encode polypeptides homologous to domains IIa and IIb of the 80-kDa subunit of mu- and m-calpain, and calpain-like mRNAs have been identified in other organisms. The molecules encoded by these mRNAs have not been isolated, so little is known about their properties. How calpain activity is regulated in cells is still unclear, but the calpains ostensibly participate in a variety of cellular processes including remodeling of cytoskeletal/membrane attachments, different signal transduction pathways, and apoptosis. Deregulated calpain activity following loss of Ca2+ homeostasis results in tissue damage in response to events such as myocardial infarcts, stroke, and brain trauma.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Genetic variation in the gene encoding calpain-10 is associated with type 2 diabetes mellitus.

            Type 2 or non-insulin-dependent diabetes mellitus (NIDDM) is the most common form of diabetes worldwide, affecting approximately 4% of the world's adult population. It is multifactorial in origin with both genetic and environmental factors contributing to its development. A genome-wide screen for type 2 diabetes genes carried out in Mexican Americans localized a susceptibility gene, designated NIDDM1, to chromosome 2. Here we describe the positional cloning of a gene located in the NIDDM1 region that shows association with type 2 diabetes in Mexican Americans and a Northern European population from the Botnia region of Finland. This putative diabetes-susceptibility gene encodes a ubiquitously expressed member of the calpain-like cysteine protease family, calpain-10 (CAPN10). This finding suggests a novel pathway that may contribute to the development of type 2 diabetes.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A statistical framework for genomic data fusion.

              During the past decade, the new focus on genomics has highlighted a particular challenge: to integrate the different views of the genome that are provided by various types of experimental data. This paper describes a computational framework for integrating and drawing inferences from a collection of genome-wide measurements. Each dataset is represented via a kernel function, which defines generalized similarity relationships between pairs of entities, such as genes or proteins. The kernel representation is both flexible and efficient, and can be applied to many different types of data. Furthermore, kernel functions derived from different types of data can be combined in a straightforward fashion. Recent advances in the theory of kernel methods have provided efficient algorithms to perform such combinations in a way that minimizes a statistical loss function. These methods exploit semidefinite programming techniques to reduce the problem of finding optimizing kernel combinations to a convex optimization problem. Computational experiments performed using yeast genome-wide datasets, including amino acid sequences, hydropathy profiles, gene expression data and known protein-protein interactions, demonstrate the utility of this approach. A statistical learning algorithm trained from all of these data to recognize particular classes of proteins--membrane proteins and ribosomal proteins--performs significantly better than the same algorithm trained on any single type of data. Supplementary data at http://noble.gs.washington.edu/proj/sdp-svm
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, USA )
                1932-6203
                2011
                3 May 2011
                : 6
                : 5
                : e19035
                Affiliations
                [1 ]Bioinformatics Center, Kyoto University, Uji, Kyoto, Japan
                [2 ]Calpain Project, Rinshoken, Tokyo, Japan
                Kyushu Institute of Technology, Japan
                Author notes

                Conceived and designed the experiments: DdV YO HS HM. Performed the experiments: DdV. Analyzed the data: DdV. Wrote the paper: DdV YO.

                Article
                PONE-D-11-01595
                10.1371/journal.pone.0019035
                3086883
                21559271
                98eaa1b2-bd7a-483e-9bff-8f7e11cceba0
                duVerle et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                History
                : 14 January 2011
                : 23 March 2011
                Page count
                Pages: 9
                Categories
                Research Article
                Biology
                Computational Biology
                Sequence Analysis
                Proteomics
                Protein Interactions
                Sequence Analysis

                Uncategorized
                Uncategorized

                Comments

                Comment on this article