Calpain Cleavage Prediction Using Multiple Kernel Learning

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Calpain, an intracellular -dependent cysteine protease, is known to play a role in a wide range of metabolic pathways through limited proteolysis of its substrates. However, only a limited number of these substrates are currently known, with the exact mechanism of substrate recognition and cleavage by calpain still largely unknown. While previous research has successfully applied standard machine-learning algorithms to accurately predict substrate cleavage by other similar types of proteases, their approach does not extend well to calpain, possibly due to its particular mode of proteolytic action and limited amount of experimental data. Through the use of Multiple Kernel Learning, a recent extension to the classic Support Vector Machine framework, we were able to train complex models based on rich, heterogeneous feature sets, leading to significantly improved prediction quality (6% over highest AUC score produced by state-of-the-art methods). In addition to producing a stronger machine-learning model for the prediction of calpain cleavage, we were able to highlight the importance and role of each feature of substrate sequences in defining specificity: primary sequence, secondary structure and solvent accessibility. Most notably, we showed there existed significant specificity differences across calpain sub-types, despite previous assumption to the contrary. Prediction accuracy was further successfully validated using, as an unbiased test set, mutated sequences of calpastatin (endogenous inhibitor of calpain) modified to no longer block calpain's proteolytic action. An online implementation of our prediction tool is available at http://calpain.org.

Related collections

Most cited references 51

Record: found
Abstract: found
Article: not found

The calpain system.

DARREL E. GOLL, VALERY THOMPSON, HONGQI LI … (2003)

The calpain system originally comprised three molecules: two Ca2+-dependent proteases, mu-calpain and m-calpain, and a third polypeptide, calpastatin, whose only known function is to inhibit the two calpains. Both mu- and m-calpain are heterodimers containing an identical 28-kDa subunit and an 80-kDa subunit that shares 55-65% sequence homology between the two proteases. The crystallographic structure of m-calpain reveals six "domains" in the 80-kDa subunit: 1). a 19-amino acid NH2-terminal sequence; 2). and 3). two domains that constitute the active site, IIa and IIb; 4). domain III; 5). an 18-amino acid extended sequence linking domain III to domain IV; and 6). domain IV, which resembles the penta EF-hand family of polypeptides. The single calpastatin gene can produce eight or more calpastatin polypeptides ranging from 17 to 85 kDa by use of different promoters and alternative splicing events. The physiological significance of these different calpastatins is unclear, although all bind to three different places on the calpain molecule; binding to at least two of the sites is Ca2+ dependent. Since 1989, cDNA cloning has identified 12 additional mRNAs in mammals that encode polypeptides homologous to domains IIa and IIb of the 80-kDa subunit of mu- and m-calpain, and calpain-like mRNAs have been identified in other organisms. The molecules encoded by these mRNAs have not been isolated, so little is known about their properties. How calpain activity is regulated in cells is still unclear, but the calpains ostensibly participate in a variety of cellular processes including remodeling of cytoskeletal/membrane attachments, different signal transduction pathways, and apoptosis. Deregulated calpain activity following loss of Ca2+ homeostasis results in tissue damage in response to events such as myocardial infarcts, stroke, and brain trauma.

0 comments Cited 505 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Genetic variation in the gene encoding calpain-10 is associated with type 2 diabetes mellitus.

Y Horikawa, N. Oda, N Cox … (2000)

Type 2 or non-insulin-dependent diabetes mellitus (NIDDM) is the most common form of diabetes worldwide, affecting approximately 4% of the world's adult population. It is multifactorial in origin with both genetic and environmental factors contributing to its development. A genome-wide screen for type 2 diabetes genes carried out in Mexican Americans localized a susceptibility gene, designated NIDDM1, to chromosome 2. Here we describe the positional cloning of a gene located in the NIDDM1 region that shows association with type 2 diabetes in Mexican Americans and a Northern European population from the Botnia region of Finland. This putative diabetes-susceptibility gene encodes a ubiquitously expressed member of the calpain-like cysteine protease family, calpain-10 (CAPN10). This finding suggests a novel pathway that may contribute to the development of type 2 diabetes.

0 comments Cited 172 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A statistical framework for genomic data fusion.

Gert Lanckriet, Tijl De Bie, Nello Cristianini … (2004)

During the past decade, the new focus on genomics has highlighted a particular challenge: to integrate the different views of the genome that are provided by various types of experimental data. This paper describes a computational framework for integrating and drawing inferences from a collection of genome-wide measurements. Each dataset is represented via a kernel function, which defines generalized similarity relationships between pairs of entities, such as genes or proteins. The kernel representation is both flexible and efficient, and can be applied to many different types of data. Furthermore, kernel functions derived from different types of data can be combined in a straightforward fashion. Recent advances in the theory of kernel methods have provided efficient algorithms to perform such combinations in a way that minimizes a statistical loss function. These methods exploit semidefinite programming techniques to reduce the problem of finding optimizing kernel combinations to a convex optimization problem. Computational experiments performed using yeast genome-wide datasets, including amino acid sequences, hydropathy profiles, gene expression data and known protein-protein interactions, demonstrate the utility of this approach. A statistical learning algorithm trained from all of these data to recognize particular classes of proteins--membrane proteins and ribosomal proteins--performs significantly better than the same algorithm trained on any single type of data. Supplementary data at http://noble.gs.washington.edu/proj/sdp-svm

0 comments Cited 166 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

: Role: Editor

Journal

Journal ID (nlm-ta): PLoS One

Journal ID (publisher-id): plos

Journal ID (pmc): plosone

Title: PLoS ONE

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Electronic): 1932-6203

Publication date Collection: 2011

Publication date (Electronic): 3 May 2011

Volume: 6

Issue: 5

Electronic Location Identifier: e19035

Affiliations

[1 ]Bioinformatics Center, Kyoto University, Uji, Kyoto, Japan

[2 ]Calpain Project, Rinshoken, Tokyo, Japan

Kyushu Institute of Technology, Japan

Author notes

* E-mail: dave@ 123456kuicr.kyoto-u.ac.jp

Conceived and designed the experiments: DdV YO HS HM. Performed the experiments: DdV. Analyzed the data: DdV. Wrote the paper: DdV YO.

Article

Publisher ID: PONE-D-11-01595

DOI: 10.1371/journal.pone.0019035

PMC ID: 3086883

PubMed ID: 21559271

SO-VID: 98eaa1b2-bd7a-483e-9bff-8f7e11cceba0

Copyright © duVerle et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 14 January 2011

Date accepted : 23 March 2011

Page count

Pages: 9

Comments

Comment on this article

scite_

Cited by 29

See all cited by

Most referenced authors 699

See all reference authors

Calpain Cleavage Prediction Using Multiple Kernel Learning

Read this article at

Abstract

Related collections

PLOS Climate

Most cited references 51

The calpain system.

Genetic variation in the gene encoding calpain-10 is associated with type 2 diabetes mellitus.

A statistical framework for genomic data fusion.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 773

Cited by 29

Most referenced authors 699