A New Method for Predicting the Subcellular Localization of Eukaryotic Proteins with Both Single and Multiple Sites: Euk-mPLoc 2.0

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Information of subcellular locations of proteins is important for in-depth studies of cell biology. It is very useful for proteomics, system biology and drug development as well. However, most existing methods for predicting protein subcellular location can only cover 5 to 12 location sites. Also, they are limited to deal with single-location proteins and hence failed to work for multiplex proteins, which can simultaneously exist at, or move between, two or more location sites. Actually, multiplex proteins of this kind usually posses some important biological functions worthy of our special notice. A new predictor called “ Euk-mPLoc 2.0” is developed by hybridizing the gene ontology information, functional domain information, and sequential evolutionary information through three different modes of pseudo amino acid composition. It can be used to identify eukaryotic proteins among the following 22 locations: (1) acrosome, (2) cell wall, (3) centriole, (4) chloroplast, (5) cyanelle, (6) cytoplasm, (7) cytoskeleton, (8) endoplasmic reticulum, (9) endosome, (10) extracell, (11) Golgi apparatus, (12) hydrogenosome, (13) lysosome, (14) melanosome, (15) microsome (16) mitochondria, (17) nucleus, (18) peroxisome, (19) plasma membrane, (20) plastid, (21) spindle pole body, and (22) vacuole. Compared with the existing methods for predicting eukaryotic protein subcellular localization, the new predictor is much more powerful and flexible, particularly in dealing with proteins with multiple locations and proteins without available accession numbers. For a newly-constructed stringent benchmark dataset which contains both single- and multiple-location proteins and in which none of proteins has pairwise sequence identity to any other in a same location, the overall jackknife success rate achieved by Euk-mPLoc 2.0 is more than 24% higher than those by any of the existing predictors. As a user-friendly web-server, Euk-mPLoc 2.0 is freely accessible at http://www.csbio.sjtu.edu.cn/bioinf/euk-multi-2/. For a query protein sequence of 400 amino acids, it will take about 15 seconds for the web-server to yield the predicted result; the longer the sequence is, the more time it may usually need. It is anticipated that the novel approach and the powerful predictor as presented in this paper will have a significant impact to Molecular Cell Biology, System Biology, Proteomics, Bioinformatics, and Drug Development.

Related collections

Most cited references 59

Record: found
Abstract: found
Article: not found

Gene Ontology: tool for the unification of biology

Michael Ashburner, Catherine A. Ball, Judith Blake … (2002)

Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

0 comments Cited 15636 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Pfam: clans, web tools and services

Robert D. Finn, Jaina Mistry, Benjamin Schuster-Böckler … (2005)

Pfam is a database of protein families that currently contains 7973 entries (release 18.0). A recent development in Pfam has enabled the grouping of related families into clans. Pfam clans are described in detail, together with the new associated web pages. Improvements to the range of Pfam web tools and the first set of Pfam web services that allow programmatic access to the database and associated tools are also presented. Pfam is available on the web in the UK (), the USA (), France () and Sweden ().

0 comments Cited 690 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Predotar: A tool for rapidly screening proteomes for N-terminal targeting sequences.

Ian Small, Nemo Peeters, Fabrice Legeai … (2004)

Probably more than 25% of the proteins encoded by the nuclear genomes of multicellular eukaryotes are targeted to membrane-bound compartments by N-terminal targeting signals. The major signals are those for the endoplasmic reticulum, the mitochondria, and in plants, plastids. The most abundant of these targeted proteins are well-known and well-studied, but a large proportion remain unknown, including most of those involved in regulation of organellar gene expression or regulation of biochemical pathways. The discovery and characterization of these proteins by biochemical means will be long and difficult. An alternative method is to identify candidate organellar proteins via their characteristic N-terminal targeting sequences. We have developed a neural network-based approach (Predotar--Prediction of Organelle Targeting sequences) for identifying genes encoding these proteins amongst eukaryotic genome sequences. The power of this approach for identifying and annotating novel gene families has been illustrated by the discovery of the pentatricopeptide repeat family.

0 comments Cited 260 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

: Role: Editor

Journal

Journal ID (nlm-ta): PLoS One

Journal ID (publisher-id): plos

Journal ID (pmc): plosone

Title: PLoS ONE

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Electronic): 1932-6203

Publication date Collection: 2010

Publication date (Electronic): 1 April 2010

Volume: 5

Issue: 4

Electronic Location Identifier: e9931

Affiliations

[1 ]Gordon Life Science Institute, San Diego, California, United States of America

[2 ]Institute of Image Processing & Pattern Recognition, Shanghai Jiaotong University, Shanghai, China

Institute of Infectious Disease and Molecular Medicine, South Africa

Author notes

* E-mail: kcchou@ 123456gordonlifescience.org

Conceived and designed the experiments: KCC HBS. Performed the experiments: KCC HBS. Analyzed the data: KCC HBS.

Article

Publisher ID: 10-PONE-RA-15948R1

DOI: 10.1371/journal.pone.0009931

PMC ID: 2848569

PubMed ID: 20368981

SO-VID: 82898ba0-5922-483f-b84a-43c5ddc0c622

Copyright © Chou, Shen. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 1 February 2010

Date accepted : 8 March 2010

Page count

Pages: 9

Comments

Comment on this article

scite_

Cited by 90

See all cited by

Most referenced authors 1,512

See all reference authors

A New Method for Predicting the Subcellular Localization of Eukaryotic Proteins with Both Single and Multiple Sites: Euk-mPLoc 2.0

Read this article at

Abstract

Related collections

PLOS Climate

Most cited references 59

Gene Ontology: tool for the unification of biology

Pfam: clans, web tools and services

Predotar: A tool for rapidly screening proteomes for N-terminal targeting sequences.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 127

Cited by 90

Most referenced authors 1,512