17
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines

      research-article
      1 , 1 , , 2
      BMC Bioinformatics
      BioMed Central

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Although many computational methods have been developed to predict protein subcellular localization, most of the methods are limited to the prediction of single-location proteins. Multi-location proteins are either not considered or assumed not existing. However, proteins with multiple locations are particularly interesting because they may have special biological functions, which are essential to both basic research and drug discovery.

          Results

          This paper proposes an efficient multi-label predictor, namely mGOASVM, for predicting the subcellular localization of multi-location proteins. Given a protein, the accession numbers of its homologs are obtained via BLAST search. Then, the original accession number and the homologous accession numbers of the protein are used as keys to search against the Gene Ontology (GO) annotation database to obtain a set of GO terms. Given a set of training proteins, a set of T relevant GO terms is obtained by finding all of the GO terms in the GO annotation database that are relevant to the training proteins. These relevant GO terms then form the basis of a T-dimensional Euclidean space on which the GO vectors lie. A support vector machine (SVM) classifier with a new decision scheme is proposed to classify the multi-label GO vectors. The mGOASVM predictor has the following advantages: (1) it uses the frequency of occurrences of GO terms for feature representation; (2) it selects the relevant GO subspace which can substantially speed up the prediction without compromising performance; and (3) it adopts an efficient multi-label SVM classifier which significantly outperforms other predictors. Briefly, on two recently published virus and plant datasets, mGOASVM achieves an actual accuracy of 88.9% and 87.4%, respectively, which are significantly higher than those achieved by the state-of-the-art predictors such as iLoc-Virus (74.8%) and iLoc-Plant (68.1%).

          Conclusions

          mGOASVM can efficiently predict the subcellular locations of multi-label proteins. The mGOASVM predictor is available online at http://bioinfo.eie.polyu.edu.hk/mGoaSvmServer/mGOASVM.html.

          Related collections

          Most cited references33

          • Record: found
          • Abstract: found
          • Article: not found

          Predotar: A tool for rapidly screening proteomes for N-terminal targeting sequences.

          Probably more than 25% of the proteins encoded by the nuclear genomes of multicellular eukaryotes are targeted to membrane-bound compartments by N-terminal targeting signals. The major signals are those for the endoplasmic reticulum, the mitochondria, and in plants, plastids. The most abundant of these targeted proteins are well-known and well-studied, but a large proportion remain unknown, including most of those involved in regulation of organellar gene expression or regulation of biochemical pathways. The discovery and characterization of these proteins by biochemical means will be long and difficult. An alternative method is to identify candidate organellar proteins via their characteristic N-terminal targeting sequences. We have developed a neural network-based approach (Predotar--Prediction of Organelle Targeting sequences) for identifying genes encoding these proteins amongst eukaryotic genome sequences. The power of this approach for identifying and annotating novel gene families has been illustrated by the discovery of the pentatricopeptide repeat family.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation.

            Many bioinformatics data resources not only hold data in the form of sequences, but also as annotation. In the majority of cases, annotation is written as scientific natural language: this is suitable for humans, but not particularly useful for machine processing. Ontologies offer a mechanism by which knowledge can be represented in a form capable of such processing. In this paper we investigate the use of ontological annotation to measure the similarities in knowledge content or 'semantic similarity' between entries in a data resource. These allow a bioinformatician to perform a similarity measure over annotation in an analogous manner to those performed over sequences. A measure of semantic similarity for the knowledge component of bioinformatics resources should afford a biologist a new tool in their repertoire of analyses. We present the results from experiments that investigate the validity of using semantic similarity by comparison with sequence similarity. We show a simple extension that enables a semantic search of the knowledge held within sequence databases. Software available from http://www.russet.org.uk.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Recent progress in protein subcellular location prediction.

                Bookmark

                Author and article information

                Contributors
                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central
                1471-2105
                2012
                6 November 2012
                : 13
                : 290
                Affiliations
                [1 ]Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong SAR, China
                [2 ]Department of Electrical Engineering, Princeton University, New Jersey, USA
                Article
                1471-2105-13-290
                10.1186/1471-2105-13-290
                3582598
                23130999
                979614e7-17c1-4ef3-9058-bbf3df332f36
                Copyright ©2012 Wan et al.; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 3 July 2012
                : 24 October 2012
                Categories
                Methodology Article

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article