Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Compared to the available protein sequences of different organisms, the number of revealed protein–protein interactions (PPIs) is still very limited. So many computational methods have been developed to facilitate the identification of novel PPIs. However, the methods only using the information of protein sequences are more universal than those that depend on some additional information or predictions about the proteins. In this article, a sequence-based method is proposed by combining a new feature representation using auto covariance (AC) and support vector machine (SVM). AC accounts for the interactions between residues a certain distance apart in the sequence, so this method adequately takes the neighbouring effect into account. When performed on the PPI data of yeast Saccharomyces cerevisiae, the method achieved a very promising prediction result. An independent data set of 11 474 yeast PPIs was used to evaluate this prediction model and the prediction accuracy is 88.09%. The performance of this method is superior to those of the existing sequence-based methods, so it can be a useful supplementary tool for future proteomics studies. The prediction software and all data sets used in this article are freely available at http://www.scucic.cn/Predict_PPI/index.htm.

Related collections

Most cited references 52

Record: found
Abstract: found
Article: not found

A comprehensive two-hybrid analysis to explore the yeast protein interactome.

T Ito, T Chiba, R Ozawa … (2001)

Protein-protein interactions play crucial roles in the execution of various biological functions. Accordingly, their comprehensive description would contribute considerably to the functional interpretation of fully sequenced genomes, which are flooded with novel genes of unpredictable functions. We previously developed a system to examine two-hybrid interactions in all possible combinations between the approximately 6,000 proteins of the budding yeast Saccharomyces cerevisiae. Here we have completed the comprehensive analysis using this system to identify 4,549 two-hybrid interactions among 3,278 proteins. Unexpectedly, these data do not largely overlap with those obtained by the other project [Uetz, P., et al. (2000) Nature (London) 403, 623-627] and hence have substantially expanded our knowledge on the protein interaction space or interactome of the yeast. Cumulative connection of these binary interactions generates a single huge network linking the vast majority of the proteins. Bioinformatics-aided selection of biologically relevant interactions highlights various intriguing subnetworks. They include, for instance, the one that had successfully foreseen the involvement of a novel protein in spindle pole body function as well as the one that may uncover a hitherto unidentified multiprotein complex potentially participating in the process of vesicular transport. Our data would thus significantly expand and improve the protein interaction map for the exploration of genome functions that eventually leads to thorough understanding of the cell as a molecular system.

0 comments Cited 804 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry.

Yuen Ho, Albrecht Gruhler, Adrian Heilbut … (2002)

The recent abundance of genome sequence data has brought an urgent need for systematic proteomics to decipher the encoded protein networks that dictate cellular function. To date, generation of large-scale protein-protein interaction maps has relied on the yeast two-hybrid system, which detects binary interactions through activation of reporter gene expression. With the advent of ultrasensitive mass spectrometric protein identification methods, it is feasible to identify directly protein complexes on a proteome-wide scale. Here we report, using the budding yeast Saccharomyces cerevisiae as a test case, an example of this approach, which we term high-throughput mass spectrometric protein complex identification (HMS-PCI). Beginning with 10% of predicted yeast proteins as baits, we detected 3,617 associated proteins covering 25% of the yeast proteome. Numerous protein complexes were identified, including many new interactions in various signalling pathways and in the DNA damage response. Comparison of the HMS-PCI data set with interactions reported in the literature revealed an average threefold higher success rate in detection of known complexes compared with large-scale two-hybrid studies. Given the high degree of connectivity observed in this study, even partial HMS-PCI coverage of complex proteomes, including that of humans, should allow comprehensive identification of cellular networks.

0 comments Cited 727 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions.

I Xenarios (2002)

The Database of Interacting Proteins (DIP: http://dip.doe-mbi.ucla.edu) is a database that documents experimentally determined protein-protein interactions. It provides the scientific community with an integrated set of tools for browsing and extracting information about protein interaction networks. As of September 2001, the DIP catalogs approximately 11 000 unique interactions among 5900 proteins from >80 organisms; the vast majority from yeast, Helicobacter pylori and human. Tools have been developed that allow users to analyze, visualize and integrate their own experimental data with the information about protein-protein interactions available in the DIP database.

0 comments Cited 484 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Nucleic Acids Res

Journal ID (iso-abbrev): Nucleic Acids Res

Journal ID (publisher-id): nar

Journal ID (hwp): nar

Title: Nucleic Acids Research

Publisher: Oxford University Press

ISSN (Print): 0305-1048

ISSN (Electronic): 1362-4962

Publication date (Print): May 2008

Publication date (Electronic): 4 April 2008

Publication date PMC-release: 4 April 2008

Volume: 36

Issue: 9

Pages: 3025-3030

Affiliations

¹College of Chemistry, Sichuan University, Chengdu 610064 and ²State Key Laboratory of Biotherapy, Sichuan University, Chengdu 610041, P.R. China

Author notes

*To whom correspondence should be addressed. +86 28 89005151+86 28 85412356 liml@ 123456scu.edu.cn

Article

Publisher ID: gkn159

DOI: 10.1093/nar/gkn159

PMC ID: 2396404

PubMed ID: 18390576

SO-VID: 00dc2a9e-6a60-4af3-836f-d0988272e28e

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 10 January 2008

Date revision received : 3 March 2008

Date accepted : 20 March 2008

Comments

Comment on this article

scite_

Cited by 235

See all cited by

Most referenced authors 1,415

See all reference authors

Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences

Read this article at

Abstract

Related collections

Genome Engineering using CRISPR

Most cited references 52

A comprehensive two-hybrid analysis to explore the yeast protein interactome.

Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry.

DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions.

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 168

Cited by 235

Most referenced authors 1,415