There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

With genome sequences complete for human and model organisms, it is essential to understand how individual genes and proteins are organized into biological networks. Much of the organization is revealed by proteomics experiments that now generate torrents of data. Extracting relevant complexes and pathways from high-throughput proteomics data sets has posed a challenge, however, and new methods to identify and extract networks are essential. We focus on the problem of building pathways starting from known proteins of interest. We have developed an efficient, greedy algorithm, SEEDY, that extracts biologically relevant biological networks from protein-protein interaction data, building out from selected seed proteins. The algorithm relies on our previous study establishing statistical confidence levels for interactions generated by two-hybrid screens and inferred from mass spectrometric identification of protein complexes. We demonstrate the ability to extract known yeast complexes from high-throughput protein interaction data with a tunable parameter that governs the trade-off between sensitivity and selectivity. DNA damage repair pathways are presented as a detailed example. We highlight the ability to join heterogeneous data sets, in this case protein-protein interactions and genetic interactions, and the appearance of cross-talk between pathways caused by re-use of shared components. SIGNIFICANCE AND COMPARISON: The significance of the SEEDY algorithm is that it is fast, running time O[(E + V) log V] for V proteins and E interactions, a single adjustable parameter controls the size of the pathways that are generated, and an associated P-value indicates the statistical confidence that the pathways are enriched for proteins with a coherent function. Previous approaches have focused on extracting sub-networks by identifying motifs enriched in known biological networks. SEEDY provides the complementary ability to perform a directed search based on proteins of interest. SEEDY software (Perl source), data tables and confidence score models (R source) are freely available from the author.

Related collections

Author and article information

Journal

PubMed ID:: 14555618

DOI:: 10.1093/bioinformatics/btg358

ScienceOpen disciplines: Chemistry

Keywords: Algorithms,Combinatorial Chemistry Techniques,Computer Simulation,DNA Repair,physiology,Models, Biological,Protein Binding,Protein Interaction Mapping,methods,Proteins,chemistry,metabolism,Proteome,Quality Control,Signal Transduction,User-Computer Interface

Greedily building protein networks with confidence.

Read this article at

Abstract

Related collections

Journal of Circulating Biomarkers

Author and article information

Journal

Comments

Comment on this article

Similar content 124

Cited by 15