InterProScan 5: genome-scale protein function classification.

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Robust large-scale sequence analysis is a major challenge in modern genomic science, where biologists are frequently trying to characterize many millions of sequences. Here, we describe a new Java-based architecture for the widely used protein function prediction software package InterProScan. Developments include improvements and additions to the outputs of the software and the complete reimplementation of the software framework, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis. InterProScan is freely available for download from the EMBl-EBI FTP site and the open source code is hosted at Google Code.

Related collections

Most cited references 8

Record: found
Abstract: found
Article: not found

Gene Ontology: tool for the unification of biology

Michael Ashburner, Catherine A. Ball, Judith Blake … (2002)

Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

0 comments Cited 15590 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Reorganizing the protein space at the Universal Protein Resource (UniProt)

emmanuel boutet, Claire O'Donovan (2011)

The mission of UniProt is to support biological research by providing a freely accessible, stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces. UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. A key development at UniProt is the provision of complete, reference and representative proteomes. UniProt is updated and distributed every 4 weeks and can be accessed online for searches or download at http://www.uniprot.org.

0 comments Cited 489 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A new generation of homology search tools based on probabilistic inference.

Sean R. Eddy (2009)

Many theoretical advances have been made in applying probabilistic inference methods to improve the power of sequence homology searches, yet the BLAST suite of programs is still the workhorse for most of the field. The main reason for this is practical: BLAST's programs are about 100-fold faster than the fastest competing implementations of probabilistic inference methods. I describe recent work on the HMMER software suite for protein sequence analysis, which implements probabilistic inference using profile hidden Markov models. Our aim in HMMER3 is to achieve BLAST's speed while further improving the power of probabilistic inference based methods. HMMER3 implements a new probabilistic model of local sequence alignment and a new heuristic acceleration algorithm. Combined with efficient vector-parallel implementations on modern processors, these improvements synergize. HMMER3 uses more powerful log-odds likelihood scores (scores summed over alignment uncertainty, rather than scoring a single optimal alignment); it calculates accurate expectation values (E-values) for those scores without simulation using a generalization of Karlin/Altschul theory; it computes posterior distributions over the ensemble of possible alignments and returns posterior probabilities (confidences) in each aligned residue; and it does all this at an overall speed comparable to BLAST. The HMMER project aims to usher in a new generation of more powerful homology search tools based on probabilistic inference methods.

0 comments Cited 465 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (iso-abbrev): Bioinformatics

Title: Bioinformatics (Oxford, England)

Publisher: Oxford University Press (OUP)

ISSN (Electronic): 1367-4811

ISSN (Print): 1367-4803

Publication date (Electronic): May 01 2014

Volume: 30

Issue: 9

Affiliations

[1 ] European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.

Article

Publisher Item ID: btu031

DOI: 10.1093/bioinformatics/btu031

PMC ID: 3998142

PubMed ID: 24451626

SO-VID: 2868b700-e75c-43fb-9b9b-7b938fa6e4be

History

Data availability:

Comments

Comment on this article

scite_

Cited by 3,441

See all cited by

Most referenced authors 1,965

See all reference authors

- Version 1
- Version 1

InterProScan 5: genome-scale protein function classification.

Read this article at

Abstract

Related collections

Genome Integrity

Most cited references 8

Gene Ontology: tool for the unification of biology

Reorganizing the protein space at the Universal Protein Resource (UniProt)

A new generation of homology search tools based on probabilistic inference.

Author and article information

Journal

Affiliations

Article

History

Comments

Comment on this article

Similar content 78

Cited by 3,441

Most referenced authors 1,965