Inparanoid: a comprehensive database of eukaryotic orthologs

O'Brien, Kevin P.; Remm, Maido; Sonnhammer, Erik L. L.

doi:10.1093/nar/gki107

ScienceOpen: research and publishing network

For Publishers

For Researchers

Blog
About

Search
Advanced search

views

recommends

Record: found
Abstract: found
Article: not found

Inparanoid: a comprehensive database of eukaryotic orthologs

research-article

Author(s): Kevin P. O'Brien , Maido Remm ¹ , Erik L. L. Sonnhammer

Publication date (Electronic): 17 December 2004

Journal: Nucleic Acids Research

Publisher: Oxford University Press

Read this article at

ScienceOpenPublisher PMC

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The Inparanoid eukaryotic ortholog database ( http://inparanoid.cgb.ki.se/) is a collection of pairwise ortholog groups between 17 whole genomes; Anopheles gambiae, Caenorhabditis briggsae, Caenorhabditis elegans, Drosophila melanogaster, Danio rerio, Takifugu rubripes, Gallus gallus, Homo sapiens, Mus musculus, Pan troglodytes, Rattus norvegicus, Oryza sativa, Plasmodium falciparum, Arabidopsis thaliana, Escherichia coli, Saccharomyces cerevisiae and Schizosaccharomyces pombe. Complete proteomes for these genomes were derived from Ensembl and UniProt and compared pairwise using Blast, followed by a clustering step using the Inparanoid program. An Inparanoid cluster is seeded by a reciprocally best-matching ortholog pair, around which inparalogs (should they exist) are gathered independently, while outparalogs are excluded. The ortholog clusters can be searched on the website using Ensembl gene/protein or UniProt identifiers, annotation text or by Blast alignment against our protein datasets. The entire dataset can be downloaded, as can the Inparanoid program itself.

Related collections

Most cited references 7

Record: found
Abstract: found
Article: not found

The Bioperl toolkit: Perl modules for the life sciences.

Jason E Stajich, David Block, Kris Boulez … (2002)

The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into the most comprehensive library of Perl modules available for managing and manipulating life-science information. Bioperl provides an easy-to-use, stable, and consistent programming interface for bioinformatics application programmers. The Bioperl modules have been successfully and repeatedly used to reduce otherwise complex tasks to only a few lines of code. The Bioperl object model has been proven to be flexible enough to support enterprise-level applications such as EnsEMBL, while maintaining an easy learning curve for novice Perl programmers. Bioperl is capable of executing analyses and processing results from programs such as BLAST, ClustalW, or the EMBOSS suite. Interoperation with modules written in Python and Java is supported through the evolving BioCORBA bridge. Bioperl provides access to data stores such as GenBank and SwissProt via a flexible series of sequence input/output modules, and to the emerging common sequence data storage format of the Open Bioinformatics Database Access project. This study describes the overall architecture of the toolkit, the problem domains that it addresses, and gives specific examples of how the toolkit can be used to solve common life-sciences problems. We conclude with a discussion of how the open-source nature of the project has contributed to the development effort.

0 comments Cited 717 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Automatic clustering of orthologs and in-paralogs from pairwise species comparisons.

M Remm, C. E. V. Storm, E. Sonnhammer (2001)

Orthologs are genes in different species that originate from a single gene in the last common ancestor of these species. Such genes have often retained identical biological roles in the present-day organisms. It is hence important to identify orthologs for transferring functional information between genes in different organisms with a high degree of reliability. For example, orthologs of human proteins are often functionally characterized in model organisms. Unfortunately, orthology analysis between human and e.g. invertebrates is often complex because of large numbers of paralogs within protein families. Paralogs that predate the species split, which we call out-paralogs, can easily be confused with true orthologs. Paralogs that arose after the species split, which we call in-paralogs, however, are bona fide orthologs by definition. Orthologs and in-paralogs are typically detected with phylogenetic methods, but these are slow and difficult to automate. Automatic clustering methods based on two-way best genome-wide matches on the other hand, have so far not separated in-paralogs from out-paralogs effectively. We present a fully automatic method for finding orthologs and in-paralogs from two species. Ortholog clusters are seeded with a two-way best pairwise match, after which an algorithm for adding in-paralogs is applied. The method bypasses multiple alignments and phylogenetic trees, which can be slow and error-prone steps in classical ortholog detection. Still, it robustly detects complex orthologous relationships and assigns confidence values for both orthologs and in-paralogs. The program, called INPARANOID, was tested on all completely sequenced eukaryotic genomes. To assess the quality of INPARANOID results, ortholog clusters were generated from a dataset of worm and mammalian transmembrane proteins, and were compared to clusters derived by manual tree-based ortholog detection methods. This study led to the identification with a high degree of confidence of over a dozen novel worm-mammalian ortholog assignments that were previously undetected because of shortcomings of phylogenetic methods.A WWW server that allows searching for orthologs between human and several fully sequenced genomes is installed at http://www.cgb.ki.se/inparanoid/. This is the first comprehensive resource with orthologs of all fully sequenced eukaryotic genomes. Programs and tables of orthology assignments are available from the same location. Copyright 2001 Academic Press.

0 comments Cited 504 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Orthology, paralogy and proposed classification for paralog subtypes.

Erik L.L. Sonnhammer, Eugene V. Koonin (2002)

0 comments Cited 162 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Nucleic Acids Res

Journal ID (iso-abbrev): Nucleic Acids Res

Title: Nucleic Acids Research

Publisher: Oxford University Press

ISSN (Print): 0305-1048

ISSN (Electronic): 1362-4962

Publication date (Print): 1 January 2005

Publication date (Electronic): 17 December 2004

Volume: 33

Issue: Database Issue

Pages: D476-D480

Affiliations

Center for Genomics and Bioinformatics, Karolinska Institutet, S-171 77 Stockholm, Sweden and [1 ]Estonian Biocentre and Department of Bioinformatics, Institute of Molecular and Cell Biology, University of Tartu, Estonia

Author notes

[*]

To whom correspondence should be addressed. Tel: +46 8 52486395; Fax: +46 8 337983; Email: Erik.Sonnhammer@ 123456cgb.ki.se

[a]

The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use permissions, please contact journals.permissions@ 123456oupjournals.org .

[a]

Article

Publisher ID: gki107

DOI: 10.1093/nar/gki107

PMC ID: 540061

PubMed ID: 15608241

SO-VID: 24836483-8682-4d86-87ba-81350a74e314

History

Date received : 15 August 2004

Date revision received : 18 October 2004

Date accepted : 18 October 2004

Comments

Comment on this article

scite_

Cited by 291

See all cited by

Most referenced authors 1,616

See all reference authors

- Version 1

Inparanoid: a comprehensive database of eukaryotic orthologs

Read this article at

Abstract

Related collections

G3: Genes|Genomes|Genetics

Most cited references 7

The Bioperl toolkit: Perl modules for the life sciences.

Automatic clustering of orthologs and in-paralogs from pairwise species comparisons.

Orthology, paralogy and proposed classification for paralog subtypes.

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 123

Cited by 291

Most referenced authors 1,616