Reorganizing the protein space at the Universal Protein Resource (UniProt).

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The mission of UniProt is to support biological research by providing a freely accessible, stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces. UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. A key development at UniProt is the provision of complete, reference and representative proteomes. UniProt is updated and distributed every 4 weeks and can be accessed online for searches or download at http://www.uniprot.org.

Related collections

Most cited references 11

Record: found
Abstract: found
Article: not found

UniRef: comprehensive and non-redundant UniProt reference clusters.

Baris Suzek, Hongzhan Huang, Peter McGarvey … (2007)

Redundant protein sequences in biological databases hinder sequence similarity searches and make interpretation of search results difficult. Clustering of protein sequence space based on sequence similarity helps organize all sequences into manageable datasets and reduces sampling bias and overrepresentation of sequences. The UniRef (UniProt Reference Clusters) provide clustered sets of sequences from the UniProt Knowledgebase (UniProtKB) and selected UniProt Archive records to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences. Currently covering >4 million source sequences, the UniRef100 database combines identical sequences and subfragments from any source organism into a single UniRef entry. UniRef90 and UniRef50 are built by clustering UniRef100 sequences at the 90 or 50% sequence identity levels. UniRef100, UniRef90 and UniRef50 yield a database size reduction of approximately 10, 40 and 70%, respectively, from the source sequence set. The reduced redundancy increases the speed of similarity searches and improves detection of distant relationships. UniRef entries contain summary cluster and membership information, including the sequence of a representative protein, member count and common taxonomy of the cluster, the accession numbers of all the merged entries and links to rich functional annotation in UniProtKB to facilitate biological discovery. UniRef has already been applied to broad research areas ranging from genome annotation to proteomics data analysis. UniRef is updated biweekly and is available for online search and retrieval at http://www.uniprot.org, as well as for download at ftp://ftp.uniprot.org/pub/databases/uniprot/uniref. Supplementary data are available at Bioinformatics online.

0 comments Cited 593 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Ensembl 2009

T. J. P. Hubbard, B. L. Aken, S. Ayling … (2009)

The Ensembl project (http://www.ensembl.org) is a comprehensive genome information system featuring an integrated set of genome annotation, databases, and other information for chordate, selected model organism and disease vector genomes. As of release 51 (November 2008), Ensembl fully supports 45 species, and three additional species have preliminary support. New species in the past year include orangutan and six additional low coverage mammalian genomes. Major additions and improvements to Ensembl since our previous report include a major redesign of our website; generation of multiple genome alignments and ancestral sequences using the new Enredo-Pecan-Ortheus pipeline and development of our software infrastructure, particularly to support the Ensembl Genomes project (http://www.ensemblgenomes.org/).

0 comments Cited 366 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The International Protein Index: an integrated database for proteomics experiments.

Paul Kersey, Jorge Duarte, Allyson Williams … (2004)

Despite the complete determination of the genome sequence of several higher eukaryotes, their proteomes remain relatively poorly defined. Information about proteins identified by different experimental and computational methods is stored in different databases, meaning that no single resource offers full coverage of known and predicted proteins. IPI (the International Protein Index) has been developed to address these issues and offers complete nonredundant data sets representing the human, mouse and rat proteomes, built from the Swiss-Prot, TrEMBL, Ensembl and RefSeq databases.

0 comments Cited 191 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (iso-abbrev): Nucleic Acids Res

Title: Nucleic acids research

Publisher: Oxford University Press (OUP)

ISSN (Electronic): 1362-4962

ISSN (Print): 0305-1048

Publication date (Electronic): Jan 2012

Volume: 40

Issue: Database issue

Affiliations

[1 ] The EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

Article

Publisher Item ID: gkr981

DOI: 10.1093/nar/gkr981

PMC ID: 3245120

PubMed ID: 22102590

SO-VID: 9b17224a-97d3-47c4-b609-eff68ea62660

History

Data availability:

Reorganizing the protein space at the Universal Protein Resource (UniProt).

Read this article at

Abstract

Related collections

Resource Identification

Most cited references 11

UniRef: comprehensive and non-redundant UniProt reference clusters.

Ensembl 2009

The International Protein Index: an integrated database for proteomics experiments.

Author and article information

Journal

Affiliations

Article

History

Comments

Comment on this article

Similar content 132

Cited by 442