Pfam: The protein families database in 2021.

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new families have been added in Pfam 33.1 and numerous improvements have been made to existing entries. To facilitate research on COVID-19, we have revised the Pfam entries that cover the SARS-CoV-2 proteome, and built new entries for regions that were not covered by Pfam. We have reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family. The new Pfam-B is based on a clustering by the MMseqs2 software. We have compared all of the regions in the RepeatsDB to those in Pfam and have started to use the results to build and refine Pfam repeat families. Pfam is freely available for browsing and download at http://pfam.xfam.org/.

Related collections

Most cited references 26

Record: found
Abstract: found
Article: found

Is Open Access

UniProt: a worldwide hub of protein knowledge

(2018)

Abstract The UniProt Knowledgebase is a collection of sequences and annotations for over 120 million proteins across all branches of life. Detailed annotations extracted from the literature by expert curators have been collected for over half a million of these proteins. These annotations are supplemented by annotations provided by rule based automated systems, and those imported from other resources. In this article we describe significant updates that we have made over the last 2 years to the resource. We have greatly expanded the number of Reference Proteomes that we provide and in particular we have focussed on improving the number of viral Reference Proteomes. The UniProt website has been augmented with new data visualizations for the subcellular localization of proteins as well as their structure and interactions. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.

0 comments Cited 2677 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

The Pfam protein families database in 2019

Sara El-Gebali, Jaina Mistry, Alex Bateman … (2018)

Abstract The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). The number of families has grown substantially to a total of 17,929 in release 32.0. New additions have been coupled with efforts to improve existing families, including refinement of domain boundaries, their classification into Pfam clans, as well as their functional annotation. We recently began to collaborate with the RepeatsDB resource to improve the definition of tandem repeat families within Pfam. We carried out a significant comparison to the structural classification database, namely the Evolutionary Classification of Protein Domains (ECOD) that led to the creation of 825 new families based on their set of uncharacterized families (EUFs). Furthermore, we also connected Pfam entries to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms. Since Pfam has many community contributors, we recently enabled the linking between authorship of all Pfam entries with the corresponding authors’ ORCID identifiers. This effectively permits authors to claim credit for their Pfam curation and link them to their ORCID record.

0 comments Cited 1612 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Protein homology detection by HMM-HMM comparison.

Johannes Söding (2005)

Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction and evolution. We have generalized the alignment of protein sequences with a profile hidden Markov model (HMM) to the case of pairwise alignment of profile HMMs. We present a method for detecting distant homologous relationships between proteins based on this approach. The method (HHsearch) is benchmarked together with BLAST, PSI-BLAST, HMMER and the profile-profile comparison tools PROF_SIM and COMPASS, in an all-against-all comparison of a database of 3691 protein domains from SCOP 1.63 with pairwise sequence identities below 20%.Sensitivity: When the predicted secondary structure is included in the HMMs, HHsearch is able to detect between 2.7 and 4.2 times more homologs than PSI-BLAST or HMMER and between 1.44 and 1.9 times more than COMPASS or PROF_SIM for a rate of false positives of 10%. Approximately half of the improvement over the profile-profile comparison methods is attributable to the use of profile HMMs in place of simple profiles. Alignment quality: Higher sensitivity is mirrored by an increased alignment quality. HHsearch produced 1.2, 1.7 and 3.3 times more good alignments ('balanced' score >0.3) than the next best method (COMPASS), and 1.6, 2.9 and 9.4 times more than PSI-BLAST, at the family, superfamily and fold level, respectively.Speed: HHsearch scans a query of 200 residues against 3691 domains in 33 s on an AMD64 2GHz PC. This is 10 times faster than PROF_SIM and 17 times faster than COMPASS.

0 comments Cited 982 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (iso-abbrev): Nucleic Acids Res

Title: Nucleic acids research

Publisher: Oxford University Press (OUP)

ISSN (Electronic): 1362-4962

ISSN (Print): 0305-1048

Publication date (Electronic): January 08 2021

Volume: 49

Issue: D1

Affiliations

[1 ] European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK.

[2 ] Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden.

[3 ] Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy.

Article

Publisher Item ID: 5943818

DOI: 10.1093/nar/gkaa913

PMC ID: 7779014

PubMed ID: 33125078

SO-VID: db8cfe38-db7e-4f1b-855c-33112323e1b6

History

Data availability:

Comments

Comment on this article

scite_

Cited by 1,964

See all cited by

Most referenced authors 939

See all reference authors

- Version 1

Pfam: The protein families database in 2021.

Read this article at

Abstract

Related collections

Novel Coronavirus Disease COVID-19

Most cited references 26

UniProt: a worldwide hub of protein knowledge

The Pfam protein families database in 2019

Protein homology detection by HMM-HMM comparison.

Author and article information

Journal

Affiliations

Article

History

Comments

Comment on this article

Similar content 456

Cited by 1,964

Most referenced authors 939