Reanalysis of ProteomicsDB Using an Accurate, Sensitive, and Scalable False Discovery Rate Estimation Approach for Protein Groups

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Estimating false discovery rates (FDRs) of protein identification continues to be an important topic in mass spectrometry–based proteomics, particularly when analyzing very large datasets. One performant method for this purpose is the Picked Protein FDR approach which is based on a target-decoy competition strategy on the protein level that ensures that FDRs scale to large datasets. Here, we present an extension to this method that can also deal with protein groups, that is, proteins that share common peptides such as protein isoforms of the same gene. To obtain well-calibrated FDR estimates that preserve protein identification sensitivity, we introduce two novel ideas. First, the picked group target-decoy and second, the rescued subset grouping strategies. Using entrapment searches and simulated data for validation, we demonstrate that the new Picked Protein Group FDR method produces accurate protein group-level FDR estimates regardless of the size of the data set. The validation analysis also uncovered that applying the commonly used Occam’s razor principle leads to anticonservative FDR estimates for large datasets. This is not the case for the Picked Protein Group FDR method. Reanalysis of deep proteomes of 29 human tissues showed that the new method identified up to 4% more protein groups than MaxQuant. Applying the method to the reanalysis of the entire human section of ProteomicsDB led to the identification of 18,000 protein groups at 1% protein group-level FDR. The analysis also showed that about 1250 genes were represented by ≥2 identified protein groups. To make the method accessible to the proteomics community, we provide a software tool including a graphical user interface that enables merging results from multiple MaxQuant searches into a single list of identified and quantified protein groups.

Graphical Abstract

Highlights

•

Evaluating protein group FDR estimation methods with entrapment and simulated data.
•

Accurate & sensitive protein group FDR method on databases with protein isoforms.
•

Tool for combining multiple large-scale MaxQuant searches on protein group-level.
•

Analysis on ProteomicsDB identified >1200 human genes with multiple protein groups.

In Brief

Distinguishing between protein products of a gene is complicated by many peptides that such isoforms have in common. Grouping indistinguishable proteins alleviates this issue but leads to problems with estimating false discovery rates (FDRs) in large-scale experiments as false positives accumulate. Here, protein group FDR estimation methods were evaluated on accuracy and sensitivity. Our new Picked Protein Group FDR method performed best and reanalysis of the draft human proteome in ProteomicsDB found >1200 genes with multiple protein products.

Related collections

Most cited references 35

Record: found
Abstract: found
Article: found

Is Open Access

The PRIDE database and related tools and resources in 2019: improving support for quantification data

Yasset Perez-Riverol, Attila Csordas, Jingwen Bai … (2018)

Abstract The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world’s largest data repository of mass spectrometry-based proteomics data, and is one of the founding members of the global ProteomeXchange (PX) consortium. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2016. In the last 3 years, public data sharing through PRIDE (as part of PX) has definitely become the norm in the field. In parallel, data re-use of public proteomics data has increased enormously, with multiple applications. We first describe the new architecture of PRIDE Archive, the archival component of PRIDE. PRIDE Archive and the related data submission framework have been further developed to support the increase in submitted data volumes and additional data types. A new scalable and fault tolerant storage backend, Application Programming Interface and web interface have been implemented, as a part of an ongoing process. Additionally, we emphasize the improved support for quantitative proteomics data through the mzTab format. At last, we outline key statistics on the current data contents and volume of downloads, and how PRIDE data are starting to be disseminated to added-value resources including Ensembl, UniProt and Expression Atlas.

0 comments Cited 3014 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The MaxQuant computational platform for mass spectrometry-based shotgun proteomics.

Stefka Tyanova, Tikira Temu, Juergen Cox (2016)

MaxQuant is one of the most frequently used platforms for mass-spectrometry (MS)-based proteomics data analysis. Since its first release in 2008, it has grown substantially in functionality and can be used in conjunction with more MS platforms. Here we present an updated protocol covering the most important basic computational workflows, including those designed for quantitative label-free proteomics, MS1-level labeling and isobaric labeling techniques. This protocol presents a complete description of the parameters used in MaxQuant, as well as of the configuration options of its integrated search engine, Andromeda. This protocol update describes an adaptation of an existing protocol that substantially modifies the technique. Important concepts of shotgun proteomics and their implementation in MaxQuant are briefly reviewed, including different quantification strategies and the control of false-discovery rates (FDRs), as well as the analysis of post-translational modifications (PTMs). The MaxQuant output tables, which contain information about quantification of proteins and PTMs, are explained in detail. Furthermore, we provide a short version of the workflow that is applicable to data sets with simple and standard experimental designs. The MaxQuant algorithms are efficiently parallelized on multiple processors and scale well from desktop computers to servers with many cores. The software is written in C# and is freely available at http://www.maxquant.org.

0 comments Cited 1509 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry.

Joshua Elias, Steven Gygi (2007)

Liquid chromatography and tandem mass spectrometry (LC-MS/MS) has become the preferred method for conducting large-scale surveys of proteomes. Automated interpretation of tandem mass spectrometry (MS/MS) spectra can be problematic, however, for a variety of reasons. As most sequence search engines return results even for 'unmatchable' spectra, proteome researchers must devise ways to distinguish correct from incorrect peptide identifications. The target-decoy search strategy represents a straightforward and effective way to manage this effort. Despite the apparent simplicity of this method, some controversy surrounds its successful application. Here we clarify our preferred methodology by addressing four issues based on observed decoy hit frequencies: (i) the major assumptions made with this database search strategy are reasonable; (ii) concatenated target-decoy database searches are preferable to separate target and decoy database searches; (iii) the theoretical error associated with target-decoy false positive (FP) rate measurements can be estimated; and (iv) alternate methods for constructing decoy databases are similarly effective once certain considerations are taken into account.

0 comments Cited 506 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Matthew The

Mathias Wilhelm

Journal

Journal ID (nlm-ta): Mol Cell Proteomics

Journal ID (iso-abbrev): Mol Cell Proteomics

Title: Molecular & Cellular Proteomics : MCP

Publisher: American Society for Biochemistry and Molecular Biology

ISSN (Print): 1535-9476

ISSN (Electronic): 1535-9484

Publication date PMC-release: 01 November 2022

Publication date Collection: December 2022

Publication date (Electronic): 01 November 2022

Volume: 21

Issue: 12

Electronic Location Identifier: 100437

Affiliations

[1 ]Chair of Proteomics and Bioanalytics, Technical University of Munich (TUM), Freising, Germany

[2 ]Bavarian Biomolecular Mass Spectrometry Center (BayBioMS), Technical University of Munich (TUM), Freising, Germany

[3 ]Computational Mass Spectrometry, Technical University of Munich (TUM), Freising, Germany

Author notes

[∗ ]For correspondence: Matthew The; Mathias Wilhelm matthew.the@ 123456tum.de mathias.wilhelm@ 123456tum.de

Article

Publisher Item ID: S1535-9476(22)00245-6 Publisher ID: 100437

DOI: 10.1016/j.mcpro.2022.100437

PMC ID: 9718969

PubMed ID: 36328188

SO-VID: 77f1b441-47aa-4aad-85e7-20cfe9dec4cc

License:

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

History

Date received : 1 June 2022

Date revision received : 16 October 2022

Comments

Comment on this article

scite_

Cited by 8

See all cited by

Reanalysis of ProteomicsDB Using an Accurate, Sensitive, and Scalable False Discovery Rate Estimation Approach for Protein Groups

Read this article at

Abstract

Graphical Abstract

Highlights

In Brief

Related collections

Higher order chromatin architecture

Most cited references 35

The PRIDE database and related tools and resources in 2019: improving support for quantification data

The MaxQuant computational platform for mass spectrometry-based shotgun proteomics.

Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 159

Cited by 8