2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Spectral Prediction Features as a Solution for the Search Space Size Problem in Proteogenomics

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Proteogenomics approaches often struggle with the distinction between true and false peptide-to-spectrum matches as the database size enlarges. However, features extracted from tandem mass spectrometry intensity predictors can enhance the peptide identification rate and can provide extra confidence for peptide-to-spectrum matching in a proteogenomics context. To that end, features from the spectral intensity pattern predictors MS 2PIP and Prosit were combined with the canonical scores from MaxQuant in the Percolator postprocessing tool for protein sequence databases constructed out of ribosome profiling and nanopore RNA-Seq analyses. The presented results provide evidence that this approach enhances both the identification rate as well as the validation stringency in a proteogenomic setting.

          Graphical abstract

          Highlights

          • First proteogenomics with PSM rescoring using machine learning–predicted spectra

          • Demonstrated on both ribosome profiling and nanopore RNA-Seq–derived databases

          • Rescoring leads to elevated stringency and increased identification rates

          • Rescoring compensates for the search space size issues in proteogenomics

          In Brief

          Proteogenomics suffers from statistical issues as the sequencing information inflates the database size. To compensate for this, rescoring with the machine learning–based spectrum predictors MS 2PIP and Prosit was implemented in a proteogenomics approach. This was demonstrated for both ribosome profiling and nanopore RNA-Seq derived databases. Postprocessing with Percolator showed that these techniques result in recovered and often even elevated stringency levels and identification rates. In this way, it allows to validate novel proteoforms through proteogenomics with unsurpassed confidence levels.

          Related collections

          Most cited references70

          • Record: found
          • Abstract: found
          • Article: not found

          Minimap2: pairwise alignment for nucleotide sequences

          Heng Li (2018)
          Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 Mb in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Salmon: fast and bias-aware quantification of transcript expression using dual-phase inference

            We introduce Salmon, a method for quantifying transcript abundance from RNA-seq reads that is accurate and fast. Salmon is the first transcriptome-wide quantifier to correct for fragment GC content bias, which we demonstrate substantially improves the accuracy of abundance estimates and the reliability of subsequent differential expression analysis. Salmon combines a new dual-phase parallel inference algorithm and feature-rich bias models with an ultra-fast read mapping procedure.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification.

              Efficient analysis of very large amounts of raw data for peptide identification and protein quantification is a principal challenge in mass spectrometry (MS)-based proteomics. Here we describe MaxQuant, an integrated suite of algorithms specifically developed for high-resolution, quantitative MS data. Using correlation analysis and graph theory, MaxQuant detects peaks, isotope clusters and stable amino acid isotope-labeled (SILAC) peptide pairs as three-dimensional objects in m/z, elution time and signal intensity space. By integrating multiple mass measurements and correcting for linear and nonlinear mass offsets, we achieve mass accuracy in the p.p.b. range, a sixfold increase over standard techniques. We increase the proportion of identified fragmentation spectra to 73% for SILAC peptide pairs via unambiguous assignment of isotope and missed-cleavage state and individual mass precision. MaxQuant automatically quantifies several hundred thousand peptides per SILAC-proteome experiment and allows statistically robust identification and quantification of >4,000 proteins in mammalian cell lysates.
                Bookmark

                Author and article information

                Contributors
                Journal
                Mol Cell Proteomics
                Mol Cell Proteomics
                Molecular & Cellular Proteomics : MCP
                American Society for Biochemistry and Molecular Biology
                1535-9476
                1535-9484
                03 April 2021
                2021
                03 April 2021
                : 20
                : 100076
                Affiliations
                [1 ]BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
                [2 ]OHMX.bio, Ghent, Belgium
                [3 ]Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
                [4 ]Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
                [5 ]VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
                Author notes
                []For correspondence: Gerben Menschaert Gerben.Menschaert@ 123456UGent.be
                Article
                S1535-9476(21)00049-9 100076
                10.1016/j.mcpro.2021.100076
                8214147
                33823297
                6e6be7f3-452a-48b3-9581-5af4c3a1bc58
                © 2021 The Authors

                This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

                History
                : 27 November 2020
                : 4 March 2021
                Categories
                Technological Innovation and Resources

                Molecular biology
                proteogenomics,spectrum predictor,nanopore sequencing,ribosome profiling,rna-seq,intensity features,machine learning,proteoform,random forest,deep learning,cdna, complementary dna,fdr, false discovery rate,ms/ms, tandem mass spectrometry,pep, posterior error probability,psm, peptide-to-spectrum match

                Comments

                Comment on this article