2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Imputing missing values is common practice in label-free quantitative proteomics. Imputation aims at replacing a missing value with a user-defined one. However, the imputation itself may not be optimally considered downstream of the imputation process, as imputed datasets are often considered as if they had always been complete. Hence, the uncertainty due to the imputation is not adequately taken into account. We provide a rigorous multiple imputation strategy, leading to a less biased estimation of the parameters’ variability thanks to Rubin’s rules. The imputation-based peptide’s intensities’ variance estimator is then moderated using Bayesian hierarchical models. This estimator is finally included in moderated t-test statistics to provide differential analyses results. This workflow can be used both at peptide and protein-level in quantification datasets. Indeed, an aggregation step is included for protein-level results based on peptide-level quantification data. Our methodology, named mi4p, was compared to the state-of-the-art limma workflow implemented in the DAPAR R package, both on simulated and real datasets. We observed a trade-off between sensitivity and specificity, while the overall performance of mi4p outperforms DAPAR in terms of F-Score.

          Author summary

          Statistical inference methods commonly used in quantitative proteomics are based on the measurement of peptide intensities. They allow the deduction of protein abundances provided that sufficient peptides per protein are available. However, they do not satisfactorily consider peptides or proteins whose intensities are missing under certain conditions, even though they are particularly interesting from a biological or medical point of view, since they may explain a difference between the groups being compared. Some state-of-the-art statistical proteomics data processing software proposes to impute these missing values, while others simply remove proteins with too many missing peptides. The statistical treatment is not entirely satisfactory when imputation methods are used, notably multiple imputation techniques. Indeed, even if these statistical tools are relevant in this context, the data sets once imputed are considered as having always been complete in the subsequent analyses: the uncertainty caused by the imputation is not taken into account. These analyses generally conclude with a study of the differences in protein abundances between the different conditions, either using Student’s or Welch’s test for the most rudimentary approaches or using the t-tempered testing techniques based on empirical Bayesian approaches. Thus, we propose a new methodology that starts by imputing missing values at the peptide level and estimating the uncertainty associated with this imputation and naturally extends by incorporating this uncertainty into the current moderated variance estimation techniques.

          Related collections

          Most cited references33

          • Record: found
          • Abstract: not found
          • Article: not found

          Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            mice: Multivariate Imputation by Chained Equations inR

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The Perseus computational platform for comprehensive analysis of (prote)omics data.

              A main bottleneck in proteomics is the downstream biological analysis of highly multivariate quantitative protein abundance data generated using mass-spectrometry-based analysis. We developed the Perseus software platform (http://www.perseus-framework.org) to support biological and biomedical researchers in interpreting protein quantification, interaction and post-translational modification data. Perseus contains a comprehensive portfolio of statistical tools for high-dimensional omics data analysis covering normalization, pattern recognition, time-series analysis, cross-omics comparisons and multiple-hypothesis testing. A machine learning module supports the classification and validation of patient groups for diagnosis and prognosis, and it also detects predictive protein signatures. Central to Perseus is a user-friendly, interactive workflow environment that provides complete documentation of computational methods used in a publication. All activities in Perseus are realized as plugins, and users can extend the software by programming their own, which can be shared through a plugin store. We anticipate that Perseus's arsenal of algorithms and its intuitive usability will empower interdisciplinary analysis of complex large data sets.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: ResourcesRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Funding acquisitionRole: Project administrationRole: SupervisionRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Funding acquisitionRole: MethodologyRole: Project administrationRole: SoftwareRole: SupervisionRole: Writing – original draftRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS Comput Biol
                PLoS Comput Biol
                plos
                PLoS Computational Biology
                Public Library of Science (San Francisco, CA USA )
                1553-734X
                1553-7358
                August 2022
                29 August 2022
                : 18
                : 8
                : e1010420
                Affiliations
                [1 ] Institut de Recherche Mathématique Avancée, UMR 7501, CNRS-Université de Strasbourg, Strasbourg, France
                [2 ] Laboratoire de Spectrométrie de Masse Bio-Organique, Institut Pluridisciplinaire Hubert Curien, UMR 7178, CNRS-Université de Strasbourg, Strasbourg, France
                [3 ] Laboratoire Mathématiques appliquées à Paris 5, UMR 8145, CNRS-Université Paris Cité, Paris, France
                [4 ] Infrastructure Nationale de Protéomique ProFi - FR2048, 67087 Strasbourg, France
                [5 ] Laboratoire Informatique et Société Numérique, Université de Technologie de Troyes, Troyes, France
                University of California San Diego, UNITED STATES
                Author notes

                The authors have declared that no competing interests exist.

                Author information
                https://orcid.org/0000-0001-8956-8388
                https://orcid.org/0000-0002-0079-319X
                https://orcid.org/0000-0002-0837-8281
                Article
                PCOMPBIOL-D-22-00385
                10.1371/journal.pcbi.1010420
                9462777
                36037245
                ebfe04f7-0272-4d54-8033-aa431be32e58
                © 2022 Chion et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 11 March 2022
                : 21 July 2022
                Page count
                Figures: 11, Tables: 6, Pages: 26
                Funding
                Funded by: Agence Nationale de la Recherche (FR)
                Award ID: ANR-11-LABX-0055_IRMIA
                Award Recipient :
                This work was funded through a PhD grant (2018-2021) awarded to MC and received by FB and CC from the Agence Nationale de la Recherche (ANR) through the Labex IRMIA [ANR-11-LABX-0055 IRMIA]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Research and Analysis Methods
                Mathematical and Statistical Techniques
                Statistical Methods
                Maximum Likelihood Estimation
                Physical Sciences
                Mathematics
                Statistics
                Statistical Methods
                Maximum Likelihood Estimation
                Research and Analysis Methods
                Animal Studies
                Experimental Organism Systems
                Model Organisms
                Arabidopsis Thaliana
                Research and Analysis Methods
                Model Organisms
                Arabidopsis Thaliana
                Biology and Life Sciences
                Organisms
                Eukaryota
                Plants
                Brassica
                Arabidopsis Thaliana
                Research and Analysis Methods
                Animal Studies
                Experimental Organism Systems
                Plant and Algal Models
                Arabidopsis Thaliana
                Research and Analysis Methods
                Animal Studies
                Experimental Organism Systems
                Model Organisms
                Saccharomyces Cerevisiae
                Research and Analysis Methods
                Model Organisms
                Saccharomyces Cerevisiae
                Biology and Life Sciences
                Organisms
                Eukaryota
                Fungi
                Yeast
                Saccharomyces
                Saccharomyces Cerevisiae
                Research and Analysis Methods
                Animal Studies
                Experimental Organism Systems
                Yeast and Fungal Models
                Saccharomyces Cerevisiae
                Research and Analysis Methods
                Database and Informatics Methods
                Biological Databases
                Proteomic Databases
                Biology and Life Sciences
                Biochemistry
                Proteomics
                Proteomic Databases
                Research and Analysis Methods
                Simulation and Modeling
                Computer and Information Sciences
                Software Engineering
                Computer Software
                Engineering and Technology
                Software Engineering
                Computer Software
                Research and Analysis Methods
                Research Design
                Experimental Design
                Research and Analysis Methods
                Mathematical and Statistical Techniques
                Statistical Methods
                Multivariate Analysis
                Principal Component Analysis
                Physical Sciences
                Mathematics
                Statistics
                Statistical Methods
                Multivariate Analysis
                Principal Component Analysis
                Custom metadata
                vor-update-to-uncorrected-proof
                2022-09-09
                Our mi4p algorithm is implemented under the R environment in the mi4p package that is publicly available on the CRAN. The development version, as well as the R scripts which led to the results presented, can also be found on a GitHub repository ( https://github.com/mariechion/mi4p). The spiked yeast dataset and the Arabidopsis thaliana spiked dataset are public and accessible on the ProteomeXchange website using the identifiers PXD003841 and PXD027800.

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article