87
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Partial Least Squares-Discriminant Analysis (PLS-DA) is a PLS regression method with a special binary ‘dummy’ y-variable and it is commonly used for classification purposes and biomarker selection in metabolomics studies. Several statistical approaches are currently in use to validate outcomes of PLS-DA analyses e.g. double cross validation procedures or permutation testing. However, there is a great inconsistency in the optimization and the assessment of performance of PLS-DA models due to many different diagnostic statistics currently employed in metabolomics data analyses. In this paper, properties of four diagnostic statistics of PLS-DA, namely the number of misclassifications ( NMC), the Area Under the Receiver Operating Characteristic ( AUROC), Q 2 and Discriminant Q 2 ( DQ 2) are discussed. All four diagnostic statistics are used in the optimization and the performance assessment of PLS-DA models of three different-size metabolomics data sets obtained with two different types of analytical platforms and with different levels of known differences between two groups: control and case groups. Statistical significance of obtained PLS-DA models was evaluated with permutation testing. PLS-DA models obtained with NMC and AUROC are more powerful in detecting very small differences between groups than models obtained with Q 2 and Discriminant Q 2 ( DQ 2). Reproducibility of obtained PLS-DA models outcomes, models complexity and permutation test distributions are also investigated to explain this phenomenon. DQ 2 and Q 2 (in contrary to NMC and AUROC) prefer PLS-DA models with lower complexity and require higher number of permutation tests and submodels to accurately estimate statistical significance of the model performance. NMC and AUROC seem more efficient and more reliable diagnostic statistics and should be recommended in two group discrimination metabolomic studies.

          Electronic supplementary material

          The online version of this article (doi:10.1007/s11306-011-0330-3) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references24

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          MetaboAnalyst: a web server for metabolomic data analysis and interpretation

          Metabolomics is a newly emerging field of ‘omics’ research that is concerned with characterizing large numbers of metabolites using NMR, chromatography and mass spectrometry. It is frequently used in biomarker identification and the metabolic profiling of cells, tissues or organisms. The data processing challenges in metabolomics are quite unique and often require specialized (or expensive) data analysis software and a detailed knowledge of cheminformatics, bioinformatics and statistics. In an effort to simplify metabolomic data analysis while at the same time improving user accessibility, we have developed a freely accessible, easy-to-use web server for metabolomic data analysis called MetaboAnalyst. Fundamentally, MetaboAnalyst is a web-based metabolomic data processing tool not unlike many of today's web-based microarray analysis packages. It accepts a variety of input data (NMR peak lists, binned spectra, MS peak lists, compound/concentration data) in a wide variety of formats. It also offers a number of options for metabolomic data processing, data normalization, multivariate statistical analysis, graphing, metabolite identification and pathway mapping. In particular, MetaboAnalyst supports such techniques as: fold change analysis, t-tests, PCA, PLS-DA, hierarchical clustering and a number of more sophisticated statistical or machine learning methods. It also employs a large library of reference spectra to facilitate compound identification from most kinds of input spectra. MetaboAnalyst guides users through a step-by-step analysis pipeline using a variety of menus, information hyperlinks and check boxes. Upon completion, the server generates a detailed report describing each method used, embedded with graphical and tabular outputs. MetaboAnalyst is capable of handling most kinds of metabolomic data and was designed to perform most of the common kinds of metabolomic data analyses. MetaboAnalyst is accessible at http://www.metaboanalyst.ca
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments.

            One of the main objectives in the analysis of microarray experiments is the identification of genes that are differentially expressed under two experimental conditions. This task is complicated by the noisiness of the data and the large number of genes that are examined simultaneously. Here, we present a novel technique for identifying differentially expressed genes that does not originate from a sophisticated statistical model but rather from an analysis of biological reasoning. The new technique, which is based on calculating rank products (RP) from replicate experiments, is fast and simple. At the same time, it provides a straightforward and statistically stringent way to determine the significance level for each gene and allows for the flexible control of the false-detection rate and familywise error rate in the multiple testing situation of a microarray experiment. We use the RP technique on three biological data sets and show that in each case it performs more reliably and consistently than the non-parametric t-test variant implemented in Tusher et al.'s significance analysis of microarrays (SAM). We also show that the RP results are reliable in highly noisy data. An analysis of the physiological function of the identified genes indicates that the RP approach is powerful for identifying biologically relevant expression changes. In addition, using RP can lead to a sharp reduction in the number of replicate experiments needed to obtain reproducible results.
              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              The relationship between Precision-Recall and ROC curves

                Bookmark

                Author and article information

                Contributors
                +31-20-5256546 , +31-20-5256971 , J.A.Westerhuis@uva.nl
                Journal
                Metabolomics
                Metabolomics
                Metabolomics
                Springer US (Boston )
                1573-3882
                1573-3890
                8 July 2011
                8 July 2011
                June 2012
                : 8
                : Suppl 1
                : 3-16
                Affiliations
                [1 ]Netherlands Metabolomics Centre, Einsteinweg 55, 2333 CC Leiden, The Netherlands
                [2 ]Biosystems Data Analysis, Swammerdam Institute for Life Sciences, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands
                Article
                330
                10.1007/s11306-011-0330-3
                3337399
                22593721
                196a5549-fcac-43a6-a780-45c41c7b23a1
                © The Author(s) 2011
                History
                : 7 March 2011
                : 23 June 2011
                Categories
                Original Article
                Custom metadata
                © Springer Science+Business Media, LLC 2012

                Molecular biology
                metabolomics,q2,auroc,pls-da,life sciences,developmental biology,cell biology,biochemistry, general,misclassifications,molecular medicine,dq2,diagnostic statistics,biomedicine general

                Comments

                Comment on this article