25
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Untargeted mass spectrometry (MS)-based metabolomics data often contain missing values that reduce statistical power and can introduce bias in biomedical studies. However, a systematic assessment of the various sources of missing values and strategies to handle these data has received little attention. Missing data can occur systematically, e.g. from run day-dependent effects due to limits of detection (LOD); or it can be random as, for instance, a consequence of sample preparation.

          Methods

          We investigated patterns of missing data in an MS-based metabolomics experiment of serum samples from the German KORA F4 cohort (n = 1750). We then evaluated 31 imputation methods in a simulation framework and biologically validated the results by applying all imputation approaches to real metabolomics data. We examined the ability of each method to reconstruct biochemical pathways from data-driven correlation networks, and the ability of the method to increase statistical power while preserving the strength of established metabolic quantitative trait loci.

          Results

          Run day-dependent LOD-based missing data accounts for most missing values in the metabolomics dataset. Although multiple imputation by chained equations performed well in many scenarios, it is computationally and statistically challenging. K-nearest neighbors ( KNN) imputation on observations with variable pre-selection showed robust performance across all evaluation schemes and is computationally more tractable.

          Conclusion

          Missing data in untargeted MS-based metabolomics data occur for various reasons. Based on our results, we recommend that KNN-based imputation is performed on observations with variable pre-selection since it showed robust results in all evaluation schemes.

          Electronic supplementary material

          The online version of this article (10.1007/s11306-018-1420-2) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references23

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Finding and evaluating community structure in networks

          We propose and study a set of algorithms for discovering community structure in networks -- natural divisions of network nodes into densely connected subgroups. Our algorithms all share two definitive features: first, they involve iterative removal of edges from the network to split it into communities, the edges removed being identified using one of a number of possible "betweenness" measures, and second, these measures are, crucially, recalculated after each removal. We also propose a measure for the strength of the community structure found by our algorithms, which gives us an objective metric for choosing the number of communities into which a network should be divided. We demonstrate that our algorithms are highly effective at discovering community structure in both computer-generated and real-world network data, and show how they can be used to shed light on the sometimes dauntingly complex structure of networked systems.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            MetaboAnalyst: a web server for metabolomic data analysis and interpretation

            Metabolomics is a newly emerging field of ‘omics’ research that is concerned with characterizing large numbers of metabolites using NMR, chromatography and mass spectrometry. It is frequently used in biomarker identification and the metabolic profiling of cells, tissues or organisms. The data processing challenges in metabolomics are quite unique and often require specialized (or expensive) data analysis software and a detailed knowledge of cheminformatics, bioinformatics and statistics. In an effort to simplify metabolomic data analysis while at the same time improving user accessibility, we have developed a freely accessible, easy-to-use web server for metabolomic data analysis called MetaboAnalyst. Fundamentally, MetaboAnalyst is a web-based metabolomic data processing tool not unlike many of today's web-based microarray analysis packages. It accepts a variety of input data (NMR peak lists, binned spectra, MS peak lists, compound/concentration data) in a wide variety of formats. It also offers a number of options for metabolomic data processing, data normalization, multivariate statistical analysis, graphing, metabolite identification and pathway mapping. In particular, MetaboAnalyst supports such techniques as: fold change analysis, t-tests, PCA, PLS-DA, hierarchical clustering and a number of more sophisticated statistical or machine learning methods. It also employs a large library of reference spectra to facilitate compound identification from most kinds of input spectra. MetaboAnalyst guides users through a step-by-step analysis pipeline using a variety of menus, information hyperlinks and check boxes. Upon completion, the server generates a detailed report describing each method used, embedded with graphical and tabular outputs. MetaboAnalyst is capable of handling most kinds of metabolomic data and was designed to perform most of the common kinds of metabolomic data analyses. MetaboAnalyst is accessible at http://www.metaboanalyst.ca
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Innovation: Metabolomics: the apogee of the omics trilogy.

              Metabolites, the chemical entities that are transformed during metabolism, provide a functional readout of cellular biochemistry. With emerging technologies in mass spectrometry, thousands of metabolites can now be quantitatively measured from minimal amounts of biological material, which has thereby enabled systems-level analyses. By performing global metabolite profiling, also known as untargeted metabolomics, new discoveries linking cellular pathways to biological mechanism are being revealed and are shaping our understanding of cell biology, physiology and medicine.
                Bookmark

                Author and article information

                Contributors
                +49 89 3187-3578 , g.kastenmueller@helmholtz-muenchen.de
                +49 89 3187-3641 , jan.krumsiek@helmholtz-muenchen.de
                Journal
                Metabolomics
                Metabolomics
                Metabolomics
                Springer US (New York )
                1573-3882
                1573-3890
                20 September 2018
                20 September 2018
                2018
                : 14
                : 10
                : 128
                Affiliations
                [1 ]ISNI 0000 0004 0483 2525, GRID grid.4567.0, Institute of Computational Biology, , Helmholtz-Zentrum München, ; Neuherberg, Germany
                [2 ]ISNI 0000 0004 0483 2525, GRID grid.4567.0, Institute of Epidemiology II, German Research Center for Environmental Health, , Helmholtz Zentrum München, ; Neuherberg, Germany
                [3 ]ISNI 0000 0004 0483 2525, GRID grid.4567.0, Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, , Helmholtz Zentrum München, ; Neuherberg, Germany
                [4 ]GRID grid.452622.5, German Center for Diabetes Research (DZD e.V.), ; Neuherberg, Germany
                [5 ]ISNI 0000 0004 0483 2525, GRID grid.4567.0, Institute of Bioinformatics and Systems Biology, , Helmholtz-Zentrum München, ; Neuherberg, Germany
                [6 ]ISNI 0000 0004 0483 2525, GRID grid.4567.0, Institute of Experimental Genetics, Genome Analysis Center, , Helmholtz Zentrum München, ; Neuherberg, Germany
                [7 ]ISNI 0000000123222966, GRID grid.6936.a, Lehrstuhl für Experimentelle Genetik, , Technische Universität München, ; Freising, Germany
                [8 ]German Center for Cardiovascular Disease Research (DZHK e.V.), Munich, Germany
                [9 ]ISNI 0000 0004 0582 4340, GRID grid.416973.e, Department of Physiology and Biophysics, , Weill Cornell Medical College in Qatar, ; Education City, Doha, Qatar
                [10 ]ISNI 0000 0004 0483 2525, GRID grid.4567.0, Institute of Genetic Epidemiology, , Helmholtz Zentrum München–German Research Center for Environmental Health, ; Neuherberg, Germany
                [11 ]ISNI 0000 0004 1936 973X, GRID grid.5252.0, Chair of Genetic Epidemiology, Institute of Medical Informatics, Biometry and Epidemiology, , Ludwig-Maximilians-University, ; Munich, Germany
                [12 ]ISNI 0000000121885934, GRID grid.5335.0, MRC Epidemiology Unit, , University of Cambridge, ; Cambridge, UK
                [13 ]ISNI 0000000123222966, GRID grid.6936.a, Department of Mathematics, , Technische Universität München, ; Garching, Germany
                [14 ]ISNI 000000041936877X, GRID grid.5386.8, Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Department of Physiology and Biophysics, , Weill Cornell Medicine, ; New York, USA
                Article
                1420
                10.1007/s11306-018-1420-2
                6153696
                30830398
                76ed0a61-0621-4690-9332-e3b87988825d
                © The Author(s) 2018

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

                History
                : 11 April 2018
                : 24 August 2018
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/501100002347, Bundesministerium für Bildung und Forschung;
                Award ID: 01ZX1313C
                Award ID: 03IS2061B
                Award Recipient :
                Funded by: European Union’s Seventh Framework Programme [FP7-Health-F5-2012]
                Award ID: 305280
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/501100000781, European Research Council;
                Award ID: LatentCauses
                Award Recipient :
                Funded by: Weill Cornell Medical College Qatar
                Award ID: Biomedical Research Program funds
                Award Recipient :
                Funded by: German Research Center for Environmental Health
                Funded by: FundRef http://dx.doi.org/10.13039/501100000265, Medical Research Council;
                Award ID: MC_PC_13048
                Award ID: MC_UU_12015/1
                Categories
                Original Article
                Custom metadata
                © Springer Science+Business Media, LLC, part of Springer Nature 2018

                Molecular biology
                untargeted metabolomics,missing values imputation,limit of detection,batch effects,mice,k-nearest neighbor,mass spectrometry

                Comments

                Comment on this article

                scite_
                0
                0
                0
                0
                Smart Citations
                0
                0
                0
                0
                Citing PublicationsSupportingMentioningContrasting
                View Citations

                See how this article has been cited at scite.ai

                scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

                Similar content200

                Cited by84

                Most referenced authors477