Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Untargeted mass spectrometry (MS)-based metabolomics data often contain missing values that reduce statistical power and can introduce bias in biomedical studies. However, a systematic assessment of the various sources of missing values and strategies to handle these data has received little attention. Missing data can occur systematically, e.g. from run day-dependent effects due to limits of detection (LOD); or it can be random as, for instance, a consequence of sample preparation.

Methods

We investigated patterns of missing data in an MS-based metabolomics experiment of serum samples from the German KORA F4 cohort (n = 1750). We then evaluated 31 imputation methods in a simulation framework and biologically validated the results by applying all imputation approaches to real metabolomics data. We examined the ability of each method to reconstruct biochemical pathways from data-driven correlation networks, and the ability of the method to increase statistical power while preserving the strength of established metabolic quantitative trait loci.

Results

Run day-dependent LOD-based missing data accounts for most missing values in the metabolomics dataset. Although multiple imputation by chained equations performed well in many scenarios, it is computationally and statistically challenging. K-nearest neighbors ( KNN) imputation on observations with variable pre-selection showed robust performance across all evaluation schemes and is computationally more tractable.

Conclusion

Missing data in untargeted MS-based metabolomics data occur for various reasons. Based on our results, we recommend that KNN-based imputation is performed on observations with variable pre-selection since it showed robust results in all evaluation schemes.

Electronic supplementary material

The online version of this article (10.1007/s11306-018-1420-2) contains supplementary material, which is available to authorized users.

Related collections

Most cited references 23

Record: found
Abstract: found
Article: found

Is Open Access

Finding and evaluating community structure in networks

M. Newman, M Girvan (2003)

We propose and study a set of algorithms for discovering community structure in networks -- natural divisions of network nodes into densely connected subgroups. Our algorithms all share two definitive features: first, they involve iterative removal of edges from the network to split it into communities, the edges removed being identified using one of a number of possible "betweenness" measures, and second, these measures are, crucially, recalculated after each removal. We also propose a measure for the strength of the community structure found by our algorithms, which gives us an objective metric for choosing the number of communities into which a network should be divided. We demonstrate that our algorithms are highly effective at discovering community structure in both computer-generated and real-world network data, and show how they can be used to shed light on the sometimes dauntingly complex structure of networked systems.

0 comments Cited 813 times – based on 0 reviews

Preprint

     Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

MetaboAnalyst: a web server for metabolomic data analysis and interpretation

Jianguo Xia, Nick Psychogios, Nelson Young … (2009)

Metabolomics is a newly emerging field of ‘omics’ research that is concerned with characterizing large numbers of metabolites using NMR, chromatography and mass spectrometry. It is frequently used in biomarker identification and the metabolic profiling of cells, tissues or organisms. The data processing challenges in metabolomics are quite unique and often require specialized (or expensive) data analysis software and a detailed knowledge of cheminformatics, bioinformatics and statistics. In an effort to simplify metabolomic data analysis while at the same time improving user accessibility, we have developed a freely accessible, easy-to-use web server for metabolomic data analysis called MetaboAnalyst. Fundamentally, MetaboAnalyst is a web-based metabolomic data processing tool not unlike many of today's web-based microarray analysis packages. It accepts a variety of input data (NMR peak lists, binned spectra, MS peak lists, compound/concentration data) in a wide variety of formats. It also offers a number of options for metabolomic data processing, data normalization, multivariate statistical analysis, graphing, metabolite identification and pathway mapping. In particular, MetaboAnalyst supports such techniques as: fold change analysis, t-tests, PCA, PLS-DA, hierarchical clustering and a number of more sophisticated statistical or machine learning methods. It also employs a large library of reference spectra to facilitate compound identification from most kinds of input spectra. MetaboAnalyst guides users through a step-by-step analysis pipeline using a variety of menus, information hyperlinks and check boxes. Upon completion, the server generates a detailed report describing each method used, embedded with graphical and tabular outputs. MetaboAnalyst is capable of handling most kinds of metabolomic data and was designed to perform most of the common kinds of metabolomic data analyses. MetaboAnalyst is accessible at http://www.metaboanalyst.ca

0 comments Cited 800 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Innovation: Metabolomics: the apogee of the omics trilogy.

Gary J Patti, Oscar Yanes, Gary Siuzdak (2012)

Metabolites, the chemical entities that are transformed during metabolism, provide a functional readout of cellular biochemistry. With emerging technologies in mass spectrometry, thousands of metabolites can now be quantitatively measured from minimal amounts of biological material, which has thereby enabled systems-level analyses. By performing global metabolite profiling, also known as untargeted metabolomics, new discoveries linking cellular pathways to biological mechanism are being revealed and are shaping our understanding of cell biology, physiology and medicine.

0 comments Cited 791 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Gabi Kastenmüller: +49 89 3187-3578 , g.kastenmueller@helmholtz-muenchen.de

Jan Krumsiek: +49 89 3187-3641 , jan.krumsiek@helmholtz-muenchen.de

Journal

Journal ID (nlm-ta): Metabolomics

Journal ID (iso-abbrev): Metabolomics

Title: Metabolomics

Publisher: Springer US (New York )

ISSN (Print): 1573-3882

ISSN (Electronic): 1573-3890

Publication date (Electronic): 20 September 2018

Publication date PMC-release: 20 September 2018

Publication date (Print): 2018

Volume: 14

Issue: 10

Electronic Location Identifier: 128

Affiliations

[1 ]ISNI 0000 0004 0483 2525, GRID grid.4567.0, Institute of Computational Biology, , Helmholtz-Zentrum München, ; Neuherberg, Germany

[2 ]ISNI 0000 0004 0483 2525, GRID grid.4567.0, Institute of Epidemiology II, German Research Center for Environmental Health, , Helmholtz Zentrum München, ; Neuherberg, Germany

[3 ]ISNI 0000 0004 0483 2525, GRID grid.4567.0, Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, , Helmholtz Zentrum München, ; Neuherberg, Germany

[4 ]GRID grid.452622.5, German Center for Diabetes Research (DZD e.V.), ; Neuherberg, Germany

[5 ]ISNI 0000 0004 0483 2525, GRID grid.4567.0, Institute of Bioinformatics and Systems Biology, , Helmholtz-Zentrum München, ; Neuherberg, Germany

[6 ]ISNI 0000 0004 0483 2525, GRID grid.4567.0, Institute of Experimental Genetics, Genome Analysis Center, , Helmholtz Zentrum München, ; Neuherberg, Germany

[7 ]ISNI 0000000123222966, GRID grid.6936.a, Lehrstuhl für Experimentelle Genetik, , Technische Universität München, ; Freising, Germany

[8 ]German Center for Cardiovascular Disease Research (DZHK e.V.), Munich, Germany

[9 ]ISNI 0000 0004 0582 4340, GRID grid.416973.e, Department of Physiology and Biophysics, , Weill Cornell Medical College in Qatar, ; Education City, Doha, Qatar

[10 ]ISNI 0000 0004 0483 2525, GRID grid.4567.0, Institute of Genetic Epidemiology, , Helmholtz Zentrum München–German Research Center for Environmental Health, ; Neuherberg, Germany

[11 ]ISNI 0000 0004 1936 973X, GRID grid.5252.0, Chair of Genetic Epidemiology, Institute of Medical Informatics, Biometry and Epidemiology, , Ludwig-Maximilians-University, ; Munich, Germany

[12 ]ISNI 0000000121885934, GRID grid.5335.0, MRC Epidemiology Unit, , University of Cambridge, ; Cambridge, UK

[13 ]ISNI 0000000123222966, GRID grid.6936.a, Department of Mathematics, , Technische Universität München, ; Garching, Germany

[14 ]ISNI 000000041936877X, GRID grid.5386.8, Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Department of Physiology and Biophysics, , Weill Cornell Medicine, ; New York, USA

Article

Publisher ID: 1420

DOI: 10.1007/s11306-018-1420-2

PMC ID: 6153696

PubMed ID: 30830398

SO-VID: 76ed0a61-0621-4690-9332-e3b87988825d

License:

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

History

Date received : 11 April 2018

Date accepted : 24 August 2018

Funding

Funded by: FundRef http://dx.doi.org/10.13039/501100002347, Bundesministerium für Bildung und Forschung;

Award ID: 01ZX1313C

Award ID: 03IS2061B

Award Recipient : Kieu Trinh Do Gabi Kastenmüller

Funded by: European Union’s Seventh Framework Programme [FP7-Health-F5-2012]

Award ID: 305280

Award Recipient : Jan Krumsiek

Funded by: FundRef http://dx.doi.org/10.13039/501100000781, European Research Council;

Award ID: LatentCauses

Award Recipient : Fabian J. Theis

Funded by: Weill Cornell Medical College Qatar

Award ID: Biomedical Research Program funds

Award Recipient : Karsten Suhre

Funded by: German Research Center for Environmental Health

Funded by: FundRef http://dx.doi.org/10.13039/501100000265, Medical Research Council;

Award ID: MC_PC_13048

Award ID: MC_UU_12015/1

Custom metadata

ScienceOpen disciplines: Molecular biology

Keywords: untargeted metabolomics,missing values imputation,limit of detection,batch effects,mice,k-nearest neighbor,mass spectrometry

Data availability:

ScienceOpen disciplines: Molecular biology

Keywords: untargeted metabolomics, missing values imputation, limit of detection, batch effects, mice, k-nearest neighbor, mass spectrometry

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Cited by 84

See all cited by

Most referenced authors 477

See all reference authors

Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies

Read this article at

Abstract

Background

Methods

Results

Conclusion

Electronic supplementary material

Related collections

EPA CompTox Chemicals Dashboard

Most cited references 23

Finding and evaluating community structure in networks

MetaboAnalyst: a web server for metabolomic data analysis and interpretation

Innovation: Metabolomics: the apogee of the omics trilogy.

Author and article information

Contributors

Journal

Affiliations

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 200

Cited by 84

Most referenced authors 477