Recurrent Topics in Mass Spectrometry-Based Metabolomics and Lipidomics—Standardization,
Coverage, and Throughput

Rampler, Evelyn; Abiead, Yasin El; Schoeny, Harald; Rusz, Mate; Hildebrand, Felina; Fitz, Veronika; Koellensperger, Gunda

doi:10.1021/acs.analchem.0c04698

ScienceOpen: research and publishing network

For Publishers

For Researchers

Blog
About

Search
Advanced search

views

recommends

Record: found
Abstract: found
Article: not found

Recurrent Topics in Mass Spectrometry-Based Metabolomics and Lipidomics—Standardization, Coverage, and Throughput

review-article

Author(s): Evelyn Rampler ^† ^, ^‡ ^, ^§ , Yasin El Abiead ^† , Harald Schoeny ^† , Mate Rusz ^† ^, ^∥ , Felina Hildebrand ^† , Veronika Fitz ^† , Gunda Koellensperger ^† ^, ^‡ ^, ^§ ^,

Publication date (Electronic): 28 November 2020

Journal: Analytical Chemistry

Publisher: American Chemical Society

Read this article at

ScienceOpenPublisher PMC

Bookmark

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

We have come a long way in the last years of developing analytical strategies in metabolomics. We have seen huge progress in tackling multiplatform measurement, data analysis, data integration, and interpretation. 1 Mass spectrometry (MS) is the unrivaled technology in the field. Following a divide and conquer strategy, successful approaches defined and addressed sub-omes individually. Recursively solving technical “subproblems” also with regard to the analytical tasks of quantification and identification allowed us to make significant progress. However, some of the challenges, as imposed by the metabolome’s complexity (molecules <1500 Da), are not entirely overcome to date. Indeed, the physicochemical space occupied by this building block of life is vastly heterogeneous, spanning concentration ranges from (high) fM to mM 2 and forming dynamic complex reaction networks. The complete scope of metabolic networks remains to be elucidated. This holds true for “simple” organisms such as bacteria with relatively small-sized metabolomes (in the hundreds) and even more for the human body metabolome considering that it consists of hundreds of different metabolome, depending on body fluids, cell type, status, and tissue. Next to the endogenous human pathways, numerous metabolites exist that are transformed and/or circulated upon the complex interplay with trillions of microbes constituting the “ecosystem” of the human body. 3 Additionally, certain disease-specific metabolites (e.g. methylated amino acids) with biological function may occur, defining the so-called epi-metabolome. 4 Damaged and repaired metabolites can be the result of enzymatic impairment. 5 Finally, the human metabolome is highly dependent on nutrition and the surrounding environment. More than 200,000 food derived metabolites and 10,000 xenobiotics exist that are potentially circulating. 6 Consequently, we have not yet reached the ultimate aim, which is to comprehensively identify and quantify all metabolites with one or at least a few analytical runs. Metabolome coverage, selectivity, sensitivity and throughput remain conflicting goals that we have to navigate. 7 While this fact limits the pace of experimental assessment of metabolome inventories 8 regarding different cell types and model organisms, there has been significant progress in customizing workflows, with the aim of providing a pragmatic base for informative metabolite measurements. 9 The virtuous cycle of the global metabolomics workflow starts with discoveries by nontargeted analysis. Over the last few years, this analytical strategy has seen a tremendous impact across different metabolomics applications and beyond. At the same time analytical chemists, embracing the novel omics-type of measurement, have been and keep being challenged regarding quality control (QC), method standardization, and harmonization. Evidently, the computational methods for data processing and data analysis are by far more complex than in target analysis. Establishing metrics and guidelines for nontargeted analysis is not straightforward, 10 especially compared to the well-established validation practice in target analysis. Experimental design and data quality 11 are key to fully exploit the potential of nontargeted analysis with regard to e.g. biomarker discovery 12,13 and beyond. The integration of reference materials in nontargeted workflows is still under debate. The complexity of omics-reference materials production, following stringent metrological criteria, results in high costs, which contradicts to the idea of affordable discoveries by large-scale studies. The authors assume that this lack of general acceptance has in turn reduced the pace of material development, and today we still have only a few biological matrix reference standards available. Finally, whether a discovery can be standardized might be debatable; however, a finding should be validated. In fact, a metabolomics experiment should not end with nontargeted methods, but the results should be validated both analytically and biologically. 14 Thus, the final analytical step of our ideal virtuous metabolomics cycle includes targeted measurements using authentic standards. Typical sample numbers in metabolomics range from tens or hundreds up to a few thousand, depending on the study design. 15 The more diverse the study cohort, the more samples must be analyzed in order to generate a meaningful hypothesis. Following the golden rules of step-wise discovery and stringent analytical validation is more demanding for large scale studies. Time spans between sampling, analysis, interpretation, discovery, and final validation together with the limited availability of authentic standards pose practical limitations towards this approach. A major aim of analytical development remains increasing throughput of measurement. Regarding compound annotation, de facto every current study accepts annotations with varying but defined degree of certainty. This holds true for metabolomics and lipidomics, where annotation is facilitated by rule-based MS data interpretation as enabled by the structural templates of lipids. It is common practice in both applications to report levels of annotation. 16,17 However, estimating the proportion of potentially false assignments is still an exciting field of research. 18 Finally, analytical validation should include the quantitative dimension of discoveries in nontargeted analysis. Despite significant progress in harmonization, standardization, and advanced statistical analysis, 19 large scale multicenter studies remain challenging. Recent applications resort to small scale studies for hypothesis generation, followed by a (wide) targeted large scale study for hypothesis validation. 20 Biological validation is dependent on the scope of the study. In metabolic phenotyping, biological probability checks are facilitated by massive joint efforts to deploy open-source metabolic atlases for a number of different organisms. Comparisons with both experimental data and predictions (reactions, rules, and enzymes) support the findings. 21 The complexity of biological validation increases dramatically in the case of a hypothesized biological function. Then, validation of the generated hypothesis does not only address the mere presence/up- or downregulation of a certain metabolite/pathway, but the hypothesized biological function needs to be corroborated. For example, in functional metabolomics, 22 cutting-edge multi-omics analysis 23 together with biochemical assays unravels molecular functions and associated modulatory mechanisms of perturbed metabolism in relation to phenotype. Undoubtedly, accepting multiple lines of evidence in nontargeted discoveries (with reported degree of confidence) has accelerated metabolomics research. The question to which degree analytical validation can be reduced or even entirely replaced by advanced computational methods and biological validation experiments needs to be addressed in the overwhelmingly interdisciplinary science of metabolomics. Reporting on the accurate assessment and the resulting degree of confidence alone is a minimum requirement. 24 On the other hand, the ways of evidence besides strict analytical validation might promote the acceleration of the measurement step itself. High-throughput technologies proved to be fit-for-purpose in dedicated applications despite limited selectivity. 25,26 This review will focus on recurring topics in MS-based metabolomics measurement (including lipids). We will emphasize the role of stable isotopes for both target and nontargeted analysis giving an overview on different standard materials derived from isotopically labeled biomass and strategies enabled by these materials. We will discuss the current state of the art of quantification, validation, and harmonization with respect to both metabolomics and lipidomics. We will include strategies enabling various ways of scientific evidence regarding the metabolite/lipid annotation task. Finally, we will survey the rationales of workflow design, which straddle coverage and throughput. Nearly five years have passed since Cajka and Fiehn published their review on the state of the art of metabolomics/lipidomics, proposing at the same time a vision of merging targeted and nontargeted analysis. 9 Since then, many studies have realized the potential of simultaneous unanticipated discovery and quantification of a selected metabolite pool, a strategy enabled by high-resolution mass spectrometry (HRMS). We report on the progress of “merging” ideas. We think that lipidomics and metabolomics need to be integrated into one workflow. We will discuss the potential of chromatographic solutions as compared to recent high-throughput technologies for the simultaneous analysis of the two sub-omes, as a first key step. Established Concepts of Quantification in Metabolomics/Lipidomics In absolute accurate quantification, guidelines on bioanalytical method validation from the United States Food & Drug Administration (U.S. FDA) 27 or European Medicines Agency (EMA) 28 establish gold standards and metrological frames. However, application to omics-type analysis is challenged by the sheer number of analytes within one measurement, the lack of standards, and the need for an actual analyte-free matrix. In the following, we will give a brief tutorial summary on absolute quantification strategies currently established in the field of metabolomics and lipidomics. The term quantitative assessment in MS-based omics studies often refers to relative quantification of differences between sample groups, while here we refer to absolute quantification requiring proper standardization and analytical validation. A brief introduction will emphasize the need for standards and reference materials, in the form of both multi-mix standards and biological matrix material. Recommended Absolute Quantification Approaches The method of highest metrological order in MS based analysis is isotope dilution established by matrix-matched multi-level external calibration with internal standardization. The internal standard (ISTD) added as early as possible in the analytical process and equilibration between sample and spike should be ensured prior to extraction. Multilevel calibration, preferred as the working range (given by the lower limit of quantification (LLOQ) and the upper limit of quantification (ULOQ)), is assessed and controlled along with the quantification exercise. This is not the case when isotope dilution is based on a single spike level (one-point calibration). Next to this gold standard, other external calibration strategies could meet the recommendations of widely accepted (bio-) analytical method validation guidelines, as well, as long as they properly employ internal standardization. As internal standards, either standards of similar structures or of matching retention time (RT) and thus co-ionization are commonly used. Spiking the same amount of ISTD to external calibrants and samples allows us to use ISTDs without certified concentration. Figure 1 dissects the calibration method into four major components and discusses their relevance. According to the guidelines, the analysis of biological matrix blanks is mandatory. The conceptualization of such a blank sample, i.e. a biological matrix free of endogenous metabolites, is challenging. Knockout experiments for specific metabolites, albeit tedious, offer a solution. However, most studies resort to simplifying approaches using extraction blanks or protein mixtures. Figure 1 Accurate absolute quantification according to the U.S. FDA guideline. Four requirements need to be fulfilled for calibration: 1, matrix-matched; 2, multipoint; 3, external standardization; 4, internal standardization. Additionally, their control point, the challenge, and a practical solution for omics-experiments are given. *The ranking of ISTD follows the levels of quantification of the Lipidomics Standards Initiative (LSI). 29 The gold standard of quantification, if applied to -omics type of analysis requires a high number of external standards (ESTDs) and ISTDs, which are stable isotope labeled. Fully labeled standards are expensive but simplify data evaluation and validation. State of the art wide targeted assays in metabolomics implement hundreds of standards. In research practice, the need for fit-for-purpose methods has led to the implementation of alternative quantification strategies with the aim of reducing the overall number of standards, measurements, and costs involved. In lipidomics, calibration strategies resorting to few standards per lipid class have successfully been established as enabled by the structural templates of lipids. 30 Moreover, recent development concerns the use of partial isotopic labeling for standard production. 31 Figure 2 provides an overview on established quantification methods in metabolomics/lipidomics compared to the gold standard (matrix-matched multi-point external calibration including internal standardization). Figure 2 Fit for purpose internal standard-based quantification strategies established in the field of metabolomics and lipidomics. Colors in the graphs symbolize values from the sample (purple), compound-specific standards (green), and surrogate standards (orange). As previously mentioned, isotope dilution using a known amount of isotopically labeled ISTD with characterized concentration (traceable) offers a method of high metrological order. Both fully labeled and partially labeled ISTDs can be used. In the latter case, concentrations are calculated using multiple linear regressions. The “single spike” isotope dilution method is accurate, provided that (1) spike and sample are equilibrated upon extraction and (2) the blend ratio is within the linear dynamic range and (3) significantly different from the natural ratio. Thus, additional validation experiments are required. For highest metrological order, reversed isotope dilution experiments are necessary to characterize the spike with every experiment. These steps are mostly omitted in -omics measurements. The validation process is accelerated by kit solutions and commercial availability of ISTD mixtures with concentration levels tailored for specific applications. If no compound-specific calibrant is available, surrogate calibration is accomplished by structurally similar standards, either using isotopically labeled ISTD or non-endogenous ISTDs. Structurally similar standards are preferred over RT matched standards, which ensure co-ionization only. Surrogate internal standardization drastically reduced the number of necessary standards. It is executed as multi-point calibration 32 or as one-point calibration. 33 In lipidomics, surrogate calibration is accepted, provided lipid class co-ionization and the use of response factors 34 is ensured. If lipid surrogate quantification is performed on the MS2 level, variations in signal intensities between the different fatty acyl chain fragments have to be mathematically corrected for. Schuhmann et al. recently published a model based on commercially available lipid standards to correct systematic errors (up to 60%) for common glycerophospholipids due to the differences in (1) the sn-1/2 positions of the glycerol backbone, (2) the length of the hydrocarbon chain, and (3) the number and location of double bonds. 35 Stable Isotope Labeling In contrast to radionuclides, isotopes have stable nuclei, hence representing a safe alternative for labeling approaches. The overall abundance of heavy stable isotopes in nature is low (<5%). Given the isotopic effect, i.e. the isotopic fractionation upon chemical reactions and biological processes, the natural abundance varies to a small degree, forming the basis for natural tracer studies in geochronology, ecology, archeology, or climatology. The low natural abundance facilitates the production of pure stable isotope labeled compounds, either via chemical synthesis or via in vivo synthesis. 36 Stable isotopes and stable isotope labeling have a well-documented history in MS, which was exquisitely outlined for life sciences by Lehmann. 37 In this review, we emphasize the pivotal role of stable isotope labeled biomass. Today, in vivo synthesized stable isotope labeled compounds have become essential tools for mass spectrometry-based identification or quantification in metabolomics (including lipidomics). The important application of supplied stable isotope tracers in metabolomics for flux and tracer studies is comprehensively covered elsewhere. 38−41 Labeled biomass was used early on in quantitative omics workflows, e.g. amino acid labeling to monitor proteome changes upon system perturbation. Relative quantification in proteomics studies using cell culture based labeling 42 was performed, but also successful labeling of higher organisms such as Caenorhabditis elegans, Drosophila melanogaster, and mice was reported. 43−45 However, it is important to note that for higher organism complex nutrients and media composition are necessary so that in most cases only specific amino acids (SILAC approach) were labeled leading to amino acid labeling efficiencies of up to 98%. 44 Only when fully labeled mircoorganisms such as Escherichia coli 98% enriched in 15N were fed to worms (C. elegans) or fruit flies (D. melanogaster), protein extracts with a labeling degree up to 94% were detectable. 45 However, the smaller number of nitrogen atoms limits its use in metabolomics or lipidomics, thus carbon or deuterium labeling is preferred. Already in 2005, absolute quantification based on internal standardization by uniformly 13C-labeled yeast cell extracts was introduced, paving the way for absolute quantification of a high-numbered analyte panel. 46 At that time no enrichment degrees were reported for metabolites or lipids. The use of labeled biomass for quantification tasks in metabolomics was facilitated by fully labeled E. coli grown in shaking flasks as pioneered by the group of Rabinowitz 47 and further extended for eukaryotic uniformly labeled yeast grown in fermenters by Canelas et al. 48 Enrichment Degree and Isotopologue Distribution Isotopically labeled standards are characterized by the enrichment degree—often used interchangeably with the term labeling efficiency—which refers to the probability of finding a labeled atom at any possible label site. One has to be aware that the actual relative abundance of the heaviest isotopologue, i.e. the fully labeled isotopologue, is lower than the enrichment degree and depends on enrichment, the number of labeling sites, elemental composition, and mass resolution (see Figure 3 A–D). A simplified assumption of 100% abundance of the fully labeled isotopologue leads to errors in actual relative abundance in the mass spectra (Examples for leucine and phosphatidylcholine (PC) 34:2 can be found in Figure 3 E). This is relevant in absolute quantification relying on ISTDs with known concentration and especially crucial if the labeled compound is used as surrogate ISTD as e.g. often performed in lipidomics. 33 In this case, either all isotopologues are summed up (after they have been checked for interferences) or the actual value is corrected e.g. similar to isotope correction Type 1 for natural unlabeled lipids. 33 A useful tool for fast prediction of isotopologue distributions from molecular formulas is enviPat, which is available as website version and R package. 49 Overall, in order to enable omics-type analysis, the known enrichment degree is of paramount importance. Spike materials of high enrichment degree (>99%) are preferred as they lead to more distinct isotopologue signals, reduced spectral overlay, and more straightforward data interpretation. Figure 3 Difference between enrichment degree and the relative isotopic abundance of a fully labeled isotopologue. (A) Isoleucine with 6 carbon atoms is used as an example. (B) Calculation of abundances for carbon as di-isotopic element is based on the binominal formula. Other elements with more than one isotope (e.g. H, N) influence the final abundance according to their natural abundance also based on a binominal formula. Polyisotopic elements (O) are based on polynomial terms. Usually, the contribution of H, N, and O to the overall difference is minimal (here 1–2%) but other elements must be considered (e.g. Cl, Br, S). (C) Determination of coefficients of a binominal formula for each term according to the n + 1 line in Pasqual’s triangle (for n = 6:1, 6, 15, 20, 15, 6, 1). (D) Binominal formula for n = 6. Each term is the relative abundance of the corresponding isotopologue without the consideration of other elemental isotopes. The last term corresponds to the fully labeled isotopologue. The sum of all isotopologues is always 100%. (E) Exemplarily, the effect of 1% enrichment difference (99%-darker color and 98%-lighter color) on the abundance is shown for PC 34:2 (n = 42, blue) and isoleucine (n = 6, grey). The bar chart shows the distribution from the fully labeled isotopologue (M′) until M′ – 4 for both molecules. The difference for the fully labeled isotopologue to 100% is already 12% for the 98% labeled isoleucine and 58% for PC 34:2. But even for a better enrichment (99%) the error for PC 34:2 is still 36%, highlighting the importance to consider the relative abundance for quantification workflows. 50 Suite of In Vivo Synthesized Isotopically Labeled Materials In the last decades, labeled organisms such as bacteria, yeast, or plants have been grown to create huge libraries of stable isotope labeled (13C, 15N, 34S, 2H) endogenous metabolites. 51−53 Controlled growth conditions of E. coli or Pichia pastoris were particularly successful as enrichment degrees higher than 99% were achieved leading to the simultaneous production of hundreds of biologically relevant labeled metabolites covering the highly conserved primary metabolome. 51,54−56 Some of the materials are already commercially available (such as labeled E. coli, yeast, and algae products; details on the materials can be found in Table 1 ). Table 1 Overview on Labeled Biomass Materials Organism Kingdom Isotope Enrichment degree Feed Reference Escherichia coli Bacteria 13C >98% Glucose Mahieu and Patti 2017 57 Escherichia coli Bacteria 15N >98% (NH4)2SO4 Krüger et al. 2008 43 Arthrospira platensis (Spirulina) Bacteria 13C >97% CO2 Berthold et al. 1991 58 Chlamydomonas reinhardtii (algae) Bacteria 13C >98% CO2 Behrens et al. 1994 59 Chlorella vulgaris (algae) Bacteria 13C >98% CO2 Behrens et al. 1994 59 Nannochloropsis oculata (algae) Protist 13C >85% CO2 Doomun et al. 2020 60 Pichia pastoris (yeast) Fungi 13C >98% Glucose Neubauer et al. 2012 56 Pichia pastoris (yeast) Fungi 34S >95% Na2SO4 Hermann et al. 2016 61 Saccharomyces cerevisiae (yeast) Fungi 15N >94% (NH4)2SO4 Krüger et al. 2008 43 Fusarium graminearum Fungi 13C >99.5% Glucose Bueschl et al. 2014 52 Arabidopsis thaliana Planae 13C >95% CO2 Giavalisco et al. 2009 53 Triticum durum (wheat) Planae 13C /15N >96%/>95% CO2/NO3 salts Ćeranić et al. 2020 62 Caenorhabditis elegans (worm) Animalia 15N >98% E. coli Krüger et al. 2008 43 Drosophila melanogaster (fly) Animalia 15N >94% S. cerevisiae Krüger et al. 2008 43 Rattus norvegicus domestica (rat) Animalia 15N >94% Spirulina McClatchy et al. 2007 63 Mus musculus (mice) Animalia 13C 6–75% Ralstonia eutropha Dethloff et al. 2018 64 Homo sapiens (HeLa cells) Animalia 2H 0–5% 5% D2O Kim et al. 2019 31 Homo sapiens (HCT116 cells) Animalia 13C 0–99% Glucose and AAs Grankvist et al. 2018 65 The list of labeled organisms is further growing. For example, uniformly 13C labeled lipids derived from microalgae Nannochloropsis oculata were measured via MS/MS to calculate 13C enrichment for both the whole molecule and the different building blocks of a lipid. 60 Such information can be useful to follow labeling of the head group versus fatty acids and might help to study lipid synthesis and remodeling processes. Advances in stable isotope labeling in plants using customized closed growth chambers enabled us to increase the enrichment degree to 96–98% for 13C and 95–99% for 15N adding a complex compound panel of primary and secondary metabolites. 62 Still missing is a fully labeled mammalian organism. The complex feed or media and the resulting high costs limit the production to partial labeling approaches, which have been used successfully for relative quantification. For example, growing HeLa cells on a 5% deuterium oxide enriched medium together with a deconvolution algorithm facilitating classical isotopic dilution approaches enabled improved relative quantification for lipids. 31 Even mice can be partially labeled (6–75% enrichment depending on the metabolite) for feeding a commercially available 13C-labeled bacterial diet (Ralstonia eutropha). This strategy was also applied for relative quantification, improving precision from 27% to less than 10%. 64 Table 1 summarizes the labeled biomass materials, used labeled isotopes, enrichment degree, feed, and literature. Applications of Stable Isotope Labeled Biomass Isotopically labeled biomass has three major applications in metabolomics and lipidomics, namely (1) credentialing by identification of biological metabolites using labeled and nonlabeled metabolite pairs, (2) validation of isotopologue distributions, and (3) standardization and normalization for quantification workflows. Credentialing: Isotopically Labeled Biomass for Identification Credentialing-type approaches involve the analysis of samples containing analytes in an unlabeled as well as a stable-isotope labeled form. Mixing of extracts from uniformly labeled organisms with those from unlabeled organisms allows us to distinguish metabolic features with biological origin from background contaminants by the occurrence of shifted m/z and MS/MS spectra and, in approaches implementing liquid chromatography (LC–MS), also matching RTs. An early application of comprehensive incorporation of stable isotope labeled biomass was published by Giavalisco et al., 53 who applied 13C labeling of Arabidopsis thaliana in order to recognize biological features and improve the molecular formula annotation of their flow injection (FI-) fourier-transform ion cyclotron resonance (FTICR) and reversed-phase (RP)-LC-FTICR analysis. The first open-source software MetExtract capable of automatizing assignment of LC-MS peaks originating from 13C labeled compounds to their endogenous counterparts was published by Bueschl et al. 66 Later, other tools mostly relying on differential incorporation of isotopic labels into metabolites have been introduced, which simplify this type of analysis and include tracer analysis (MAVEN, 67 mzMatch-ISO, 68 X13CMS, 69 isoMETLIN, 70 george, 71 and ALLocator 72 ). The isotopic ratio outlier analysis (IROA) approach demonstrated the introduction of highly specific isotopologue patterns to further improve specificity and quantification capabilities using labeled organisms. 73 In 2014, Mahieu et al. 74 coined the term “credentialing” and further emphasized the importance of this type of approaches for the recognition of real biological features and the comparison and fine tuning of metabolomics workflows. Later they used stable isotope labeling combined with other feature grouping and noise removal approaches to show that the number of biological features in an E. coli extract can account for less than 5% of all features detected via nontargeted peak detection. 57 MetExtract was later updated to MetExtract II to remove mismatches and group different ion-species as well as employ stable isotope patterns for the purpose of LC-MS peak detection, annotation/noise removal in fragmentation spectra, molecular formula elucidation, and isotopic tracer studies. 75 This presented a significant step in harvesting the full potential of stable isotope labeling. In 2019, Wang et al. employed not only 13C but also 15N isotopically labeled organisms (Saccharomyces cerevisiae and E. coli). 76 As in the original credentialing approach they combined stable isotope labeling with other noise reduction and feature grouping approaches in order to recognize biological features. Using this approach, they found a comparable number of biological features (only 4% of the peaks were annotated as apparent metabolites). Moreover, systematic annotation of peaks and discrimination of biological compounds (including isotopic variants) from adducts, fragments and MS artifacts was established. In fact, the correct identification of adducts was identified as a major bottleneck for elucidating the number of true sample molecules. In the following, the integration of stable isotope labeled buffers in LC-HRMS improved cost efficiency and introduced an universal stable isotope labeling approach for the corroboration and annotation of real chemical features to any kind of sample. 77 The disadvantage of doubled measurement time is compensated by the comparable performance (for noise removal and annotation) to other credentialing approaches. Isotopically Labeled Biomass for Validated Isotopologue Distribution Elucidations Another way to harvest stable-isotope labels in metabolomics is the investigation of differential incorporation of labels into organisms comprehensively reviewed elsewhere. 38−40 However, we want to highlight the application of labeled biomass with controlled labeling pattern 78−80 to validate isotope tracer analysis workflows. 81 In the past, it was shown that 13C tracer and flux experiments demand dedicated validation tools. Spectral accuracy, i.e. an instrument’s ability to truly measure the fractional abundance of the different isotopologues, is crucial. Metabolite standards with natural isotopic pattern (as well as fully labeled standards) are not well suited to assess the accuracy of carbon isotopologue distribution in tracer studies. Due to the low natural abundance of 13C, heavy natural isotopologues are below the limit of detection. Using in vivo synthesis, tailored carbon isotopologue distribution of primary metabolites can be obtained, which serves as ideal reference. Isotopologue distribution of stable isotopologe-labeled compounds can be assessed with excellent precisions of <1% and trueness bias as small as 0.01–1%. Isotopically Labeled Biomass for Quantification Starting in the 1980s, stable isotope-labeled ISTDs and isotope dilution approaches in combination with LC- and gas chromatography (GC)-MS/MS were used to improve quantification of small molecules. 37 In metabolomics, internal standardization is widely adopted for absolute quantification, as the analytical process consists of multiple steps and requires normalization. Chemical synthesis of isotope labeled standards precludes omics-type of analysis, as hundreds of ISTDs are required to make isotopically labeled biomass a promising alternative. The cost-effective in vivo synthesized metabolites standards are characterized with respect to their isotope labeling degree but not their concentrations. Thus, normalization between samples (relative quantification) or internal standardization of external calibration (absolute quantification) 47,48,54,56 is accomplished by spiking known amounts of labeled biomass into the samples. The benefits of these quantification workflows are well documented. Overall, improved analytical figures of merit (trueness, precision, and linearity) have been observed upon the integration of labeled yeast extracts. 54,55,82 The use of HRMS together with stable isotope labeled standards supports workflows merging absolute quantification and nontargeted unanticipated discoveries (relative quantification and annotation) in one analytical run. This powerful strategy has been addressed in metabolomics and lipidomics. 54,82 In lipidomics, only a slight decrease of identified lipids (∼10%) was observed in the presence of labeled biomass. 82 This can be explained with ion competition in complex matrices when applying data-dependent fragmentation and can be further optimized by deep metabolite profiling or data-independent acquisition. Stable isotope labeled materials as an intermediate have to be chosen on the bases of sufficient metabolite/lipid class coverage and biomass availability/costs. Labeled yeast, e.g. P. pastoris, offers a reasonable compromise for quantitative studies, as it is an eukaryotic organism that can be easily cultivated under controlled conditions on a sole carbon source. Yeasts share a high metabolome and lipidome overlap with humans including the evolutionarily conserved primary metabolome, e.g. amino acids, nucleotides, organic acids, and metabolites of the central carbon metabolism. But also lipids are covered as shown by Natter et al. 83 Wolrab et al. summarized the most frequently up- and downregulated lipids in oncology including the classes phosphatidylcholines (PC), phosphatidylethanolamines (PE), phosphatidylinositols (PI), phosphatidylserines (PS), lysophosphatidylcholines (LPC), lysophosphatidylethanolamines (LPE), lysophosphatidic acids (LPA), free fatty acids (FA), triacylglycerols (TG), diacylglycerols (DG), cholesterol esters (CE), sphingomyelins (SM), ceramides (Cer), monosialodihexosylganglioside (GM3), and sulfatides (SHexCer) in both tissue and body fluids, 84 and except for CE, SM, GM3, and SHexCer, all of the listed classes are present in yeast. In the past, P. pastoris yeast extracts were successfully spiked to human plasma (including standard reference material (SRM) 1950 from the national institute of standards and technology (NIST), USA), different cell extracts, and yeast, either as ISTD based on ethanolic extracts or chloroform based lipidome isotope labeling of yeast (LILY) extracts, for metabolites and lipids, respectively (Figure 4 A, B). Figure 4 Current in-house library of annotated metabolites and lipids found in Pichia pastoris (yeast). (A) Metabolite classes in ethanolic yeast extract 85 classified using the ClassyFirer 86 annotation system. (B) Lipid classes annotated in chloroformic yeast extract. 87 GPL, glycerophospholipids; GL, glycerolipids; SL, sphingolipids; ST. sterols; PR, prenols; Hex1Cer, hexosyl ceramides; SPH, shingosine bases; SE, steryl esters; Co, coenzyme Q; PG, phosphatidylglycerols; PA, phosphatidic acids; CL, cardiolipins. At the present state, a library of 206 metabolites for the ethanolic yeast extract covering the classes of (1) organic acids and derivatives, (2) nucleosides, nucleotides, and analogues, (3) lipids and lipid-like molecules, (4) organic oxygen compounds, (5) organoheterocyclic compounds, (6) organic nitrogen compounds, and (7) benzoids is established (Figure 4 A). All of the identified metabolites were also present in The Human Metabolome Database (HMDB) 88 . This can be in part attributed to the human microbiome, but also to the evolutionary (inter-species) conservation of the primary metabolome. With regard to the yeast and human lipidome, major differences exist including a different sphingoid base—SPH 18:0;3 instead of SPH 18:1;2—as well as other sphingolipid classes (inositol phosphoceramide (IPC), mannosylinositol phosphoceramide (MIPC), and mannose-bis(inositolphospho)ceramide (M(IP)2C) instead of SM, ceramide 1-phosphates (CerP), and gangliosides. Yeasts also contain a smaller diversity of fatty acids with a maximum of three double bonds with a lack of higher polyunsaturated fatty acids (PUFA). Furthermore, no ether lipids (plasmanyl (ether bond), plasmenyl (vinyl bond)) are present and cholesterol is replaced by ergosterol in yeast. Overall, this leads to a list of 405 lipid species (Figure 4 B) combining information from reports on LILY from chloroform extracts by RP-LCMS 82 and an improved preparative supercritical fluid chromatography (SFC) workflow. 87 Optimized extraction strategies and confirmation by authentic standards can further increase the metabolite and lipid list in yeast. 89 Here, we want to emphasize the possibility of class or retention-time specific standardization if the target metabolite or lipid is not present in the yeast extract. By using these labeled compounds as class or retention-time specific ISTD if the target analyte is not present in the yeast extract, 90 the list of possible analytes in a quantitative approach can further be enlarged and adapted to the sample of interest. Harmonization and Reference Materials Joint efforts toward harmonized metabolomics protocols and the definition of a minimum of quality requirements are of paramount importance. There is a vivid scientific community working toward harmonization to raise transparency and quality of published results. 17,91−95 Standardized methods and reference materials provide benchmarks, paving the way to reproducibility and most importantly interassay commutability, with regard to both targeted and nontargeted analysis. Reference Materials and Interlaboratory Comparisons Certified reference materials represent the highest metrological order benchmarks enabling traceable and accurate quantification in metabolomics workflows. Certification requires an inherently long lead time, as composition and quantitative values are reported with characterized uncertainty and stability. Certified reference materials are provided by metrological institutions or by accredited material producers. While the application of (certified) reference materials in absolute quantification is well established, their integration for nontargeted metabolomics is emerging. A recent multi-platform study by hydrophilic interaction liquid chromatography (HILIC)/RP-LC HRMS 96 demonstrated the power of using high-quality benchmarks in large-scale nontargeted metabolomics. Three pooled human plasma reference materials (Qstd3, 211 CHEAR, NIST SRM 1950) were repeatedly measured along with 3600 samples over a period of 17 months, providing a convincing strategy for data normalization and estimative concentration levels. As the pace of standard production suitable for omics-type research in national metrological institutions is slow, international ring trials/interlaboratory initiatives drive standardization by offering measurement protocols and consensus values for biological matrix materials which can be distributed to the community. For the widely adopted NIST reference material human plasma SRM 1950, the number of consensus values assessed by international ring trials is continuously growing. Consensus values for 250 metabolites (amino acids, biogenic amines, acylcarnitines, glycerolipids, glycerophospholipids, cholesteryl esters, sphingolipids, hexoses) were assessed on the basis of the Biocrates AbsoluteIDQp400HR. 97 Interlaboratory comparisons are of paramount importance in lipidomics, since reference materials are lacking. In 2017, an international ring trial provided consensus values for 339 lipids (from the major categories: fatty acids, glycerolipids, glycerophospholipids, sphingolipids, sterols) in SRM 1950. 98 Recently, Triebl et al. 99 further emphasized the need for reference samples by showing that lipidomics workflows continue to suffer from limitations associated with reproducibility and commutability of quantitative data from different platforms, even when isotopically labeled ISTDs were included. The authors compared direct infusion, HILIC, and RP-LC-MS workflows for lipid analysis showing that upon normalization to the reference sample SRM 1950, platform-dependent quantitative bias was successfully removed. 99 The frequent use of SRM 1950 in both metabolomics and lipidomics studies 96,97,100,101 highlights its key role as a reference point for merged workflows. Another recent interlaboratory study tested seven distinct materials including human urine pools from four SRMs and one research-grade test material (RGTM) provided by NIST. 102 Untargeted analytical profiles for these materials were obtained using a variety of common metabolomics platforms (nuclear magnetic resonance (NMR), GC- and LC-MS), leading to the conclusion that all platforms were able to detect compositional differences despite some platform-dependent differences. Community-Based Guidelines in Metabolomics Community guidelines on how to report and perform metabolomics workflows form the basis of standardization. The metabolomics standardization initiative (MSI) of the metabolomics society 91 has worked intensively on definitions and guidelines considering all steps of the targeted and nontargeted analytical process for many years. This includes defining the analytical task, sampling/analysis of data standards, data evaluation, and reporting. 17,93 The metabolomics community is currently, revisiting the standards of metabolite reporting by the state of the art level of confidence scala 94 (1–3) introducing new subclasses (A–F) for unambiguous metabolite identification such as cis/trans configuration information. In October 2020, a new guideline on lipid classification, nomenclature, and shorthand notation was published 95 including major changes for the annotation of double bond equivalents and the number of oxygens as well as newly delineated oxygenated lipid species. Figure 5 shows the metabolite and lipid identification ranking according to the newly proposed guidelines of the metabolomics community. Figure 5 Metabolite (left) and lipid (right) identification according to the proposed guidelines of the metabolomics society (A–G) using the examples of leucine and a PC 18:0/16:2(7E,11Z)[R]. The lowest annotation level corresponds to known accurate mass information (G) followed by a known compound class (F), known compound sum formula (E), known functional moieties (D), known structure (isoleucine)/double bond position (PC 18:0/16:2(7,11) (C), known diastereomer (B), and the highest level to enantiomer-specific identification (A). *in lipidomics 105 3 intermediate steps are distinguished at level D: sum of carbon and double bond number for all fatty acyl chains (PC 34:2)/known distribution (PC 18:0_16:2) and known position of the fatty acyl chains (PC 18:0/16:2). Updated metabolomics repositories such as MetaboLights 103 provide openness and transparency of reported data sets. These repositories will be essential for developing of community-based benchmark materials and will facilitate the development of accepted guidelines. Instrument-dependent compound identification workflows complicate cross-platform evaluations and call for harmonization of reference libraries. A recent European interlaboratory study published harmonization guidelines for acquisition and processing of tandem MS data. Interestingly, they also revealed that under certain collision energies time of flights ((TOF)s) and Orbitrap fragmentation spectra are comparable. 104 Quality Control and Benchmarking QC and normalization strategies are essential for successful large-scale studies. Normalization can be performed by QC samples and data-driven or via ISTDs and is extensively summarized elsewhere. 106−109 In large-scale metabolomics and lipidomics studies, the concept of a pooled sample for QC has gained worldwide acceptance, also allowing us to correct for intra- and interbatch variations and to accomplish MS/MS measurements required for annotation. 109,110 However, the production of sufficient amounts of pooled samples can be problematic for multicenter studies in clinical metabolomics. Additionally, if only one sample pool including all sample groups is produced, dilution effects can mask low abundant metabolite signals. The production of QCs for each group represents an alternative; however, in some cases preparing a pooled sample is simply impossible. For example, in many large-scale investigations such as longitudinal clinical studies or population profiling, all samples are not available at the beginning of the analysis. Alternatively, multistandard mixes of metabolites and/or lipids are established reference samples, which can be either produced user-defined in the lab or ordered as commercially available stocks, e.g. LSMLS or MSMLS (from IROA) including 400 metabolites each 1 mg in well plates or 600 metabolites each 5 μg per well plate. Lipid-specific kits are also offered, e.g. AbsoluteIDQ (from Biocrates) including 180 or 400 lipids. More recently, lipid mixes with matrix-specific concentrations are commercially available e.g. SPLASH LIPIDOMIX (Avanti) products which include one deuterated ISTD of all major lipid classes at ratios relative to human plasma. Another possibility is to take deuterated standards from the UltimateSPLASH (Avanti) panel from different lipid classes to prepare a customized lipid mix. These valuable standard panels offer reference materials for streamlined validation protocols and accelerate harmonization. However, it should be emphasized that harmonization efforts enabled by reference standard mixes and kit-type of analysis will not replace certified reference materials which are fully traceable. Recently, the concept of a cheap and easily accessible biological benchmark material was proposed for metabolomics and lipidomics. The idea was resumed from proteomics, where HeLa cell extracts have become the gold standard for benchmarking instrument performance and proof-of-principle experiments upon introduction of new analytical methods. 111−115 Yeast ethanolic extracts with a characterized metabolome, not only enabled testing for the chemical space and coverage upon method implementation and developments but also enabled in-house routines for instrumental performance tests with additional potential for batch to batch corrections in large scale nontargeted metabolomics studies. The benchmark material is obtained from P. pastoris from fully controlled fermentations, which can be easily reproduced in a lab with fermentor access. 85 Additionally, these extracts are also commercially available in both endogenous and 13C-labeled formate. An open-source yeast metabolite and lipid library is established for the material. All reported compounds were reported in the human metabolome data base, showing once more that yeast is a cost-effective benchmark material for human metabolomics. 104 out of 206 metabolites were stable for several years when stored in aliquots at −80 °C. 85 Nontargeted Data Analysis—Increasing Quality by Multiple Lines of Evidence Nontargeted metabolomics workflows consist of key steps that need to be addressed individually with regard to standardization. The first step of a nontargeted experiment involves the analytical process aspects, discussed in several reviews. 1,116,117 Data analysis constitutes the most time consuming and complex step of nontargeted experiments. Many tools and approaches are available for this process and have been summarized extensively. 107,118−120 More specifically, data analysis follows stepwise data preprocessing, features table processing, statistical analysis (feature prioritization and biomarker elucidation), annotation, and biological contextualization like pathway mapping and integration with other omics data, all of which (with the exception of statistical analysis) are discussed in the following. We will emphasize the multiple strategies of corroborating nontargeted read-outs and deliberately focus on different aspects that improve quality. Data Preprocessing Data preprocessing (DPP) presents the first major challenge in nontargeted metabolomics since it facilitates the translation of raw data into the less complex format of so-called feature tables. While approaches enabling metabolomics DPP keep being improved, the general steps have remained unchanged across different tools (Figure 6 ) (with very few exceptions as in ref (121)). However, despite this fact and the development of different DPP parameter optimization tools 122−124 it often suffers from extensive problems. Those include false negative and false positive reports of ion species as well as wrongly reported abundance values and other issues. 125−128 It should be noted that data pre-processing is not challenging because it is hard to perform, but because it is hard to perform well. This point was laid out by Sindelar et al., who demonstrated why poor performance of data preprocessing could lead to much harder downstream data analysis. 129 It is therefore essential to control the effectiveness of this process. Figure 6 General steps of nontargeted data preprocessing. There are a number of advances we would like to highlight in this context. One recent R package, named patRoon, combines different data preprocessing and annotation algorithms into a single framework and thereby allows us to build pipelines in the R-environment. 130 This increases flexibility in data processing choices considerably since it allows us to combine the strengths of many different tools and to compare them more easily. It is worth noting that patRoon supports any HRMS platform and supports algorithms from many widely used tools such as ProteoWizard, 131 XCMS. 132 Two other tools which should be noted here are NeatMS 133 and MetaClean 134 , which are based on deep learning and machine learning, respectively. Both tools allow us to comprehensively assess the peak picking quality as conducted via different tools for experimental datasets. 133 To the best of our knowledge these recently published works represent the only available tools to comprehensively assess peak picking quality for all picked peaks, which poses a significant advancement. However, RT alignment and false negatives are not considered in this approach which makes further development necessary. To address this need mzRAPP was introduced, a tool enabling reliability assessment of different nontargeted data preprocessing steps (under submission). It is based on automatically validated and extended benchmarks (starting from user supplied integration boundaries per molecular formula) and allows us to derive different performance metrics including the proportion of false negatives, affected isotopic ratios, and the number of alignment errors for nontargeted DPP of any experimental datasets. It is worth noting that the use of benchmark datasets in this context enables us to investigate the number of false negative peaks as they provide a so called “ground truth” as a reference point. While this also offers several other apparent advantages for the benchmarking of different DPP tools, 135 benchmark datasets in metabolomics come with significant problems. First, their curation process requires extensive manual work and is hugely time intensive (although some do exist; e.g., ref (136)). This, in turn, implies that it is impractical to create benchmarks for different types of datasets (e.g. sample complexities or choices in instrumentation or acquisition mode such as RP-LC, HILIC, orbitrap MS, TOF MS) which might imply different needs for applied DPP software. Secondly, it can be problematic to consider benchmark datasets as “ground truth” without sufficient validation. mzRAPP tackles this problem by automatically applying a number of validation metrics to check the consistency of user supplied benchmark candidates. Elimination of Redundancies and Noise from Feature Tables As discussed, nontargeted data preprocessing of LC-HRMS data generally leads to aligned feature tables. Ideally, (when bioinformatic noise is not considered) rows in those feature tables correspond to chromatographic peaks with specific mz@RT values in different samples. Hence, each mz@RT value ideally reflects an ion species originating from a sample molecule eluting from the chromatographic dimension and being ionized in the electrospray. However, preprocessing workflows typically introduce significant numbers of bioinformatics noise-features into data sets. In this context, we would like to highlight three recently published tools allowing us to remove those noise features from datasets. MetProc allows us to remove features based on missing value structures in QC samples. 137 Another tool called genuMet is solely relying on injection order to identify false positive features without relying on measured QC samples. 138 Finally MS-CleanR has been added to the MS-DIAL 139 workflow, allowing us to (among other things not discussed here) filter features based on blank signals, background drifts, unusual mass decimals and relative standard deviations (RSDs). 140 Since all of those tools offer slightly different approaches, their compatibility for different data sets remains to be elucidated. Over the last years, many papers and authors have discussed the challenge that the number of reported ion-species cannot be directly translated to the number of sample molecules. 11 This is due to bioinformatic noise and because one molecule will form multiple ion species due to the presence of different isotopologues and adducts. It has been reported that a single metabolite can lead to more than 100 different ion species during the ionization process. 141 More recently, it was also shown that adduct species differ significantly in HILIC compared to RP chromatography. 142 The same work also highlighted the problem of in-source fragmentation, which poses a significant risk for wrong annotation. Over the years a number of approaches have been developed to group those different ion-species in order to eliminate redundancies or even gain additional reliability for annotations. Many tools enabling this and other important data analysis steps are summarized elsewhere. 120 An interesting experimental approach which has been shown to allow improved and simplified annotation of adducts has been to measure samples twice with different LC-MS buffer compositions (14NH3–acetate and 15NH3–formate buffer) 77 (in fact, this approach has also been used by ref (142)). In both conducted studies this approach showed great potential for annotating adducts and eliminating noise. Unlike credentialing approaches 74,143 this workflow is applicable to any samples even if it cannot be labeled via stable isotopes. However, as it requires two measurements for each sample, it dramatically increases measurement time and might not be applicable to small sample volumes. Nevertheless, this approach presents significant improvement in increased control over noise reduction and adduct annotation. The Annotation Task In metabolomics the term annotation refers to the assignment of molecular information to features. This information can involve details on contributing atoms (molecular formula e.g. C6H12O6), structural class (e.g. steroid), atomic connections (e.g. phenylalanine), relative stereochemistry (e.g. leucine or isoleucine) or chirality (e.g. d-leucine). Different approaches allow to collect evidence for the affiliation of a feature on any of those levels. In fact the Metabolite Identification Task Group of the Metabolomics Society has proposed reporting standards for different levels of identification depending on the nature of collected evidence (Figure 5 shows the proposed metabolite annotation). While those standards are defining specific types of evidence which have to be collected for a level to be reached (e.g. matching of acquired MS/MS scans against a mass spectral library), there are no consensus criteria for the necessary strength of collected evidence (e.g. what constitutes a valid spectral match). In this context one of the most discussed topics is the adaptation of a false discovery rate (FDR) for spectral matching as it is routinely applied in the proteomics field. Over recent years a range of different strategies allowing us to apply this idea also in metabolomics has been proposed or implemented. 18,144−147 While their actual application is still scarce, they definitely pose a step toward increased reliability of annotations. Another point, which needs to be considered in this regard is the nature of reference spectra used for spectral matching. Until now, matching against experimental spectral libraries has been considered the gold standard for this kind of approach. Although spectral libraries have been growing to impressive sizes (e.g. recently METLIN reached more than 850 K standard spectra), 148 a recent evaluation on available reference spectra from authentic chemical standards 149 regarding the coverage of different MS spectral libraries in different genome scale metabolic models (GSMs) revealed that on average only <40% of metabolites in the models are represented. Meanwhile, in silico approaches MetFrag 150 (a combinatorial fragmenter) and machine learning based methods such as CFM-ID 151 (an in-silico fragmenter) and CSI:FingerID 152 (a structure predictor) are more and more accepted. This is mainly due to their increased coverage of the molecular space since they do not rely on experimental fragmentation data but molecular structure databases such as PubChem. 153 Indeed some of those can even go beyond that (e.g. in combination with tools like EMMF 154 ). The advantage is evident since such structure databases are many orders of magnitude larger than any spectral library. Indeed, this might lead to an improved FDR when using this kind of approach as compared to matching against a spectral library with less metabolic coverage. Another strategy worth mentioning involves the support of annotations utilizing reactivities of specific functional groups. Briefly, this involves the specific derivatization of functional groups (such as amines, carboxylic acids, alcohols, etc.) commonly referred to as sub-omes. 155 Derivatization improves overall ionization efficiency and enables selective separation and enrichment using reversed-phase stationary phases. Moreover, the production of sample specific ISTDs is facilitated. Blends of sample derivatized with isotopically labeled reagent or unlabeled reagent, respectively, served for relative as well as absolute quantification. 156 This also enables credentialing-type approaches (as discussed above). 157 As a drawback, these approaches take considerable effort in terms of data analysis. Dedicated RT and spectral libraries for identification of derivatized molecules (available for some derivatization strategies such as dansylation 158 ) are required. It should be noted that derivatization approaches reduce throughput and require dedicated validation, due to challenges arising from matrix effects and decreased stability. 159 Hence, derivatization strategies can potentially bring many advantages, but require an extensive amount of work in order for validation and method development. H/D exchange on the other hand is more straightforward in its application and can be included into existing data evaluation pipelines. Recently, there have been significant advancements in infrastructure for this type of analysis. For example, the software MetFrag supports H/D exchange data. 160 Although H/D exchange only allows us to investigate acidic moieties, its potential for annotation has been shown in multiple studies. 161,162 In cases where the annotation strategies discussed above fail to deliver the desired insight, novel approaches based upon complex bioinformatics algorithms fill the void. These innovations utilize molecular networking of fragmentation spectra (spectral similarity translated to biochemical and chemical substructures) or machine learning algorithms. In this context MS2LDA, which was initially published in 2016, 163 associates specific fragments and/or neutral losses with chemical moieties and, thereby inspecting complex structural relationships between different unknown analytes. This algorithm has been further developed to now directly enable differential analysis of chemical substructures between different samples (such as investigations on the regulation of xenobiotic derivatives across different samples. 164 More recently also feature based molecular networking, allowing us to consider the chromatographic and/or ion mobility dimension in this type of analysis, has been introduced. 165 This way, isomers and in-source fragments can potentially be investigated. Another tool we wish to highlight here is CANOPUS 166 , which classifies features via their MS/MS spectra even when existing spectral libraries do not include MS/MS scans of the class in question. Annotation in the Field of Lipidomics The general annotation strategies applied for metabolomics are often not applicable in nontargeted lipidomics. This is reflected in a survey among lipidomics researchers from 2018 167 revealing that 60% of all researchers rely mostly on manual (visual) annotation. Even though software tools are available and commonly applied (e.g. LDA; 168 MS-DIAL, 169 LIFS software tools 170 ), manual annotation remains an integral part of lipid annotation highlighting the lack of adequate nontargeted analysis tools in lipidomics. Most available software tools are based on two approaches: library matching (MS-DIAL, LipidSearch, Greazy, LipidDex, etc.) and decision rule-based identification (LDA, LipidXplorer, LipidMatch, LipidHunter, etc.). Due to building blocks of lipids leading to a distinct MS2 pattern within the same class, decision rule sets based on well-defined fragments (fragment rules) and their intensity relationships (intensity rules) can be described for specific lipid classes. 171,172 For library matching similar principles are applied as in standard metabolomics workflows using accurate mass, MS2 spectra, and scoring algorithm. Both experimental or in-silico databases are applied in lipidomics. Unfortunately, false discovery rate calculation is not possible up to now and a certain level of false assignments is state of the art in nontargeted lipidomics. Hence, it is of utmost importance to reliably estimate the proportion of potentially false assignments. Filtering of false positive annotations can be done by relative RT; the homologous lipid series of the same class depends on relative carbon number and/or relative double bond number. 173 Using regressions models the so-called equivalent carbon number (ECN) model can be applied for manual annotation 174 or RT prediction 175 in order to exclude false positive hits and confirm lipids. Additionally, Kendrick mass plots can be used to identify homologous series in lipid data sets. 176 The application of Bayesian statistics presents an interesting and promising direction and may overcome some limitations of hand-crafted rule sets. 177 Excellent community-based resources provide guidelines (see ILS, LSI 92 ) on criteria and characteristic fragments for MS/MS annotation. The LIPID MAPS 178 and LSI website list continuously update information on manual inspection of MS/MS data reporting on obligatory fragment ions for unambiguous annotation of lipids. Still, only the minimum requirements have been defined (see ILS and LSI) so that openness and transparency of reported datasets remains inevitable to bring harmonization in lipidomics to the next level. As nontargeted lipidomics remains error-prone and still requires expert knowledge, comprehensive information on lipid annotations is essential. The periodicity of lipids offers further control points in lipid identification. In our opinion, lipidomics and metabolomics annotation have to be harmonized and is already possible using the proposed identification levels by the metabolomics society (Figure 5 ). Retention Time and Cross Section as Orthogonal Parameter in Nontargeted Analysis Retention Time for Annotation Orthogonal data as chromatographic RTs are key to increase the confidence of MS-based compound annotations. So far, the poor reproducibility and commutability of experimental retention times across labs even when using reversed-phase chromatography only precluded the wide adoption of RT libraries 179 for high quality annotation across labs. RT prediction from molecular structures is a currently very active area of research. The most relevant developments are summarized elsewhere. 180 The most recent advances not covered in the review are provided by the software tools Retip 181 and QSRR automator. 182 Retip is a machine learning based tool which has been trained using more than 800 standard compounds for each, RP and HILIC chromatography. Retip was integrated into the MS-DIAL tool-box. QSRR automator has been published as a Python package and builds RT prediction models for in-house chromatographic methods. 182 It is worth noting that RT prediction is not (yet) accurate enough to enable accurate identification of small molecules. However, it can be applied for the annotation of (miss-)annotated in-source fragments and allows reranking of positional isomers which can provide valuable insights. 181 Collision Cross Section Value for Compound Annotation The role of collision cross sections (CCS) obtained from ion mobility spectrometry (IMS) for confident compound annotation has been extensively discussed. 183−186 The pace of generating CCS databases (both experimental and in-silico predicted) has been enormous. 187−189 Currently there are two unified databases, CCS Compendium 186 and AllCCS. 190 Novel open-source software tools facilitate data evaluation. 131,169 Seminal studies showed that interlaboratory reproducibility of CCS assessment outperforms 191,192 reproducibility of chromatographic RTs. As a drawback, a CCS value correlates with the measured accurate mass of a molecule, while chromatographic retention offers an entirely orthogonal identifier. Due to the current limitations in ion mobility resolution, isomer separation of small primary metabolites is limited. In complex samples, only molecules exhibiting CCS differences in the low % range (typically 3%) are routinely resolved. The resolution is improved by novel advanced instrumental concepts. 193,194 Recently, the potential of trapped IMS (TIMS) to separate lipid isomers was shown. 195 The obtained resolving power allowed us to discriminate lipid species exhibiting CCS differences of <1% in complex biological mixtures. Several studies implemented ion mobility for structurally characterizing lipids with a high degree of specificity. Information on double bond position and geometry was obtained combining IMS with ozonolysis and Paternò–Büchi reaction. 196,197 Navigating the Conflicting Goals of Metabolome Coverage and Throughput Up to date metabolome analysis is the best fit for purpose compromise between coverage, selectivity, and throughput. High coverage implies a wide interrogation window with regard to both the chemical molecular dimension and the metabolite abundance dimension (8 orders of magnitude concentration difference). Major application areas of metabolomics such as, e.g., precision medicine envisage the measurement of large cohorts (thousands of samples) in regulated environments. The current transitory phase from small scale experiments to large scale studies, industry- and clinical applications, triggers exciting developments regarding streamlined workflows and tailored solutions with advanced throughput. As the field moves forward, economic considerations regarding cost effectiveness and automation of the complete workflow become more important. Miniaturization accommodates the analysis of small precious samples, bears the potential of increasing sensitivity, and reduces solvent consumption following the principle of green chemistry. We will discuss key aspects of current developments—from sample preparation to analysis, advancing automation, miniaturization, and throughput—and discuss the methods with regard to coverage and selectivity. Sample Preparation High-throughput sample preparation is still a bottleneck preventing exploitation of the full potential of high-throughput MS-based metabolomics. A recent review discusses the need for high-throughput technologies emphasizing the role of sample preparation. 198 The state of the art of sample preparation strategies for all relevant sample matrices is comprehensively reviewed elsewhere. 1 Protein precipitation upon dilution, liquid–liquid extraction, and solid phase extraction (SPE) are widely accepted methods in metabolomics and lipidomics analysis which can be adopted for robotic liquid handing systems (e.g. methyl tert-butyl ether (MTBE) extractions in lipidomics 199 ). Further advancement of classical sample preparation strategies in metabolomics and lipidomics is driven by emerging application fields such as biotechnological large-scale enzyme activity screens and plate-based biomarker or drug screening and includes the development of miniaturized green sample pre-treatment (e.g. micro-liquid–liquid extraction, low volume SPE), offering favorable extraction kinetics, high preconcentration rates, and increased throughput. For example, implementation of a commercially available, fully automated SPE system using small volume SPE cartridges achieved a duty cycle of less than 15 seconds per sample preparation. 200 Automated nondispersive micro-liquid–liquid extraction allows high-throughput through parallelization. Dispersive micro-liquid–liquid extraction ameliorates extraction kinetics, but severe limitations regarding automation of phase separation have been reported. 198,201 Currently, solid-phase microextraction (SPME) and electromembrane extraction methods are “re-explored” for metabolomics, given their potential for fully automated parallel extraction in well-plate formats and enrichment through miniaturization. 202−204 SPME is a nondestructive and nonexhaustive extraction showing great promise in probing and extraction of “tiny” metabolomes. While multianalyte quantification remains a challenge, low invasiveness of SPME and the nonexhaustive nature of extraction, together with recently developed extractive phases, make the technique particularly attractive for time-resolved or spatially resolved metabolomics fingerprinting. 202 For example, a high-throughput time-course metabolomic analysis was achieved through multiple extraction of 96-well-plate cell cultures. 205 Direct immersion (DI) in vivo sampling enabled time-resolved metabolic fingerprinting of animal brains 206,207 and a method for the analysis of small molecules from semi-solid tissue relying on DI-SPME and desorption electrospray ionization, (DESI)-MS, has been proposed, promising space-resolved analysis of tissues. 208 Non-exhaustive in vivo extraction followed by GC X GC qTOFMS analysis enabled real-time monitoring of apple metabolism during the process of ripening on the tree. The slim geometry of the extraction device avoided tissue wounding and oxidative degradation of analytes seen with conventional workflows relying on harvesting, metabolism quenching, and ex vivo extraction. 209 However, the current selection of commercially available DI-SPME extractive devices is very narrow, limiting the wide adoption of this technique. 202 Electromembrane extraction is a combination of partitioning-based liquid–liquid extraction and electrophoresis. Fundamentals of electromembrane extraction have been summarized in a review by Douin et al. 204 Analytes move from a donor phase, usually an aqueous sample, through a water-immiscible organic layer acting as purification filter, into an aqueous (or optionally organic) acceptor phase. Mass transfer is driven by an electric field introduced between donor and acceptor phase via insertion of electrodes and application of direct current in the milliampere range, which speeds up the extraction process and enhances extraction yield compared to simple partitioning-based extraction. For optimized systems, selective analyte enrichment up to 100-fold and recoveries up to 100% 204 and excellent cleanup potential have been reported (salt- and protein-removal, 210 phospholipid-removal 211 ). The technique holds high potential for point of care analysis as enabled by parallelization and downscaling of analysis as well as implementation into microfluidic chips (e.g. Hansen et al. 212 ). However, the extraction principle is inherently limited to ionizable molecule species and is not suited for molecules prone to degradation by electrolysis, and electrolysis phenomena are aggravated with decreasing acceptor volume. Plus, electromembrane extraction is a selective extraction procedure, 204 preventing the full scope of wide coverage metabolomics. On the other hand, high selectivity towards target analytes is a desirable feature for specialized routine application in regulated environments as it facilitates process validation. Direct Analysis in Metabolomics and Lipidomics Flow Injection-MS Direct analysis has its undisputed role as a rapid first-pass metabolic fingerprinting method. It comes with a reduced analysis time of 2–5 min, thereby increasing the analytical throughput by one order of magnitude compared to typical LC-MS-based metabolomics. A recent review gives an excellent summary on successful applications and well-known limitations imposed by matrix effects and the occurrence of isomers and in-source fragments. 213 Ion suppression and ion competition were studied in fundamental experiments using injections of 5 μL at flow rates <100 μL min–1, where ion competition was shown to be a major cause for limited sensitivity in orbitrap MS. 214 As a consequence, sensitivity could be increased by optimizing data acquisition. The use of sequential narrow mass segments in trapping MS with fixed m/z windows or variable sample specific windows showed to be valid strategies for improving sensitivity and linear dynamic range. 214 A recent study combined FI-HRMS with online fractionation improving the metabolome coverage and reducing matrix effects. 215 The fully automated sequential fractionation was based on solid-phase extraction on complementary ion-exchange and reversed-phase chemistries. Fast and high coverage screening (3 min per polarity) was thoroughly validated for targeted analysis of 50 diagnostic and explorative biomarkers in plasma samples, including amino acids, amines, purines, sugars, acylcarnitines, organic acids, and fatty acids. The sensitivity of FI was significantly improved. LLOQ values comparable to conventional LC-MS/MS were reported. FI-HRMS for quantification of high abundant cholesterol and cholesterylester utilizing compound specific response factors proved to be fit for the purpose for cultured cells, tissue homogenates, and serum samples. 34 IMS offers a rapid (millisecond-regime) post-ionization separation dimension, 216 which makes it particularly attractive for FI analysis. Its benefit for both targeted and nontargeted metabolomics has been investigated. 217,218 Compared to FI-MS alone, FI-IMS-MS offers improved linearity and reduced noise level. Nonetheless, ionization suppression due to matrix effects remains a major obstacle with detrimental impact on sensitivity, peak capacity, and consequently, coverage. 219 It is therefore unlikely that IMS will render chromatographic separations obsolete in nontargeted analysis. Ultimate Throughput–Duty Cycles of Seconds Per Sample The cycle time of the sample transfer to the MS limits the throughput of FI-MS-based metabolomics. For example, the fastest commercially available SPE system offers a sample cycle time of 10 seconds, limited by the required SPE elution volumes. 200 When used without SPE, the rate limiting step becomes the autosampler, enabling a duty cycle of 2.5 seconds per sample, 220 a setting which was proposed for drug discovery and high-throughput MS targeted assays. Duty cycles of seconds per sample are also realized in alternative ambient MS approaches. However, despite significant progress, large scale metabolomics studies have not yet been put into practice. Excellent duty cycles in the second-regime were, for example, obtained by immediate drop on demand technology combined with open port sampling interfaces (I-DOT-OPSI-MS). 221 Recent studies on single cell metabolomics demonstrate the power of high throughput MS. Another emerging high-throughput-technique enables nanoliter-scale infusion MS at sampling rates of up to 6 Hz installing plate robotic handling. 222 Acoustic droplet ejection (ADE) uses acoustic pulses to generate nanoliter-droplets directly from a microtiter plate in a contactless manner with high speed, precision, and accuracy. The potential areas of future applications are evident and range from high-throughput drug screening assays, plate based synthetic chemistry, and large-scale biotechnological studies addressing enzyme kinetics. Interfacing ADE with MS involved (1) acoustic mist ionization (AMI) coupled to MS 222 or (2) acoustic ejection MS (AEMS) using an open port interface (OPI) with electrospray ionization (ESI). 223,224 While the first approach integrated droplet generation and ionization, the latter configuration used ADE only for sample delivery for subsequent ionization by ESI. This way, matrix effects and adverse effects caused by contamination of MS transfer capillaries were reduced. Excellent analytical figures of merit were obtained upon injection of 25,000 samples (standards) revealing excellent RSD of 8.5% for peak intensity and full width at half maximum (177 ms), respectively. The peak width was in the order of 200 ms. 224 Miniaturization–Nanoflow Direct Infusion Miniaturization of direct analysis toward nanoflow proved to be particularly attractive because of the inherent features of nanoESI. Ionization at this flow regime is characterized by increased ionization efficiency. At the same time, differences in ionization efficiency for different molecules are significantly reduced as compared to ESI at higher flow rates. 225 Shotgun lipidomics accomplished by chip based nanoESI orbitrap MS have become an essential tool of the trade for both lipid identification and quantification. 35,226 A 50 min analysis time consuming only 10 μL of sample solution is theoretically possible. 227 In practice, a 5–15 min run time ensures analysis at both polarities while applying data dependent acquisition (DDA) or data independent acquisition (DIA) approaches. Today, MS2 methods based on DIA (covering the whole mass range in 1 Da steps 228 ) prevail over DDA (follows the intensity order 226 ). Dedicated software solutions allow for noise filtering accelerating data processing. 229 Typically, several hundred lipids are identified on a species level covering the abundant lipid classes. Several strategies enable increased coverage, by e.g. including derivatization. 230 Pitfalls regarding lipid identification are summarized and curated by the LSI. 29 Quantification is achieved by ISTDs. The lipid head group determines the ionization efficiency to a large extent, allowing us to minimize the number of calibrators to one or a few per class. Response factor corrections were introduced for the quantification of neutral lipids. 34 Quantification on the molecular species level is complicated as the required MS2 level, different fatty acyl chain moieties show different responses (up to 60%), jeopardizing accuracy without correction. 231 Chromatography—Key Steps Toward Coverage and Throughput Miniaturization of Liquid Chromatography In MS-based metabolomics, microscale and nanoscale separations have been developed with the aim of advancing small scale sample analysis, increasing sensitivity and thus coverage of low abundant analytes, and finally reducing costs by overall reduced reagent consumption. Miniaturized separation used with tailored low-diameter ESI-emitters offers unrivaled absolute detection limits (fmol on column). Combinations with large volume injection and online enrichment allows the analysis of very low analyte concentrations and very efficient sample use. However, the successful application of online-enrichment-nano-RP-LC faces limitations: Numerous primary analytes show poor retention on RP-LC, and sample volumes may be extremely limited, as in single cell analysis. In such cases, the full sensitivity potential of nano-RP-LC-MS platforms may be exploited by analyte derivatization, increasing RP-retention and ionization efficiency. A comprehensive summary of theory, common approaches, and over 20 of the most recent applications of nano-LC-MS in metabolomics and lipidomics investigation can be found elsewhere. 232 Single cell analysis is an emerging application of small-scale metabolomics by nano-LC-MS. Recently, Nakatani et al. reported a method for derivatization-free targeted quantification of hydrophilic metabolites in single HeLa cells. Living single cells were sampled from culture using an in-house developed nano-pipette device, and the sampling capillary was directly connected to a sample loop line. The optimized nano-LC-MS/MS method based on a self-packed RP-LC column (pentafluorophenylpropyl Discovery HSF5, 0.1 × 180 mm, 3 μm) and multiple reaction monitoring yielded an average sensitivity increase of 26-fold compared to a conventional flow setup (2.1 x 150 mm) employing the same column chemistry. 18 relatively abundant hydrophilic metabolites (16 amino acids and 2 nucleic acid related metabolites) were detected and quantified in 22 single HeLa cells. Clustering in different groups was observed. 233 Another emerging nano-LC-MS application is the in-depth, high-coverage analysis of the lipidome. With a recently published 110 min nano-LC-MS method, linear dynamic range and sensitivity could be substantially increased by 1–2 and 2–3 orders of magnitude, respectively, when compared to conventional high-performance LC (HPLC) (150 × 2.1 mm, 2.7 μm). The proposed workflow displayed excellent analytical figures of merit after careful optimization of sample reconstitution. Lipidome coverage was evaluated for the phospholipidome of S. cerevisiae and achieved increased lipid identification (436 phospholipids) compared to conventional-flow HPLC and a shotgun approach. Low abundant lipid species and isomers could be detected even when they were coeluting. 234 When combined to a new data evaluation pipe-line, almost 900 lipid species in 26 lipid classes in S. cerevisiae were identified. The identification rate was increased by a factor of 4 compared to previous whole yeast lipidome shotgun studies. 89 The high potential of this workflow for in-depth lipidome analysis is highlighted due to the detection of less common lipid classes like monomethyl-PE (MMPE) and dimethyl-PE (DMPE) and lipids with incorporated odd-chain and diunsaturated fatty acids. For a long time, the development and wide adoption of microscale separations in metabolomics suffered from the fact that many stationary phase chemistries were not commercialized for the required column dimensions (1.5–0.5 mm inner diameter for micro-LC; 0.5–0.15 mm I.D. for capillary LC 235 ). Micro-LC separations are more common, since ionization performance of ESI sources is compromised at the flow regime of capillary-LC (0.01–0.001 mL min–1). The sensitivity gain of micro-LC is moderate as compared to microbore-LC (3.2–1.5 mm I.D.). In a recently published study, the optimized microflow-LC-MS/MS improved sensitivity in a compound-dependent manner by 6- to 49-fold when compared to conventional microbore-LC-MS/MS. 236 In metabolomics, the sensitivity gain provided by micro-LC has been exploited to design rapid separations for metabolic phenotyping. Throughput has been optimized at the expense of chromatographic performance, providing fit for purpose platforms with enhanced but not maximized sensitivity upon miniaturization. 237−241 A systematic comparison to conventional HPLC methods in terms of LOD and LLOQ was beyond the scope of these studies. Short micro-ultraperformance (UPLC) type of separation utilizing RP materials with sub-2-μm particles (100% wettable, 1.0 × 50 mm, 1.7 μm) and separation times of 2.5 min was successfully applied in large scale phenotyping studies 237 and recently explored in combination with ion mobility. 238,239 A rapid micro-HILIC method utilizing sub-2-μm particles (Acquity UPLC BEH-Amide, 1.0 × 50 mm, 1.7 μm) addressed the analysis of polar metabolites in rat urine in less than 3.5 min. Comparison to conventional HILIC-MS demonstrated 4-fold reduction of analysis time, 75% reduction in solvent consumption, and 18-fold reduction of sample consumption, while providing sufficient retention of polar metabolites (e.g. hexoses, methylhistidine, kynurenic acid, creatinine), excellent reproducibility (RT RSDs between 0.31 and 6.3% over 134 sample injections), and excellent run-to-run reproducibility. 240 Rapid lipidomic profiling of plasma by micro-LC-IMS-MS proved to be fit for the purpose for clustering plasma lipotypes as assessed in breast cancer patients and healthy controls indicating the suitability of micro-LC–IMS–MS as a rapid platform for large scale lipidomics screening. 241 The combination of online SPE enrichment with micro-LC-MS was validated for targeted analysis of 13 steroid hormones from human plasma. After careful optimization, large volume injections obtained excellent LOQs in the sub-ng/mL range at high throughput (below 3 min per sample). Validation according to FDA guidelines showed the suitability for high-throughput analysis in a clinical routine laboratory. 242 Cebo et al. introduced a validated approach based on offline mixed-mode SPE enrichment and micro-UHPLC-ESI-triple quadruple (QqQ)-MS/MS for the quantification of 42 oxylipins in plasma and platelets at reasonable throughput (13 min per sample). 243 Limits of detection were between 2 and 250 fmol on column, offering comparable LODs to well established conventional LC approaches, but at significantly reduced solvent consumption. Multidimensional Chromatography Comprehensive two-dimensional chromatography is undoubtedly a powerful strategy for the separation of complex mixtures. 244,245 In order to maximize the separation space and thus the peak capacity, orthogonality and compatibility of the two dimensions is essential. Each chromatographic peak of the first dimension requires sampling several times, efficient transfer, and rapid separation by a second dimension, in order to maintain the chromatographic resolution of the first dimension. The time spans between successive separations in the second dimension should be minimized requiring short separation and equilibration times. This is well established in GC X GC but still a technical challenge in 2D-liquid-chromatography. The wide suite of successful GC X GC applications in metabolomics has been summarized elsewhere. 1,245 In LC X LC, method development is still regarded as a bottleneck. The experimental design regarding the two separation dimensions is not straightforward, as separation conditions influence and restrict each other. 244 Solvent incompatibility with regard to mismatch of elution strength and immiscibility requires dilution of the sample upon transfer to the second dimension. Typical 2D-LC-MS designs involve HILIC and RP-LC. Long microLC/microbore LC columns (flow rate 10–50 μL/min) are employed as the first dimension, followed by short, thick columns (flow rate in the mL/min regime) in the second dimension. In the last years many new instrumental 2D-LC designs have been developed for metabolomics/lipidomics applications facilitating flexibility and universality. 246 Elaborate constructions allow versatile modulation (i.e. sample collection and transfer) including active modulation with dilution conditions optimized over the separation time. Despite significant progress, the theoretical peak capacity in LC X LC-MS is hardly reached in practise, due to incomplete usage of separation space, still suboptimal cutting, and peak deterioration upon remaining solvent incompatibility. Adoption of comprehensive LC X LC-MS in metabolomics and lipidomics has been limited mainly because comprehensive 2D-LC-MS metabolomics approaches developed up to date have largely suffered from incomplete usage of separation space in HILIC and RP-LC combinations and from severe sensitivity loss. 247 The latter diminishes actual coverage in nontargeted screenings. Solvent evaporation interfaces as featured in SFC X RP-LC lipidomics might overcome this challenge. Recent reports on 2D-SFC-RP-MS, separating 370 lipids from 10 lipid classes of human plasma within 38 min, are promising. 248 2D-LC-MS relying on heart-cutting strategies proved to be powerful in selected applications 246 including the separation of secondary metabolites in plants and emerging targeted chiral metabolomics. 249,250 Dual/Parallel Chromatography Several column switching approaches have been introduced as elegant solutions for increased throughput and coverage within one analytical run. One successful configuration (Figure 7 A) integrated serial orthogonal chromatography in order to transfer the poorly retained metabolites of the first dimension onto a second orthogonal column and enable two parallel separations subsequently. This configuration offered a valuable alternative to heart-cut chromatography. 246 A simple six-port valve was installed between the two chromatographic columns enabling to transfer metabolites eluting from the first column onto the second column. Two independent separations were carried out by the switching of the valve. This setting was successfully employed for high coverage metabolome 251 and lipidome analysis. 252 For metabolome analysis, reversed-phase and porous graphitized carbon LC were combined. The method was validated by targeted absolute quantification of 80 primary metabolites in P. pastoris. Excellent RT stability (average 0.4%) even in the presence of a biological matrix was obtained. An interplatform comparison with GC- and LC-tandem-MS analyses showed the power of the method even with respect to sugar phosphate isomer quantification. 251 The same separation concept combined HILIC and RP-LC for high coverage lipidome analysis. 253 The void volume of the HILIC separation containing non-polar lipids was transferred to the RP column which enabled the on-line combination of HILIC with RP without any dilution in the second dimension. Rapid consecutive separation for polar lipids and class specific separation for nonpolar lipids was accomplished within one analytical run of only 15 min (including re-equilibration time, using stationary phases with sub-2-μm particles and UPLC). Figure 7 Practical setup solutions for sequential and parallel LC. (A) In valve position A, the void volume of the first column is transferred to the second column. Afterward, the valve is switched in position B and the sample is analyzed on both columns parallel. 251,252 (B) In valve position A, the first extract is injected on the first column and analyzed. Meanwhile, the second column is equilibrated and the mobile phase is flushed into waste. After separation on the first column, the valve is switched to position B and the second extract is injected on the second column and analyzed while the first column is equilibrated. 253 (C) In valve position A, the sample is loaded and divided into two sample loops equally. In valve position B, both parts of the sample are injected onto two orthogonal columns and analyzed. 254 Figure 7 summarizes options for successful column switching technologies. Dual chromatography extends the separation space by fully automated consecutive or parallel execution of orthogonal chromatographic separations. Different configurations enabled orthogonal dual HILIC/RP-LC separation by parallel injection of two extracts from one biological specimen (Figure 7 B) or of one sample extract (Figure 7 C). While the latter configuration was proposed for simultaneous analysis of nonpolar and polar metabolites, 254 the parallel injection of different sample extracts facilitated the development of merged metabolomics/lipidomics. 253 The HRMS workflow integrated biphasic extractions, parallel injection, separation, and MS analysis providing the full scope of targeted and nontargeted metabolomics and lipidomics within one analytical run. Wang et al. proposed a dual chromatography approach for simultaneous lipidomics and metabolomics analysis implementing parallel HILIC and RP separation in a heart cutting 2D-LC configuration where parallel analysis was preceded by prefractionation on a first separation dimension. 255 As a drawback, this method precluded the integration of biphasic extractions, since only one sample could be analysed. Recently, sample preparation and reconstitution were reoptimized in order to provide high coverage within one measurement solution. 256 MS Platforms and Data Acquisition Strategies—Improving Coverage, Selectivity, and Reliability Despite significant progress, cutting edge low-resolution tandem-MS outperforms latest generation HRMS for quantitative analysis, both in terms of sensitivity and linear dynamic range. Large scale metabolomics studies are often performed on triple quadrupole-MS based platforms profiting from increased robustness and high quantitative capabilities. 257 Today, the implementation of multiple reaction monitoring (MRM) approaches is supported by significant computational resources. A library containing MRM transitions for more than 15,500 molecules is publicly available. 258 Both experimentally assessed and in-silico generated MRM transitions are included. Dedicated software tools enable optimization of MRM transitions including collision energies using mass spectral libraries, such as METLIN and HMDB. 259 Large-scale metabolomics as enabled by QqQ-MS was successfully applied in wide-targeted assays providing absolute quantification of a high coverage metabolite panel. 1,97 Recently, hybrid MS approaches emerged which offer attractive ways bridging the concepts of targeted and nontargeted analysis. 20 These workflows successfully exploit the power for accurate relative quantification by QqQ-MS, without omitting a discovery step. The “discovery” is realized through optimizing the MRM transitions based on a sample matrix representative for the large-scale study. This optimization/discovery step can be performed using low mass resolution MS only or integrating high mass resolution. In MS/MS analysis by HRMS, sensitivity and selectivity (and thus the coverage) are significantly influenced by the type of mass spectrometer, but also by the selected data acquisition strategy. DIA and DDA acquisition modes have their specific applications in both targeted quantification and nontargeted compound annotation. 260−262 For both DIA and DDA, new tools for online/on-the-fly and offline scan-level control, fragmentation, and acquisition optimization are available to support automated mass spectrometer parameter choice. 262−264 Toward MS-Based Multi-omics Emerging multi-omics analysis led to significant efforts for methods integrating multiple omics layers for one sample. Significant progress relates to the experimental design of multi-omics measurement and data evaluation strategies. Multi-omics applications profit from global phenotype metabolomics data acquired at reasonable throughput. Sophisticated sample preparation protocols together with high coverage multi-platforms characterize the tools of the trade for multi-omics analysis. Cutting edge network analysis enables us to integrate MS-based datasets with genome, transcriptome, proteome, and metabolome information derived from orthogonal platforms. Multi-omics Sample Preparation Strategies Multi-omics sample preparation approaches have to deal with the challenge that preferred collection methods, storage techniques, required quantity, and choice of biological samples are not directly transferable from one omics field to the other, especially when quantification rather than profiling has to be performed. 265 The metabolomics part of multi-omics studies is especially challenging as degradation, oxidation, or conversion of metabolites (including lipids) might occur during sample preparation. Moreover, the procedures have to be tailored for the two sub-omes. 9 In the multi-omics setting, discovery studies call for minimal pretreatment in order to prevent the potential loss of metabolites. 266 Multi-omics sample preparation strategies based on a single sample enable the true combination of multi-molecular information without influences of different sample aliquots. One-phase extractions coined as sample preparation for multi-omics technologies (SPOT) were successfully applied for the parallel analysis of the proteome and metabolome. 267 Other multi-omics strategies involve filter tips such as cellulose based filter tips, which enable detergent-free single-pot metabolomics and proteomics by capturing the protein fraction, collecting the metabolite flow-through and the peptide fraction after tryspin digestion. 268 For lipid sub-ome integration, extraction strategies involve different solvent mixtures with higher organic content, e.g. chloroform/MeOH 269 or MTBE/MeOH. 199 Two-phase extractions such as chloroform/MeOH/water 270 or MTBE/MeOH/water 226,253,271,272 proved to be particularly successful for global high coverage analysis as metabolites and lipids can be analyzed from a single sample at the same time separating the protein fraction. In terms of sample handling and automatization potential, MTBE/MeOH/water is preferable as the protein pellet is found at the vial bottom after phase induction and is not present in the intermediate phase between the polar and nonpolar phase. Such phase separation strategies reduce the sample amount needed and pave the way for direct-infusion and LC-MS based multi-omics (metabolomics, lipidomics, and proteomics) from a single sample. 226,273 Additional lipidome coverage can be achieved using approaches such as (1) 3-phase extraction separating neutral lipids in the upper phase and glycerophospholipids in the middle organic phase 274 or (2) the combination of two-phase liquid extraction with MTBE combined with SPE. 271 In order to understand metabolite action on the subcellular level, experimental resolution of subcellular metabolism is needed. Recently, non-aqueous fractionation was successfully applied to resolve subcellular plant metabolism and the corresponding proteome. 275 In this comprehensive approach, organic solvent mixtures and ultracentrifugation was applied to analyze proteins and primary metabolite in a four-compartment plant model comprising chloroplasts, cytosol, vacuole, and mitochondria. 275 However, additional steps such as various liquid phases or subcellular analysis increase the sample preparation significantly so that scientists have to decide in each case at what time expense they aim to increase metabolite coverage. Multiplatform Analysis Strategies for Metabolomics and Lipidomics High coverage analysis in MS-based metabolomics is only achieved through multiplatform solutions involving orthogonal chromatographic separations integrating both HRMS and MS/MS methods. Despite significant progress in HRMS and its quantification capabilities, up-to-date tandem MS is the method of choice when aiming at the analysis of low abundant metabolites (e.g. bile acids or fatty acyl-CoA esters). Hydrophilic primary metabolites—the biological definition of metabolites involved in growth, development, and reproduction—are among the most evolutionarily conserved biomolecules. Multiple isomers and isomeric in-source fragments intrinsically challenge MS analysis. 276 Examples are hexose phosphates, pentose phosphates, 3-phosphoglycerate/2-phosphoglycerat, citrate/isocitrate, homoserin/threonine, leucine/isoleucine, adenosine monophosphate/deoxyguanosine monophosphate, adenosine triphosphate/deoxyguanosine triphosphate, and alanine/sarcosine. As a consequence, common multiplatform workflows include chromatographic separations providing selectivity for primary metabolites, a prerequisite for accuracy regarding both identification and quantification, respectively. In LC, ion-pairing and HILIC are the methods of choice separating water-soluble central metabolites. A currently accepted protocol of wide targeted analysis relies on RP ion-pairing MS/MS analysis. The method covers 215 metabolites including amino acids, citric acid cycle intermediates, and other carboxylic acids, nucleobases, nucleosides, phosphosugars, and fatty acids. 277 Sugar phosphate isomers are quantified based on distinct MS/MS fragmentation pattern 278 (as no base-line separation is provided). There are several drawbacks associated with the use of ion-pairing reagents, such as MS system contamination, ion suppression effects limiting overall sensitivity, together with the fact that metabolites ionizing only in positive mode such as e.g. carnitines and S-adenosylmethionine cannot be measured. Generally, the use of ion-pairing reagents implies the establishment of a dedicated MS systems, often precluding the combination with HRMS. For HILIC separations, two stationary phases, i.e. the BEH amid phase and the polymeric zwitterionic phase, 279 prevail, using both acidic and neutral/basic eluents. HILIC separations are versatile, but as a drawback there is no single experimental setting covering all relevant primary metabolites. 280 When optimized properly the separation selectivity of phosphorylated carbohydrates is comparable to ion-pairing chromatography. Thus, GC remains the unrivaled separation method when addressing intermediates of glycolysis and pentose phosphate pathways. Wide coverage of the primary metabolome is established upon two-step derivatization procedures (ethoximation/methoximation followed by trimethylsilylation). Routine applications involve robotic just-in time derivatization. However, GC is not suitable for measuring the energy status of a given sample, as important nucleotides and cofactors are not covered. For this purpose, both ion-pairing 281 chromatography or HILIC can be applied after careful consideration of sample preparation. 280 Up to date, despite excellent separation power for polar and ionic metabolites, the application of capillary electrophoresis (CE) in metabolomics is limited to expert laboratories. The most recent CE developments were comprehensively summarized elsewhere. 1,198 Examples of multiplatform combinations practice integrating the analysis of different extracts (tailored preparation for sub-ome analysis) followed by nontargeted assays (RP-LC-HRMS for lipidomics, GC-HRMS for primary metabolites, 280 complementary HILIC-HRMS for metabolites not amenable to GC-MS, and targeted tandem mass spectrometric assays for low abundant metabolites such as bile acids, 282 steroids, and oxylipins). 283 Alternatively, the number of MS platforms can be reduced by replacing the combination of GC-HRMS/LC-HRMS by two complementary LC-HRMS methods (either two HILIC methods with acidic and basic eluents/positive and negative ionization or the combination of acidic RP-LC-HRMS and basic HILIC-HRMS). In lipidomics, the majority of lipid classes can be covered by state of the art profiling approaches such as direct infusion MS or RP-MS. Multiplatform combinations in lipidomics often involve shotgun lipidomics for bulk lipid analysis in combination with LC-MS strategies enlarging the lipidome in terms of lower abundant lipids as recently shown for the platelet lipidome. 284 An excellent review by Rustam and Reid summarizes the analytical challenges and advances in lipidomics including common MS-based review, chromatographic solutions, and possible combinations. 285 The major challenge remains measuring high-abundant membrane lipids besides very low-abundant signal molecules across a huge polarity range (log P value from 5–35). Low concentrated lipid mediators are an important subclass often analyzed by RP-LC to cover different eicosanoide isomers. 286 Sphingolipids analysis is of high interest as they are involved in signaling and protein sorting and are often suppressed by other membrane lipids. 287 Extraction and analysis of glycosphingolipids subclasses such as gangliosides or sulfatides is challenging, and potential strategies are summarized in a recent workflow by Barrientos et al. 288 If specific lipid sub-omes (e.g. sterols or prenols) are of interest, LIPID MAPS provides methods and protocols for LC-MS and GC-MS based analysis as starting points including a summary of available analytical standards (see resources section). 178 Merging Metabolomics and Lipidomics Monitoring the metabolic phenotype should always consider lipids due to their critical function in health and disease. Lipids make up 79% (90,678 lipids in 114,126 metabolites) of all listed metabolites in the HMDB 4.0 (accessed October 2020) 88 highlighting the need to cover them in the analysis workflows. Especially when it comes to biomarker research the metabolome including the lipidome should be monitored to follow disease relevant changes as shown in recent studies on cancer prediction 289,290 or cardiovascular disease. 291 Cajka and Fiehn provide an excellent overview on the challenges and opportunities of merged metabolomics and lipidomics workflows. 9 Here, we want to emphasize that global metabolite and lipid profiling in one analytical run is possible shown by two-phase MTBE extraction and fully automated parallel HILIC chromatography for metabolites and RP chromatography for lipids. 253 The instrumental setup was realized by a HRMS, a dual-injection autosampler, and two positional 6-ports enabling simultaneous lipid and metabolite analysis in one analytical run of 32 min (Figure7 B). Untargeted screening of human plasma samples resulted in >100 metabolite (organic acids, amino acids, nucleotides, acyl carnitines) and >380 lipid (phospholipids, sphingolipids, cholesteryl esters, di-and triglycerides). Stable-isotope labeled metabolites and lipids from yeast extracts further enabled us to merge targeted and nontargeted identification and are generally possible when labeled biomass is utilized. Moreover, LC-MS based proteomics, metabolomics, and lipidomics can be performed from a single sample which provides the starting-point for interesting multi-omics studies based on network analysis, e.g. protein-metabolite interactions in mesenchymal stem cell adipogenesis. 283 Network Analysis and Visualization of Multi-omics Workflows Multi-omics derived data sets including different sub-omes rely on appropriate data integration originating from several layers of information. The ultimate aim is to understand “the flow of information” underlying a certain phenotype. Here, we will consider relevant solutions—a selection of tools can be found in Table 2 , which emerged to satisfy the need to facilitate downstream analysis of metabolomics data and generate or validate an underlying biological hypothesis. Visualization tools play a crucial role for biological interpretation of metabolomics data, and well-covered overviews can be found in several recent reviews. 292−294 Such tools are required at the end of a metabolomics workflow pipeline and assume successful tackling of steps prior in the pipeline 292,295 including construction of adept study design, biological experiment, sample preparation, identification, quantification, assessing QC standards, adjustment to batch effects, etc. The aspects and critical points of a metabolomics experiment from study design and sample preparation to data analysis and evaluation of various tools have been discussed in a comprehensive review. 293 Table 2 Selected Tools for Data Analysis and Visualization, Metabolic Networking, and Databases Name of the tool Literature Innovation MetaboAnalyst (Chong et al., 2019) 328 One-in-all metabolomics data analysis tool collection. MetExplore (Chazalviel et al., 2018; Cottret et al., 2018) 303,304 Visualization of metabolic networks and pathways, facilitates the analysis of omics data in biochemical context and pathway enrichment. KEGG (Kanehisa et al., 2017) 329 “Encyclopedia of genes and genomes”. Several model organisms. KEGG orthology for genes and proteins. Reactome (Bohler et al., 2016; Fabregat et al., 2018) 309,330 Knowledge base of biomolecular pathways: free, open-source, open-data, curated and peer-reviewed. Cyc databases (Caspi et al., 2020) 311 The “largest curated collection of metabolic pathways”. Many different model organisms. Virtual Metabolic Human database (Noronha et al., 2017) (Noronha et al., 2019) 310,314 Human and gut microbiome metabolism, 255 diseases, and also microbial genes, microbes. WikiPathways (Slenter et al., 2018) 312 Browsable, editable database curated by the research community Chemical Similarity Enrichment Analysis (ChemRICH) (Barupal and Fiehn, 2017) 306 Alternative to biochemical pathway mapping for metabolomic datasets. Not based on biochemistry directly but on structural similarity. The enrichment test is Kolmogorov–Smirnov test based (not hypergeometric test or Fisher exact test). Metabox (Wanichthanarak et al., 2017) 308 Metabolomics data analysis and interpretation toolbox for integration of proteomics and transcriptomics data. Metscape (Gao et al., 2010; Karnovsky et al., 2012) 322,323 Cytoscape plugin, metabolomics correlation networks and KEGG-based metabolic networks integrating gene expression and metabolomics. PathBank (Wishart et al., 2020) 313 Comprehensive, interactive database for metabolic pathways in 10 different model organisms. OmicsNet (Zhou and Xia, 2018) 318 Multi-omics data integration, biological networks (genes, proteins, microRNAs, transcription factors, metabolites) GEM-Vis (Buchweitz et al., 2020) 324 Visualization of time-course metabolomic data within the context of metabolic network maps. FEMTO (Nägele et al. 2016) 302 Integration of metabolomic time-series analysis and network information. Among the commonly applied strategies are uni- and multivariate statistical methods. Multivariate methods involve both several unsupervised methods like principal component analysis (PCA) or hierarchical clustering (HC) as well as supervised methods (partial least squares discriminant analysis (PLS-DA), orthogonal projections to latent structures discriminant analysis (oPLS-DA), (linear) discriminant analysis ((L)DA), (canonical) correspondence analysis (C)DA, random forests (RF), support vector machines (SVM), neural networks (NN), and feature selection strategies (recursive feature elimination, genetic algorithms, sparse models (Lasso, Elastic Net, sparse PLS)). Such methods are capable of capturing and pinpointing the unique metabolic fingerprints related to the underlying phenotype. 119 While this is no concern with unsupervised techniques, a particular issue with supervised methods is the risk of overfitting to the labeled data. 296 However, accompanying cross-validation can help avoid this issue. 297 A powerful approach beyond the realm of statistics and machine learning algorithms is pathway analysis, which is taking advantage of established biological knowledge. In the simplest variation, metabolites of interest derived from a metabolomics experiment can be mapped on the pathways defined in a particular library. In the case of over-representation analysis (ORA), corresponding p-values can be obtained based on the metabolites list for the affected pathways. Extending the metabolite list with quantitative information (fold-change, intensity, and absolute amounts) can be further exploited by metabolite set enrichment analysis (MSEA). Even beyond this, considering the position of affected metabolites within pathways informs about the perturbation of the pathway and is a useful additional metric next to the enrichment, as it is capable of identifying subtle but consistent changes, whereby affected metabolites are ranked based on centrality measures thus contributing more to perturbation. Such a strategy constitutes the core of combined enrichment and topology analysis. 298,299 Although pathway analysis is a powerful tool and there is great merit in identifying relevant pathways corresponding to a phenotype, it suffers from several limitations. First, definitions of pathways differ to a varying degrees across databases. 300 Second, highly linked metabolites with a high number of possible biochemical reactions are also constituents of multiple pathways and pathways might overlap. Hence, it is challenging to explain changes over several pathways. 301 In such a scenario, time-series analysis might support the identification of regulatory hubs within a metabolic network. 302 A great way to comprise results and enable biological interpretation is the use of network-based approaches. Similar to pathway analysis, metabolic networking relies on reference databases for biochemical and signaling pathway information but constructs a single network where each metabolite is linked by all possible biochemical reactions. With the help of several strategies and options to extract a subnetwork capturing all relevant metabolites from the input metabolite list, they can represent a global and concise picture of metabolism. 301 MetExplore 303,304 allows metabolic network construction, exploration, and combination with omics data analysis. It can access several databases for multiple model organisms and allows collaboration in curation and annotation of metabolic networks. The interpretation of results is aided by metabolite set enrichment analysis (MSEA) and extraction of relevant subnetworks. Correlation-network construction does not require biochemical knowledge but is based on quantitative information. It can establish correlations and metabolites can be grouped based on the magnitude and sign of correlations, while the network visualization strategy lends itself well to identify the corresponding clusters and relationships among them. 305 Furthermore, correlations can also be determined based on chemical similarity in a metabolite list (ChemRICH) 306 or spectral similarity (MS2LDA). 164 The full potential and perspectives of network-analysis in metabolomics data analysis and system biological approach for biological interpretation have been discussed in recent reviews including various possibilities and respective tools. 294,305 A plethora of bioinformatics and data analysis tools was developed in the R ecosystem, also for the metabolomics community. These tools and their evolution have been extensively reviewed. 120 However, the barrier of entry can be substantially higher to users with no programming skills or experience with command line tools. One response by R developers for this is to include a graphical user interface (GUI) within the package, which allows users to work within the comfort of their browser. Several web-based tools emerged in the last years, which function as a metabolomics data analysis toolbox and allow the visualization of metabolomics results via different modules and offer multiple solutions from the aforementioned options (MetaboAnalyst, 307 MetaBox, 308 MetExplore 304 ). Such tools lower the barrier of entry with aesthetic and user-friendly GUI and example datasets. As a prime example, MetaboAnalyst and its equivalent R-based package MetaboAnalystR 123 serve multiple functionalities in several modules ranging from metabolite identification, exploratory data analysis, pathway enrichment analysis, combined MSEA and topology analysis, and multi-omics integration, just to name a few. Several pathway databases exist (KEGG, Reactome, 309 Recon, 310 Cyc, 311 WikiPathways, 312 PathBank, 313 etc., Table 2 ) with a different focus, number of model organisms contained, and thus target audience, features, and applications. They have been reviewed extensively. 293 Most of them provide a basic functionality to map metabolites from a list to their pathways, visualize, and some form of quantitative analysis (ORA, MSEA). As a unique feature Virtual Metabolic Human 310,314 integrates the largest database of human and gut microbiome metabolism and presents a virtual human model with many possible pathological conditions. The final integration of data from multiple omics-type experiments (like genomics, transcriptomics, and proteomics) complementing metabolomics studies 315 depends on the possibility to combine multilayer information. MetaboAnalyst, Reactome, Recon, PaintOmics 3, 316 the R-package mixOmics, 317 and OmicsNet 318,319 contain several modules for multi-omics data integration. A comprehensive review by Wörheide et al. 320 discusses the various ways how to perform data-integration in multi-omics workflows. MetScape 321−323 is a Cytoscape plugin to facilitate the visualization of correlation networks and metabolic networks based on metabolomics data. Metabolomics networks can also integrate transcriptomics data to inspect gene-metabolite connections, and subnetworks can be extracted. The new visualization technique GEM-Vis 324 facilitates the visualization and exploration of time-course metabolomics experiments as metabolic network maps. Although the field of metabolomics downstream data analysis and visualization clearly gained momentum with an increasing number of novel tools in the last years, 120,293,325 there are many software examples which are not available anymore through the uniform resource locator (URL) originally referenced. This is by no means specific to metabolomics, but rather to bioinformatics software in general. 326 In addition, some tools—though still available in online repositories—are not compatible with current R versions or require specific dependencies. Using virtual environments, virtual machines or containers are technical solutions to the problem of long-term software availability. Here it has to be mentioned that funding of scientific research is often not capable to encompass the full life cycle of software development as functional tools require maintenance even after their publication. 327 Hence, sustainability of software solutions is of utmost importance as the challenge of growing data complexity increases our dependency on data interpretation pipelines. Conclusions Years of successful analytical development led to informative tailored methods. An optimal metabolomics workflow should cover the lipid dimension and has to find the right balance of coverage, throughput, and accuracy. State of the art workflows consist of complementary multiplatform modules which allow nontargeted discoveries and targeted absolute quantification. Only recently, measurement and data evaluation strategies of the two sub-ome specific disciplines metabolomics and lipidomics converge. Both high-resolution and low-resolution tandem MS are integral parts of multiplatform approaches. Up to date, coverage of low abundant metabolites (pM/low nM concentrations) is ensured by quadrupole-based tandem MS. While, there has been a paradigmatic shift in using HRMS for targeted absolute quantification, thereby enabling us to merge targeted and nontargeted approaches, typical limits of detection of HRMS workflows remain in the (low) nM range. The final measurement strategy depends on sample type and size, sampling frequency, the envisaged depth of the metabolomics/lipidomic profile, the different experimental conditions addressed, and finally, the type of information expected as outcome. Accurate quantification and identification are the prerequisite for correct biological interpretation, a bold argument which remains valid even in the context of powerful multi-omics data rich in analysis and network integration. While the gold standard validating the quantitative aspect in MS-based metabolomics will remain stringent analytical validation using standards and reference materials, the corroboration of the qualitative realm in metabolomics is currently revolutionized by bioinformatics tools. For example, in silico approaches are more and more accepted as an alternative to spectral library search. However, one should not forget that the development and validation of these tools is inherently linked to the availability of excellent community-based resources. Providing standards, reference materials, setting up and curating open-source data sets and experimental spectral libraries was and still is of paramount importance of the progress to the field.

Related collections

Most cited references 321

Record: found
Abstract: found
Article: found

Is Open Access

mixOmics: An R package for ‘omics feature selection and multiple data integration

Florian Rohart, Benoît Gautier, Amrit Singh … (2017)

The advent of high throughput technologies has led to a wealth of publicly available ‘omics data coming from different sources, such as transcriptomics, proteomics, metabolomics. Combining such large-scale biological data sets can lead to the discovery of important biological insights, provided that relevant information can be extracted in a holistic manner. Current statistical approaches have been focusing on identifying small subsets of molecules (a ‘molecular signature’) to explain or predict biological conditions, but mainly for a single type of ‘omics. In addition, commonly used methods are univariate and consider each biological feature independently. We introduce mixOmics, an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation. By adopting a systems biology approach, the toolkit provides a wide range of methods that statistically integrate several data sets at once to probe relationships between heterogeneous ‘omics data sets. Our recent methods extend Projection to Latent Structure (PLS) models for discriminant analysis, for data integration across multiple ‘omics data or across independent studies, and for the identification of molecular signatures. We illustrate our latest mixOmics integrative frameworks for the multivariate analyses of ‘omics data available from the package.

0 comments Cited 1149 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

HMDB 4.0: the human metabolome database for 2018

David Wishart, Yannick Djoumbou Feunang, Ana Marcu … (2017)

Abstract The Human Metabolome Database or HMDB (www.hmdb.ca) is a web-enabled metabolomic database containing comprehensive information about human metabolites along with their biological roles, physiological concentrations, disease associations, chemical reactions, metabolic pathways, and reference spectra. First described in 2007, the HMDB is now considered the standard metabolomic resource for human metabolic studies. Over the past decade the HMDB has continued to grow and evolve in response to emerging needs for metabolomics researchers and continuing changes in web standards. This year's update, HMDB 4.0, represents the most significant upgrade to the database in its history. For instance, the number of fully annotated metabolites has increased by nearly threefold, the number of experimental spectra has grown by almost fourfold and the number of illustrated metabolic pathways has grown by a factor of almost 60. Significant improvements have also been made to the HMDB’s chemical taxonomy, chemical ontology, spectral viewing, and spectral/text searching tools. A great deal of brand new data has also been added to HMDB 4.0. This includes large quantities of predicted MS/MS and GC–MS reference spectral data as well as predicted (physiologically feasible) metabolite structures to facilitate novel metabolite identification. Additional information on metabolite-SNP interactions and the influence of drugs on metabolite levels (pharmacometabolomics) has also been added. Many other important improvements in the content, the interface, and the performance of the HMDB website have been made and these should greatly enhance its ease of use and its potential applications in nutrition, biochemistry, clinical chemistry, clinical genetics, medicine, and metabolomics science.

0 comments Cited 1002 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

MetaboAnalyst 4.0: towards more transparent and integrative metabolomics analysis

Jasmine Chong, Othman Soufan, Carin Li … (2018)

Abstract We present a new update to MetaboAnalyst (version 4.0) for comprehensive metabolomic data analysis, interpretation, and integration with other omics data. Since the last major update in 2015, MetaboAnalyst has continued to evolve based on user feedback and technological advancements in the field. For this year's update, four new key features have been added to MetaboAnalyst 4.0, including: (1) real-time R command tracking and display coupled with the release of a companion MetaboAnalystR package; (2) a MS Peaks to Pathways module for prediction of pathway activity from untargeted mass spectral data using the mummichog algorithm; (3) a Biomarker Meta-analysis module for robust biomarker identification through the combination of multiple metabolomic datasets and (4) a Network Explorer module for integrative analysis of metabolomics, metagenomics, and/or transcriptomics data. The user interface of MetaboAnalyst 4.0 has been reengineered to provide a more modern look and feel, as well as to give more space and flexibility to introduce new functions. The underlying knowledgebases (compound libraries, metabolite sets, and metabolic pathways) have also been updated based on the latest data from the Human Metabolome Database (HMDB). A Docker image of MetaboAnalyst is also available to facilitate download and local installation of MetaboAnalyst. MetaboAnalyst 4.0 is freely available at http://metaboanalyst.ca.

0 comments Cited 973 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Anal Chem

Journal ID (iso-abbrev): Anal Chem

Journal ID (publisher-id): ac

Journal ID (coden): ancham

Title: Analytical Chemistry

Publisher: American Chemical Society

ISSN (Print): 0003-2700

ISSN (Electronic): 1520-6882

Publication date (Electronic): 28 November 2020

Publication date (Print): 12 January 2021

Volume: 93

Issue: 1 , Fundamental and Applied Reviews in Analytical Chemistry 2021

Pages: 519-545

Affiliations

[† ]Department of Analytical Chemistry, Faculty of Chemistry, University of Vienna , Währinger Str. 38, 1090 Vienna, Austria

[‡ ]Vienna Metabolomics Center (VIME), University of Vienna , Althanstraße 14, 1090 Vienna, Austria

[§ ]University of Vienna , Althanstraße 14, 1090 Vienna, Austria

[∥ ]Institute of Inorganic Chemistry, University of Vienna , Währinger Straße 42, 1090 Vienna, Austria

Author notes

[* ]Email: gunda.koellensperger@ 123456univie.ac.at . Phone: +43-1-4277-52303.

Article

DOI: 10.1021/acs.analchem.0c04698

PMC ID: 7807424

PubMed ID: 33249827

SO-VID: 36963603-4be3-453a-a9a8-86e356e334da

License:

This is an open access article published under a Creative Commons Attribution (CC-BY) License, which permits unrestricted use, distribution and reproduction in any medium, provided the author and source are cited.

History

Custom metadata

document-id-old-9 ac0c04698

document-id-new-14 ac0c04698

ccc-price

ScienceOpen disciplines: Analytical chemistry

Data availability:

ScienceOpen disciplines: Analytical chemistry

Comments

Comment on this article

scite_

Cited by 40

See all cited by

Most referenced authors 8,988

See all reference authors

Recurrent Topics in Mass Spectrometry-Based Metabolomics and Lipidomics—Standardization, Coverage, and Throughput

Read this article at

Abstract

Related collections

Drug_transporters

Most cited references 321

mixOmics: An R package for ‘omics feature selection and multiple data integration

HMDB 4.0: the human metabolome database for 2018

MetaboAnalyst 4.0: towards more transparent and integrative metabolomics analysis

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 17

Cited by 40

Most referenced authors 8,988