Benchmarking microbiome transformations favors experimental quantitative approaches to address compositionality and sampling depth biases

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

While metagenomic sequencing has become the tool of preference to study host-associated microbial communities, downstream analyses and clinical interpretation of microbiome data remains challenging due to the sparsity and compositionality of sequence matrices. Here, we evaluate both computational and experimental approaches proposed to mitigate the impact of these outstanding issues. Generating fecal metagenomes drawn from simulated microbial communities, we benchmark the performance of thirteen commonly used analytical approaches in terms of diversity estimation, identification of taxon-taxon associations, and assessment of taxon-metadata correlations under the challenge of varying microbial ecosystem loads. We find quantitative approaches including experimental procedures to incorporate microbial load variation in downstream analyses to perform significantly better than computational strategies designed to mitigate data compositionality and sparsity, not only improving the identification of true positive associations, but also reducing false positive detection. When analyzing simulated scenarios of low microbial load dysbiosis as observed in inflammatory pathologies, quantitative methods correcting for sampling depth show higher precision compared to uncorrected scaling. Overall, our findings advocate for a wider adoption of experimental quantitative approaches in microbiome research, yet also suggest preferred transformations for specific cases where determination of microbial load of samples is not feasible.

Abstract

Here, the authors use simulated quantitative gut microbial communities to benchmark the performance of 13 common data transformations in determining diversity as well as microbe-microbe and microbe-metadata associations, finding that quantitative approaches incorporating microbial load variation outperform computational strategies in downstream analyses, urging for a widespread adoption of quantitative approaches, or recommending specific computational transformations whenever determination of microbial load of samples is not feasible.

Related collections

Most cited references 33

Record: found
Abstract: found
Article: found

Is Open Access

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Michael Love, Wolfgang Huber, Simon Anders (2014)

In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0550-8) contains supplementary material, which is available to authorized users.

0 comments Cited 24546 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data

Mark Robinson, Davis J. McCarthy, Gordon K. Smyth (2009)

Summary: It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data. Availability: The package is freely available under the LGPL licence from the Bioconductor web site (http://bioconductor.org). Contact: mrobinson@wehi.edu.au

0 comments Cited 10410 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data

Paul McMurdie, Susan Holmes (2013)

Background The analysis of microbial communities through DNA sequencing brings many challenges: the integration of different types of data with methods from ecology, genetics, phylogenetics, multivariate statistics, visualization and testing. With the increased breadth of experimental designs now being pursued, project-specific statistical analyses are often needed, and these analyses are often difficult (or impossible) for peer researchers to independently reproduce. The vast majority of the requisite tools for performing these analyses reproducibly are already implemented in R and its extensions (packages), but with limited support for high throughput microbiome census data. Results Here we describe a software project, phyloseq, dedicated to the object-oriented representation and analysis of microbiome census data in R. It supports importing data from a variety of common formats, as well as many analysis techniques. These include calibration, filtering, subsetting, agglomeration, multi-table comparisons, diversity analysis, parallelized Fast UniFrac, ordination methods, and production of publication-quality graphics; all in a manner that is easy to document, share, and modify. We show how to apply functions from other R packages to phyloseq-represented data, illustrating the availability of a large number of open source analysis techniques. We discuss the use of phyloseq with tools for reproducible research, a practice common in other fields but still rare in the analysis of highly parallel microbiome census data. We have made available all of the materials necessary to completely reproduce the analysis and figures included in this article, an example of best practices for reproducible research. Conclusions The phyloseq project for R is a new open-source software package, freely available on the web from both GitHub and Bioconductor.

0 comments Cited 4999 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Jeroen Raes:

ORCID: http://orcid.org/0000-0002-1337-041X

jeroen.raes@kuleuven.be

Journal

Journal ID (nlm-ta): Nat Commun

Journal ID (iso-abbrev): Nat Commun

Title: Nature Communications

Publisher: Nature Publishing Group UK (London )

ISSN (Electronic): 2041-1723

Publication date (Electronic): 11 June 2021

Publication date PMC-release: 11 June 2021

Publication date Collection: 2021

Volume: 12

Electronic Location Identifier: 3562

Affiliations

[1 ]GRID grid.5596.f, ISNI 0000 0001 0668 7884, Laboratory of Molecular Bacteriology, Department of Microbiology and Immunology, , Rega Institute, KU Leuven, ; Leuven, Belgium

[2 ]GRID grid.11486.3a, ISNI 0000000104788040, Center for Microbiology, VIB, ; Leuven, Belgium

[3 ]GRID grid.438114.b, ISNI 0000 0004 0550 9586, Max Planck Research Group Neural Systems Analysis, , Center of Advanced European Studies and Research (caesar), ; Bonn, Germany

Author information

Verónica Lloréns-Rico http://orcid.org/0000-0002-0860-5990

Sara Vieira-Silva http://orcid.org/0000-0002-4616-7602

Pedro J. Gonçalves http://orcid.org/0000-0002-6987-4836

Gwen Falony http://orcid.org/0000-0003-2450-0782

Jeroen Raes http://orcid.org/0000-0002-1337-041X

Article

Publisher ID: 23821

DOI: 10.1038/s41467-021-23821-6

PMC ID: 8196019

PubMed ID: 34117246

SO-VID: 5867e7fc-287c-4a3e-b264-fcf35a31e56c

License:

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

History

Date received : 12 October 2020

Date accepted : 17 May 2021

Funding

Funded by: FundRef https://doi.org/10.13039/501100003130, Fonds Wetenschappelijk Onderzoek (Research Foundation Flanders);

Award ID: 12V9418N

Award Recipient : Verónica Lloréns-Rico

Custom metadata

ScienceOpen disciplines: Uncategorized

Keywords: data processing,standards,microbiome

Data availability:

ScienceOpen disciplines: Uncategorized

Keywords: data processing, standards, microbiome

Comments

Comment on this article

scite_

Cited by 21

See all cited by

Most referenced authors 887

See all reference authors

- Version 1

Benchmarking microbiome transformations favors experimental quantitative approaches to address compositionality and sampling depth biases

Read this article at

Abstract

Abstract

Related collections

Tick microbiome

Most cited references 33

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data

phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 6

Cited by 21

Most referenced authors 887