115
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Multivariable association discovery in population-scale meta-omics studies

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          It is challenging to associate features such as human health outcomes, diet, environmental conditions, or other metadata to microbial community measurements, due in part to their quantitative properties. Microbiome multi-omics are typically noisy, sparse (zero-inflated), high-dimensional, extremely non-normal, and often in the form of count or compositional measurements. Here we introduce an optimized combination of novel and established methodology to assess multivariable association of microbial community features with complex metadata in population-scale observational studies. Our approach, MaAsLin 2 (Microbiome Multivariable Associations with Linear Models), uses generalized linear and mixed models to accommodate a wide variety of modern epidemiological studies, including cross-sectional and longitudinal designs, as well as a variety of data types (e.g., counts and relative abundances) with or without covariates and repeated measurements. To construct this method, we conducted a large-scale evaluation of a broad range of scenarios under which straightforward identification of meta-omics associations can be challenging. These simulation studies reveal that MaAsLin 2’s linear model preserves statistical power in the presence of repeated measures and multiple covariates, while accounting for the nuances of meta-omics features and controlling false discovery. We also applied MaAsLin 2 to a microbial multi-omics dataset from the Integrative Human Microbiome (HMP2) project which, in addition to reproducing established results, revealed a unique, integrated landscape of inflammatory bowel diseases (IBD) across multiple time points and omics profiles.

          Author summary

          Recently, several statistical methods have been proposed to identify phenotypic or environmental associations with features (e.g., taxa, genes, pathways, chemicals, etc.) from molecular profiles of microbial communities. Particularly for human microbiome epidemiology, however, most of these are primarily focused on univariable associations that analyze only one or a few environmental covariates. This is a critical gap to address, given the growing commonality of population-scale microbiome research and the complexity of associated study designs, including dietary, pharmaceutical, clinical, and environmental covariates, often with samples from multiple time points or tissues. Surprisingly, there have been no systematic evaluations of statistical analysis methods appropriate for such studies, nor consensus on appropriate methods for scalable microbiome epidemiology. To this end, we developed and validated a statistical model (MaAsLin) that provides both the first unified method and the first large-scale, comprehensive benchmarking of multivariable associations in population-scale microbial community studies. We hope that the MaAsLin 2 implementation, validated through extensive simulations and an application to HMP2 IBD multi-omics, will be helpful for researchers in future analysis of both human-associated and environmental microbial communities.

          Related collections

          Most cited references60

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

          In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0550-8) contains supplementary material, which is available to authorized users.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Fitting Linear Mixed-Effects Models Usinglme4

                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: InvestigationRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: InvestigationRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: Data curationRole: Formal analysisRole: InvestigationRole: Writing – original draftRole: Writing – review & editing
                Role: InvestigationRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: InvestigationRole: MethodologyRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: InvestigationRole: Writing – original draftRole: Writing – review & editing
                Role: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: InvestigationRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: InvestigationRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: InvestigationRole: Writing – original draftRole: Writing – review & editing
                Role: InvestigationRole: Writing – original draftRole: Writing – review & editing
                Role: InvestigationRole: Writing – original draftRole: Writing – review & editing
                Role: InvestigationRole: Writing – original draftRole: Writing – review & editing
                Role: InvestigationRole: Writing – original draftRole: Writing – review & editing
                Role: InvestigationRole: Writing – original draftRole: Writing – review & editing
                Role: InvestigationRole: SupervisionRole: Writing – original draftRole: Writing – review & editing
                Role: InvestigationRole: ResourcesRole: SupervisionRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Funding acquisitionRole: InvestigationRole: Project administrationRole: ResourcesRole: SupervisionRole: Writing – original draftRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS Comput Biol
                PLoS Comput Biol
                plos
                PLoS Computational Biology
                Public Library of Science (San Francisco, CA USA )
                1553-734X
                1553-7358
                16 November 2021
                November 2021
                : 17
                : 11
                : e1009442
                Affiliations
                [1 ] Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
                [2 ] The Broad Institute, Cambridge, Massachusetts, United States of America
                [3 ] Computational Biology Institute, Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington DC, United States of America
                [4 ] Clinical and Translational Epidemiology Unit, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
                [5 ] Division of Gastroenterology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
                [6 ] Epidemiology Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
                [7 ] Department of Epidemiology and Biostatistics, CUNY School of Public Health, New York City, New York, United States of America
                [8 ] Department of Biostatistics, Product Development, Genentech, Inc., South San Francisco, California, United States of America
                [9 ] Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
                Fudan University, CHINA
                Author notes

                I have read the journal’s policy and the authors of this manuscript have the following competing interests: CH is on the Scientific Advisory Board for Seres Therapeutics and Empress Therapeutics. The remaining authors have declared that no competing interests exist. Author Yiren Lu was unable to confirm their authorship contributions. On their behalf, the corresponding author has reported their contributions to the best of their knowledge.

                ‡ Unavailable

                Author information
                https://orcid.org/0000-0003-4956-2429
                https://orcid.org/0000-0002-9710-0248
                https://orcid.org/0000-0002-2199-4310
                https://orcid.org/0000-0003-2313-6448
                https://orcid.org/0000-0002-2768-2975
                https://orcid.org/0000-0002-5436-4219
                https://orcid.org/0000-0002-6592-6272
                https://orcid.org/0000-0002-5300-1184
                https://orcid.org/0000-0002-7385-8994
                https://orcid.org/0000-0002-9437-9722
                https://orcid.org/0000-0002-8024-5600
                https://orcid.org/0000-0002-4134-7612
                https://orcid.org/0000-0003-2725-0694
                https://orcid.org/0000-0001-8221-7139
                https://orcid.org/0000-0002-8798-7068
                https://orcid.org/0000-0002-1110-0096
                Article
                PCOMPBIOL-D-21-01441
                10.1371/journal.pcbi.1009442
                8714082
                34784344
                a0a90013-0cb2-493c-acaa-9bdbd9bc8081

                This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

                History
                : 5 August 2021
                : 9 September 2021
                Page count
                Figures: 5, Tables: 0, Pages: 27
                Funding
                Funded by: US National Science Foundation, Division of Environmental Biology
                Award ID: DEB 2028280
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000060, national institute of allergy and infectious diseases;
                Award ID: U19AI110820
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000051, national human genome research institute;
                Award ID: R01HG005220
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000062, national institute of diabetes and digestive and kidney diseases;
                Award ID: R24DK110499
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000062, national institute of diabetes and digestive and kidney diseases;
                Award ID: U54DK102557
                Award Recipient :
                This work was funded in part by US National Science Foundation grant DEB-2028280 (AR), US National Institutes of Health grants U19AI110820 (CH, to Owen White), R01HG005220 (CH, to Rafael Irizarry), and R24DK110499 and U54DK102557 (CH). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Computer and Information Sciences
                Data Management
                Metadata
                Biology and Life Sciences
                Microbiology
                Medical Microbiology
                Microbiome
                Biology and Life Sciences
                Genetics
                Genomics
                Microbial Genomics
                Microbiome
                Biology and Life Sciences
                Microbiology
                Microbial Genomics
                Microbiome
                Research and Analysis Methods
                Simulation and Modeling
                Biology and Life Sciences
                Genetics
                Genomics
                Metagenomics
                Engineering and Technology
                Measurement
                Physical Sciences
                Mathematics
                Statistics
                Statistical Models
                Medicine and Health Sciences
                Pharmacology
                Drugs
                Antimicrobials
                Antibiotics
                Biology and Life Sciences
                Microbiology
                Microbial Control
                Antimicrobials
                Antibiotics
                Medicine and Health Sciences
                Gastroenterology and Hepatology
                Inflammatory Bowel Disease
                Custom metadata
                vor-update-to-uncorrected-proof
                2021-12-28
                The implementation of MaAsLin 2 is publicly available with source code, documentation, tutorial data, and as an R/Bioconductor package at http://huttenhower.sph.harvard.edu/maaslin2. The software packages used in this work are free and open source, including bioBakery methods available via http://huttenhower.sph.harvard.edu/biobakery as source code, cloud-compatible images, and installable packages. Analysis scripts using these packages to generate figures and results from this manuscript (and associated usage notes) are available from https://github.com/biobakery/maaslin2_benchmark. The iHMP dataset is publicly available at the IBDMDB website ( https://ibdmdb.org) and the HMP DACC web portal ( https://www.hmpdacc.org/ihmp/). The processed HMP2 datasets analysed in this manuscript are also available as Supporting Information.

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article