Learning accurate representations of microbe-metabolite interactions

Morton, James T.; Aksenov, Alexander A.; Nothias, Louis Felix; Foulds, James R.; Quinn, Robert A; Badri, Michelle H.; Swenson, Tami L; Van Goethem, Marc W.; Northen, Trent R.; Vázquez-Baeza, Yoshiki; Wang, Mingxun; Bokulich, Nicholas A.; Watters, Aaron; Song, Se Jin; Bonneau, Richard A.; Dorrestein, Pieter C.; Knight, Rob

doi:10.1038/s41592-019-0616-3

ScienceOpen: research and publishing network

For Publishers

For Researchers

Blog
About

Search
Advanced search

views

recommends

Record: found
Abstract: found
Article: not found

Learning accurate representations of microbe-metabolite interactions

research-article

Author(s): James T. Morton ¹ ^, ² , Alexander A. Aksenov ³ ^, ⁴ , Louis Felix Nothias ³ ^, ⁴ , James R. Foulds ⁵ , Robert A. Quinn ⁶ , Michelle H. Badri ⁷ , Tami L. Swenson ⁸ , Marc W. Van Goethem ⁸ , Trent R. Northen ⁸ ^, ⁹ , Yoshiki Vazquez-Baeza ¹⁰ ^, ¹¹ , Mingxun Wang ³ ^, ⁴ , Nicholas A. Bokulich ¹² ^, ¹³ , Aaron Watters ¹⁴ , Se Jin Song ¹ ^, ¹¹ , Richard Bonneau ⁷ ^, ¹⁴ ^, ¹⁵ ^, ¹⁶ , Pieter C. Dorrestein ³ ^, ⁴ , Rob Knight ¹ ^, ² ^, ¹⁷ ^, ¹¹

Publication date (Electronic): 04 November 2019

Journal: Nature methods

Read this article at

ScienceOpenPublisher PMC

Bookmark

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Integrating multi-omics datasets is critical for microbiome research, but multiple statistical challenges can confound traditional correlation techniques. We solve this problem by using neural networks to estimate the conditional probability that each molecule is present given the presence of each specific microbe. We show with known environmental (desert biological soil crust wetting) and clinical (cystic fibrosis lung) examples, our ability to recover microbe-metabolite relationships, and demonstrate how the method can discover relationships between microbially-produced metabolites and inflammatory bowel disease.

Related collections

Most cited references 40

Record: found
Abstract: found
Article: found

Is Open Access

mixOmics: An R package for ‘omics feature selection and multiple data integration

Florian Rohart, Benoît Gautier, Amrit Singh … (2017)

The advent of high throughput technologies has led to a wealth of publicly available ‘omics data coming from different sources, such as transcriptomics, proteomics, metabolomics. Combining such large-scale biological data sets can lead to the discovery of important biological insights, provided that relevant information can be extracted in a holistic manner. Current statistical approaches have been focusing on identifying small subsets of molecules (a ‘molecular signature’) to explain or predict biological conditions, but mainly for a single type of ‘omics. In addition, commonly used methods are univariate and consider each biological feature independently. We introduce mixOmics, an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation. By adopting a systems biology approach, the toolkit provides a wide range of methods that statistically integrate several data sets at once to probe relationships between heterogeneous ‘omics data sets. Our recent methods extend Projection to Latent Structure (PLS) models for discriminant analysis, for data integration across multiple ‘omics data or across independent studies, and for the identification of molecular signatures. We illustrate our latest mixOmics integrative frameworks for the multivariate analyses of ‘omics data available from the package.

0 comments Cited 1166 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

EMPeror: a tool for visualizing high-throughput microbial community data

Yoshiki Vázquez-Baeza, Meg Pirrung, Antonio González González … (2013)

Background As microbial ecologists take advantage of high-throughput sequencing technologies to describe microbial communities across ever-increasing numbers of samples, new analysis tools are required to relate the distribution of microbes among larger numbers of communities, and to use increasingly rich and standards-compliant metadata to understand the biological factors driving these relationships. In particular, the Earth Microbiome Project drives these needs by profiling the genomic content of tens of thousands of samples across multiple environment types. Findings Features of EMPeror include: ability to visualize gradients and categorical data, visualize different principal coordinates axes, present the data in the form of parallel coordinates, show taxa as well as environmental samples, dynamically adjust the size and transparency of the spheres representing the communities on a per-category basis, dynamically scale the axes according to the fraction of variance each explains, show, hide or recolor points according to arbitrary metadata including that compliant with the MIxS family of standards developed by the Genomic Standards Consortium, display jackknifed-resampled data to assess statistical confidence in clustering, perform coordinate comparisons (useful for procrustes analysis plots), and greatly reduce loading times and overall memory footprint compared with existing approaches. Additionally, ease of sharing, given EMPeror’s small output file size, enables agile collaboration by allowing users to embed these visualizations via emails or web pages without the need for extra plugins. Conclusions Here we present EMPeror, an open source and web browser enabled tool with a versatile command line interface that allows researchers to perform rapid exploratory investigations of 3D visualizations of microbial community data, such as the widely used principal coordinates plots. EMPeror includes a rich set of controllers to modify features as a function of the metadata. By being specifically tailored to the requirements of microbial ecologists, EMPeror thus increases the speed with which insight can be gained from large microbiome datasets.

0 comments Cited 584 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.

Daniela M. Witten, Robert Tibshirani, Trevor J. Hastie (2009)

We present a penalized matrix decomposition (PMD), a new framework for computing a rank-K approximation for a matrix. We approximate the matrix X as circumflexX = sigma(k=1)(K) d(k)u(k)v(k)(T), where d(k), u(k), and v(k) minimize the squared Frobenius norm of X - circumflexX, subject to penalties on u(k) and v(k). This results in a regularized version of the singular value decomposition. Of particular interest is the use of L(1)-penalties on u(k) and v(k), which yields a decomposition of X using sparse vectors. We show that when the PMD is applied using an L(1)-penalty on v(k) but not on u(k), a method for sparse principal components results. In fact, this yields an efficient algorithm for the "SCoTLASS" proposal (Jolliffe and others 2003) for obtaining sparse principal components. This method is demonstrated on a publicly available gene expression data set. We also establish connections between the SCoTLASS method for sparse principal component analysis and the method of Zou and others (2006). In addition, we show that when the PMD is applied to a cross-products matrix, it results in a method for penalized canonical correlation analysis (CCA). We apply this penalized CCA method to simulated data and to a genomic data set consisting of gene expression and DNA copy number measurements on the same set of samples.

0 comments Cited 461 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-journal-id): 101215604

Journal ID (pubmed-jr-id): 32338

Journal ID (nlm-ta): Nat Methods

Journal ID (iso-abbrev): Nat. Methods

Title: Nature methods

ISSN (Print): 1548-7091

ISSN (Electronic): 1548-7105

Publication date Nihms-submitted: 24 September 2019

Publication date (Electronic): 04 November 2019

Publication date (Print): December 2019

Publication date PMC-release: 04 May 2020

Volume: 16

Issue: 12

Pages: 1306-1314

Affiliations

[1 ]Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA

[2 ]Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA, USA

[3 ]Collaborative Mass Spectrometry Innovaftion Center, University of California San Diego, La Jolla, CA, USA

[4 ]Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA

[5 ]Department of Information Systems, University of Maryland Baltimore County, Baltimore, MD, USA

[6 ]Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA

[7 ]Department of Biology, New York University, New York, 10012 NY, USA

[8 ]Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory,1 Cyclotron Rd, Berkeley, CA, 94720, USA

[9 ]DOE Joint Genome Institute, 2800 Mitchell Dr., Walnut Creek, CA, 94598, USA

[10 ]Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA

[11 ]Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, USA

[12 ]The Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA

[13 ]Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, USA

[14 ]Flatiron Institute, Simons Foundation, New York, 10010 NY, USA

[15 ]Computer Science Department, Courant Institute, New York, 10012 NY, USA

[16 ]Center For Data Science, NYU, New York, NY 10008, USA

[17 ]Department of Bioengineering University of California, San Diego, La Jolla, CA, USA

Author notes

[6]

Author contributions

J.T.M wrote the mmvec algorithm, conducted the benchmarks and ran all of the analyses. A.A.A. and L.F.N. preprocessed and annotated the metabolomics data. A.A.A. provided insights in the high fat diet study. J.R.F. provided insights behind word2vec and topic modeling. M.H.B. benchmarked SPIEC-EASI. R.A.Q. provided insights behind the cystic fibrosis study and simulations. Y.V.B. provided insights behind the interpretation of the IBD analysis. M.W. developed the GNPS workflow for mmvec. N.A.B developed the heatmap visualizations. A.W developed the network visualizations. A.W developed the network visualizations. T.L.S. M.W.V.G and T.N. provided insights behind the biocrust soils experiment. All authors were involved with writing the manuscript.

Article

Manuscript ID: NIHMS1540415

DOI: 10.1038/s41592-019-0616-3

PMC ID: 6884698

PubMed ID: 31686038

SO-VID: 45f54fa2-1f3e-450d-b4d9-84fd776b2bbe

License:

Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms

Learning accurate representations of microbe-metabolite interactions

Read this article at

Abstract

Related collections

Teaching and learning evolution

Most cited references 40

mixOmics: An R package for ‘omics feature selection and multiple data integration

EMPeror: a tool for visualizing high-throughput microbial community data

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 78

Cited by 107

Most referenced authors 2,732