12
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Learning accurate representations of microbe-metabolite interactions

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Integrating multi-omics datasets is critical for microbiome research, but multiple statistical challenges can confound traditional correlation techniques. We solve this problem by using neural networks to estimate the conditional probability that each molecule is present given the presence of each specific microbe. We show with known environmental (desert biological soil crust wetting) and clinical (cystic fibrosis lung) examples, our ability to recover microbe-metabolite relationships, and demonstrate how the method can discover relationships between microbially-produced metabolites and inflammatory bowel disease.

          Related collections

          Most cited references40

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          mixOmics: An R package for ‘omics feature selection and multiple data integration

          The advent of high throughput technologies has led to a wealth of publicly available ‘omics data coming from different sources, such as transcriptomics, proteomics, metabolomics. Combining such large-scale biological data sets can lead to the discovery of important biological insights, provided that relevant information can be extracted in a holistic manner. Current statistical approaches have been focusing on identifying small subsets of molecules (a ‘molecular signature’) to explain or predict biological conditions, but mainly for a single type of ‘omics. In addition, commonly used methods are univariate and consider each biological feature independently. We introduce mixOmics, an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation. By adopting a systems biology approach, the toolkit provides a wide range of methods that statistically integrate several data sets at once to probe relationships between heterogeneous ‘omics data sets. Our recent methods extend Projection to Latent Structure (PLS) models for discriminant analysis, for data integration across multiple ‘omics data or across independent studies, and for the identification of molecular signatures. We illustrate our latest mixOmics integrative frameworks for the multivariate analyses of ‘omics data available from the package.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            EMPeror: a tool for visualizing high-throughput microbial community data

            Background As microbial ecologists take advantage of high-throughput sequencing technologies to describe microbial communities across ever-increasing numbers of samples, new analysis tools are required to relate the distribution of microbes among larger numbers of communities, and to use increasingly rich and standards-compliant metadata to understand the biological factors driving these relationships. In particular, the Earth Microbiome Project drives these needs by profiling the genomic content of tens of thousands of samples across multiple environment types. Findings Features of EMPeror include: ability to visualize gradients and categorical data, visualize different principal coordinates axes, present the data in the form of parallel coordinates, show taxa as well as environmental samples, dynamically adjust the size and transparency of the spheres representing the communities on a per-category basis, dynamically scale the axes according to the fraction of variance each explains, show, hide or recolor points according to arbitrary metadata including that compliant with the MIxS family of standards developed by the Genomic Standards Consortium, display jackknifed-resampled data to assess statistical confidence in clustering, perform coordinate comparisons (useful for procrustes analysis plots), and greatly reduce loading times and overall memory footprint compared with existing approaches. Additionally, ease of sharing, given EMPeror’s small output file size, enables agile collaboration by allowing users to embed these visualizations via emails or web pages without the need for extra plugins. Conclusions Here we present EMPeror, an open source and web browser enabled tool with a versatile command line interface that allows researchers to perform rapid exploratory investigations of 3D visualizations of microbial community data, such as the widely used principal coordinates plots. EMPeror includes a rich set of controllers to modify features as a function of the metadata. By being specifically tailored to the requirements of microbial ecologists, EMPeror thus increases the speed with which insight can be gained from large microbiome datasets.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.

              We present a penalized matrix decomposition (PMD), a new framework for computing a rank-K approximation for a matrix. We approximate the matrix X as circumflexX = sigma(k=1)(K) d(k)u(k)v(k)(T), where d(k), u(k), and v(k) minimize the squared Frobenius norm of X - circumflexX, subject to penalties on u(k) and v(k). This results in a regularized version of the singular value decomposition. Of particular interest is the use of L(1)-penalties on u(k) and v(k), which yields a decomposition of X using sparse vectors. We show that when the PMD is applied using an L(1)-penalty on v(k) but not on u(k), a method for sparse principal components results. In fact, this yields an efficient algorithm for the "SCoTLASS" proposal (Jolliffe and others 2003) for obtaining sparse principal components. This method is demonstrated on a publicly available gene expression data set. We also establish connections between the SCoTLASS method for sparse principal component analysis and the method of Zou and others (2006). In addition, we show that when the PMD is applied to a cross-products matrix, it results in a method for penalized canonical correlation analysis (CCA). We apply this penalized CCA method to simulated data and to a genomic data set consisting of gene expression and DNA copy number measurements on the same set of samples.
                Bookmark

                Author and article information

                Journal
                101215604
                32338
                Nat Methods
                Nat. Methods
                Nature methods
                1548-7091
                1548-7105
                24 September 2019
                04 November 2019
                December 2019
                04 May 2020
                : 16
                : 12
                : 1306-1314
                Affiliations
                [1 ]Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
                [2 ]Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA, USA
                [3 ]Collaborative Mass Spectrometry Innovaftion Center, University of California San Diego, La Jolla, CA, USA
                [4 ]Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
                [5 ]Department of Information Systems, University of Maryland Baltimore County, Baltimore, MD, USA
                [6 ]Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA
                [7 ]Department of Biology, New York University, New York, 10012 NY, USA
                [8 ]Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory,1 Cyclotron Rd, Berkeley, CA, 94720, USA
                [9 ]DOE Joint Genome Institute, 2800 Mitchell Dr., Walnut Creek, CA, 94598, USA
                [10 ]Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA
                [11 ]Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, USA
                [12 ]The Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
                [13 ]Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, USA
                [14 ]Flatiron Institute, Simons Foundation, New York, 10010 NY, USA
                [15 ]Computer Science Department, Courant Institute, New York, 10012 NY, USA
                [16 ]Center For Data Science, NYU, New York, NY 10008, USA
                [17 ]Department of Bioengineering University of California, San Diego, La Jolla, CA, USA
                Author notes
                [6]

                Author contributions

                J.T.M wrote the mmvec algorithm, conducted the benchmarks and ran all of the analyses. A.A.A. and L.F.N. preprocessed and annotated the metabolomics data. A.A.A. provided insights in the high fat diet study. J.R.F. provided insights behind word2vec and topic modeling. M.H.B. benchmarked SPIEC-EASI. R.A.Q. provided insights behind the cystic fibrosis study and simulations. Y.V.B. provided insights behind the interpretation of the IBD analysis. M.W. developed the GNPS workflow for mmvec. N.A.B developed the heatmap visualizations. A.W developed the network visualizations. A.W developed the network visualizations. T.L.S. M.W.V.G and T.N. provided insights behind the biocrust soils experiment. All authors were involved with writing the manuscript.

                Article
                NIHMS1540415
                10.1038/s41592-019-0616-3
                6884698
                31686038
                45f54fa2-1f3e-450d-b4d9-84fd776b2bbe

                Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms

                History
                Categories
                Article

                Life sciences
                Life sciences

                Comments

                Comment on this article