2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      mb-PHENIX: diffusion and supervised uniform manifold approximation for denoizing microbiota data

      brief-report

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation

          Microbiota data encounters challenges arising from technical noise and the curse of dimensionality, which affect the reliability of scientific findings. Furthermore, abundance matrices exhibit a zero-inflated distribution due to biological and technical influences. Consequently, there is a growing demand for advanced algorithms that can effectively recover missing taxa while also considering the preservation of data structure.

          Results

          We present mb-PHENIX, an open-source algorithm developed in Python that recovers taxa abundances from the noisy and sparse microbiota data. Our method infers the missing information of count matrix (in 16S microbiota and shotgun studies) by applying imputation via diffusion with supervised Uniform Manifold Approximation Projection (sUMAP) space as initialization. Our hybrid machine learning approach allows to denoise microbiota data, revealing differential abundance microbes among study groups where traditional abundance analysis fails.

          Availability and implementation

          The mb-PHENIX algorithm is available at https://github.com/resendislab/mb-PHENIX. An easy-to-use implementation is available on Google Colab (see GitHub).

          Related collections

          Most cited references8

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          UMAP: Uniform Manifold Approximation and Projection

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Recovering Gene Interactions from Single-Cell Data Using Data Diffusion

            Single-cell RNA-sequencing technologies suffer from many sources of technical noise, including under-sampling of mRNA molecules, often termed ‘dropout’, which can severely obscure important gene-gene relationships. To address this, we developed MAGIC (Markov Affinity-based Graph Imputation of Cells), a method that shares information across similar cells, via data diffusion, to denoise the cell count matrix and fill in missing transcripts. We validate MAGIC on several biological systems and find it effective at recovering gene-gene relationships and additional structures. MAGIC reveals a phenotypic continuum, with the majority of cells residing in intermediate states that display stem-like signatures and uncovers known and previously uncharacterized regulatory interactions, demonstrating that our approach can successfully uncover regulatory relations without perturbations. One Sentence Summary: Graph diffusion-based imputation method recovers missing transcripts in scRNA-seq data, yielding insight into the epithelial-to-mesenchymal transition. Abstract highlights: 1. MAGIC restores noisy and sparse single-cell data using diffusion geometry. 2. Corrected data is amenable to myriad downstream analyses. 3. MAGIC enables archetypal analysis and inference of gene interactions. 4. Transcription factor targets can be predicted without perturbation after MAGIC. In brief - A new algorithm overcomes limitations of data loss in single cell sequencing experiments
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Uniform Manifold Approximation and Projection (UMAP) Reveals Composite Patterns and Resolves Visualization Artifacts in Microbiome Data

              Microbiome data are sparse and high dimensional, so effective visualization of these data requires dimensionality reduction. To date, the most commonly used method for dimensionality reduction in the microbiome is calculation of between-sample microbial differences (beta diversity), followed by principal-coordinate analysis (PCoA). Uniform Manifold Approximation and Projection (UMAP) is an alternative method that can reduce the dimensionality of beta diversity distance matrices. Here, we demonstrate the benefits and limitations of using UMAP for dimensionality reduction on microbiome data. Using real data, we demonstrate that UMAP can improve the representation of clusters, especially when the clusters are composed of multiple subgroups. Additionally, we show that UMAP provides improved correlation of biological variation along a gradient with a reduced number of coordinates of the resulting embedding. Finally, we provide parameter recommendations that emphasize the preservation of global geometry. We therefore conclude that UMAP should be routinely used as a complementary visualization method for microbiome beta diversity studies. IMPORTANCE UMAP provides an additional method to visualize microbiome data. The method is extensible to any beta diversity metric used with PCoA, and our results demonstrate that UMAP can indeed improve visualization quality and correspondence with biological and technical variables of interest. The software to perform this analysis is available under an open-source license and can be obtained at https://github.com/knightlab-analyses/umap-microbiome-benchmarking ; additionally, we have provided a QIIME 2 plugin for UMAP at https://github.com/biocore/q2-umap .
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: SoftwareRole: SupervisionRole: ValidationRole: VisualizationRole: Writing - original draftRole: Writing - review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: SoftwareRole: SupervisionRole: ValidationRole: VisualizationRole: Writing - original draftRole: Writing - review & editing
                Role: ConceptualizationRole: Data curationRole: InvestigationRole: Writing - original draftRole: Writing - review & editing
                Role: InvestigationRole: Writing - original draftRole: Writing - review & editing
                Role: InvestigationRole: Writing - original draftRole: Writing - review & editing
                Role: InvestigationRole: Writing - original draftRole: Writing - review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: SoftwareRole: SupervisionRole: ValidationRole: VisualizationRole: Writing - original draftRole: Writing - review & editing
                Role: Associate Editor
                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                December 2023
                28 November 2023
                28 November 2023
                : 39
                : 12
                : btad706
                Affiliations
                Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN) , Mexico City, 14610, Mexico
                Programa de Doctorado en Ciencias Biomédicas, Universidad Nacional Autónoma de México (UNAM) , Mexico City, 04510, Mexico
                Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN) , Mexico City, 14610, Mexico
                Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN) , Mexico City, 14610, Mexico
                Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN) , Mexico City, 14610, Mexico
                Programa de Doctorado en Ciencias Médicas, Odontológicas y de la Salud, Universidad Nacional Autónoma de México (UNAM) , Mexico City, 04510, Mexico
                Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN) , Mexico City, 14610, Mexico
                Programa de Maestría en Ciencias Bioquímicas, Universidad Nacional Autónoma de México (UNAM) , Mexico City, 04510, Mexico
                Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN) , Mexico City, 14610, Mexico
                Programa de Maestría en Ciencias Bioquímicas, Universidad Nacional Autónoma de México (UNAM) , Mexico City, 04510, Mexico
                Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN) , Mexico City, 14610, Mexico
                Programa de Maestría en Ciencias Bioquímicas, Universidad Nacional Autónoma de México (UNAM) , Mexico City, 04510, Mexico
                Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN) , Mexico City, 14610, Mexico
                Coordinación de la Investigación Científica—Red de Apoyo a la Investigación—Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México (UNAM) , Mexico City, 04510, Mexico
                Author notes
                Corresponding author. Human Systems Biology Laboratory, Instituto Nacional de Medicina Genómica (INMEGEN), Periferico Sur 4809, Arenal Tepepan, Mexico City, 14610, Mexico. E-mail: oresendis@ 123456inmegen.gob.mx
                Author information
                https://orcid.org/0000-0001-5220-541X
                Article
                btad706
                10.1093/bioinformatics/btad706
                10699834
                38015858
                932eef3f-ba36-46de-a4c7-d144a207d471
                © The Author(s) 2023. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License ( https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                History
                : 16 November 2022
                : 16 October 2023
                : 27 October 2023
                : 27 November 2023
                : 06 December 2023
                Page count
                Pages: 4
                Funding
                Funded by: CONAHCyT;
                Award ID: FORDECYT-PRONACES/425859/2020
                Categories
                Applications Note
                Systems Biology
                AcademicSubjects/SCI01060

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article