0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Fast clustering and cell-type annotation of scATAC data using pre-trained embeddings

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Data from the single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) are now widely available. One major computational challenge is dealing with high dimensionality and inherent sparsity, which is typically addressed by producing lower dimensional representations of single cells for downstream clustering tasks. Current approaches produce such individual cell embeddings directly through a one-step learning process. Here, we propose an alternative approach by building embedding models pre-trained on reference data. We argue that this provides a more flexible analysis workflow that also has computational performance advantages through transfer learning. We implemented our approach in scEmbed, an unsupervised machine-learning framework that learns low-dimensional embeddings of genomic regulatory regions to represent and analyze scATAC-seq data. scEmbed performs well in terms of clustering ability and has the key advantage of learning patterns of region co-occurrence that can be transferred to other, unseen datasets. Moreover, models pre-trained on reference data can be exploited to build fast and accurate cell-type annotation systems without the need for other data modalities. scEmbed is implemented in Python and it is available to download from GitHub. We also make our pre-trained models available on huggingface for public use. scEmbed is open source and available at https://github.com/databio/geniml. Pre-trained models from this work can be obtained on huggingface: https://huggingface.co/databio.

          Related collections

          Most cited references25

          • Record: found
          • Abstract: found
          • Article: not found

          chromVAR: Inferring transcription factor-associated accessibility from single-cell epigenomic data

          Single cell ATAC-seq (scATAC) yields sparse data that makes application of conventional analysis approaches challenging. We developed chromVAR, an R package for analyzing sparse chromatin accessibility data by estimating gain or loss of accessibility within peaks sharing the same motif or annotation while controlling for technical biases. chromVAR enables accurate clustering of scATAC-seq profiles and enables characterization of known and de novo sequence motifs associated with variation in chromatin accessibility.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis

            The advent of single-cell chromatin accessibility profiling has accelerated the ability to map gene regulatory landscapes but has outpaced the development of scalable software to rapidly extract biological meaning from these data. Here we present a software suite for single-cell analysis of regulatory chromatin in R (ArchR; https://www.archrproject.com/) that enables fast and comprehensive analysis of single-cell chromatin accessibility data. ArchR provides an intuitive, user-focused interface for complex single-cell analyses, including doublet removal, single-cell clustering and cell type identification, unified peak set generation, cellular trajectory identification, DNA element-to-gene linkage, transcription factor footprinting, mRNA expression level prediction from chromatin accessibility and multi-omic integration with single-cell RNA sequencing (scRNA-seq). Enabling the analysis of over 1.2 million single cells within 8 h on a standard Unix laptop, ArchR is a comprehensive software suite for end-to-end analysis of single-cell chromatin accessibility that will accelerate the understanding of gene regulation at the resolution of individual cells.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              A Single-Cell Atlas of In Vivo Mammalian Chromatin Accessibility

                Bookmark

                Author and article information

                Contributors
                Journal
                NAR Genom Bioinform
                NAR Genom Bioinform
                nargab
                NAR Genomics and Bioinformatics
                Oxford University Press
                2631-9268
                September 2024
                05 July 2024
                05 July 2024
                : 6
                : 3
                : lqae073
                Affiliations
                Center for Public Health Genomics, School of Medicine, University of Virginia , Charlottesville, VA 22908, USA
                Department of Biomedical Engineering, School of Medicine, University of Virginia , Charlottesville, VA 22904, USA
                Center for Public Health Genomics, School of Medicine, University of Virginia , Charlottesville, VA 22908, USA
                Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia , Charlottesville, VA 22908, USA
                Child Health Research Center, School of Medicine, University of Virginia , Charlottesville, VA 22908, USA
                Department of Computer Science, School of Engineering, University of Virginia , Charlottesville, VA 22908, USA
                Center for Public Health Genomics, School of Medicine, University of Virginia , Charlottesville, VA 22908, USA
                Center for Public Health Genomics, School of Medicine, University of Virginia , Charlottesville, VA 22908, USA
                School of Data Science, University of Virginia , Charlottesville, VA 22904, USA
                School of Data Science, University of Virginia , Charlottesville, VA 22904, USA
                Department of Systems and Information Engineering, University of Virginia , Charlottesville, VA 22908, USA
                Department of Biomedical Engineering, School of Medicine, University of Virginia , Charlottesville, VA 22904, USA
                Department of Computer Science, School of Engineering, University of Virginia , Charlottesville, VA 22908, USA
                School of Data Science, University of Virginia , Charlottesville, VA 22904, USA
                Center for Public Health Genomics, School of Medicine, University of Virginia , Charlottesville, VA 22908, USA
                Department of Biomedical Engineering, School of Medicine, University of Virginia , Charlottesville, VA 22904, USA
                Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia , Charlottesville, VA 22908, USA
                Child Health Research Center, School of Medicine, University of Virginia , Charlottesville, VA 22908, USA
                Department of Computer Science, School of Engineering, University of Virginia , Charlottesville, VA 22908, USA
                School of Data Science, University of Virginia , Charlottesville, VA 22904, USA
                Department of Public Health Sciences, School of Medicine, University of Virginia , Charlottesville, VA 22908, USA
                Author notes
                To whom correspondence should be addressed. Email: nsheffield@ 123456virginia.edu
                Author information
                https://orcid.org/0000-0002-7354-7213
                https://orcid.org/0000-0002-2688-0988
                https://orcid.org/0000-0002-1287-4931
                https://orcid.org/0009-0009-2762-9048
                https://orcid.org/0000-0003-4444-2751
                https://orcid.org/0000-0002-9140-2632
                https://orcid.org/0000-0001-9723-3246
                https://orcid.org/0000-0001-5643-4068
                Article
                lqae073
                10.1093/nargab/lqae073
                11224678
                aa186207-d536-4760-ae67-330e2e7d0eff
                © The Author(s) 2024. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 27 October 2023
                : 29 April 2024
                : 20 June 2024
                Page count
                Pages: 8
                Funding
                Funded by: National Institute of General Medical Sciences, DOI 10.13039/100000057;
                Award ID: R35-GM128636
                Funded by: National Human Genome Research Institute, DOI 10.13039/100000051;
                Award ID: R01-HG012558
                Categories
                AcademicSubjects/SCI00030
                AcademicSubjects/SCI00980
                AcademicSubjects/SCI01060
                AcademicSubjects/SCI01140
                AcademicSubjects/SCI01180
                Standard Article
                Editor's Choice

                Comments

                Comment on this article