52
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Mapping single-cell data to reference atlases by transfer learning

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Large single-cell atlases are now routinely generated to serve as references for analysis of smaller-scale studies. Yet learning from reference data is complicated by batch effects between datasets, limited availability of computational resources and sharing restrictions on raw data. Here we introduce a deep learning strategy for mapping query datasets on top of a reference called single-cell architectural surgery (scArches). scArches uses transfer learning and parameter optimization to enable efficient, decentralized, iterative reference building and contextualization of new datasets with existing references without sharing raw data. Using examples from mouse brain, pancreas, immune and whole-organism atlases, we show that scArches preserves biological state information while removing batch effects, despite using four orders of magnitude fewer parameters than de novo integration. scArches generalizes to multimodal reference mapping, allowing imputation of missing modalities. Finally, scArches retains coronavirus disease 2019 (COVID-19) disease variation when mapping to a healthy reference, enabling the discovery of disease-specific cell states. scArches will facilitate collaborative projects by enabling iterative construction, updating, sharing and efficient use of reference atlases.

          Abstract

          Single-cell data are readily integrated with cell atlases using scArches.

          Related collections

          Most cited references73

          • Record: found
          • Abstract: found
          • Article: not found

          Comprehensive Integration of Single-Cell Data

          Single-cell transcriptomics has transformed our ability to characterize cell states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets to better understand cellular identity and function. Here, we develop a strategy to "anchor" diverse datasets together, enabling us to integrate single-cell measurements not only across scRNA-seq technologies, but also across different modalities. After demonstrating improvement over existing methods for integrating scRNA-seq data, we anchor scRNA-seq experiments with scATAC-seq to explore chromatin differences in closely related interneuron subsets and project protein expression measurements onto a bone marrow atlas to characterize lymphocyte populations. Lastly, we harmonize in situ gene expression and scRNA-seq datasets, allowing transcriptome-wide imputation of spatial gene expression patterns. Our work presents a strategy for the assembly of harmonized references and transfer of information across datasets.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Integrated analysis of multimodal single-cell data

            Summary The simultaneous measurement of multiple modalities represents an exciting frontier for single-cell genomics and necessitates computational methods that can define cellular states based on multimodal data. Here, we introduce “weighted-nearest neighbor” analysis, an unsupervised framework to learn the relative utility of each data type in each cell, enabling an integrative analysis of multiple modalities. We apply our procedure to a CITE-seq dataset of 211,000 human peripheral blood mononuclear cells (PBMCs) with panels extending to 228 antibodies to construct a multimodal reference atlas of the circulating immune system. Multimodal analysis substantially improves our ability to resolve cell states, allowing us to identify and validate previously unreported lymphoid subpopulations. Moreover, we demonstrate how to leverage this reference to rapidly map new datasets and to interpret immune responses to vaccination and coronavirus disease 2019 (COVID-19). Our approach represents a broadly applicable strategy to analyze single-cell multimodal datasets and to look beyond the transcriptome toward a unified and multimodal definition of cellular identity.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Fast, sensitive, and accurate integration of single cell data with Harmony

              The emerging diversity of single cell RNAseq datasets allows for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. However, it is challenging to analyze them together, particularly when datasets are assayed with different technologies. Here, real biological differences are interspersed with technical differences. We present Harmony, an algorithm that projects cells into a shared embedding in which cells group by cell type rather than dataset-specific conditions. Harmony simultaneously accounts for multiple experimental and biological factors. In six analyses, we demonstrate the superior performance of Harmony to previously published algorithms. We show that Harmony requires dramatically fewer computational resources. It is the only currently available algorithm that makes the integration of ~106 cells feasible on a personal computer. We apply Harmony to PBMCs from datasets with large experimental differences, 5 studies of pancreatic islet cells, mouse embryogenesis datasets, and cross-modality spatial integration.
                Bookmark

                Author and article information

                Contributors
                fabian.theis@helmholtz-muenchen.de
                Journal
                Nat Biotechnol
                Nat Biotechnol
                Nature Biotechnology
                Nature Publishing Group US (New York )
                1087-0156
                1546-1696
                30 August 2021
                30 August 2021
                2022
                : 40
                : 1
                : 121-130
                Affiliations
                [1 ]GRID grid.4567.0, ISNI 0000 0004 0483 2525, Helmholtz Center Munich—German Research Center for Environmental Health, Institute of Computational Biology, ; Neuherberg, Germany
                [2 ]GRID grid.6936.a, ISNI 0000000123222966, School of Life Sciences Weihenstephan, , Technical University of Munich, ; Munich, Germany
                [3 ]GRID grid.6936.a, ISNI 0000000123222966, Department of Computer Science, , Technical University of Munich, ; Munich, Germany
                [4 ]GRID grid.47840.3f, ISNI 0000 0001 2181 7878, Center for Computational Biology, , University of California, Berkeley, ; Berkeley, CA USA
                [5 ]GRID grid.47840.3f, ISNI 0000 0001 2181 7878, Department of Electrical Engineering and Computer Sciences, , University of California, Berkeley, ; Berkeley, CA USA
                [6 ]GRID grid.499295.a, ISNI 0000 0004 9234 0175, Chan Zuckerberg Biohub, ; San Francisco, CA USA
                [7 ]GRID grid.461656.6, ISNI 0000 0004 0489 3491, Ragon Institute of MGH, MIT and Harvard, ; Cambridge, MA USA
                [8 ]GRID grid.5949.1, ISNI 0000 0001 2172 9288, Institute of Medical Informatics, , University of Münster, ; Münster, Germany
                [9 ]GRID grid.6936.a, ISNI 0000000123222966, Department of Mathematics, , Technical University of Munich, ; Munich, Germany
                [10 ]GRID grid.16753.36, ISNI 0000 0001 2299 3507, Division of Pulmonary and Critical Care Medicine, Feinberg School of Medicine, , Northwestern University, ; Chicago, IL USA
                Author information
                http://orcid.org/0000-0001-6858-7985
                http://orcid.org/0000-0003-0465-0126
                http://orcid.org/0000-0001-7464-7921
                http://orcid.org/0000-0002-6189-3792
                http://orcid.org/0000-0002-7790-8936
                http://orcid.org/0000-0001-9537-0845
                http://orcid.org/0000-0001-9004-1225
                http://orcid.org/0000-0003-2879-3789
                http://orcid.org/0000-0002-2419-1943
                Article
                1001
                10.1038/s41587-021-01001-7
                8763644
                34462589
                5e4d3727-e2d3-45ea-b8d1-a9e715daf641
                © The Author(s) 2021

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 30 July 2020
                : 28 June 2021
                Funding
                Funded by: FundRef https://doi.org/10.13039/501100001659, Deutsche Forschungsgemeinschaft (German Research Foundation);
                Award ID: ZT-I-0007
                Award Recipient :
                Categories
                Analysis
                Custom metadata
                © The Author(s), under exclusive licence to Springer Nature America, Inc. 2022

                Biotechnology
                machine learning,data integration
                Biotechnology
                machine learning, data integration

                Comments

                Comment on this article