1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Transfer learning compensates limited data, batch effects and technological heterogeneity in single-cell sequencing

      research-article
      , ,
      NAR Genomics and Bioinformatics
      Oxford University Press

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Tremendous advances in next-generation sequencing technology have enabled the accumulation of large amounts of omics data in various research areas over the past decade. However, study limitations due to small sample sizes, especially in rare disease clinical research, technological heterogeneity and batch effects limit the applicability of traditional statistics and machine learning analysis. Here, we present a meta-transfer learning approach to transfer knowledge from big data and reduce the search space in data with small sample sizes. Few-shot learning algorithms integrate meta-learning to overcome data scarcity and data heterogeneity by transferring molecular pattern recognition models from datasets of unrelated domains. We explore few-shot learning models with large scale public dataset, TCGA (The Cancer Genome Atlas) and GTEx dataset, and demonstrate their potential as pre-training dataset in other molecular pattern recognition tasks. Our results show that meta-transfer learning is very effective for datasets with a limited sample size. Furthermore, we show that our approach can transfer knowledge across technological heterogeneity, for example, from bulk cell to single-cell data. Our approach can overcome study size constraints, batch effects and technical limitations in analyzing single-cell data by leveraging existing bulk-cell sequencing data.

          Related collections

          Most cited references31

          • Record: found
          • Abstract: found
          • Article: not found

          Comprehensive Integration of Single-Cell Data

          Single-cell transcriptomics has transformed our ability to characterize cell states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets to better understand cellular identity and function. Here, we develop a strategy to "anchor" diverse datasets together, enabling us to integrate single-cell measurements not only across scRNA-seq technologies, but also across different modalities. After demonstrating improvement over existing methods for integrating scRNA-seq data, we anchor scRNA-seq experiments with scATAC-seq to explore chromatin differences in closely related interneuron subsets and project protein expression measurements onto a bone marrow atlas to characterize lymphocyte populations. Lastly, we harmonize in situ gene expression and scRNA-seq datasets, allowing transcriptome-wide imputation of spatial gene expression patterns. Our work presents a strategy for the assembly of harmonized references and transfer of information across datasets.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            SCENIC: Single-cell regulatory network inference and clustering

            Although single-cell RNA-seq is revolutionizing biology, data interpretation remains a challenge. We present SCENIC for the simultaneous reconstruction of gene regulatory networks and identification of cell states. We apply SCENIC to a compendium of single-cell data from tumors and brain, and demonstrate that the genomic regulatory code can be exploited to guide the identification of transcription factors and cell states. SCENIC provides critical biological insights into the mechanisms driving cellular heterogeneity.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Next-generation characterization of the Cancer Cell Line Encyclopedia

              Large panels of comprehensively characterized human cancer models, including the Cancer Cell Line Encyclopedia (CCLE), have provided a rigorous backbone upon which to study genetic variants, candidate targets, small molecule and biological therapeutics and to identify new marker-driven cancer dependencies. To improve our understanding of the molecular features that contribute to cancer phenotypes including drug responses, here we have expanded the characterizations of cancer cell lines to include genetic, RNA splicing, DNA methylation, histone H3 modification, microRNA expression and reverse-phase protein array data for 1,072 cell lines from various lineages and ethnicities. Integrating these data with functional characterizations such as drug-sensitivity data, short hairpin RNA knockdown and CRISPR–Cas9 knockout data reveals potential targets for cancer drugs and associated biomarkers. Together, this dataset and an accompanying public data portal provide a resource to accelerate cancer research using model cancer cell lines.
                Bookmark

                Author and article information

                Contributors
                Journal
                NAR Genom Bioinform
                NAR Genom Bioinform
                nargab
                NAR Genomics and Bioinformatics
                Oxford University Press
                2631-9268
                December 2021
                12 November 2021
                12 November 2021
                : 3
                : 4
                : lqab104
                Affiliations
                Data Science in Biomedicine, Faculty of Mathematics and Computer Science, Philipps-University of Marburg , Marburg 35039, Germany
                Data Science in Biomedicine, Faculty of Mathematics and Computer Science, Philipps-University of Marburg , Marburg 35039, Germany
                Data Science in Biomedicine, Faculty of Mathematics and Computer Science, Philipps-University of Marburg , Marburg 35039, Germany
                Author notes
                To whom correspondence should be addressed. Tel: +49 6421 2821579; Email: dominik.heider@ 123456uni-marburg.de
                Correspondence may also be addressed to Anne-Christin Hauschild. Email: hauschild@ 123456uni-marburg.de
                Author information
                https://orcid.org/0000-0002-3108-8311
                Article
                lqab104
                10.1093/nargab/lqab104
                8598306
                3923970a-a898-4e6b-ac76-b8e1200ac16d
                © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License ( https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@ 123456oup.com

                History
                : 31 July 2021
                : 07 October 2021
                : 18 October 2021
                Page count
                Pages: 9
                Funding
                Funded by: EU Framework Programme for Research and Innovation H2020, DOI 10.13039/100010661;
                Award ID: 826078
                Categories
                AcademicSubjects/SCI00030
                AcademicSubjects/SCI00980
                AcademicSubjects/SCI01060
                AcademicSubjects/SCI01140
                AcademicSubjects/SCI01180
                Standard Article

                Comments

                Comment on this article