0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A statistical learning method for simultaneous copy number estimation and subclone clustering with single-cell sequencing data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The availability of single-cell sequencing (SCS) enables us to assess intra-tumor heterogeneity and identify cellular subclones without the confounding effect of mixed cells. Copy number aberrations (CNAs) have been commonly used to identify subclones in SCS data using various clustering methods, as cells comprising a subpopulation are found to share a genetic profile. However, currently available methods may generate spurious results (e.g., falsely identified variants) in the procedure of CNA detection, thereby diminishing the accuracy of subclone identification within a large, complex cell population. In this study, we developed a subclone clustering method based on a fused lasso model, referred to as FLCNA, which can simultaneously detect CNAs in single-cell DNA sequencing (scDNA-seq) data. Spike-in simulations were conducted to evaluate the clustering and CNA detection performance of FLCNA, benchmarking it against existing copy number estimation methods (SCOPE, HMMcopy) in combination with commonly used clustering methods. Application of FLCNA to a scDNA-seq data set of breast cancer revealed different genomic variation patterns in neoadjuvant chemotherapy-treated samples and pretreated samples. We show that FLCNA is a practical and powerful method for subclone identification and CNA detection with scDNA-seq data.

          Related collections

          Most cited references58

          • Record: found
          • Abstract: found
          • Article: not found

          Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

          Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Cytoscape: a software environment for integrated models of biomolecular interaction networks.

            Cytoscape is an open source software project for integrating biomolecular interaction networks with high-throughput expression data and other molecular states into a unified conceptual framework. Although applicable to any system of molecular components and interactions, Cytoscape is most powerful when used in conjunction with large databases of protein-protein, protein-DNA, and genetic interactions that are increasingly available for humans and model organisms. Cytoscape's software Core provides basic functionality to layout and query the network; to visually integrate the network with expression profiles, phenotypes, and other molecular states; and to link the network to databases of functional annotations. The Core is extensible through a straightforward plug-in architecture, allowing rapid development of additional computational analyses and features. Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              KEGG as a reference resource for gene and protein annotation

              KEGG (http://www.kegg.jp/ or http://www.genome.jp/kegg/) is an integrated database resource for biological interpretation of genome sequences and other high-throughput data. Molecular functions of genes and proteins are associated with ortholog groups and stored in the KEGG Orthology (KO) database. The KEGG pathway maps, BRITE hierarchies and KEGG modules are developed as networks of KO nodes, representing high-level functions of the cell and the organism. Currently, more than 4000 complete genomes are annotated with KOs in the KEGG GENES database, which can be used as a reference data set for KO assignment and subsequent reconstruction of KEGG pathways and other molecular networks. As an annotation resource, the following improvements have been made. First, each KO record is re-examined and associated with protein sequence data used in experiments of functional characterization. Second, the GENES database now includes viruses, plasmids, and the addendum category for functionally characterized proteins that are not represented in complete genomes. Third, new automatic annotation servers, BlastKOALA and GhostKOALA, are made available utilizing the non-redundant pangenome data set generated from the GENES database. As a resource for translational bioinformatics, various data sets are created for antimicrobial resistance and drug interaction networks.
                Bookmark

                Author and article information

                Journal
                Genome Res
                Genome Res
                genome
                GENOME
                Genome Research
                Cold Spring Harbor Laboratory Press
                1088-9051
                1549-5469
                January 2024
                : 34
                : 1
                : 85-93
                Affiliations
                [1 ]Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, South Carolina 29208, USA;
                [2 ]Department of Environmental Health Science, Arnold School of Public Health, University of South Carolina, Columbia, South Carolina 29208, USA;
                [3 ]Department of Quantitative Sciences, Baylor College of Medicine, Houston, Texas 77030, USA;
                [4 ]Department of Biostatistics, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, Florida 32603, USA
                Author notes
                Corresponding author: feifeixiao@ 123456ufl.edu
                Author information
                http://orcid.org/0000-0003-3678-2879
                http://orcid.org/0000-0002-8540-7023
                http://orcid.org/0000-0002-1597-4719
                Article
                9509184
                10.1101/gr.278098.123
                10903939
                38290978
                3e61c550-bc05-496a-afa5-4ba5cf6cf9ca
                © 2024 Qin et al.; Published by Cold Spring Harbor Laboratory Press

                This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

                History
                : 15 May 2023
                : 8 January 2024
                Page count
                Pages: 9
                Funding
                Funded by: National Institutes of Health , doi 10.13039/100000002;
                Award ID: R21 HG010925
                Categories
                Methods

                Comments

                Comment on this article