4
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      XOmiVAE: an interpretable deep learning model for cancer classification using high-dimensional omics data

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The lack of explainability is one of the most prominent disadvantages of deep learning applications in omics. This ‘black box’ problem can undermine the credibility and limit the practical implementation of biomedical deep learning models. Here we present XOmiVAE, a variational autoencoder (VAE)-based interpretable deep learning model for cancer classification using high-dimensional omics data. XOmiVAE is capable of revealing the contribution of each gene and latent dimension for each classification prediction and the correlation between each gene and each latent dimension. It is also demonstrated that XOmiVAE can explain not only the supervised classification but also the unsupervised clustering results from the deep learning network. To the best of our knowledge, XOmiVAE is one of the first activation level-based interpretable deep learning models explaining novel clusters generated by VAE. The explainable results generated by XOmiVAE were validated by both the performance of downstream tasks and the biomedical knowledge. In our experiments, XOmiVAE explanations of deep learning-based cancer classification and clustering aligned with current domain knowledge including biological annotation and academic literature, which shows great potential for novel biomedical knowledge discovery from deep learning models.

          Related collections

          Most cited references50

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

          In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0550-8) contains supplementary material, which is available to authorized users.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

            Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Deep learning.

              Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
                Bookmark

                Author and article information

                Contributors
                Journal
                Brief Bioinform
                Brief Bioinform
                bib
                Briefings in Bioinformatics
                Oxford University Press
                1467-5463
                1477-4054
                November 2021
                17 August 2021
                17 August 2021
                : 22
                : 6
                : bbab315
                Affiliations
                Data Science Institute Imperial College London , SW7 2AZ London, UK
                Department of Health Informatics University College London , WC1E 6BT London, UK
                Data Science Institute Imperial College London , SW7 2AZ London, UK
                Data Science Institute Imperial College London , SW7 2AZ London, UK
                Data Science Institute Imperial College London , SW7 2AZ London, UK
                Department of Computer Science Hong Kong Baptist University , Hong Kong China
                Author notes
                Corresponding author: Xiaoyu Zhang, Data Science Institute, Imperial College London, SW7 2AZ London, UK. Tel: +44 0207 594 8630; Fax: +44 0207 594 8630; E-mail: x.zhang18@ 123456imperial.ac.uk

                Eloise Withnell and Xiaoyu Zhang authors contributed equally to this paper.

                Article
                bbab315
                10.1093/bib/bbab315
                8575033
                34402865
                a5707857-0d09-4c2f-b2d7-70ff27aac811
                © The Author(s) 2021. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 26 May 2021
                : 4 July 2021
                : 20 July 2021
                Page count
                Pages: 11
                Funding
                Funded by: European Union’s Horizon 2020 Research and Innovation Programme;
                Award ID: 764281
                Categories
                Problem Solving Protocol
                AcademicSubjects/SCI01060

                Bioinformatics & Computational biology
                explainable artificial intelligence,deep learning,cancer classification,omics data,gene expression

                Comments

                Comment on this article