XOmiVAE: an interpretable deep learning model for cancer classification using high-dimensional omics data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The lack of explainability is one of the most prominent disadvantages of deep learning applications in omics. This ‘black box’ problem can undermine the credibility and limit the practical implementation of biomedical deep learning models. Here we present XOmiVAE, a variational autoencoder (VAE)-based interpretable deep learning model for cancer classification using high-dimensional omics data. XOmiVAE is capable of revealing the contribution of each gene and latent dimension for each classification prediction and the correlation between each gene and each latent dimension. It is also demonstrated that XOmiVAE can explain not only the supervised classification but also the unsupervised clustering results from the deep learning network. To the best of our knowledge, XOmiVAE is one of the first activation level-based interpretable deep learning models explaining novel clusters generated by VAE. The explainable results generated by XOmiVAE were validated by both the performance of downstream tasks and the biomedical knowledge. In our experiments, XOmiVAE explanations of deep learning-based cancer classification and clustering aligned with current domain knowledge including biological annotation and academic literature, which shows great potential for novel biomedical knowledge discovery from deep learning models.

Related collections

Most cited references 50

Record: found
Abstract: found
Article: found

Is Open Access

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Michael Love, Wolfgang Huber, Simon Anders (2014)

In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0550-8) contains supplementary material, which is available to authorized users.

0 comments Cited 24702 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

A. Subramanian, P. Tamayo, V. K. Mootha … (2005)

Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

0 comments Cited 13360 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Deep learning.

Yann LeCun, Yoshua Bengio, Geoffrey E Hinton (2015)

Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

0 comments Cited 8858 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Title: Briefings in Bioinformatics

Publisher: Oxford University Press (OUP)

ISSN (Print): 1467-5463

ISSN (Electronic): 1477-4054

Publication date Created: November 2021

Publication date Created: November 05 2021

Publication date Other: November 2021

Publication date (Print): November 05 2021

Publication date (Electronic): August 17 2021

Volume: 22

Issue: 6

Affiliations

[1 ]Data Science Institute Imperial College London, SW7 2AZ London, UK

[2 ]Department of Health Informatics University College London, WC1E 6BT London, UK

[3 ]Department of Computer Science Hong Kong Baptist University, Hong Kong China

Article

DOI: 10.1093/bib/bbab315

SO-VID: a5707857-0d09-4c2f-b2d7-70ff27aac811

License:

https://creativecommons.org/licenses/by/4.0/

History

Data availability:

Comments

Comment on this article

scite_

Cited by 13

See all cited by

Most referenced authors 3,166

See all reference authors

- Version 1

XOmiVAE: an interpretable deep learning model for cancer classification using high-dimensional omics data

Read this article at

Abstract

Related collections

UCL: UN SDG 03 Good Health and Well-Being

Most cited references 50

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

Deep learning.

Author and article information

Journal

Affiliations

Article

History

Comments

Comment on this article

Similar content 186

Cited by 13

Most referenced authors 3,166