Joint analysis of expression levels and histological images identifies genes associated with tissue morphology

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Histopathological images are used to characterize complex phenotypes such as tumor stage. Our goal is to associate features of stained tissue images with high-dimensional genomic markers. We use convolutional autoencoders and sparse canonical correlation analysis (CCA) on paired histological images and bulk gene expression to identify subsets of genes whose expression levels in a tissue sample correlate with subsets of morphological features from the corresponding sample image. We apply our approach, ImageCCA, to two TCGA data sets, and find gene sets associated with the structure of the extracellular matrix and cell wall infrastructure, implicating uncharacterized genes in extracellular processes. We find sets of genes associated with specific cell types, including neuronal cells and cells of the immune system. We apply ImageCCA to the GTEx v6 data, and find image features that capture population variation in thyroid and in colon tissues associated with genetic variants (image morphology QTLs, or imQTLs), suggesting that genetic variation regulates population variation in tissue morphological traits.

Abstract

Image features from histological slides can be used as informative endophenotypes in association studies for tissue-localized pathologies. Here, the authors develop ImageCCA, a framework for joint analysis of paired gene expression and histology data derived from automatically extracted image features.

Related collections

Most cited references 35

Record: found
Abstract: not found
Article: not found

Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Yoav Benjamini, Yosef Hochberg (1995)

0 comments Cited 24842 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Gene Ontology: tool for the unification of biology

Michael Ashburner, Catherine A. Ball, Judith Blake … (2002)

Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

0 comments Cited 15740 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

Bo Li, Colin Dewey (2011)

Background RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. Results We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. Conclusions RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.

0 comments Cited 4913 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Barbara E. Engelhardt:

ORCID: http://orcid.org/0000-0002-6139-7334

bee@princeton.edu

Journal

Journal ID (nlm-ta): Nat Commun

Journal ID (iso-abbrev): Nat Commun

Title: Nature Communications

Publisher: Nature Publishing Group UK (London )

ISSN (Electronic): 2041-1723

Publication date (Electronic): 11 March 2021

Publication date PMC-release: 11 March 2021

Publication date Collection: 2021

Volume: 12

Electronic Location Identifier: 1609

Affiliations

[1 ]GRID grid.16750.35, ISNI 0000 0001 2097 5006, Department of Computer Science, , Princeton University, ; Princeton, NJ USA

[2 ]GRID grid.16750.35, ISNI 0000 0001 2097 5006, Lewis-Sigler Institute for Integrative Genomics, , Princeton University, ; Princeton, NJ USA

[3 ]GRID grid.16750.35, ISNI 0000 0001 2097 5006, Center for Statistics and Machine Learning, , Princeton University, ; Princeton, NJ USA

Author information

Daniel Munro http://orcid.org/0000-0002-0724-218X

Barbara E. Engelhardt http://orcid.org/0000-0002-6139-7334

Article

Publisher ID: 21727

DOI: 10.1038/s41467-021-21727-x

PMC ID: 7952575

PubMed ID: 33707455

SO-VID: 1863a584-e8f3-4da9-8546-56fb41f9b85a

License:

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

History

Date received : 21 August 2017

Date accepted : 5 February 2021

Funding

Funded by: FundRef https://doi.org/10.13039/100000050, U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI);

Award ID: HL133218

Award Recipient : Barbara E. Engelhardt

Funded by: FundRef https://doi.org/10.13039/100000879, Alfred P. Sloan Foundation;

Funded by: FundRef https://doi.org/10.13039/100000001, National Science Foundation (NSF);

Award ID: AWD1005627

Award Recipient : Barbara E. Engelhardt

Custom metadata

ScienceOpen disciplines: Uncategorized

Keywords: image processing,machine learning,transcriptomics

Data availability:

ScienceOpen disciplines: Uncategorized

Keywords: image processing, machine learning, transcriptomics

Comments

Comment on this article

scite_

Cited by 18

See all cited by

Most referenced authors 1,948

See all reference authors

Joint analysis of expression levels and histological images identifies genes associated with tissue morphology

Read this article at

Abstract

Abstract

Related collections

Annual Reviews AI, Machine Learning, and Society

Most cited references 35

Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Gene Ontology: tool for the unification of biology

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 178

Cited by 18

Most referenced authors 1,948