1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Joint analysis of expression levels and histological images identifies genes associated with tissue morphology

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Histopathological images are used to characterize complex phenotypes such as tumor stage. Our goal is to associate features of stained tissue images with high-dimensional genomic markers. We use convolutional autoencoders and sparse canonical correlation analysis (CCA) on paired histological images and bulk gene expression to identify subsets of genes whose expression levels in a tissue sample correlate with subsets of morphological features from the corresponding sample image. We apply our approach, ImageCCA, to two TCGA data sets, and find gene sets associated with the structure of the extracellular matrix and cell wall infrastructure, implicating uncharacterized genes in extracellular processes. We find sets of genes associated with specific cell types, including neuronal cells and cells of the immune system. We apply ImageCCA to the GTEx v6 data, and find image features that capture population variation in thyroid and in colon tissues associated with genetic variants (image morphology QTLs, or imQTLs), suggesting that genetic variation regulates population variation in tissue morphological traits.

          Abstract

          Image features from histological slides can be used as informative endophenotypes in association studies for tissue-localized pathologies. Here, the authors develop ImageCCA, a framework for joint analysis of paired gene expression and histology data derived from automatically extracted image features.

          Related collections

          Most cited references35

          • Record: found
          • Abstract: not found
          • Article: not found

          Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Gene Ontology: tool for the unification of biology

            Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

              Background RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. Results We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. Conclusions RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.
                Bookmark

                Author and article information

                Contributors
                bee@princeton.edu
                Journal
                Nat Commun
                Nat Commun
                Nature Communications
                Nature Publishing Group UK (London )
                2041-1723
                11 March 2021
                11 March 2021
                2021
                : 12
                : 1609
                Affiliations
                [1 ]GRID grid.16750.35, ISNI 0000 0001 2097 5006, Department of Computer Science, , Princeton University, ; Princeton, NJ USA
                [2 ]GRID grid.16750.35, ISNI 0000 0001 2097 5006, Lewis-Sigler Institute for Integrative Genomics, , Princeton University, ; Princeton, NJ USA
                [3 ]GRID grid.16750.35, ISNI 0000 0001 2097 5006, Center for Statistics and Machine Learning, , Princeton University, ; Princeton, NJ USA
                Author information
                http://orcid.org/0000-0002-0724-218X
                http://orcid.org/0000-0002-6139-7334
                Article
                21727
                10.1038/s41467-021-21727-x
                7952575
                33707455
                1863a584-e8f3-4da9-8546-56fb41f9b85a
                © The Author(s) 2021

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 21 August 2017
                : 5 February 2021
                Funding
                Funded by: FundRef https://doi.org/10.13039/100000050, U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI);
                Award ID: HL133218
                Award Recipient :
                Funded by: FundRef https://doi.org/10.13039/100000879, Alfred P. Sloan Foundation;
                Funded by: FundRef https://doi.org/10.13039/100000001, National Science Foundation (NSF);
                Award ID: AWD1005627
                Award Recipient :
                Categories
                Article
                Custom metadata
                © The Author(s) 2021

                Uncategorized
                image processing,machine learning,transcriptomics
                Uncategorized
                image processing, machine learning, transcriptomics

                Comments

                Comment on this article