19
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Discovering epistatic feature interactions from neural network models of regulatory DNA sequences

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation

          Transcription factors bind regulatory DNA sequences in a combinatorial manner to modulate gene expression. Deep neural networks (DNNs) can learn the cis-regulatory grammars encoded in regulatory DNA sequences associated with transcription factor binding and chromatin accessibility. Several feature attribution methods have been developed for estimating the predictive importance of individual features (nucleotides or motifs) in any input DNA sequence to its associated output prediction from a DNN model. However, these methods do not reveal higher-order feature interactions encoded by the models.

          Results

          We present a new method called Deep Feature Interaction Maps (DFIM) to efficiently estimate interactions between all pairs of features in any input DNA sequence. DFIM accurately identifies ground truth motif interactions embedded in simulated regulatory DNA sequences. DFIM identifies synergistic interactions between GATA1 and TAL1 motifs from in vivo TF binding models. DFIM reveals epistatic interactions involving nucleotides flanking the core motif of the Cbf1 TF in yeast from in vitro TF binding models. We also apply DFIM to regulatory sequence models of in vivo chromatin accessibility to reveal interactions between regulatory genetic variants and proximal motifs of target TFs as validated by TF binding quantitative trait loci. Our approach makes significant strides in improving the interpretability of deep learning models for genomics.

          Availability and implementation

          Code is available at: https://github.com/kundajelab/dfim.

          Supplementary information

          Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references9

          • Record: found
          • Abstract: found
          • Article: not found

          Pooled ChIP-Seq Links Variation in Transcription Factor Binding to Complex Disease Risk.

          Cis-regulatory elements such as transcription factor (TF) binding sites can be identified genome-wide, but it remains far more challenging to pinpoint genetic variants affecting TF binding. Here, we introduce a pooling-based approach to mapping quantitative trait loci (QTLs) for molecular-level traits. Applying this to five TFs and a histone modification, we mapped thousands of cis-acting QTLs, with over 25-fold lower cost compared to standard QTL mapping. We found that single genetic variants frequently affect binding of multiple TFs, and CTCF can recruit all five TFs to its binding sites. These QTLs often affect local chromatin and transcription but can also influence long-range chromosomal contacts, demonstrating a role for natural genetic variation in chromosomal architecture. Thousands of these QTLs have been implicated in genome-wide association studies, providing candidate molecular mechanisms for many disease risk loci and suggesting that TF binding variation may underlie a large fraction of human phenotypic variation.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Comprehensive, high-resolution binding energy landscapes reveal context dependencies of transcription factor binding

            Significance Transcription factors (TFs) are key proteins that bind DNA targets to coordinate gene expression in cells. Understanding how TFs recognize their DNA targets is essential for predicting how variations in regulatory sequence disrupt transcription to cause disease. Here, we develop a high-throughput assay and analysis pipeline capable of measuring binding energies for over one million sequences with high resolution and apply it toward understanding how nucleotides flanking DNA targets affect binding energies for two model yeast TFs. Through systematic comparisons between models trained on these data, we establish that considering dinucleotide (DN) interactions is sufficient to accurately predict binding and further show that sites used by TFs in vivo are both energetically and mutationally distant from the highest affinity sequence.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Exploiting genetic variation to uncover rules of transcription factor binding and chromatin accessibility

              Single-nucleotide variants that underlie phenotypic variation can affect chromatin occupancy of transcription factors (TFs). To delineate determinants of in vivo TF binding and chromatin accessibility, we introduce an approach that compares ChIP-seq and DNase-seq data sets from genetically divergent murine erythroid cell lines. The impact of discriminatory single-nucleotide variants on TF ChIP signal enables definition at single base resolution of in vivo binding characteristics of nuclear factors GATA1, TAL1, and CTCF. We further develop a facile complementary approach to more deeply test the requirements of critical nucleotide positions for TF binding by combining CRISPR-Cas9-mediated mutagenesis with ChIP and targeted deep sequencing. Finally, we extend our analytical pipeline to identify nearby contextual DNA elements that modulate chromatin binding by these three TFs, and to define sequences that impact kb-scale chromatin accessibility. Combined, our approaches reveal insights into the genetic basis of TF occupancy and their interplay with chromatin features.
                Bookmark

                Author and article information

                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                01 September 2018
                08 September 2018
                08 September 2018
                : 34
                : 17
                : i629-i637
                Affiliations
                [1 ]Biomedical Informatics Training Program, Stanford University, Stanford, CA
                [2 ]Genetics, Stanford University, Stanford, CA
                [3 ]Bioengineering, Stanford University, Stanford, CA
                [4 ]Chan Zuckerberg Biohub, San Francisco, CA, USA
                [5 ]Chem-H Institute, Stanford University, Stanford, CA, USA
                [6 ]Computer Science, Stanford University, Stanford, CA, USA
                Author notes
                To whom correspondence should be addressed. E-mail: akundaje@ 123456stanford.edu
                Article
                bty575
                10.1093/bioinformatics/bty575
                6129272
                30423062
                a417b03a-b84a-49e1-8cf7-c3b69079df10
                © The Author(s) 2018. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                Page count
                Pages: 9
                Funding
                Funded by: BioX Stanford Interdisciplinary Graduate Fellowship
                Funded by: SIGF
                Funded by: National Science Foundation Graduate Research Fellowship
                Funded by: National Institute of Health
                Award ID: 1DP2GM123485
                Award ID: 1U01HG009431
                Award ID: 1R01HG00967401
                Categories
                Eccb 2018: European Conference on Computational Biology Proceedings
                Genes

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article