7
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Efficient inference for sparse latent variable models of transcriptional regulation

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation

          Regulation of gene expression in prokaryotes involves complex co-regulatory mechanisms involving large numbers of transcriptional regulatory proteins and their target genes. Uncovering these genome-scale interactions constitutes a major bottleneck in systems biology. Sparse latent factor models, assuming activity of transcription factors (TFs) as unobserved, provide a biologically interpretable modelling framework, integrating gene expression and genome-wide binding data, but at the same time pose a hard computational inference problem. Existing probabilistic inference methods for such models rely on subjective filtering and suffer from scalability issues, thus are not well-suited for realistic genome-scale applications.

          Results

          We present a fast Bayesian sparse factor model, which takes input gene expression and binding sites data, either from ChIP-seq experiments or motif predictions, and outputs active TF-gene links as well as latent TF activities. Our method employs an efficient variational Bayes scheme for model inference enabling its application to large datasets which was not feasible with existing MCMC-based inference methods for such models. We validate our method on synthetic data against a similar model in the literature, employing MCMC for inference, and obtain comparable results with a small fraction of the computational time. We also apply our method to large-scale data from Mycobacterium tuberculosis involving ChIP-seq data on 113 TFs and matched gene expression data for 3863 putative target genes. We evaluate our predictions using an independent transcriptomics experiment involving over-expression of TFs.

          Availability and implementation

          An easy-to-use Jupyter notebook demo of our method with data is available at https://github.com/zhenwendai/SITAR.

          Supplementary information

          Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references18

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond

          RegulonDB (http://regulondb.ccg.unam.mx) is one of the most useful and important resources on bacterial gene regulation,as it integrates the scattered scientific knowledge of the best-characterized organism, Escherichia coli K-12, in a database that organizes large amounts of data. Its electronic format enables researchers to compare their results with the legacy of previous knowledge and supports bioinformatics tools and model building. Here, we summarize our progress with RegulonDB since our last Nucleic Acids Research publication describing RegulonDB, in 2013. In addition to maintaining curation up-to-date, we report a collection of 232 interactions with small RNAs affecting 192 genes, and the complete repertoire of 189 Elementary Genetic Sensory-Response units (GENSOR units), integrating the signal, regulatory interactions, and metabolic pathways they govern. These additions represent major progress to a higher level of understanding of regulated processes. We have updated the computationally predicted transcription factors, which total 304 (184 with experimental evidence and 120 from computational predictions); we updated our position-weight matrices and have included tools for clustering them in evolutionary families. We describe our semiautomatic strategy to accelerate curation, including datasets from high-throughput experiments, a novel coexpression distance to search for ‘neighborhood’ genes to known operons and regulons, and computational developments.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Rv3133c/dosR is a transcription factor that mediates the hypoxic response of Mycobacterium tuberculosis.

            Unlike many pathogens that are overtly harmful to their hosts, Mycobacterium tuberculosis can persist for years within humans in a clinically latent state. Latency is often linked to hypoxic conditions within the host. Among M. tuberculosis genes induced by hypoxia is a putative transcription factor, Rv3133c/DosR. We performed targeted disruption of this locus followed by transcriptome analysis of wild-type and mutant bacilli. Nearly all the genes powerfully regulated by hypoxia require Rv3133c/DosR for their induction. Computer analysis identified a consensus motif, a variant of which is located upstream of nearly all M. tuberculosis genes rapidly induced by hypoxia. Further, Rv3133c/DosR binds to the two copies of this motif upstream of the hypoxic response gene alpha-crystallin. Mutations within the binding sites abolish both Rv3133c/DosR binding as well as hypoxic induction of a downstream reporter gene. Also, mutation experiments with Rv3133c/DosR confirmed sequence-based predictions that the C-terminus is responsible for DNA binding and that the aspartate at position 54 is essential for function. Together, these results demonstrate that Rv3133c/DosR is a transcription factor of the two-component response regulator class, and that it is the primary mediator of a hypoxic signal within M. tuberculosis.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The Mycobacterium tuberculosis regulatory network and hypoxia.

              We have taken the first steps towards a complete reconstruction of the Mycobacterium tuberculosis regulatory network based on ChIP-Seq and combined this reconstruction with system-wide profiling of messenger RNAs, proteins, metabolites and lipids during hypoxia and re-aeration. Adaptations to hypoxia are thought to have a prominent role in M. tuberculosis pathogenesis. Using ChIP-Seq combined with expression data from the induction of the same factors, we have reconstructed a draft regulatory network based on 50 transcription factors. This network model revealed a direct interconnection between the hypoxic response, lipid catabolism, lipid anabolism and the production of cell wall lipids. As a validation of this model, in response to oxygen availability we observe substantial alterations in lipid content and changes in gene expression and metabolites in corresponding metabolic pathways. The regulatory network reveals transcription factors underlying these changes, allows us to computationally predict expression changes, and indicates that Rv0081 is a regulatory hub.
                Bookmark

                Author and article information

                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                01 December 2017
                26 August 2017
                26 August 2017
                : 33
                : 23
                : 3776-3783
                Affiliations
                [1 ]Department of Computer Science, University of Sheffield, Sheffield, UK
                [2 ]Amazon Research, Cambridge, UK
                [3 ]Division of Informatics, Imaging & Data Sciences, Faculty of Biology, Medicine, and Health Sciences, University of Manchester, Manchester, UK
                Author notes
                [*]

                Associate Editor: Jonathan Wren

                [* ]To whom correspondence should be addressed.

                The authors wish it to be known that, in their opinion, Zhenwen Dai and Mudassar Iqbal authors should be regarded as Joint First Authors.

                Author information
                http://orcid.org/0000-0002-5006-4331
                Article
                btx508
                10.1093/bioinformatics/btx508
                5860323
                28961802
                a4e4a1e3-2d21-4e7f-902f-6e079a338954
                © The Author 2017. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 27 March 2017
                : 24 July 2017
                : 25 August 2017
                Page count
                Pages: 8
                Funding
                Funded by: National Institute of Allergy and Infectious Diseases 10.13039/100000060
                Funded by: Medical Research Council 10.13039/501100000265
                Funded by: MRC 10.13039/501100000265
                Award ID: MR/M012174/1
                Categories
                Original Papers
                Systems Biology

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article