222
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Genome-wide identification of in vivo protein–DNA binding sites from ChIP-Seq data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          ChIP-Seq, which combines chromatin immunoprecipitation (ChIP) with ultra high-throughput massively parallel sequencing, is increasingly being used for mapping protein–DNA interactions in-vivo on a genome scale. Typically, short sequence reads from ChIP-Seq are mapped to a reference genome for further analysis. Although genomic regions enriched with mapped reads could be inferred as approximate binding regions, short read lengths (∼25–50 nt) pose challenges for determining the exact binding sites within these regions. Here, we present SISSRs ( Site Identification from Short Sequence Read s), a novel algorithm for precise identification of binding sites from short reads generated from ChIP-Seq experiments. The sensitivity and specificity of SISSRs are demonstrated by applying it on ChIP-Seq data for three widely studied and well-characterized human transcription factors: CTCF (CCCTC-binding factor), NRSF (neuron-restrictive silencer factor) and STAT1 (signal transducer and activator of transcription protein 1). We identified 26 814, 5813 and 73 956 binding sites for CTCF, NRSF and STAT1 proteins, respectively, which is 32, 299 and 78% more than that inferred previously for the respective proteins. Motif analysis revealed that an overwhelming majority of the identified binding sites contained the previously established consensus binding sequence for the respective proteins, thus attesting for SISSRs’ accuracy. SISSRs’ sensitivity and precision facilitated further analyses of ChIP-Seq data revealing interesting insights, which we believe will serve as guidance for designing ChIP-Seq experiments to map in vivo protein–DNA interactions. We also show that tag densities at the binding sites are a good indicator of protein–DNA binding affinity, which could be used to distinguish and characterize strong and weak binding sites. Using tag density as an indicator of DNA-binding affinity, we have identified core residues within the NRSF and CTCF binding sites that are critical for a stronger DNA binding.

          Related collections

          Most cited references20

          • Record: found
          • Abstract: found
          • Article: not found

          Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome.

          Insulator elements affect gene expression by preventing the spread of heterochromatin and restricting transcriptional enhancers from activation of unrelated promoters. In vertebrates, insulator's function requires association with the CCCTC-binding factor (CTCF), a protein that recognizes long and diverse nucleotide sequences. While insulators are critical in gene regulation, only a few have been reported. Here, we describe 13,804 CTCF-binding sites in potential insulators of the human genome, discovered experimentally in primary human fibroblasts. Most of these sequences are located far from the transcriptional start sites, with their distribution strongly correlated with genes. The majority of them fit to a consensus motif highly conserved and suitable for predicting possible insulators driven by CTCF in other vertebrate genomes. In addition, CTCF localization is largely invariant across different cell types. Our results provide a resource for investigating insulator function and possible other general and evolutionarily conserved activities of CTCF sites.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Combining evidence using p-values: application to sequence homology searches.

            To illustrate an intuitive and statistically valid method for combining independent sources of evidence that yields a p-value for the complete evidence, and to apply it to the problem of detecting simultaneous matches to multiple patterns in sequence homology searches. In sequence analysis, two or more (approximately) independent measures of the membership of a sequence (or sequence region) in some class are often available. We would like to estimate the likelihood of the sequence being a member of the class in view of all the available evidence. An example is estimating the significance of the observed match of a macromolecular sequence (DNA or protein) to a set of patterns (motifs) that characterize a biological sequence family. An intuitive way to do this is to express each piece of evidence as a p-value, and then use the product of these p-values as the measure of membership in the family. We derive a formula and algorithm (QFAST) for calculating the statistical distribution of the product of n independent p-values. We demonstrate that sorting sequences by this p-value effectively combines the information present in multiple motifs, leading to highly accurate and sensitive sequence homology searches.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The protein CTCF is required for the enhancer blocking activity of vertebrate insulators.

              An insulator is a DNA sequence that can act as a barrier to the influences of neighboring cis-acting elements, preventing gene activation, for example, when located between an enhancer and a promoter. We have identified a 42 bp fragment of the chicken beta-globin insulator that is both necessary and sufficient for enhancer blocking activity in human cells. We show that this sequence is the binding site for CTCF, a previously identified eleven-zinc finger DNA-binding protein that is highly conserved in vertebrates. CTCF sites are present in all of the vertebrate enhancer-blocking elements we have examined. We suggest that directional enhancer blocking by CTCF is a conserved component of gene regulation in vertebrates.
                Bookmark

                Author and article information

                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                September 2008
                6 August 2008
                6 August 2008
                : 36
                : 16
                : 5221-5231
                Affiliations
                Laboratory of Molecular Immunology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20894, USA
                Author notes
                *To whom correspondence should be addressed. Tel: +1 301 496 2098; Fax: +1 301 480 0961; Email: zhaok@ 123456nhlbi.nih.gov
                Article
                gkn488
                10.1093/nar/gkn488
                2532738
                18684996
                2e3749c3-1dc7-4aa0-950f-6cadfcf12c72
                © 2008 The Author(s)

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 15 May 2008
                : 3 July 2008
                : 16 July 2008
                Categories
                Computational Biology

                Genetics
                Genetics

                Comments

                Comment on this article