Genome-wide identification of in vivo protein–DNA binding sites from ChIP-Seq data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

ChIP-Seq, which combines chromatin immunoprecipitation (ChIP) with ultra high-throughput massively parallel sequencing, is increasingly being used for mapping protein–DNA interactions in-vivo on a genome scale. Typically, short sequence reads from ChIP-Seq are mapped to a reference genome for further analysis. Although genomic regions enriched with mapped reads could be inferred as approximate binding regions, short read lengths (∼25–50 nt) pose challenges for determining the exact binding sites within these regions. Here, we present SISSRs ( Site Identification from Short Sequence Read s), a novel algorithm for precise identification of binding sites from short reads generated from ChIP-Seq experiments. The sensitivity and specificity of SISSRs are demonstrated by applying it on ChIP-Seq data for three widely studied and well-characterized human transcription factors: CTCF (CCCTC-binding factor), NRSF (neuron-restrictive silencer factor) and STAT1 (signal transducer and activator of transcription protein 1). We identified 26 814, 5813 and 73 956 binding sites for CTCF, NRSF and STAT1 proteins, respectively, which is 32, 299 and 78% more than that inferred previously for the respective proteins. Motif analysis revealed that an overwhelming majority of the identified binding sites contained the previously established consensus binding sequence for the respective proteins, thus attesting for SISSRs’ accuracy. SISSRs’ sensitivity and precision facilitated further analyses of ChIP-Seq data revealing interesting insights, which we believe will serve as guidance for designing ChIP-Seq experiments to map in vivo protein–DNA interactions. We also show that tag densities at the binding sites are a good indicator of protein–DNA binding affinity, which could be used to distinguish and characterize strong and weak binding sites. Using tag density as an indicator of DNA-binding affinity, we have identified core residues within the NRSF and CTCF binding sites that are critical for a stronger DNA binding.

Related collections

Most cited references 20

Record: found
Abstract: found
Article: not found

Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome.

Tae Hoon Kim, Ziedulla K Abdullaev, Andrew D. M. Smith … (2007)

Insulator elements affect gene expression by preventing the spread of heterochromatin and restricting transcriptional enhancers from activation of unrelated promoters. In vertebrates, insulator's function requires association with the CCCTC-binding factor (CTCF), a protein that recognizes long and diverse nucleotide sequences. While insulators are critical in gene regulation, only a few have been reported. Here, we describe 13,804 CTCF-binding sites in potential insulators of the human genome, discovered experimentally in primary human fibroblasts. Most of these sequences are located far from the transcriptional start sites, with their distribution strongly correlated with genes. The majority of them fit to a consensus motif highly conserved and suitable for predicting possible insulators driven by CTCF in other vertebrate genomes. In addition, CTCF localization is largely invariant across different cell types. Our results provide a resource for investigating insulator function and possible other general and evolutionarily conserved activities of CTCF sites.

0 comments Cited 377 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Combining evidence using p-values: application to sequence homology searches.

Sean T. Bailey, M Gribskov, Swneke D Bailey (1997)

To illustrate an intuitive and statistically valid method for combining independent sources of evidence that yields a p-value for the complete evidence, and to apply it to the problem of detecting simultaneous matches to multiple patterns in sequence homology searches. In sequence analysis, two or more (approximately) independent measures of the membership of a sequence (or sequence region) in some class are often available. We would like to estimate the likelihood of the sequence being a member of the class in view of all the available evidence. An example is estimating the significance of the observed match of a macromolecular sequence (DNA or protein) to a set of patterns (motifs) that characterize a biological sequence family. An intuitive way to do this is to express each piece of evidence as a p-value, and then use the product of these p-values as the measure of membership in the family. We derive a formula and algorithm (QFAST) for calculating the statistical distribution of the product of n independent p-values. We demonstrate that sorting sequences by this p-value effectively combines the information present in multiple motifs, leading to highly accurate and sensitive sequence homology searches.

0 comments Cited 336 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The protein CTCF is required for the enhancer blocking activity of vertebrate insulators.

Adam Bell, Adam West, Gary Felsenfeld (1999)

An insulator is a DNA sequence that can act as a barrier to the influences of neighboring cis-acting elements, preventing gene activation, for example, when located between an enhancer and a promoter. We have identified a 42 bp fragment of the chicken beta-globin insulator that is both necessary and sufficient for enhancer blocking activity in human cells. We show that this sequence is the binding site for CTCF, a previously identified eleven-zinc finger DNA-binding protein that is highly conserved in vertebrates. CTCF sites are present in all of the vertebrate enhancer-blocking elements we have examined. We suggest that directional enhancer blocking by CTCF is a conserved component of gene regulation in vertebrates.

0 comments Cited 286 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Nucleic Acids Res

Journal ID (iso-abbrev): Nucleic Acids Res

Journal ID (publisher-id): nar

Journal ID (hwp): nar

Title: Nucleic Acids Research

Publisher: Oxford University Press

ISSN (Print): 0305-1048

ISSN (Electronic): 1362-4962

Publication date (Print): September 2008

Publication date (Electronic): 6 August 2008

Publication date PMC-release: 6 August 2008

Volume: 36

Issue: 16

Pages: 5221-5231

Affiliations

Laboratory of Molecular Immunology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20894, USA

Author notes

*To whom correspondence should be addressed. Tel: +1 301 496 2098; Fax: +1 301 480 0961; Email: zhaok@ 123456nhlbi.nih.gov

Article

Publisher ID: gkn488

DOI: 10.1093/nar/gkn488

PMC ID: 2532738

PubMed ID: 18684996

SO-VID: 2e3749c3-1dc7-4aa0-950f-6cadfcf12c72

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 15 May 2008

Date revision received : 3 July 2008

Date accepted : 16 July 2008

Comments

Comment on this article

scite_

Cited by 213

See all cited by

Most referenced authors 1,115

See all reference authors

Genome-wide identification of in vivo protein–DNA binding sites from ChIP-Seq data

Read this article at

Abstract

Related collections

Genome Integrity

Most cited references 20

Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome.

Combining evidence using p-values: application to sequence homology searches.

The protein CTCF is required for the enhancer blocking activity of vertebrate insulators.

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 41

Cited by 213

Most referenced authors 1,115