Getting Started in Tiling Microarray Analysis

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Introduction The availability of sequenced eukaryotic genomes and commercial oligonucleotide tiling microarrays has enabled many genomics applications. Different from expression microarrays, tiling microarrays have probes that cover the entire genome or contigs of the genome in an unbiased fashion. Currently three commercial sources provide tiling microarrays with different probe lengths and spacing, and array design characteristics. Affymetrix tiles 6 million 25-mer probes per array, which offers the lowest price per probe and the highest resolution (chromosomal distance between neighboring probe centers). Its arrays use one-color assays, so individual samples are hybridized to different arrays. NimbleGen can tile 385,000 50- to 75-mer probes, and Agilent can tile 244,000 60-mer probes per array. The latter two platforms, with longer oligonucleotide probes and two-color assays for which treatment and control samples are differentially labeled and put on the same array for competitive hybridization, have slightly better sensitivity. They are also flexible for custom array design, especially Agilent's multiplex arrays, which allow multiple samples to hybridize on different subareas of the same array. These tiling arrays offer diverse genomic applications, each with its own data analysis challenges. ChIP-Chip The most popular application for the tiling array platform is ChIP-chip, which maps the genome-wide binding locations of transcription factors and other DNA-binding proteins. In a ChIP-chip experiment, chromatin is crosslinked and fragmented to approximately 500 bp. An antibody to the protein of interest is used to precipitate the protein together with its interacting DNA (chromatin immunoprecipitation, or “ChIP”). The coprecipitated DNA is detected on a DNA microarray (the “chip”) and mapped back to the genome [1,2]. In complex genomes, DNA-binding proteins often have thousands of binding sites throughout the genome, so genome tiling microarrays from Affymetrix [3], NimbleGen [4], and Agilent [5] can be used for unbiased binding site mapping. For ChIP-chip on Affymetrix tiling microarrays, MAT (model-based analysis of tiling arrays) [6] is a very effective peak-finding algorithm. MAT standardizes probe behavior by its 25-mer probe sequence and genome copy number, and can work even without replicate ChIP or control samples. Occasionally Affymetrix genome tiling microarrays have blob-like image defects, which are visible when the array image is converted to a data .cel file. If users encounter array images with blob defects, they are advised to use a “microarray blob remover” [7] to detect and remove affected probes before running MAT. For NimbleGen tiling microarrays, TAMAL [8] is the best algorithm for locating binding sites, while MA2C [9] and TileScope [10] offer alternatives that are more user-friendly and flexible. For Agilent tiling arrays, the joint binding deconvolution [11] algorithm can detect ChIP-chip peaks, in addition providing finer peak spatial resolution than Agilent array tiling resolution. After the ChIP-chip peaks are detected, biologists often want to find the sequence-specific binding motifs of their protein of interests. MEME [12] and Gibbs Motif Sampler [13] are the most popular tools for de novo motif discovery. As an alternative, biologists could use the cis-regulatory element annotation system [14] to annotate large-scale ChIP-chip data in human and mouse, such as retrieving ChIP-chip sequences, mapping nearby genes, plotting sequence conservation figures, and finding enriched known transcription factor motifs. For a more generalized genomics annotation pipeline, Galaxy (http://main.g2.bx.psu.edu/) offers more customized and interactive features to analyze additional sequenced genomes. MeDIP-Chip and DNase-Chip DNA methylation status often controls gene transcription status, and genome-wide DNA methylation sites can be mapped using methyl–DNA immunoprecipitation followed by microarray (MeDIP-chip). MeDIP-chip is similar to ChIP-chip in protocol, except that an antibody against 5-methyl-cytosine is used to directly precipitate methylated DNA [15,16]. Peak identification and annotation of MeDIP-chip experiments can be conducted with methods similar to ChIP-chip. The methylation level measured by MeDIP-chip should be calibrated by the GC content of the region, since poorly methylated CG-rich regions might still have a higher number of methyl-Cs to MeDIP than fully methylated CG-poor regions. DNase-hypersensitive regions in the genome are often open chromatin harboring transcriptionally active or regulatory regions, which can be located using DNase-chip. Relying on the assumption that open chromatin is cleaved more often by DNase over a short distance, this experiment involves digesting chromatin with DNase I, isolating DNA fragments created by two DNase cleavages less than 1,200 bp apart, and hybridizing the DNA to tiling microarrays [17]. The resulting tiling array data can be analyzed with a regular ChIP-chip peak-finding algorithm, although window size needs to be adjusted based on the DNA fragment length distribution resulting from the level of DNase digestion. Nucleosome Localization A nucleosome, which consists of ∼146 bp of DNA wrapped around eight histone proteins, forms the fundamental structural unit of eukaryotic chromatin. Since nucleosomes limit DNA accessibility to regulatory factors, it is important to map positioned nucleosomes or nucleosome-free regions in the genome. Nucleosome mapping experiments involve digesting the chromatin with micrococcal nuclease to remove the linker DNA between neighboring nucleosomes, and isolating the remaining nucleosomal DNA to be labeled and hybridized to a tiling microarray. The controls for such experiments are often naked genomic DNA (without chromatin structure) cleaved with hydroxyl radicals or micrococcal nuclease to the same size distribution. Unlike ChIP-chip, the occupancy difference between positioned nucleosomes and linker regions is often less than 10-fold, and positioned nucleosomes occupy only about 100–200 bp of DNA. This requires the tiling microarray to have both high sensitivity and high resolution. Long oligonucleotide microarrays tiled at 5–20 bp resolution are often custom-made to cover selected genomic regions (e.g., promoters or a few megabases on a chromosome) for this application. In a nucleosome mapping study conducted in yeast Chromosome III [18], a hidden Markov model was applied. The model defines a stretch of probes with low signals as linkers, six to eight probes that span approximately 146 bp with high signals as well-positioned nucleosomes, and more than eight probes with intermediately high signals as delocalized nucleosomes. A Viterbi algorithm is used to infer the most likely partition of probes along the chromosome into the different nucleosomal states. In a similar study conducted in human promoters [19], wavelet transformation was first used to remove noise from the probe signal, which eliminated the high frequency and low coefficient signals. Laplacian Gaussian edge detection was applied to the smoothed probe signal curve to detect peaks and troughs (zero first derivatives) with a reasonable width as positioned nucleosomes and linker regions, respectively. ArrayCGH and Copy Number Variation In an array-based comparative genome hybridization (arrayCGH) experiment, DNA from normal and diseased individuals are differentially hybridized to microarrays to identify copy number variations in the genome that are potential biomarkers or causal genes of disease [20]. Early microarrays used in arrayCGH studies have long (e.g., BAC clones) and/or sparse probes to cover the genome. Recently, tiling microarrays have been used to improve the copy number variation detection sensitivity and resolution [21]. One method proposes a structural change model to use dynamic programming to segment the genome into a number of regions with different copy numbers; within each region the probe signals (thus genome copy number) are similar [22]. However, selecting the number of regions could be difficult for big genomes with complex copy number variations. Hidden Markov model is also a plausible approach to infer the hidden copy number based on observed probe values. One complication that all arrayCGH applications need to reconcile with is that sample impurities (e.g., patient DNA degradation or heterogeneous tumor DNA) sometimes give rise to noisy signals or non-integer copy numbers. Transcriptome Mapping Hybridizing RNA samples to tiling microarrays is gaining popularity for detecting novel transcripts in sequenced genomes. Early studies often called positive probes based on a probe signal cutoff [23], then defined stretches of genomic regions with a significant number of positive probes as transfrags (transcribed fragments). One study on yeast 4-bp resolution tiling arrays adopted a structural change model similar to that used in arrayCGH [24]. In a more recent study profiling multiple Drosophila embryogenesis stages on genome tiling microarrays, a Kruskal-Wallis test (a nonparametric analog of one-way ANOVA) was used to detect a stretch of probes giving differential expression among conditions [25]. In addition, the study checked neighboring transfrags with correlated expression in different conditions to find novel 5′, 3′, or internal exons of previously annotated genes. With more transcriptome conditions profiled at better tiling resolution, more advanced algorithms can be developed to refine transfrag borders and detect differential expression, alternative splicing, and antisense transcripts. Prospective All commercial tiling microarray companies strive to put more probes on the array at reduced cost. This trend seems to follow the Moore's Law observed in the semiconductor industry, which dictates that chips double their density at half the cost every 18 months. A few years from now might see tiling microarrays covering the whole mammalian genome at single-base resolution that cost only a few thousand dollars. Tiling arrays will have much wider applications, and researchers might use them for different experiments and informatically select a subset of the probes for analysis. At the same time, high-throughput sequencing technologies such as 454, Illumina Solexa, and ABI SOLiD are making fast progress as well. If enough coverage can be achieved at a cost similar to tiling microarrays, they might give more sensitive and unbiased results. These technologies each entail different challenges and opportunities for computational biologists to develop efficient analysis algorithms. The competition between the different technology companies will inevitably benefit researchers regardless of the winner. Therefore, we look forward to a very exciting decade of genomics advances ahead.

Related collections

Most cited references 17

Record: found
Abstract: found
Article: not found

Genome-wide analysis of estrogen receptor binding sites.

Jason Carroll, Clifford Meyer, Jun. Song … (2006)

The estrogen receptor is the master transcriptional regulator of breast cancer phenotype and the archetype of a molecular therapeutic target. We mapped all estrogen receptor and RNA polymerase II binding sites on a genome-wide scale, identifying the authentic cis binding sites and target genes, in breast cancer cells. Combining this unique resource with gene expression data demonstrates distinct temporal mechanisms of estrogen-mediated gene regulation, particularly in the case of estrogen-suppressed genes. Furthermore, this resource has allowed the identification of cis-regulatory sites in previously unexplored regions of the genome and the cooperating transcription factors underlying estrogen signaling in breast cancer.

0 comments Cited 385 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Genome-scale identification of nucleosome positions in S. cerevisiae.

G.-C. Yuan (2005)

The positioning of nucleosomes along chromatin has been implicated in the regulation of gene expression in eukaryotic cells, because packaging DNA into nucleosomes affects sequence accessibility. We developed a tiled microarray approach to identify at high resolution the translational positions of 2278 nucleosomes over 482 kilobases of Saccharomyces cerevisiae DNA, including almost all of chromosome III and 223 additional regulatory regions. The majority of the nucleosomes identified were well-positioned. We found a stereotyped chromatin organization at Pol II promoters consisting of a nucleosome-free region approximately 200 base pairs upstream of the start codon flanked on both sides by positioned nucleosomes. The nucleosome-free sequences were evolutionarily conserved and were enriched in poly-deoxyadenosine or poly-deoxythymidine sequences. Most occupied transcription factor binding motifs were devoid of nucleosomes, strongly suggesting that nucleosome positioning is a global determinant of transcription factor access.

0 comments Cited 379 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Global identification of human transcribed sequences with genome tiling arrays.

Paul Bertone, Viktor Stolc, Thomas E Royce … (2004)

Elucidating the transcribed regions of the genome constitutes a fundamental aspect of human biology, yet this remains an outstanding problem. To comprehensively identify coding sequences, we constructed a series of high-density oligonucleotide tiling arrays representing sense and antisense strands of the entire nonrepetitive sequence of the human genome. Transcribed sequences were located across the genome via hybridization to complementary DNA samples, reverse-transcribed from polyadenylated RNA obtained from human liver tissue. In addition to identifying many known and predicted genes, we found 10,595 transcribed sequences not detected by other methods. A large fraction of these are located in intergenic regions distal from previously annotated genes and exhibit significant homology to other mammalian proteins.

0 comments Cited 332 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

: Role: Editor

Journal

Journal ID (nlm-ta): PLoS Comput Biol

Journal ID (publisher-id): pcbi

Title: PLoS Computational Biology

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Print): 1553-734X

ISSN (Electronic): 1553-7358

Publication date (Print): October 2007

Publication date (Electronic): 26 October 2007

Volume: 3

Issue: 10

Electronic Location Identifier: e183

Affiliations

Princeton University, United States of America

Author notes

X. Shirley Liu is with the Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard School of Public Health, Boston, Massachusetts, United States of America. E-mail: xsliu@ 123456jimmy.harvard.edu

Article

Publisher ID: 07-PLCB-MI-0241R1 Serial Item and Contribution ID: plcb-03-10-01

DOI: 10.1371/journal.pcbi.0030183

PMC ID: 2041964

PubMed ID: 17967045

SO-VID: 91fd61b0-c6db-47a1-98b0-76daecb5f45f

Copyright © Copyright: © 2007 X. Shirley Liu. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Page count

Pages: 3

Custom metadata

citation Liu XS (2007) Getting started in tiling microarray analysis. PLoS Comput Biol 3(10): e183. doi: 10.1371/journal.pcbi.0030183

Getting Started in Tiling Microarray Analysis

Read this article at

Abstract

Related collections

Journal of Systems Thinking Preprints

Most cited references 17

Genome-wide analysis of estrogen receptor binding sites.

Genome-scale identification of nucleosome positions in S. cerevisiae.

Global identification of human transcribed sequences with genome tiling arrays.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Custom metadata

Comments

Comment on this article

Similar content 18

Cited by 15

Most referenced authors 489