132
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation: The Illumina Infinium 450 k DNA Methylation Beadchip is a prime candidate technology for Epigenome-Wide Association Studies (EWAS). However, a difficulty associated with these beadarrays is that probes come in two different designs, characterized by widely different DNA methylation distributions and dynamic range, which may bias downstream analyses. A key statistical issue is therefore how best to adjust for the two different probe designs.

          Results: Here we propose a novel model-based intra-array normalization strategy for 450 k data, called BMIQ (Beta MIxture Quantile dilation), to adjust the beta-values of type2 design probes into a statistical distribution characteristic of type1 probes. The strategy involves application of a three-state beta-mixture model to assign probes to methylation states, subsequent transformation of probabilities into quantiles and finally a methylation-dependent dilation transformation to preserve the monotonicity and continuity of the data. We validate our method on cell-line data, fresh frozen and paraffin-embedded tumour tissue samples and demonstrate that BMIQ compares favourably with two competing methods. Specifically, we show that BMIQ improves the robustness of the normalization procedure, reduces the technical variation and bias of type2 probe values and successfully eliminates the type1 enrichment bias caused by the lower dynamic range of type2 probes. BMIQ will be useful as a preprocessing step for any study using the Illumina Infinium 450 k platform.

          Availability: BMIQ is freely available from http://code.google.com/p/bmiq/.

          Contact: a.teschendorff@ 123456ucl.ac.uk

          Supplementary information: Supplementary data are available at Bioinformatics online

          Related collections

          Most cited references18

          • Record: found
          • Abstract: found
          • Article: not found

          High density DNA methylation array with single CpG site resolution.

          We have developed a new generation of genome-wide DNA methylation BeadChip which allows high-throughput methylation profiling of the human genome. The new high density BeadChip can assay over 480K CpG sites and analyze twelve samples in parallel. The innovative content includes coverage of 99% of RefSeq genes with multiple probes per gene, 96% of CpG islands from the UCSC database, CpG island shores and additional content selected from whole-genome bisulfite sequencing data and input from DNA methylation experts. The well-characterized Infinium® Assay is used for analysis of CpG methylation using bisulfite-converted genomic DNA. We applied this technology to analyze DNA methylation in normal and tumor DNA samples and compared results with whole-genome bisulfite sequencing (WGBS) data obtained for the same samples. Highly comparable DNA methylation profiles were generated by the array and sequencing methods (average R2 of 0.95). The ability to determine genome-wide methylation patterns will rapidly advance methylation research. Copyright © 2011 Elsevier Inc. All rights reserved.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found

            Epigenome-wide association studies for common human diseases.

            Despite the success of genome-wide association studies (GWASs) in identifying loci associated with common diseases, a substantial proportion of the causality remains unexplained. Recent advances in genomic technologies have placed us in a position to initiate large-scale studies of human disease-associated epigenetic variation, specifically variation in DNA methylation. Such epigenome-wide association studies (EWASs) present novel opportunities but also create new challenges that are not encountered in GWASs. We discuss EWAS design, cohort and sample selections, statistical significance and power, confounding factors and follow-up studies. We also discuss how integration of EWASs with GWASs can help to dissect complex GWAS haplotypes for functional analysis.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              CpG islands in vertebrate genomes.

              Although vertebrate DNA is generally depleted in the dinucleotide CpG, it has recently been shown that some vertebrate genes contain CpG islands, regions of DNA with a high G+C content and a high frequency of CpG dinucleotides relative to the bulk genome. In this study, a large number of sequences of vertebrate genes were screened for the presence of CpG islands. Each CpG island was then analysed in terms of length, nucleotide composition, frequency of CpG dinucleotides, and location relative to the transcription unit of the associated gene. CpG islands were associated with the 5' ends of all housekeeping genes and many tissue-specific genes, and with the 3' ends of some tissue-specific genes. A few genes contained both 5' and 3' CpG islands, separated by several thousand base-pairs of CpG-depleted DNA. The 5' CpG islands extended through 5'-flanking DNA, exons and introns, whereas most of the 3' CpG islands appeared to be associated with exons. CpG islands were generally found in the same position relative to the transcription unit of equivalent genes in different species, with some notable exceptions. The locations of G/C boxes, composed of the sequence GGGCGG or its reverse complement CCGCCC, were investigated relative to the location of CpG islands. G/C boxes were found to be rare in CpG-depleted DNA and plentiful in CpG islands, where they occurred in 3' CpG islands, as well as in 5' CpG islands associated with tissue-specific and housekeeping genes. G/C boxes were located both upstream and downstream from the transcription start site of genes with 5' CpG islands. Thus, G/C boxes appeared to be a feature of CpG islands in general, rather than a feature of the promoter region of housekeeping genes. Two theories for the maintenance of a high frequency of CpG dinucleotides in CpG islands were tested: that CpG islands in methylated genomes are maintained, despite a tendency for 5mCpG to mutate by deamination to TpG+CpA, by the structural stability of a high G+C content alone, and that CpG islands associated with exons result from some selective importance of the arginine codon CGX. Neither of these theories could account for the distribution of CpG dinucleotides in the sequences analysed. Possible functions of CpG islands in transcriptional and post-transcriptional regulation of gene expression were discussed, and were related to theories for the maintenance of CpG islands as "methylation-free zones" in germline DNA.
                Bookmark

                Author and article information

                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                bioinfo
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                15 January 2013
                21 November 2012
                21 November 2012
                : 29
                : 2
                : 189-196
                Affiliations
                1Statistical Genomics Group, UCL Cancer Institute, University College London, London WC1E 6BT, UK, 2Department of Medicine, Unit of Computational Medicine, Centre for Molecular Medicine, Karolinska Institute, Solna 171 76, Stockholm, Sweden and 3Medical Genomics Group, UCL Cancer Institute, University College London, London WC1E 6BT, UK
                Author notes
                *To whom correspondence should be addressed

                Associate Editor: Olga Troyanskaya

                Article
                bts680
                10.1093/bioinformatics/bts680
                3546795
                23175756
                dd758219-98f1-434a-aaa6-ee1e373671ca
                © The Author 2012. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 27 June 2012
                : 9 October 2012
                : 16 November 2012
                Page count
                Pages: 8
                Categories
                Original Papers
                Gene Expression

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article