119
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A data-driven approach to preprocessing Illumina 450K methylation array data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          As the most stable and experimentally accessible epigenetic mark, DNA methylation is of great interest to the research community. The landscape of DNA methylation across tissues, through development and in disease pathogenesis is not yet well characterized. Thus there is a need for rapid and cost effective methods for assessing genome-wide levels of DNA methylation. The Illumina Infinium HumanMethylation450 (450K) BeadChip is a very useful addition to the available methods for DNA methylation analysis but its complex design, incorporating two different assay methods, requires careful consideration. Accordingly, several normalization schemes have been published. We have taken advantage of known DNA methylation patterns associated with genomic imprinting and X-chromosome inactivation (XCI), in addition to the performance of SNP genotyping assays present on the array, to derive three independent metrics which we use to test alternative schemes of correction and normalization. These metrics also have potential utility as quality scores for datasets.

          Results

          The standard index of DNA methylation at any specific CpG site is β = M/( M + U + 100) where M and U are methylated and unmethylated signal intensities, respectively. Betas ( βs) calculated from raw signal intensities (the default GenomeStudio behavior) perform well, but using 11 methylomic datasets we demonstrate that quantile normalization methods produce marked improvement, even in highly consistent data, by all three metrics. The commonly used procedure of normalizing betas is inferior to the separate normalization of M and U, and it is also advantageous to normalize Type I and Type II assays separately. More elaborate manipulation of quantiles proves to be counterproductive.

          Conclusions

          Careful selection of preprocessing steps can minimize variance and thus improve statistical power, especially for the detection of the small absolute DNA methylation changes likely associated with complex disease phenotypes. For the convenience of the research community we have created a user-friendly R software package called wateRmelon, downloadable from bioConductor, compatible with the existing methylumi, minfi and IMA packages, that allows others to utilize the same normalization methods and data quality tests on 450K data.

          Related collections

          Most cited references11

          • Record: found
          • Abstract: found
          • Article: not found

          High density DNA methylation array with single CpG site resolution.

          We have developed a new generation of genome-wide DNA methylation BeadChip which allows high-throughput methylation profiling of the human genome. The new high density BeadChip can assay over 480K CpG sites and analyze twelve samples in parallel. The innovative content includes coverage of 99% of RefSeq genes with multiple probes per gene, 96% of CpG islands from the UCSC database, CpG island shores and additional content selected from whole-genome bisulfite sequencing data and input from DNA methylation experts. The well-characterized Infinium® Assay is used for analysis of CpG methylation using bisulfite-converted genomic DNA. We applied this technology to analyze DNA methylation in normal and tumor DNA samples and compared results with whole-genome bisulfite sequencing (WGBS) data obtained for the same samples. Highly comparable DNA methylation profiles were generated by the array and sequencing methods (average R2 of 0.95). The ability to determine genome-wide methylation patterns will rapidly advance methylation research. Copyright © 2011 Elsevier Inc. All rights reserved.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome.

            DNA methylation is the most studied epigenetic mark and CpG methylation is central to many biological processes and human diseases. Since cancer has highlighted the contribution to disease of aberrant DNA methylation patterns, such as the presence of promoter CpG island hypermethylation-associated silencing of tumor suppressor genes and global DNA hypomethylation defects, their importance will surely become apparent in other pathologies. However, advances in obtaining comprehensive DNA methylomes are hampered by the high cost and time-consuming aspects of the single nucleotide methods currently available for whole genome DNA methylation analyses. Following the success of the standard CpG methylation microarrays for 1,505 CpG sites and 27,000 CpG sites, we have validated in vivo the newly developed 450,000 (450K) cytosine microarray (Illumina). The 450K microarray includes CpG and CNG sites, CpG islands/shores/shelves/open sea, non-coding RNA (microRNAs and long non-coding RNAs) and sites surrounding the transcription start sites (-200 bp to -1,500 bp, 5'-UTRs and exons 1) for coding genes, but also for the corresponding gene bodies and 3'-UTRs, in addition to intergenic regions derived from GWAS studies. Herein, we demonstrate that the 450K DNA methylation array can consistently and significantly detect CpG methylation changes in the HCT-116 colorectal cancer cell line in comparison with normal colon mucosa or HCT-116 cells with defective DNA methyltransferases (DKO). The provided validation highlights the potential use of the 450K DNA methylation microarray as a useful tool for ongoing and newly designed epigenome projects.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Complete pipeline for Infinium(®) Human Methylation 450K BeadChip data processing using subset quantile normalization for accurate DNA methylation estimation.

              Huge progress has been made in the development of array- or sequencing-based technologies for DNA methylation analysis. The Illumina Infinium(®) Human Methylation 450K BeadChip (Illumina Inc., CA, USA) allows the simultaneous quantitative monitoring of more than 480,000 CpG positions, enabling large-scale epigenotyping studies. However, the assay combines two different assay chemistries, which may cause a bias in the analysis if all signals are merged as a unique source of methylation measurement. We confirm in three 450K data sets that Infinium I signals are more stable and cover a wider dynamic range of methylation values than Infinium II signals. We evaluated the methylation profile of Infinium I and II probes obtained with different normalization protocols and compared these results with the methylation values of a subset of CpGs analyzed by pyrosequencing. We developed a subset quantile normalization approach for the processing of 450K BeadChips. The Infinium I signals were used as 'anchors' to normalize Infinium II signals at the level of probe coverage categories. Our normalization approach outperformed alternative normalization or correction approaches in terms of bias correction and methylation signal estimation. We further implemented a complete preprocessing protocol that solves most of the issues currently raised by 450K array users. We developed a complete preprocessing pipeline for 450K BeadChip data using an original subset quantile normalization approach that performs both sample normalization and efficient Infinium I/II shift correction. The scripts, being freely available from the authors, will allow researchers to concentrate on the biological analysis of data, such as the identification of DNA methylation signatures.
                Bookmark

                Author and article information

                Journal
                BMC Genomics
                BMC Genomics
                BMC Genomics
                BioMed Central
                1471-2164
                2013
                1 May 2013
                : 14
                : 293
                Affiliations
                [1 ]Social, Genetic and Developmental Psychiatry,Institute of Psychiatry, King's College London, De Crespigny Park, London, UK
                [2 ]University of Exeter Medical School, Exeter, UK
                Article
                1471-2164-14-293
                10.1186/1471-2164-14-293
                3769145
                23631413
                3cab8200-d63b-4c6d-a628-ee831887e59c
                Copyright ©2013 Pidsley et al.; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 25 March 2013
                : 4 April 2013
                Categories
                Methodology Article

                Genetics
                Genetics

                Comments

                Comment on this article