40
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Commentary: The seven plagues of epigenetic epidemiology

      article-commentary
      1 , 2 , * , 3
      International Journal of Epidemiology
      Oxford University Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Epigenetics is being increasingly combined with epidemiology to add mechanistic understanding to associations observed between environmental, genetic and stochastic factors and human disease phenotypes. Currently, epigenetic epidemiological studies primarily focus on exploring if and where the epigenome (i.e. the overall epigenetic state of a cell) is influenced by specific environmental exposures like prenatal nutrition, 1 sun exposure 2 and smoking. 3 In this issue of the IJE, Nada Borghol et al. 4 report an association between childhood social-economic status (SES) and differential DNA methylation in adulthood. Low SES may integrate diverse and heterogeneous environmental influences, and knowing which epigenetic changes are associated with low SES may provide clues about the biological processes underlying its health consequences. The authors stress that their study is preliminary. This statement is, in fact, to a greater or lesser extent applicable to the entire first wave of studies currently being published that likewise aim to discover associations between epigenetic variation measured on a genome-wide scale and environmental exposures or disease phenotypes. When executing such epigenome-wide association studies (EWASs), 5 every epigenetic epidemiologist is struggling with the same biological, technical and methodological issues. It is important to take these into consideration when designing a study and interpreting the results. Let us consider seven of those issues, taking the current study on SES as a starting point. We do not really know where to look, or what to look for Most epigenetic epidemiological studies focus on DNA methylation for various practical and biological reasons, neglecting other layers of the epigenome-like histone modifications that are also likely to be important in influencing disease phenotypes. Our basic understanding of the methylome (i.e. the whole of DNA methylation marks on the genome) is in its infancy, and we are still learning about the specific localization of the features that, when differentially methylated, regulate gene expression and are thus relevant for epigenetic epidemiologists to study. The current study, like many others, evaluated promoter regions, in this case defined as 1000 bp upstream to 250 bp downstream of transcription start sites. Although these features are often enriched for DNA methylation marks influencing the expression of genes, recent work suggests that other regions of the methylome outside of promoters, including inter-genic CpG island shores 6 and intra-genic CpG islands, 7 may ultimately be more important for regulating phenotypic variation. For any differentially methylated region identified in EWASs it will be important to demonstrate functionality. Promoter methylation in the current study was integrated with public gene expression data and, as expected, highly expressed genes were more commonly flanked by less methylated promoters and vice versa. A limitation is that this observation is for groups of promoters, whereas information is needed about this relationship for individual promoters. Mining the reference epigenomes and transcriptomes that are being generated for different cell types under the umbrella of initiatives such as the National Institutes of Health (NIH) Epigenomics Roadmap 8 and the International Human Epigenome Consortium 9 may contribute to such information. Additional in vitro experiments will be required to evaluate the transcriptional effects of differential DNA methylation at a specific locus independent of its genomic context. 10 We have to rely on imperfect technology The good news is that recent advances in genomic technology mean that genome-scale studies of DNA methylation across multiple samples are now feasible. In practice, however, one has to compromise between coverage and precision in epidemiological studies, which likely incorporate a large number of samples. A large (and growing) number of methods exist for assessing DNA methylation both genome wide and at specific CpG sites, 11 and one problem relates to our inability to compare results across studies that have used different platforms. On the one hand there are methods such as that used in the current study in which the methylated portion of the genome is captured using antibodies against methylated DNA and subsequently quantified using microarrays or next-generation sequencing. These approaches can provide coverage across most of the genome and may be optimally suited to discriminate low from high methylation, but have lower reliability for smaller differences and are biased by factors such as CG density. 12 , 13 On the other hand, there are methods based on the bisulphite conversion of DNA combined with next-generation sequencing that provide higher accuracy and single nucleotide resolution. Although whole-genome bisulphite sequencing is currently unfeasible to use across large epidemiological cohorts, the method can be adapted to target a reduced representation of the genome (approximately 3 million out of approximately 28 million CG dinucleotides in the human genome). 12 , 13 The recently launched Illumina 450 k Methylation Beadchip may offer a balance between coverage and precision, which will be attractive for epidemiological EWASs executed during the next few years. 5 It interrogates DNA methylation at over 480 000 CG dinucleotides, is high-throughput and relatively affordable. The precision of this platform appears to compare well with some of the other platforms, 12 , 13 but these results should be interpreted with caution. Although correlation coefficients reported across the various platform comparisons are high, they are mainly driven by the fact that the large majority of the genome is either unmethylated or fully methylated, and substantial discrepancies between platforms may exist for intermediate level methylation. 12 , 14 Therefore, the technological validation of findings using an independent method remains important. This will be feasible for a small number of ‘top hits’, like the three procadherin promoters assessed in the current study. However, validating the outcomes of the complex pathway analyses performed to implicate either entire biological processes (such as extra- and intra-cellular signalling in the current study) or genomic features with a specific function in gene regulation [e.g. promoters, enhancers, inter/intragenic CG island (shores) etc.], is more demanding and currently not realized. Validating the results of such gene-set testing methods will entail the re-assessment of DNA methylation across large sets of loci. We may be limited by available sample sizes that are optimal for epigenetic epidemiology The current study investigated only 40 individuals. Investigators will be able to secure budgets for larger studies as empirical data increasingly highlight the value of epigenetic epidemiology, and high-throughput, economical laboratory approaches become more widely adopted. Nevertheless, it is unlikely that the simple brute-force approach that has been used relatively successfully in genome-wide association studies (GWASs) is valid for EWASs. In genetics, many of the epidemiological principles about designing studies with respect to selection biases, confounding, batch effects and appropriateness of controls could largely be replaced by the simple rule ‘bigger-is-better’. This is not true for epigenetic epidemiology, because the epigenome is not a static entity like the genome, which necessitates the use of more conventional epidemiological approaches. 15 Further complicating matters is the fact that, for the most powerful study designs in epigenetic epidemiology (including studies of discordant monozygotic twins 16 particularly when longitudinally sampled, 17 early exposure studies with long-term follow-up, 1 and studies of specific cell types 18 ), the number of eligible individuals for whom relevant biological materials were stored in existing epidemiological cohorts were often limited, and it will be difficult to scale-up analyses to include the thousands of samples that may be required for establishing robust associations with disease phenotypes. Moving forward, it will be important to establish cause and effect in epigenetic epidemiology; disease-associated differentially methylated regions may arise prior to illness and contribute to the disease phenotype or could be a secondary effect of the disease process, or the medications used in treatment. 19 Furthermore, maximum information will be obtained from epidemiological studies that are able to integrate epigenomic information with genomic, transcriptomic and proteomic data obtained from the same samples. Whatever we do, it may never be enough to fully account for epigenetic differences between tissues and cells In many respects, large comprehensively phenotyped and longitudinally sampled epidemiological studies, like the 1958 British birth cohort used in the current study, are an ideal resource for epigenetic epidemiology. In nearly all of these studies, however, whole blood is the only biological material that has been archived. Blood is a heterogeneous tissue and any DNA methylation difference between groups could be confounded by differences in the cellular composition of whole blood samples, for example, resulting from the immune response to sub-clinical infection. The good news is that fewer than perhaps expected DNA methylation differences exist between leucocyte types, and controlling for cellular heterogeneity may be possible in biobanks with a simple blood cell count. 20 Whether the latter is sufficient (and under which circumstances it is not), however, remains to be established. Epigenomic studies of separate cell types such as those being undertaken by the NIH Epigenomic Roadmap Initiative and the European Union Blueprint consortium are currently generating reference epigenomes of haematopoietic cells that will be of great utility in this regard. 8 When moving beyond associations with environmental exposures to epigenetic associations with phenotypes, a key question for epigenetic epidemiology concerns the extent to which easily accessible peripheral tissues (such as blood) can be used to ask questions about inter-individual phenotypic variation manifest in inaccessible tissues such as the brain, visceral fat and other internal organs and tissues. Cross-tissue comparisons of the methylome within the same individual are currently underway to establish the relationship between epigenetic patterns in blood with other tissues. Although these analyses are crucial, the results may not be generally applicable; higher inter-tissue concordance may be present for DNA methylation changes induced early in development (and potentially propagated soma-wide) than for changes occur during ageing that are more likely to remain tissue specific. 19 , 21 Efforts to obtain biopsies (subcutaneous fat, muscle, etc.) and post-mortem material in subsets of longitudinal biobanks will greatly increase their value for epigenetic studies, despite the problems associated with cellular heterogeneity that also hold for such samples. We may be trying to detect inherently small effect sizes using these sub-optimal methods and sample cohorts The main findings in the current study concerned DNA methylation differences at three procadherin promoters. 4 The extent of the difference at these promoters was similar to those commonly observed in other recent studies, namely ~5%, 5 and was most apparent for a single, nominally statistically significant CG dinucleotide in each region. The biological implications of such small alterations in DNA methylation in terms of gene expression and function are unknown. Although DNA methylation is recognized as one of the most stable epigenetic marks, it is still relatively dynamic and this has important implications for epigenetic epidemiology. The randomness of maintaining and mitotically transmitting DNA methylation patterns may potentially dilute the putative epigenetic signatures of an adverse exposure early in life (e.g. to low SES in childhood) observed decades later. Of note, recent studies indicate that DNA methylation patterns in leucocytes undergo considerable changes during the first years of life. 22 Thus on top of the previously discussed question of whether DNA methylation at a specific locus actually influences transcriptional activity, researchers should also aim to establish whether the small DNA methylation differences often observed between groups—either expressed as absolute difference, relative difference or relative to the variation in the population—translate into differences in gene expression in the relevant tissue. It will be of particular interest to see whether the effects of such modest differences, while perhaps of little consequence individually, may shift transcription of a biological process or functional network when they co-occur with other changes to the methylome. 23 Little is known about the actual scale and extent of between-individual variation in DNA methylation across the genome. In this regard, public genome-scale resources need to be created that document inter-individual differences in DNA methylation and gene expression, in addition to the reference epigenomes that are currently being generated. We lack a framework for the analysis of genome-wide epigenetic data The results of GWASs are relatively easy to judge. Quality-control steps are well-defined and reported, individually testing every genetic variant [i.e. single nucleotide polymorphism (SNP)] is straightforward, and levels of genome-wide statistical significance are clear. For EWASs, the analytical methodology is very much under construction. For example, in the current study it was not possible to attain genome-wide levels of significance, which is acceptable for an exploratory study, but makes it difficult to fully interpret the reported differences. Because of the vast range of methods currently being used to assess DNA methylation, meta-analyses across different studies are difficult. The adoption of a common technology platform, such as the new Illumina 450 k Methylation Beadchip, across multiple studies would provide an excellent opportunity to converge on widely accepted guidelines for the analysis and integration of EWAS data. Apart from pre-processing procedures (quality control, normalization, handling different probe types, accounting for genetic variation, etc.), elements of these guidelines should deal with the analysis of individual CG dinucleotides vs groups of (correlated) adjacent CGs, the use of genome annotations in the analysis (histone states, promoter types, CG content, etc.), and levels of epigenome-wide significance for various analyses. An important aspect will be the exploration of the previously mentioned gene-set testing methods in the context of DNA methylation since they will be vital to obtain meaningful interpretations of genome-wide data in terms of underlying biological processes or genomic functions [e.g. promoters, enhancers, inter/intragenic CG island (shores), etc.]. For example, commonly used enrichment methods assume independence within a gene set and, apart from consistency in biological signal in a gene set, statistical significance may reflect consistency in other characteristics such as GC content, coverage or other sequence features. 24 Alternative implementations of gene-set testing methods include global testing approaches. 25 Finally, it will be important to adopt an integrative paradigm based on the combination of genetic and epigenetic epidemiological data. 26 Of particular relevance in this respect is evidence for the widespread occurrence of allele-specific DNA methylation (ASM) across the genome. Recent studies have shown that there are considerable inter-individual differences in ASM, which are frequently associated with genetic variation but can also be mediated by genomic imprinting (i.e. the parent-of-origin dependent silencing of expression by epigenetic mechanisms), environmental influences and apparently stochastic factors in the cell. 27 , 28 ASM can mask the effect of risk alleles by silencing their expression, and also provides a potential mechanism underlying gene–environment interactions. 26 Furthermore, ASM may contribute towards the apparent ‘missing heritability’ of many complex diseases and the low penetrance often reported for SNPs identified by GWASs. 29 We have to manage high expectations There is a considerable interest in epigenetic research in the popular press. The current study is a vivid illustration: even though the authors deem it preliminary, it was widely covered by the media. 30 Epigenetics should avoid some of the hype that surrounded the early days of genetic epidemiology. After the draft human genome sequence was announced in 2001, it was widely perceived that we would soon understand the causes of most common diseases and how to treat them. This expectation was not realistic, but not always renounced by geneticists. Currently, many scientists outside the field are disappointed by results of human genetics, and in particular GWASs, despite their overall considerable success. Genetic epidemiology has proven to be harder than expected despite the favourable starting point of thousands of Mendelian diseases and the high heritabilities associated with most traits to be explained. Very much like genetics, epigenetics will not be able to deliver the miracles it is sometimes claimed it will. In conclusion, epigenetic epidemiology is early in its development and susceptible to new ideas and approaches. Only a few years ago empirical papers were greatly outnumbered by reviews. Now, reference epigenomes are produced at great pace (see http://epigenomeatlas.org). 8 , 9 Moreover, furthered by pilot studies like the one from Nada Borghol et al., 4 the outline of the infrastructure required for EWASs is emerging. Crucial elements include optimal study designs, benchmarking technology and data analysis approaches that are statistically and biologically sound. An additional key aspect to the successful design and interpretation of epigenetic epidemiological studies will be the creation of public genome-scale resources focusing on inter-individual variation incorporating epigenomic, DNA sequence and transcriptomic data. Education, hard work and a certain degree of luck will get us there—not very different to the remedy against low SES. Funding NGI/NWO (#93518027, to B.T.H.); NGI/NWO-funded Netherlands Consortium for Healthy Ageing (NCHA) (#05060810, B.T.H.); NIH grant (AG036039, to J.M.).

          Related collections

          Most cited references21

          • Record: found
          • Abstract: found
          • Article: not found

          Principles and challenges of genomewide DNA methylation analysis.

          Methylation of cytosine bases in DNA provides a layer of epigenetic control in many eukaryotes that has important implications for normal biology and disease. Therefore, profiling DNA methylation across the genome is vital to understanding the influence of epigenetics. There has been a revolution in DNA methylation analysis technology over the past decade: analyses that previously were restricted to specific loci can now be performed on a genome-scale and entire methylomes can be characterized at single-base-pair resolution. However, there is such a diversity of DNA methylation profiling techniques that it can be challenging to select one. This Review discusses the different approaches and their relative merits and introduces considerations for data analysis.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found

            Epigenome-wide association studies for common human diseases.

            Despite the success of genome-wide association studies (GWASs) in identifying loci associated with common diseases, a substantial proportion of the causality remains unexplained. Recent advances in genomic technologies have placed us in a position to initiate large-scale studies of human disease-associated epigenetic variation, specifically variation in DNA methylation. Such epigenome-wide association studies (EWASs) present novel opportunities but also create new challenges that are not encountered in GWASs. We discuss EWAS design, cohort and sample selections, statistical significance and power, confounding factors and follow-up studies. We also discuss how integration of EWASs with GWASs can help to dissect complex GWAS haplotypes for functional analysis.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome.

              DNA methylation is the most studied epigenetic mark and CpG methylation is central to many biological processes and human diseases. Since cancer has highlighted the contribution to disease of aberrant DNA methylation patterns, such as the presence of promoter CpG island hypermethylation-associated silencing of tumor suppressor genes and global DNA hypomethylation defects, their importance will surely become apparent in other pathologies. However, advances in obtaining comprehensive DNA methylomes are hampered by the high cost and time-consuming aspects of the single nucleotide methods currently available for whole genome DNA methylation analyses. Following the success of the standard CpG methylation microarrays for 1,505 CpG sites and 27,000 CpG sites, we have validated in vivo the newly developed 450,000 (450K) cytosine microarray (Illumina). The 450K microarray includes CpG and CNG sites, CpG islands/shores/shelves/open sea, non-coding RNA (microRNAs and long non-coding RNAs) and sites surrounding the transcription start sites (-200 bp to -1,500 bp, 5'-UTRs and exons 1) for coding genes, but also for the corresponding gene bodies and 3'-UTRs, in addition to intergenic regions derived from GWAS studies. Herein, we demonstrate that the 450K DNA methylation array can consistently and significantly detect CpG methylation changes in the HCT-116 colorectal cancer cell line in comparison with normal colon mucosa or HCT-116 cells with defective DNA methyltransferases (DKO). The provided validation highlights the potential use of the 450K DNA methylation microarray as a useful tool for ongoing and newly designed epigenome projects.
                Bookmark

                Author and article information

                Journal
                Int J Epidemiol
                Int J Epidemiol
                ije
                intjepid
                International Journal of Epidemiology
                Oxford University Press
                0300-5771
                1464-3685
                February 2012
                23 January 2012
                23 January 2012
                : 41
                : 1
                : 74-78
                Affiliations
                1Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands, 2Netherlands Consortium for Healthy Ageing, Leiden, The Netherlands and 3Institute of Psychiatry, King's College London, London, UK
                Author notes
                *Corresponding author. Molecular Epidemiology, Leiden University Medical Center, Postal Zone S-5-P, PO Box 9600, 2300 RC, Leiden, The Netherlands. E-mail: bas.heijmans@ 123456lumc.nl
                Article
                dyr225
                10.1093/ije/dyr225
                3304528
                22269254
                eeea26a6-2aa4-43bf-a6b8-fb87fac87d42
                Published by Oxford University Press on behalf of the International Epidemiological Association © The Author 2012; all rights reserved.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 1 December 2011
                Page count
                Pages: 5
                Categories
                Epigenetic Epidemiology

                Public health
                Public health

                Comments

                Comment on this article