32
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Gene, Environment and Methylation (GEM): a tool suite to efficiently navigate large scale epigenome wide association studies and integrate genotype and interaction between genotype and environment

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          The interplay among genetic, environment and epigenetic variation is not fully understood. Advances in high-throughput genotyping methods, high-density DNA methylation detection and well-characterized sample collections, enable epigenetic association studies at the genomic and population levels (EWAS). The field has extended to interrogate the interaction of environmental and genetic (GxE) influences on epigenetic variation. Also, the detection of methylation quantitative trait loci (methQTLs) and their association with health status has enhanced our knowledge of epigenetic mechanisms in disease trajectory. However analysis of this type of data brings computational challenges and there are few practical solutions to enable large scale studies in standard computational environments.

          Results

          GEM is a highly efficient R tool suite for performing epigenome wide association studies (EWAS). GEM provides three major functions named GEM_Emodel, GEM_Gmodel and GEM_GxEmodel to study the interplay of Gene, Environment and Methylation (GEM). Within GEM, the pre-existing “Matrix eQTL” package is utilized and extended to study methylation quantitative trait loci (methQTL) and the interaction of genotype and environment (GxE) to determine DNA methylation variation, using matrix based iterative correlation and memory-efficient data analysis. Benchmarking presented here on a publicly available dataset, demonstrated that GEM can facilitate reliable genome-wide methQTL and GxE analysis on a standard laptop computer within minutes.

          Conclusions

          The GEM package facilitates efficient EWAS study in large cohorts. It is written in R code and can be freely downloaded from Bioconductor at https://www.bioconductor.org/packages/GEM/.

          Related collections

          Most cited references10

          • Record: found
          • Abstract: found
          • Article: not found

          High density DNA methylation array with single CpG site resolution.

          We have developed a new generation of genome-wide DNA methylation BeadChip which allows high-throughput methylation profiling of the human genome. The new high density BeadChip can assay over 480K CpG sites and analyze twelve samples in parallel. The innovative content includes coverage of 99% of RefSeq genes with multiple probes per gene, 96% of CpG islands from the UCSC database, CpG island shores and additional content selected from whole-genome bisulfite sequencing data and input from DNA methylation experts. The well-characterized Infinium® Assay is used for analysis of CpG methylation using bisulfite-converted genomic DNA. We applied this technology to analyze DNA methylation in normal and tumor DNA samples and compared results with whole-genome bisulfite sequencing (WGBS) data obtained for the same samples. Highly comparable DNA methylation profiles were generated by the array and sequencing methods (average R2 of 0.95). The ability to determine genome-wide methylation patterns will rapidly advance methylation research. Copyright © 2011 Elsevier Inc. All rights reserved.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found

            Epigenome-wide association studies for common human diseases.

            Despite the success of genome-wide association studies (GWASs) in identifying loci associated with common diseases, a substantial proportion of the causality remains unexplained. Recent advances in genomic technologies have placed us in a position to initiate large-scale studies of human disease-associated epigenetic variation, specifically variation in DNA methylation. Such epigenome-wide association studies (EWASs) present novel opportunities but also create new challenges that are not encountered in GWASs. We discuss EWAS design, cohort and sample selections, statistical significance and power, confounding factors and follow-up studies. We also discuss how integration of EWASs with GWASs can help to dissect complex GWAS haplotypes for functional analysis.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Commentary: The seven plagues of epigenetic epidemiology

              Epigenetics is being increasingly combined with epidemiology to add mechanistic understanding to associations observed between environmental, genetic and stochastic factors and human disease phenotypes. Currently, epigenetic epidemiological studies primarily focus on exploring if and where the epigenome (i.e. the overall epigenetic state of a cell) is influenced by specific environmental exposures like prenatal nutrition, 1 sun exposure 2 and smoking. 3 In this issue of the IJE, Nada Borghol et al. 4 report an association between childhood social-economic status (SES) and differential DNA methylation in adulthood. Low SES may integrate diverse and heterogeneous environmental influences, and knowing which epigenetic changes are associated with low SES may provide clues about the biological processes underlying its health consequences. The authors stress that their study is preliminary. This statement is, in fact, to a greater or lesser extent applicable to the entire first wave of studies currently being published that likewise aim to discover associations between epigenetic variation measured on a genome-wide scale and environmental exposures or disease phenotypes. When executing such epigenome-wide association studies (EWASs), 5 every epigenetic epidemiologist is struggling with the same biological, technical and methodological issues. It is important to take these into consideration when designing a study and interpreting the results. Let us consider seven of those issues, taking the current study on SES as a starting point. We do not really know where to look, or what to look for Most epigenetic epidemiological studies focus on DNA methylation for various practical and biological reasons, neglecting other layers of the epigenome-like histone modifications that are also likely to be important in influencing disease phenotypes. Our basic understanding of the methylome (i.e. the whole of DNA methylation marks on the genome) is in its infancy, and we are still learning about the specific localization of the features that, when differentially methylated, regulate gene expression and are thus relevant for epigenetic epidemiologists to study. The current study, like many others, evaluated promoter regions, in this case defined as 1000 bp upstream to 250 bp downstream of transcription start sites. Although these features are often enriched for DNA methylation marks influencing the expression of genes, recent work suggests that other regions of the methylome outside of promoters, including inter-genic CpG island shores 6 and intra-genic CpG islands, 7 may ultimately be more important for regulating phenotypic variation. For any differentially methylated region identified in EWASs it will be important to demonstrate functionality. Promoter methylation in the current study was integrated with public gene expression data and, as expected, highly expressed genes were more commonly flanked by less methylated promoters and vice versa. A limitation is that this observation is for groups of promoters, whereas information is needed about this relationship for individual promoters. Mining the reference epigenomes and transcriptomes that are being generated for different cell types under the umbrella of initiatives such as the National Institutes of Health (NIH) Epigenomics Roadmap 8 and the International Human Epigenome Consortium 9 may contribute to such information. Additional in vitro experiments will be required to evaluate the transcriptional effects of differential DNA methylation at a specific locus independent of its genomic context. 10 We have to rely on imperfect technology The good news is that recent advances in genomic technology mean that genome-scale studies of DNA methylation across multiple samples are now feasible. In practice, however, one has to compromise between coverage and precision in epidemiological studies, which likely incorporate a large number of samples. A large (and growing) number of methods exist for assessing DNA methylation both genome wide and at specific CpG sites, 11 and one problem relates to our inability to compare results across studies that have used different platforms. On the one hand there are methods such as that used in the current study in which the methylated portion of the genome is captured using antibodies against methylated DNA and subsequently quantified using microarrays or next-generation sequencing. These approaches can provide coverage across most of the genome and may be optimally suited to discriminate low from high methylation, but have lower reliability for smaller differences and are biased by factors such as CG density. 12 , 13 On the other hand, there are methods based on the bisulphite conversion of DNA combined with next-generation sequencing that provide higher accuracy and single nucleotide resolution. Although whole-genome bisulphite sequencing is currently unfeasible to use across large epidemiological cohorts, the method can be adapted to target a reduced representation of the genome (approximately 3 million out of approximately 28 million CG dinucleotides in the human genome). 12 , 13 The recently launched Illumina 450 k Methylation Beadchip may offer a balance between coverage and precision, which will be attractive for epidemiological EWASs executed during the next few years. 5 It interrogates DNA methylation at over 480 000 CG dinucleotides, is high-throughput and relatively affordable. The precision of this platform appears to compare well with some of the other platforms, 12 , 13 but these results should be interpreted with caution. Although correlation coefficients reported across the various platform comparisons are high, they are mainly driven by the fact that the large majority of the genome is either unmethylated or fully methylated, and substantial discrepancies between platforms may exist for intermediate level methylation. 12 , 14 Therefore, the technological validation of findings using an independent method remains important. This will be feasible for a small number of ‘top hits’, like the three procadherin promoters assessed in the current study. However, validating the outcomes of the complex pathway analyses performed to implicate either entire biological processes (such as extra- and intra-cellular signalling in the current study) or genomic features with a specific function in gene regulation [e.g. promoters, enhancers, inter/intragenic CG island (shores) etc.], is more demanding and currently not realized. Validating the results of such gene-set testing methods will entail the re-assessment of DNA methylation across large sets of loci. We may be limited by available sample sizes that are optimal for epigenetic epidemiology The current study investigated only 40 individuals. Investigators will be able to secure budgets for larger studies as empirical data increasingly highlight the value of epigenetic epidemiology, and high-throughput, economical laboratory approaches become more widely adopted. Nevertheless, it is unlikely that the simple brute-force approach that has been used relatively successfully in genome-wide association studies (GWASs) is valid for EWASs. In genetics, many of the epidemiological principles about designing studies with respect to selection biases, confounding, batch effects and appropriateness of controls could largely be replaced by the simple rule ‘bigger-is-better’. This is not true for epigenetic epidemiology, because the epigenome is not a static entity like the genome, which necessitates the use of more conventional epidemiological approaches. 15 Further complicating matters is the fact that, for the most powerful study designs in epigenetic epidemiology (including studies of discordant monozygotic twins 16 particularly when longitudinally sampled, 17 early exposure studies with long-term follow-up, 1 and studies of specific cell types 18 ), the number of eligible individuals for whom relevant biological materials were stored in existing epidemiological cohorts were often limited, and it will be difficult to scale-up analyses to include the thousands of samples that may be required for establishing robust associations with disease phenotypes. Moving forward, it will be important to establish cause and effect in epigenetic epidemiology; disease-associated differentially methylated regions may arise prior to illness and contribute to the disease phenotype or could be a secondary effect of the disease process, or the medications used in treatment. 19 Furthermore, maximum information will be obtained from epidemiological studies that are able to integrate epigenomic information with genomic, transcriptomic and proteomic data obtained from the same samples. Whatever we do, it may never be enough to fully account for epigenetic differences between tissues and cells In many respects, large comprehensively phenotyped and longitudinally sampled epidemiological studies, like the 1958 British birth cohort used in the current study, are an ideal resource for epigenetic epidemiology. In nearly all of these studies, however, whole blood is the only biological material that has been archived. Blood is a heterogeneous tissue and any DNA methylation difference between groups could be confounded by differences in the cellular composition of whole blood samples, for example, resulting from the immune response to sub-clinical infection. The good news is that fewer than perhaps expected DNA methylation differences exist between leucocyte types, and controlling for cellular heterogeneity may be possible in biobanks with a simple blood cell count. 20 Whether the latter is sufficient (and under which circumstances it is not), however, remains to be established. Epigenomic studies of separate cell types such as those being undertaken by the NIH Epigenomic Roadmap Initiative and the European Union Blueprint consortium are currently generating reference epigenomes of haematopoietic cells that will be of great utility in this regard. 8 When moving beyond associations with environmental exposures to epigenetic associations with phenotypes, a key question for epigenetic epidemiology concerns the extent to which easily accessible peripheral tissues (such as blood) can be used to ask questions about inter-individual phenotypic variation manifest in inaccessible tissues such as the brain, visceral fat and other internal organs and tissues. Cross-tissue comparisons of the methylome within the same individual are currently underway to establish the relationship between epigenetic patterns in blood with other tissues. Although these analyses are crucial, the results may not be generally applicable; higher inter-tissue concordance may be present for DNA methylation changes induced early in development (and potentially propagated soma-wide) than for changes occur during ageing that are more likely to remain tissue specific. 19 , 21 Efforts to obtain biopsies (subcutaneous fat, muscle, etc.) and post-mortem material in subsets of longitudinal biobanks will greatly increase their value for epigenetic studies, despite the problems associated with cellular heterogeneity that also hold for such samples. We may be trying to detect inherently small effect sizes using these sub-optimal methods and sample cohorts The main findings in the current study concerned DNA methylation differences at three procadherin promoters. 4 The extent of the difference at these promoters was similar to those commonly observed in other recent studies, namely ~5%, 5 and was most apparent for a single, nominally statistically significant CG dinucleotide in each region. The biological implications of such small alterations in DNA methylation in terms of gene expression and function are unknown. Although DNA methylation is recognized as one of the most stable epigenetic marks, it is still relatively dynamic and this has important implications for epigenetic epidemiology. The randomness of maintaining and mitotically transmitting DNA methylation patterns may potentially dilute the putative epigenetic signatures of an adverse exposure early in life (e.g. to low SES in childhood) observed decades later. Of note, recent studies indicate that DNA methylation patterns in leucocytes undergo considerable changes during the first years of life. 22 Thus on top of the previously discussed question of whether DNA methylation at a specific locus actually influences transcriptional activity, researchers should also aim to establish whether the small DNA methylation differences often observed between groups—either expressed as absolute difference, relative difference or relative to the variation in the population—translate into differences in gene expression in the relevant tissue. It will be of particular interest to see whether the effects of such modest differences, while perhaps of little consequence individually, may shift transcription of a biological process or functional network when they co-occur with other changes to the methylome. 23 Little is known about the actual scale and extent of between-individual variation in DNA methylation across the genome. In this regard, public genome-scale resources need to be created that document inter-individual differences in DNA methylation and gene expression, in addition to the reference epigenomes that are currently being generated. We lack a framework for the analysis of genome-wide epigenetic data The results of GWASs are relatively easy to judge. Quality-control steps are well-defined and reported, individually testing every genetic variant [i.e. single nucleotide polymorphism (SNP)] is straightforward, and levels of genome-wide statistical significance are clear. For EWASs, the analytical methodology is very much under construction. For example, in the current study it was not possible to attain genome-wide levels of significance, which is acceptable for an exploratory study, but makes it difficult to fully interpret the reported differences. Because of the vast range of methods currently being used to assess DNA methylation, meta-analyses across different studies are difficult. The adoption of a common technology platform, such as the new Illumina 450 k Methylation Beadchip, across multiple studies would provide an excellent opportunity to converge on widely accepted guidelines for the analysis and integration of EWAS data. Apart from pre-processing procedures (quality control, normalization, handling different probe types, accounting for genetic variation, etc.), elements of these guidelines should deal with the analysis of individual CG dinucleotides vs groups of (correlated) adjacent CGs, the use of genome annotations in the analysis (histone states, promoter types, CG content, etc.), and levels of epigenome-wide significance for various analyses. An important aspect will be the exploration of the previously mentioned gene-set testing methods in the context of DNA methylation since they will be vital to obtain meaningful interpretations of genome-wide data in terms of underlying biological processes or genomic functions [e.g. promoters, enhancers, inter/intragenic CG island (shores), etc.]. For example, commonly used enrichment methods assume independence within a gene set and, apart from consistency in biological signal in a gene set, statistical significance may reflect consistency in other characteristics such as GC content, coverage or other sequence features. 24 Alternative implementations of gene-set testing methods include global testing approaches. 25 Finally, it will be important to adopt an integrative paradigm based on the combination of genetic and epigenetic epidemiological data. 26 Of particular relevance in this respect is evidence for the widespread occurrence of allele-specific DNA methylation (ASM) across the genome. Recent studies have shown that there are considerable inter-individual differences in ASM, which are frequently associated with genetic variation but can also be mediated by genomic imprinting (i.e. the parent-of-origin dependent silencing of expression by epigenetic mechanisms), environmental influences and apparently stochastic factors in the cell. 27 , 28 ASM can mask the effect of risk alleles by silencing their expression, and also provides a potential mechanism underlying gene–environment interactions. 26 Furthermore, ASM may contribute towards the apparent ‘missing heritability’ of many complex diseases and the low penetrance often reported for SNPs identified by GWASs. 29 We have to manage high expectations There is a considerable interest in epigenetic research in the popular press. The current study is a vivid illustration: even though the authors deem it preliminary, it was widely covered by the media. 30 Epigenetics should avoid some of the hype that surrounded the early days of genetic epidemiology. After the draft human genome sequence was announced in 2001, it was widely perceived that we would soon understand the causes of most common diseases and how to treat them. This expectation was not realistic, but not always renounced by geneticists. Currently, many scientists outside the field are disappointed by results of human genetics, and in particular GWASs, despite their overall considerable success. Genetic epidemiology has proven to be harder than expected despite the favourable starting point of thousands of Mendelian diseases and the high heritabilities associated with most traits to be explained. Very much like genetics, epigenetics will not be able to deliver the miracles it is sometimes claimed it will. In conclusion, epigenetic epidemiology is early in its development and susceptible to new ideas and approaches. Only a few years ago empirical papers were greatly outnumbered by reviews. Now, reference epigenomes are produced at great pace (see http://epigenomeatlas.org). 8 , 9 Moreover, furthered by pilot studies like the one from Nada Borghol et al., 4 the outline of the infrastructure required for EWASs is emerging. Crucial elements include optimal study designs, benchmarking technology and data analysis approaches that are statistically and biologically sound. An additional key aspect to the successful design and interpretation of epigenetic epidemiological studies will be the creation of public genome-scale resources focusing on inter-individual variation incorporating epigenomic, DNA sequence and transcriptomic data. Education, hard work and a certain degree of luck will get us there—not very different to the remedy against low SES. Funding NGI/NWO (#93518027, to B.T.H.); NGI/NWO-funded Netherlands Consortium for Healthy Ageing (NCHA) (#05060810, B.T.H.); NIH grant (AG036039, to J.M.).
                Bookmark

                Author and article information

                Contributors
                ASCKKWOH@ntu.edu.sg
                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                2 August 2016
                2 August 2016
                2016
                : 17
                : 299
                Affiliations
                [1 ]Singapore Institute for Clinical Sciences (SICS), Agency for Science Technology and Research (A*STAR), Singapore, 117609 Singapore
                [2 ]School of Computer Science and Engineering, Nanyang Technological University (NTU), Singapore, 639798 Singapore
                [3 ]Yong Loo Lin School of Medicine, National University of Singapore (NUS), Singapore, 119228 Singapore
                Article
                1161
                10.1186/s12859-016-1161-z
                4970299
                27480116
                f9e699e6-5e0a-4f47-95df-4997664f0832
                © The Author(s). 2016

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 19 March 2016
                : 21 July 2016
                Funding
                Funded by: M4020242 ARC6/15-KWOH CHEE KEONG
                Categories
                Software
                Custom metadata
                © The Author(s) 2016

                Bioinformatics & Computational biology
                matrix operation,ewas,methqtl,gxe
                Bioinformatics & Computational biology
                matrix operation, ewas, methqtl, gxe

                Comments

                Comment on this article