12
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Significant variation in the performance of DNA methylation predictors across data preprocessing and normalization strategies

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          DNA methylation (DNAm)-based predictors hold great promise to serve as clinical tools for health interventions and disease management. While these algorithms often have high prediction accuracy, the consistency of their performance remains to be determined. We therefore conduct a systematic evaluation across 101 different DNAm data preprocessing and normalization strategies and assess how each analytical strategy affects the consistency of 41 DNAm-based predictors.

          Results

          Our analyses are conducted in a large EPIC DNAm array dataset from the Jackson Heart Study ( N = 2053) that included 146 pairs of technical replicate samples. By estimating the average absolute agreement between replicate pairs, we show that 32 out of 41 predictors (78%) demonstrate excellent consistency when appropriate data processing and normalization steps are implemented. Across all pairs of predictors, we find a moderate correlation in performance across analytical strategies (mean rho = 0.40, SD = 0.27), highlighting significant heterogeneity in performance across algorithms. Successful or unsuccessful removal of technical variation furthermore significantly impacts downstream phenotypic association analysis, such as all-cause mortality risk associations.

          Conclusions

          We show that DNAm-based algorithms are sensitive to technical variation. The right choice of data processing strategy is important to achieve reproducible estimates and improve prediction accuracy in downstream phenotypic association analyses. For each of the 41 DNAm predictors, we report its degree of consistency and provide the best performing analytical strategy as a guideline for the research community. As DNAm-based predictors become more and more widely used, our work helps improve their performance and standardize their implementation.

          Supplementary Information

          The online version contains supplementary material available at 10.1186/s13059-022-02793-w.

          Related collections

          Most cited references43

          • Record: found
          • Abstract: found
          • Article: not found

          A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research.

          Intraclass correlation coefficient (ICC) is a widely used reliability index in test-retest, intrarater, and interrater reliability analyses. This article introduces the basic concept of ICC in the content of reliability analysis.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            DNA methylation age of human tissues and cell types

            Background It is not yet known whether DNA methylation levels can be used to accurately predict age across a broad spectrum of human tissues and cell types, nor whether the resulting age prediction is a biologically meaningful measure. Results I developed a multi-tissue predictor of age that allows one to estimate the DNA methylation age of most tissues and cell types. The predictor, which is freely available, was developed using 8,000 samples from 82 Illumina DNA methylation array datasets, encompassing 51 healthy tissues and cell types. I found that DNA methylation age has the following properties: first, it is close to zero for embryonic and induced pluripotent stem cells; second, it correlates with cell passage number; third, it gives rise to a highly heritable measure of age acceleration; and, fourth, it is applicable to chimpanzee tissues. Analysis of 6,000 cancer samples from 32 datasets showed that all of the considered 20 cancer types exhibit significant age acceleration, with an average of 36 years. Low age-acceleration of cancer tissue is associated with a high number of somatic mutations and TP53 mutations, while mutations in steroid receptors greatly accelerate DNA methylation age in breast cancer. Finally, I characterize the 353 CpG sites that together form an aging clock in terms of chromatin states and tissue variance. Conclusions I propose that DNA methylation age measures the cumulative effect of an epigenetic maintenance system. This novel epigenetic clock can be used to address a host of questions in developmental biology, cancer and aging research.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays.

              The recently released Infinium HumanMethylation450 array (the '450k' array) provides a high-throughput assay to quantify DNA methylation (DNAm) at ∼450 000 loci across a range of genomic features. Although less comprehensive than high-throughput sequencing-based techniques, this product is more cost-effective and promises to be the most widely used DNAm high-throughput measurement technology over the next several years. Here we describe a suite of computational tools that incorporate state-of-the-art statistical techniques for the analysis of DNAm data. The software is structured to easily adapt to future versions of the technology. We include methods for preprocessing, quality assessment and detection of differentially methylated regions from the kilobase to the megabase scale. We show how our software provides a powerful and flexible development platform for future methods. We also illustrate how our methods empower the technology to make discoveries previously thought to be possible only with sequencing-based methods. http://bioconductor.org/packages/release/bioc/html/minfi.html. khansen@jhsph.edu; rafa@jimmy.harvard.edu Supplementary data are available at Bioinformatics online.
                Bookmark

                Author and article information

                Contributors
                anilori.contact@gmail.com
                ophoff@ucla.edu
                Journal
                Genome Biol
                Genome Biol
                Genome Biology
                BioMed Central (London )
                1474-7596
                1474-760X
                24 October 2022
                24 October 2022
                2022
                : 23
                : 225
                Affiliations
                [1 ]GRID grid.19006.3e, ISNI 0000 0000 9632 6718, Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, , University of California Los Angeles, ; 695 Charles E. Young Drive South, Los Angeles, CA 90095-176 USA
                [2 ]GRID grid.19006.3e, ISNI 0000 0000 9632 6718, Department of Human Genetics, David Geffen School of Medicine, , University of California Los Angeles, ; Los Angeles, CA USA
                [3 ]GRID grid.19006.3e, ISNI 0000 0000 9632 6718, Department of Biostatistics, Fielding School of Public Health, , University of California Los Angeles, ; Los Angeles, CA USA
                [4 ]GRID grid.5645.2, ISNI 000000040459992X, Department of Psychiatry, , Erasmus University Medical Center, ; Rotterdam, The Netherlands
                Author information
                http://orcid.org/0000-0003-0579-0905
                Article
                2793
                10.1186/s13059-022-02793-w
                9590227
                36280888
                96e0be88-abc2-4469-a8be-1f0090b0a8c1
                © The Author(s) 2022

                Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                History
                : 8 October 2021
                : 11 October 2022
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100000009, Foundation for the National Institutes of Health;
                Award ID: 1U01AG060908-01
                Award Recipient :
                Categories
                Research
                Custom metadata
                © The Author(s) 2022

                Genetics
                dna methylation,infinium methylationepic array,dnam predictors,consistency,replicability,biomarkers,jackson heart study

                Comments

                Comment on this article