35
views
0
recommends
+1 Recommend
4 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found

      Will Big Data Close the Missing Heritability Gap?

      Read this article at

      ScienceOpenPublisherPMC
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Despite the important discoveries reported by genome-wide association (GWA) studies, for most traits and diseases the prediction R-squared (R-sq.) achieved with genetic scores remains considerably lower than the trait heritability. Modern biobanks will soon deliver unprecedentedly large biomedical data sets: Will the advent of big data close the gap between the trait heritability and the proportion of variance that can be explained by a genomic predictor? We addressed this question using Bayesian methods and a data analysis approach that produces a surface response relating prediction R-sq. with sample size and model complexity (e.g., number of SNPs). We applied the methodology to data from the interim release of the UK Biobank. Focusing on human height as a model trait and using 80,000 records for model training, we achieved a prediction R-sq. in testing (n = 22,221) of 0.24 (95% C.I.: 0.23-0.25). Our estimates show that prediction R-sq. increases with sample size, reaching an estimated plateau at values that ranged from 0.1 to 0.37 for models using 500 and 50,000 (GWA-selected) SNPs, respectively. Soon much larger data sets will become available. Using the estimated surface response, we forecast that larger sample sizes will lead to further improvements in prediction R-sq. We conclude that big data will lead to a substantial reduction of the gap between trait heritability and the proportion of interindividual differences that can be explained with a genomic predictor. However, even with the power of big data, for complex traits we anticipate that the gap between prediction R-sq. and trait heritability will not be fully closed.

          Most cited references22

          • Record: found
          • Abstract: found
          • Article: not found

          Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis.

          By combining genome-wide association data from 8,130 individuals with type 2 diabetes (T2D) and 38,987 controls of European descent and following up previously unidentified meta-analysis signals in a further 34,412 cases and 59,925 controls, we identified 12 new T2D association signals with combined P<5x10(-8). These include a second independent signal at the KCNQ1 locus; the first report, to our knowledge, of an X-chromosomal association (near DUSP9); and a further instance of overlap between loci implicated in monogenic and multifactorial forms of diabetes (at HNF1A). The identified loci affect both beta-cell function and insulin action, and, overall, T2D association signals show evidence of enrichment for genes involved in cell cycle regulation. We also show that a high proportion of T2D susceptibility loci harbor independent association signals influencing apparently unrelated complex traits.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Accuracy of Predicting the Genetic Risk of Disease Using a Genome-Wide Approach

            Background The prediction of the genetic disease risk of an individual is a powerful public health tool. While predicting risk has been successful in diseases which follow simple Mendelian inheritance, it has proven challenging in complex diseases for which a large number of loci contribute to the genetic variance. The large numbers of single nucleotide polymorphisms now available provide new opportunities for predicting genetic risk of complex diseases with high accuracy. Methodology/Principal Findings We have derived simple deterministic formulae to predict the accuracy of predicted genetic risk from population or case control studies using a genome-wide approach and assuming a dichotomous disease phenotype with an underlying continuous liability. We show that the prediction equations are special cases of the more general problem of predicting the accuracy of estimates of genetic values of a continuous phenotype. Our predictive equations are responsive to all parameters that affect accuracy and they are independent of allele frequency and effect distributions. Deterministic prediction errors when tested by simulation were generally small. The common link among the expressions for accuracy is that they are best summarized as the product of the ratio of number of phenotypic records per number of risk loci and the observed heritability. Conclusions/Significance This study advances the understanding of the relative power of case control and population studies of disease. The predictions represent an upper bound of accuracy which may be achievable with improved effect estimation methods. The formulae derived will help researchers determine an appropriate sample size to attain a certain accuracy when predicting genetic risk.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Genomics for the world.

                Bookmark

                Author and article information

                Journal
                Genetics
                Genetics
                Genetics Society of America
                1943-2631
                0016-6731
                November 2017
                : 207
                : 3
                Affiliations
                [1 ] Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan 48824.
                [2 ] Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, Michigan 48824.
                [3 ] Department of Physics and Astronomy, Michigan State University, East Lansing, Michigan 48824.
                [4 ] Vice President for Research and Graduate Studies, Michigan State University, East Lansing, Michigan 48824.
                [5 ] Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan 48824 gustavoc@msu.edu.
                [6 ] Department of Statistics and Probability, Michigan State University, East Lansing, Michigan 48824.
                Article
                genetics.117.300271
                10.1534/genetics.117.300271
                5676235
                28893854
                007cebc1-be54-4879-b063-b69814b598e3
                Copyright © 2017 by the Genetics Society of America.
                History

                BGLR,Bayesian,GenPred,Genomic Selection,Shared Data Resources,UK Biobank,big data,genomic prediction,prediction of complex traits,whole-genome regressions

                Comments

                Comment on this article