94
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Generalization and Dilution of Association Results from European GWAS in Populations of Non-European Ancestry: The PAGE Study

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          A multi-ethnic study demonstrates that the extrapolation of genetic disease risk models from European populations to other ethnicities is compromised more strongly by genetic structure than by environmental or global genetic background in differential genetic risk associations across ethnicities.

          Abstract

          The vast majority of genome-wide association study (GWAS) findings reported to date are from populations with European Ancestry (EA), and it is not yet clear how broadly the genetic associations described will generalize to populations of diverse ancestry. The Population Architecture Using Genomics and Epidemiology (PAGE) study is a consortium of multi-ancestry, population-based studies formed with the objective of refining our understanding of the genetic architecture of common traits emerging from GWAS. In the present analysis of five common diseases and traits, including body mass index, type 2 diabetes, and lipid levels, we compare direction and magnitude of effects for GWAS-identified variants in multiple non-EA populations against EA findings. We demonstrate that, in all populations analyzed, a significant majority of GWAS-identified variants have allelic associations in the same direction as in EA, with none showing a statistically significant effect in the opposite direction, after adjustment for multiple testing. However, 25% of tagSNPs identified in EA GWAS have significantly different effect sizes in at least one non-EA population, and these differential effects were most frequent in African Americans where all differential effects were diluted toward the null. We demonstrate that differential LD between tagSNPs and functional variants within populations contributes significantly to dilute effect sizes in this population. Although most variants identified from GWAS in EA populations generalize to all non-EA populations assessed, genetic models derived from GWAS findings in EA may generate spurious results in non-EA populations due to differential effect sizes. Regardless of the origin of the differential effects, caution should be exercised in applying any genetic risk prediction model based on tagSNPs outside of the ancestry group in which it was derived. Models based directly on functional variation may generalize more robustly, but the identification of functional variants remains challenging.

          Author Summary

          The number of known associations between human diseases and common genetic variants has grown dramatically in the past decade, most being identified in large-scale genetic studies of people of Western European origin. But because the frequencies of genetic variants can differ substantially between continental populations, it's important to assess how well these associations can be extended to populations with different continental ancestry. Are the correlations between genetic variants, disease endpoints, and risk factors consistent enough for genetic risk models to be reliably applied across different ancestries? Here we describe a systematic analysis of disease outcome and risk-factor–associated variants (tagSNPs) identified in European populations, in which we test whether the effect size of a tagSNP is consistent across six populations with significant non-European ancestry. We demonstrate that although nearly all such tagSNPs have effects in the same direction across all ancestries (i.e., variants associated with higher risk in Europeans will also be associated with higher risk in other populations), roughly a quarter of the variants tested have significantly different magnitude of effect (usually lower) in at least one non-European population. We therefore advise caution in the use of tagSNP-based genetic disease risk models in populations that have a different genetic ancestry from the population in which original associations were first made. We then show that this differential strength of association can be attributed to population-dependent variations in the correlation between tagSNPs and the variant that actually determines risk—the so-called functional variant. Risk models based on functional variants are therefore likely to be more robust than tagSNP-based models.

          Related collections

          Most cited references17

          • Record: found
          • Abstract: found
          • Article: not found

          Linkage disequilibrium in the human genome.

          With the availability of a dense genome-wide map of single nucleotide polymorphisms (SNPs), a central issue in human genetics is whether it is now possible to use linkage disequilibrium (LD) to map genes that cause disease. LD refers to correlations among neighbouring alleles, reflecting 'haplotypes' descended from single, ancestral chromosomes. The size of LD blocks has been the subject of considerable debate. Computer simulations and empirical data have suggested that LD extends only a few kilobases (kb) around common SNPs, whereas other data have suggested that it can extend much further, in some cases greater than 100 kb. It has been difficult to obtain a systematic picture of LD because past studies have been based on only a few (1-3) loci and different populations. Here, we report a large-scale experiment using a uniform protocol to examine 19 randomly selected genomic regions. LD in a United States population of north-European descent typically extends 60 kb from common alleles, implying that LD mapping is likely to be practical in this population. By contrast, LD in a Nigerian population extends markedly less far. The results illuminate human history, suggesting that LD in northern Europeans is shaped by a marked demographic event about 27,000-53,000 years ago.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium.

            Common genetic polymorphisms may explain a portion of the heritable risk for common diseases. Within candidate genes, the number of common polymorphisms is finite, but direct assay of all existing common polymorphism is inefficient, because genotypes at many of these sites are strongly correlated. Thus, it is not necessary to assay all common variants if the patterns of allelic association between common variants can be described. We have developed an algorithm to select the maximally informative set of common single-nucleotide polymorphisms (tagSNPs) to assay in candidate-gene association studies, such that all known common polymorphisms either are directly assayed or exceed a threshold level of association with a tagSNP. The algorithm is based on the r(2) linkage disequilibrium (LD) statistic, because r(2) is directly related to statistical power to detect disease associations with unassayed sites. We show that, at a relatively stringent r(2) threshold (r2>0.8), the LD-selected tagSNPs resolve >80% of all haplotypes across a set of 100 candidate genes, regardless of recombination, and tag specific haplotypes and clades of related haplotypes in nonrecombinant regions. Thus, if the patterns of common variation are described for a candidate gene, analysis of the tagSNP set can comprehensively interrogate for main effects from common functional variation. We demonstrate that, although common variation tends to be shared between populations, tagSNPs should be selected separately for populations with different ancestries.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Rare Variants Create Synthetic Genome-Wide Associations

              Introduction Efforts to fine map the causal variants responsible for genome-wide association studies (GWAS) signals have been largely predicated on the common disease common variant theory, postulating a common variant as the culprit for observed associations. This has led to extensive resequencing efforts that have been largely unsuccessful [1]–[5]. Here, we explore the possibility that part of the reason for this may be that the disease class causing an observed association may consist of multiple low-frequency variants across large regions of the genome—a phenomenon we call synthetic association. For convenience, these less common variants will be referred to here as “rare,” but we emphasize that we use this term loosely, only to refer to variants less common than those routinely studied in GWAS. The basic idea of how synthetic associations emerge in this model is illustrated in Figure 1, which shows how rare variants, by chance, can occur disproportionately in some parts of a gene genealogy. Any variant “higher up in the genealogy” that partitions those parts of the genealogy containing more disease variants than average will be identified as disease-associated. It is well appreciated that a noncausal variant will show association with a causal variant if the two are in strong linkage disequilibrium (LD). We use the previously introduced term synthetic association [6], however, to describe how such indirect association can occur between a common variant and at least one and possibly many rarer causal variants. Using the term synthetic as opposed to indirect emphasizes that the properties of the association signal are very different when the responsible variant or variants are much less frequent than the marker that carries the signal, as we detail below. 10.1371/journal.pbio.1000294.g001 Figure 1 Example genealogies showing causal variants and the strongest association for a common variant. (A) A genealogy with 10,000 original haplotypes was generated with 3,000 cases and 3,000 controls, genotype relative risk (γ) = 4, and nine causal variants. The branches containing the strongest synthetic association are indicated in blue. The branches containing the rare causal variants are in red. (B) A second genealogy was generated using the same parameters. These genealogies demonstrate two scenarios with genome-wide significant synthetic associations: the first (upper genealogy) had a high risk allele frequency (RAF = 0.49), and the second (lower genealogy) had a low RAF (0.08). To assess the tendency of rare disease-causing variants to create synthetic signals of association that are credited to single polymorphisms that are much more common in the population than the causal variants, we have simulated 10,000 haplotypes based on a coalescent model in a region either with or without recombination (Materials and Methods). We assumed that gene variants that influence disease have an allele frequency between 0.005 and 0.02, which is generally below the range of reliable detection (either by inclusion or indirect representation) using the genome-wide association platforms currently in use. We assumed a baseline probability of disease of φ for individuals with none of the rare genetic risk factors. The presence of at least one rare risk allele at the locus increased the probability of disease from φ to γ. We considered two values of φ (0.01, 0.1) and chose values of the penetrance γ such that the genotypic relative risk (GRR) of the rare causal variants varied incrementally between 2 and 6, where GRR is the ratio γ/φ. These values were chosen to explore the space around a GRR of 4, a threshold above which consistent linkage signals would be expected [7]. We simulated scenarios with one, three, five, seven, and nine rare causal variants. Results Across the conditions we have studied, not only is it possible to achieve genome-wide significance for common variants when one or more rare variants are the only contributors to disease, it is often the likely outcome (Figure 2). Overall, 30% of the simulations were able to detect an association with a common SNP at genome-wide significance (p 5%, Hardy-Weinberg equilibrium p-value >1×10−6, SNP call rate >95%), using the PLINK software [40]. For the sickle cell anemia GWAS, we compared 194 cases and 7,407 controls of inferred African ancestry via multidimensional scaling, with a genomic control inflation factor of 1.01. For hearing loss, we performed a GWAS on 418 cases and 6,892 control subjects, all of whom were of genetically inferred European ancestry via multidimensional scaling, with a genomic control inflation factor of 1.02.
                Bookmark

                Author and article information

                Contributors
                Role: Academic Editor
                Journal
                PLoS Biol
                PLoS Biol
                plos
                plosbiol
                PLoS Biology
                Public Library of Science (San Francisco, USA )
                1544-9173
                1545-7885
                September 2013
                September 2013
                17 September 2013
                : 11
                : 9
                : e1001661
                Affiliations
                [1 ]Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
                [2 ]Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America
                [3 ]Department of Epidemiology and Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, North Carolina, United States of America
                [4 ]Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris Comprehensive Cancer Center, Los Angeles, California, United States of America
                [5 ]Center for Child Health, Behavior, and Development, Seattle Children's Research Institute, Seattle, Washington, United States of America
                [6 ]Department of Statistics & Biostatistics, Rutgers University, Piscataway, New Jersey, United States of America
                [7 ]Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
                [8 ]Translational Genomics Research Institute, Phoenix, Arizona, United States of America
                [9 ]Department of Biology & Environmental Science at Heidelberg University, Tiffin, Ohio, United States of America
                [10 ]Department of Molecular Physiology and Biophysics, Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
                [11 ]Department of Family Medicine, Brown University, Pawtucket, Rhode Island, United States of America
                [12 ]Division of Biostatistics & Epidemiology, Department of Preventive Medicine, College of Medicine, The University of Tennessee Healthy Science Center, Memphis, Tennessee, United States of America
                [13 ]Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii, United States of America
                [14 ]Division of Genomic Medicine, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
                Georgia Institute of Technology, United States of America
                Author notes

                Membership of the PAGE study is provided in the Acknowledgments.

                ¶ Membership of the PAGE study is provided in the Acknowledgments.

                The authors have declared that no competing interests exist. The contents of this paper are solely the responsibility of the authors and do not necessarily represent the official views of the NIH, the Centers for Disease Control, the Indian Health Service, or any other funding agency.

                The author(s) have made the following declarations about their contributions: Conceived and designed the experiments: CSC KEN CLK DCC CAH FRS MDR UP LAH. Performed the experiments: AY KLS LD MDF DJD. Analyzed the data: NF SB CC LD MDF FRS. Contributed reagents/materials/analysis tools: FT PAGE Consortium. Wrote the paper: CSC CLK. Editorial feedback and revisions: KEN CAH MDF SB FRS UP NF MDR DJD CBE FT TCM GH LLM DCC LAH.

                Article
                PBIOLOGY-D-13-00491
                10.1371/journal.pbio.1001661
                3775722
                24068893
                a1a92c08-dbdc-4c7f-9fe6-e44c747104b8
                Copyright @ 2013

                This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

                History
                : 8 February 2013
                : 8 August 2013
                Page count
                Pages: 11
                Funding
                The Population Architecture Using Genomics and Epidemiology (PAGE) program is funded by the National Human Genome Research Institute (NHGRI). The data and materials included in this report result from a collaboration between the following studies: The “Epidemiologic Architecture for Genes Linked to Environment (EAGLE)” is funded through the NHGRI PAGE program (U01HG004798-01). Genotyping services for select NHANES III SNPs presented here were also provided by the Johns Hopkins University under federal contract number (N01-HV-48195) from NHLBI. The study participants derive from the National Health and Nutrition Examination Surveys (NHANES), and these studies are supported by the Centers for Disease Control and Prevention. The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the Centers for Disease Control and Prevention. The Multiethnic Cohort study (MEC) characterization of epidemiological architecture is funded through the NHGRI PAGE program (U01HG004802). The MEC study is funded through the National Cancer Institute (R37CA54281, R01 CA63, P01CA33619, U01CA136792, and U01CA98758). Funding support for the “Epidemiology of putative genetic variants: The Women's Health Initiative” study is provided through the NHGRI PAGE program (U01HG004790). The WHI program is funded by the National Heart, Lung, and Blood Institute; NIH; and US Department of Health and Human Services through contracts N01WH22110, 24152, 32100-2, 32105-6, 32108-9, 32111-13, 32115, 32118-32119, 32122, 42107-26, 42129-32, and 44221. Funding support for the Genetic Epidemiology of Causal Variants Across the Life Course (CALiCo) program was provided through the NHGRI PAGE program (U01HG004803). The following studies contributed to this manuscript and are funded by the following agencies: The Atherosclerosis Risk in Communities (ARIC) Study is carried out as a collaborative study supported by National Heart, Lung, and Blood Institute contracts N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022. The Coronary Artery Risk Development in Young Adults (CARDIA) study is supported by the following National Institutes of Health, National Heart, Lung and Blood Institute contracts: N01-HC-95095; N01-HC-48047; N01-HC-48048; N01-HC-48049; N01-HC-48050; N01-HC-45134; N01-HC-05187; and N01-HC-45205. The Cardiovascular Health Study (CHS) is supported by NHLBI contracts HHSN268201200036C, N01-HC-85239, N01-HC-85079 through N01-HC-85086; N01-HC-35129, N01 HC-15103, N01 HC-55222, N01-HC-75150, N01-HC-45133 and NHLBI grant HL080295, with additional contribution from NINDS. Additional support was provided through AG-023629, AG-15928, AG-20098, and AG-027058 from the NIA. See also https://chs-nhlbi.org/. CHS GWAS DNA handling and genotyping was supported in part by National Center for Research Resources grant M01-RR00425 to the Cedars-Sinai General Clinical Research Center Genotyping core and National Institute of Diabetes and Digestive and Kidney Diseases grant DK063491 to the Southern California Diabetes Endocrinology Research Center. The Strong Heart Study (SHS) is supported by NHLBI grants U01 HL65520, U01 HL41642, U01 HL41652, U01 HL41654, and U01 HL65521. The opinions expressed in this paper are those of the author(s) and do not necessarily reflect the views of the Indian Health Service. Assistance with phenotype harmonization, SNP selection and annotation, data cleaning, data management, integration and dissemination, and general study coordination was provided by the PAGE Coordinating Center (U01HG004801-01). The National Institutes of Mental Health also contributes to the support for the Coordinating Center. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article

                Life sciences
                Life sciences

                Comments

                Comment on this article