There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.
Abstract
Background
Heritability estimates have revealed an important contribution of SNP variants for
most common traits; however, SNP analysis by single-trait genome-wide association
studies (GWAS) has failed to uncover their impact. In this study, we applied a multitrait
GWAS approach to discover additional factor of the missing heritability of human anthropometric
variation.
Methods
We analysed 205 traits, including diseases identified at baseline in the GCAT cohort
(Genomes For Life- Cohort study of the Genomes of Catalonia) (n=4988), a Mediterranean
adult population-based cohort study from the south of Europe. We estimated SNP heritability
contribution and single-trait GWAS for all traits from 15 million SNP variants. Then,
we applied a multitrait-related approach to study genome-wide association to anthropometric
measures in a two-stage meta-analysis with the UK Biobank cohort (n=336 107).
Results
Heritability estimates (eg, skin colour, alcohol consumption, smoking habit, body
mass index, educational level or height) revealed an important contribution of SNP
variants, ranging from 18% to 77%. Single-trait analysis identified 1785 SNPs with
genome-wide significance threshold. From these, several previously reported single-trait
hits were confirmed in our sample with
LINC01432 (p=1.9×10
−9) variants associated with male baldness,
LDLR variants with hyperlipidaemia (ICD-9:272) (p=9.4×10
−10) and variants in
IRF4 (p=2.8×10
−57),
SLC45A2 (p=2.2×10
−130),
HERC2 (p=2.8×10
−176),
OCA2 (p=2.4×10
−121) and
MC1R (p=7.7×10
−22) associated with hair, eye and skin colour, freckling, tanning capacity and sun burning
sensitivity and the Fitzpatrick phototype score, all highly correlated cross-phenotypes.
Multitrait meta-analysis of anthropometric variation validated 27 loci in a two-stage
meta-analysis with a large British ancestry cohort, six of which are newly reported
here (p value threshold <5×10
−9) at
ZRANB2-AS2,
PIK3R1,
EPHA7,
MAD1L1,
CACUL1 and
MAP3K9.
Conclusion
Considering multiple-related genetic phenotypes improve associated genome signal detection.
These results indicate the potential value of data-driven multivariate phenotyping
for genetic studies in large population-based cohorts to contribute to knowledge of
complex traits.
Introduction The human immune-mediated diseases are the result of aberrant immune responses. These immune responses may lead to chronic inflammation and tissue destruction, often targeting a specific organ site. The outcome of this process is immune-mediated inflammatory and autoimmune disease, affecting approximately 5% of the population [1]. Extensive clinical and epidemiologic observations have shown that immune-mediated inflammatory and autoimmune diseases can occur either in the same individual or in closely related family members. This clustering of multiple diseases appears more frequently than expected if disease processes were independent. As each of the immune-mediated inflammatory and autoimmune diseases has strong genetic influences on disease risk [2]–[7], the observed clustering of multiple diseases could be due to an overlap in the causal genes and pathways [8], [9]. The patterns of clustering of diseases across the population are complex [10] – each disease has a prevalence between 0.01%–3%, so direct assessment of co-aggregation within individuals or families does not result in the very large samples required for genetic or epidemiological investigation. Thus it is unsurprising that to date, these observations have yet to be translated into determinants of the shared molecular etiologies of disease. Recent GWA studies in immune-mediated and autoimmune diseases have identified 140 regions of the genome with statistically significant and robust evidence of presence of disease susceptibility loci. A subset of these loci have been shown to modulate risk of multiple diseases [3], [6], [11]–[14]. In addition, there is evidence that loci predisposing to one disease can have effects on risk of a second disease [15], although the risk allele for one disease may not be the same as for the second [16]. Together, these observations support the hypothesis of a common genetic basis of immune-mediated and autoimmune diseases [17]. There is now the ability to estimate both the number of loci contributing to risk of multiple diseases and the spectrum of diseases that each locus influences. In addition, grouping variants by the diseases they influence should provide insight into the specific biological processes underlying co-morbidity and disease risk. In this report, we systematically investigate the genetic commonality in immune-mediated inflammatory and autoimmune diseases by examining the contributions of associated genomic risk regions in seven diseases: celiac disease (CeD), Crohn's disease (CD), multiple sclerosis (MS), psoriasis (Ps), rheumatoid arthritis (RA), systemic lupus erythematosus (SLE) and type 1 diabetes (T1D). We find that nearly half of loci identified in GWAS studies of an individual disease influence risk to at least two diseases, arguing for a genetic basis to co-morbidity. We also find several variants with opposing risk profiles in different diseases. Supporting the idea that common patterns of association implicate shared biological processes, we further demonstrate that loci clustered by the pattern of diseases they affect harbor genes encoding interacting proteins at a much higher rate than by chance. These results suggest that multi-phenotype mapping will identify the molecular mechanisms underlying co-morbid immune-mediated inflammatory and autoimmune diseases. Results We first test our hypothesis of common genetic determinants by examining evidence of association of genetic variants in known immune-mediated and autoimmune disease susceptibility loci to multiple disease phenotypes. We collated a list of 140 single nucleotide polymorphisms (SNPs) representing reported associations to at least one immune-mediated disease at genome-wide significance levels. Where data for the reported SNP itself were not available in our GWA studies (Table 1), we chose a proxy in high linkage disequilibrium to the reported marker (r2 >0.9 in HapMap/CEU). We did not consider SNPs in the human Major Histocompatibility Complex (MHC) from this analysis, as its role in many of these diseases is well-established and the classically associated alleles in the HLA region are not well captured by SNPs [18]. We were able to acquire data for either the reported SNP or a good proxy in 107 of 140 cases, and assembled genotype test summaries for these from previously described GWA studies representing over 26,000 disease cases (Table 1). 10.1371/journal.pgen.1002254.t001 Table 1 Participating studies. Disease Cases Controls Reference Celiac disease 3796 8154 22 Crohn's disease 3230 4829 1 Multiple sclerosis 2624 7220 4 Psoriasis 1359 1400 5 Rheumatoid arthritis 5539 20169 6 Systemic Lupus Erythematosus 1963 4329 23 Type 1 diabetes 7514 9045 24 Data were collated for seven phenotypes from meta-analyses incorporating all known genome-wide association studies. SLE is the exception as no comprehensive meta-analysis has yet been published; data were instead obtained from a recent meta-analysis including some, but not all, known genome-wide association studies. Note that controls overlap in some cases due to the use of common shared sample genotypes. We have developed a cross-phenotype meta-analysis (CPMA) statistic to assess association across multiple phenotypes. The CPMA statistic determines evidence for the hypothesis that each independent SNP has multiple phenotypic associations. Support for this hypothesis would be shown by deviations from expected uniformity of the distribution of association p-values, indicative of multiple associations. The likelihood of the observed rate of exponential decay of −log10(p) is calculated and compared to the null expectation (the decay rate should be unity) as a likelihood ratio test (see Materials and Methods for details). This CPMA statistic has one degree of freedom, as it measures a deviation in p-value behavior instead of testing all possible combinations of diseases for association to each SNP. A total of 47 of the 107 SNPs tested have evidence of association to multiple diseases (SNP-wise PCPMA 0.9) to represent the region. Cross-phenotype meta-analysis Our CPMA analysis relies on the expected distribution of p-values for each SNP across diseases. Under the null hypothesis of no additional associations beyond those already known, we expect association values to be uniformly distributed and hence -ln(p) to be exponentially decaying with a decay rate λ = 1. We calculate the likelihood of the observed and expected values of λ and express these as a likelihood ratio test: This statistic therefore measures the likelihood of the null hypothesis given the data; we can reject the null hypothesis if sufficient evidence to the contrary is present. We note that, because we only estimate a single parameter, our test is asymptotically distributed as . This gives us more statistical power than relying on strategies combining association statistics, which would consume multiple degrees of freedom. SNP–SNP distance calculation and clustering To compare the patterns of association for multi-phenotype SNPs we first calculate SNP-SNP distances and then use hierarchical clustering on that distance matrix to assess relative relationships between SNP association patterns. Calculating distances based directly on p values or the underlying association statistics is problematic, as each contributing study has slightly different sample sizes and therefore different statistical power to detect associations. Thus, distance functions based on numeric data – which incorporate magnitude differences between observations – would be biased if studies have systematically different data. Normalization procedures can account for such systematic differences but may fail to remove all bias. To reduce the impact such systematic irregularities might have on our comparison, we bin associations into informal “levels of evidence” categories. We define four classes (1
Genome-wide association studies (GWASs) have identified many genetic variants underlying complex traits. Many detected genetic loci harbor variants that associate with multiple-even distinct-traits. Most current analysis approaches focus on single traits, even though the final results from multiple traits are evaluated together. Such approaches miss the opportunity to systemically integrate the phenome-wide data available for genetic association analysis. In this study, we propose a general approach that can integrate association evidence from summary statistics of multiple traits, either correlated, independent, continuous, or binary traits, which might come from the same or different studies. We allow for trait heterogeneity effects. Population structure and cryptic relatedness can also be controlled. Our simulations suggest that the proposed method has improved statistical power over single-trait analysis in most of the cases we studied. We applied our method to the Continental Origins and Genetic Epidemiology Network (COGENT) African ancestry samples for three blood pressure traits and identified four loci (CHIC2, HOXA-EVX1, IGFBP1/IGFBP3, and CDH17; p < 5.0 × 10(-8)) associated with hypertension-related traits that were missed by a single-trait analysis in the original report. Six additional loci with suggestive association evidence (p < 5.0 × 10(-7)) were also observed, including CACNA1D and WNT3. Our study strongly suggests that analyzing multiple phenotypes can improve statistical power and that such analysis can be executed with the summary statistics from GWASs. Our method also provides a way to study a cross phenotype (CP) association by using summary statistics from GWASs of multiple phenotypes.
Genome-wide association studies (GWAS) are a standard approach for studying the genetics of natural variation. A major concern in GWAS is the need to account for the complicated dependence-structure of the data both between loci as well as between individuals. Mixed models have emerged as a general and flexible approach for correcting for population structure in GWAS. Here we extend this linear mixed model approach to carry out GWAS of correlated phenotypes, deriving a fully parameterized multi-trait mixed model (MTMM) that considers both the within-trait and between-trait variance components simultaneously for multiple traits. We apply this to human cohort data for correlated blood lipid traits from the Northern Finland Birth Cohort 1966, and demonstrate greatly increased power to detect pleiotropic loci that affect more than one blood lipid trait. We also apply this to an Arabidopsis dataset for flowering measurements in two different locations, identifying loci whose effect depends on the environment.
[1
]
departmentGenomesForLife-GCAT Lab Group, Program of Predictive and Personalized Medicine of
Cancer (PMPPC) , Germans Trias i Pujol Research Institute (IGTP), Crta. de Can Ruti , Badalona, Catalunya, Spain
[2
]
departmentUnit of Biomarkers and Susceptibility, Cancer Prevention and Control Program , Catalan Institute of Oncology (ICO), IDIBELL and CIBERESP , Barcelona, Spain
[3
]
departmentHigh Content Genomics and Bioinformatics Unit, Program of Predictive and Personalized
Medicine of Cancer (PMPPC) , Germans Trias i Pujol Research Institute (IGTP) , Badalona, Catalunya, Spain
[4
]
departmentLife Sciences - Computational Genomics , Barcelona Supercomputing Center (BSC-CNS), Joint BSC-CRG-IRB Research Program in Computational
Biology , Barcelona, Spain
[5
]
departmentPrograms in Metabolism and Medical & Population Genetics , Broad Institute of Harvard and MIT , Cambridge, Massachusetts, US
[6
]
departmentDiabetes Unit and Center for Human Genetic Research , Massachusetts General Hospital , Boston, Massachusetts, US
[7
]
departmentBlood Division , Banc de Sang i Teixits , Barcelona, Spain
[8
]
departmentCancer Genetics and Epigenetics Group, Program of Predictive and Personalized Medicine
of Cancer (PMPPC) , Germans Trias i Pujol Research Institute (IGTP) , Badalona, Catalunya, Spain
[9
]
departmentICREA , Catalan Institution for Research and Advanced Studies , Barcelona, Catalunya, Spain
[10
]
departmentDepartment of Clinical Sciences, Faculty of Medicine , University of Barcelona , Barcelona, Spain
Author notes
[Correspondence to
] Dr Rafael de Cid, GCAT lab Group, Program of Predictive and Personalized Medicine
of Cancer (PMPPC), Germans Trias i Pujol Research Institute (IGTP), Crta. de Can Ruti,
Badalona 08916, Spain;
rdecid@
123456igtp.cat
This is an open access article distributed in accordance with the Creative Commons
Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute,
remix, adapt, build upon this work non-commercially, and license their derivative
works on different terms, provided the original work is properly cited, appropriate
credit is given, any changes made indicated, and the use is non-commercial. See:
http://creativecommons.org/licenses/by-nc/4.0/.
History
Date
received
: 24
April
2018
Date
revision received
: 19
July
2018
Date
accepted
: 21
July
2018
Funding
Funded by: FundRef http://dx.doi.org/10.13039/501100004587, Instituto de Salud Carlos III;
Funded by: FundRef http://dx.doi.org/10.13039/501100003329, Ministerio de Economía y Competitividad;
Funded by: Agència de Gestió d’Ajuts Universitaris i de Recerca;
scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.