Introduction Systemic lupus erythematosus (SLE) is a relapsing-remitting complex trait which most commonly affects women of child-bearing age, with a ratio of 9∶1 in female to males. The disease prevalence varies with ethnicity, being more prevalent in non-European populations (approximately 1∶500 in populations with African ancestry and 1∶2500 in Northern Europeans) [1]. The condition is characterised by the production of a diverse range of auto-antibodies against serological, intra-cellular, nucleic acid and cell surface antigens [2]. The wide-ranging clinical phenotypes include skin rash, neuropsychiatric and musculosketal symptoms and lupus nephritis, which may be partially mediated by the extensive deposition of immune complexes. Today, thanks to improved treatments, the 10-year survival rate after diagnosis has increased to 90%, with lower survival rates being related to disease severity or complications from treatment [3]. Increased understanding of the underlying genetic basis for lupus is of key importance in improving the prognosis for lupus patients. Until recently, the genetic basis of lupus remained largely undetermined, with only about ∼8% of the genetic contribution known [4]. However, within the last three years, tremendous progress has been made in defining novel loci, through three moderate-sized genome-wide association studies in European American cohorts and a replication study in a US-Swedish cohort [5]–[7]. The loci previously identified for SLE include genes involved in the innate immune response (eg. IRF5), T and B cell signalling (eg. STAT4, TNFSF4 and BLK), autophagy/apoptosis (eg. ATG5), ubiquitinylation (UBE2L3, TNAIP3, TNIP1) and phagocytosis (ITGAM, FCGR3A and FCGR3B). All of these pathways are of potential importance in lupus pathogenesis [8]–[10]. To date, a total of 1729 independent SLE cases have been subjected to genome-wide association genotyping using three genotyping platforms: Illumina 317 K BeadChip [5], Illumina 550 K BeadChip [6] and Affymetrix 500 K array [7]. There is currently no published meta-analysis of these datasets. The aim of the current work was to perform a replication study using our UK SLE cohort on loci that showed some evidence for association in previous studies in order to extend the list of confirmed susceptibility genes for lupus. Results To identify additional susceptibility loci for SLE, we first identified the independent genetic variants that showed moderate risk (5×10−3 5×10−8) in a combined US-Swedish dataset comprising 3273 SLE cases and 12188 controls [4]. We then genotyped 27 independent SNPs in a replication cohort of 905 UK SLE cases and 5551 UK control samples (Table 1), that included both British 1958 Birth Cohort samples and additional controls from the WTCCC2 project. 10.1371/journal.pgen.1002341.t001 Table 1 SLE Case-Control Study Cohorts used in the study. Study Origin of Samples SLE cases Control samples UKa USc SWEd UK (B58BCC)a UK (WTCCC2)b USc SWEd UK UK 870 68 5483 US GWAS Gateva et al (2009) [4] 1310 7859 US replication Gateva et al (2009) [4] 1129 2991 SWE replication Gateva et al (2009) [4] 834 1338 Total 870 2439 834 5551 10850 1338 Each of the seven groups of samples included in this manuscript was independent from each other. In the UK population, direct genotyping was carried out on 905 UK cases a and samples from the British Birth Control Cohorta (B58BCC). A total of 905 UK SLE cases were typed and for the analysis, 35 cases were removed following QC. Genotypes from the WTCCC2 b were used as out-of-study controls. The published data used for the meta-analysis described in this current manuscript was derived from US and Swedish samples. The US cohort c consisted of samples included in a GWAS and additional non-GWAS'd samples used just for the replication study, as described by Gateva et al. (2009) [4]. Full details of the Swedish (SWE) b replication samples are also described in Gateva et al. (2009) [4]. For the 27 genotyped SNPs, 10 variants which had not been genotyped by the WTCCC2 project, were imputed using IMPUTE2 [11]. This imputation was performed using CEPH HapMap samples as the phased reference sequence and the boundary of the surrounding haplotype blocks used to demarcate the imputation interval. The subsequent association analysis excluded two of these ten imputed SNPs because they had less than 95% certainty for the imputation (Table S2). In the US/SWE dataset, imputation of selected SNPs not genotyped previously [4] was performed using IMPUTE1 for HapMap. Phase II CEU sample haplotypes were used as reference with subsequent association analysis performed using SNPTEST and a genomic control factor (lambda-GC) values of: 1.05 (US dataset) and 1.10 (SWE dataset) after correction for population stratification. In the UK replication sample by performing allelic association analysis using PLINK for the 23 SNPs passing QC (Tables S2 and S3), we demonstrated moderate association (P≤0.05) for twelve variants - with a lambda-GC of 1.01 following ancestry correction (see Table 2 and Table 3). Under the null hypothesis, only 1 of the 23 loci would be expected to have P≤0.05. The observed enrichment of associated SLE genes in the UK dataset suggested that many of these loci were likely to be true-positive associations. 10.1371/journal.pgen.1002341.t002 Table 2 Novel SNPs showing genome-wide significance (P = 5×10−8) in SLE following meta-analysis of UK, US, and Swedish cohorts. MARKER Locus Risk allele UK population870 cases, 5551 controls P value (US/SWE)a 3273 cases, 12188 controls Combined AnalysisFisher's testP value Freq risk alleleb OR P value OR P value rs10911363 NCF2 T 0.27 1.23 3.02×10−4 1.19 9.50×10−8 2.87×10−11 rs2366293 IKZF1 G 0.14 1.20 8.77×10−3 1.23 2.66×10−7 c 2.33×10−9 rs2280381 IRF8 A 0.62 1.11 0.0491 1.16 2.53×10−7 c 1.24×10−8 rs1990760 IFIH1 T 0.61 1.11 0.0487 1.17 3.34×10−7 1.63×10−8 rs280519 TYK2 A 0.47 1.20 5.24×10−4 1.16 7.40×10−5 d 3.88×10−8 a For sample numbers see reference [4] and Table S1 (US GWAS: 1310 cases and 7859 controls; US replication cohort: 1129 cases and 2991 controls; Swedish replication cohort: 834 cases and 1388 controls). b The risk allele frequency was calculated in control individuals. c Unpublished data. d The combined P value was calculated from imputed genotypes in the US GWAS dataset and direct genotyping in the US and SWE replication datasets. 10.1371/journal.pgen.1002341.t003 Table 3 Additional SNPs showing association with SLE in the UK, US, and Swedish cohorts. MARKER Locus Risk allele UK population870 cases, 5551 controls P value (US/SWE)a 3273 cases, 12188 controls Combined AnalysisFisher's testP value Freq risk alleleb OR P value OR P value SNPs showing borderline genome-wide significance (5×10−8 1×10−7) rs1861525 CYCS G 0.05 1.08 0.555 1.27 1.90×10−6 1.05×10−6 rs11951576 POLS C 0.69 1.01 0.907 1.14 4.60×10−6 4.17×10−6 rs641153 CFB C 0.92 1.24 0.0356 1.30 1.40×10−4 4.98×10−6 rs6438700 CASR C 0.82 1.01 0.947 1.18 5.50×10−6 5.20×10−6 rs3212227 IL12B A 0.81 1.15 0.0369 1.13 1.70×10−4 6.27×10−6 rs3184504 SH2B3 T 0.49 1.14 0.0156 1.11 5.57×10−4 8.69×10−6 rs12708716 CLEC16A A 0.65 1.10 0.0996 1.16 1.60×10−4 1.59×10−5 rs10516487 BANK1 C 0.68 1.12 0.0500 1.11 8.30×10−4 4.15×10−5 rs10156091 ICA1 T 0.11 1.04 0.604 1.16 6.50×10−4 3.93×10−4 rs2022013 NMNAT2 A 0.58 1.05 0.326 1.09 0.0015 4.89×10−4 a For sample numbers see reference [4] and Table S1 (US GWAS: 1310 cases and 7859 controls; US replication cohort: 1129 cases and 2991 controls; Swedish replication cohort: 834 cases and 1388 controls). b The risk allele frequency was calculated in control individuals. We confirmed the similarity of odds-ratios (Het P value) and direction of the effect between the UK and US-SWE datasets (Table S4) and then performed a meta-analysis using Fisher's combined P-value (see Materials and Methods). This meta-analysis revealed five novel associated loci with P 0.05). These new SLE loci are discussed in more detail below and with additional information in Text S1. Three of the SNPs tested were for loci that had shown genome-wide levels of significance in other SLE GWAS studies (Table S5). In the UK cohort we found further support for the association at JAZF1 (rs849142 P UK = 0.0243, ORUK = 1.13) and identified a third associated variant in the first intron of TNIP1 (rs6889239 P UK = 9.06×10−6, ORUK = 1.30), which is in strong LD (r2 = 0.895) with both the previous report in Europeans [4] and in perfect LD with a third SNP (rs10036748), first reported in a Chinese GWAS [12]. All three variants in TNIP1 are located within a 661 bp region of intron 1. We did not replicate the previous association with IL10 (rs3024505, P UK = 0.209 ORUK = 1.09) (Table S5). These analyses increased the evidence of association for a number of additional loci that had shown borderline significance in the original US/SWE GWAS (Table 3), including CFB, C12ORF30, SH2B3, and IL12B. Genotyping of additional samples will be required to determine if the association signals shown in Table 3 represent confirmed genetic loci for SLE. Discussion The work presented here confirms five new susceptibility loci for SLE at the level of genome-wide significance (P 48% (2) to detect an association in our cohort. Quality control of genotyping Markers were excluded from the analysis if they showed a genotyping success rate of less than 95% or had a Hardy-Weinberg P value in the B58BCC control samples of less than P = 0.001. A total of 21 cases were removed from the final analysis due to low percentage genotyping ( 0.1). The full list of genotyped variants and the results of the QC analysis are shown in (Table S3). Correction for ancestry A total of 35887 markers, distributed across each autosome, were selected for ancestry correction in the UK case-control cohort, these markers had all been typed as part of the HapMap project and on the WTCCC2 samples. The 35887 SNPs were chosen from a set of Illumina 317 K markers pruned for LD (r2 95% genotyping in the each dataset. Following EIGENSTRAT analysis, a graph was plotted of PC1 against PC2 for all the cases and controls in the UK study cohort (Figure S1). Individuals were only retained for association analysis if the values for their first two principal components fell within 6 SD of the mean for the CEPH HapMap samples. The genomic inflation factor (lambda-GC) for each population was calculated using PLINK. Statistical analysis All sample genotype and phenotype data was managed by, and analysis files generated with BC/SNPmax and BC/CLIN software (Biocomputing Platforms Ltd, Finland). The imputation intervals for each imputed variant, defined as the bounds of the haplotype blocks, calculated using the Gabriel algorithm in Haploview, (for details of the intervals see Table S2). For SNPs which were not genotyped as part of the WTCCC2 project, we performed imputation using a method described by Marchini et al [11] to generate the missing genotypes for case-control association analysis. Each un-typed variant from our list of tested SNPs, was imputed in the WTCCC2 samples, using HAPMAP as the phased reference sequence. The LD pattern around each un-typed variant was examined using the CEPH cohort from HapMap. The boundaries of the haplotype blocks were determined using the default settings for the Gabriel et al algorithm in Haploview. For each imputed variant, these haplotype boundaries were used to define the boundaries of the imputation interval (Table S2). Only SNPs with greater than a 95% certainty in imputation, assessed using the quality score from the IMPUTE2 output file, were used for subsequent analysis. Allelic association testing, using UK SLE cases with either genotyped control samples or imputed genotypes, was carried out using PLINK (http://pngu.mgh.harvard.edu/~purcell/plink/). Prior to performing the meta-analysis, the heterogeneity of odds ratios was tested using METAL and the Cochran-Mantel-Haenszel test (Table S4). SNPs with P value value>0.012807). The 870 SLE cases and 5551 control individuals retained for association analysis are located within the ellipse on the graph. (EPS) Click here for additional data file. Figure S2 Patterns of LD around the five SLE susceptibility genes reaching genome-wide levels of significance. Patterns of linkage disequilibrium in CEU individuals taken from HapMap, using positions from data release 27 phase II+III Feb09, NCBI assembly dbSNP 126: (A) 200 kb around rs10911363 (Chr 1:181,716,380-181,916,379); (B) 155.8 kb region around rs2366293 (Chr 7:50,120,474-50,276,274); (C) 100 kb region around the gap between IRF8 and rs2280381 (Chr 16:84500150-845600149); (D) 250 kb region around rs1990760 (Chr 2:162,700,000-162,949,999); (E) 200 kb region around rs280519 (Chr 19: 10,233,933-10433933). (PDF) Click here for additional data file. Figure S3 Variants showing a trend for changes in expression in EBV-transformed lymphoblastoid cell lines. Regression analysis, as described in the materials and methods, was performed on publically available genotype data from EBV-transformed B cells which were part of the HAPMAP collection and expression data on the same individuals taken from the GEO database. Four populations were used: CEPH, YRI and CHB/JPT (ASN) [2]. The GEO dataset was GSE12526 and the expression probes were: A) IKZF1 (205039_s_at), B) IFIH1 (219209_at), C) TYK2 (205546_s_at). For each graph, the mean expression per risk (R) allele and that for the non-risk (r) allele was plotted for each population. The alleles are listed on each bar and for each SNP, the total number of individuals for which there was both genotype and expression data are quoted for the three populations analysed. D) Heritability estimates for each locus were taken from the mRNA by SNP browser (http://www.sph.umich.edu/csg/liang/asthma/). (EPS) Click here for additional data file. Table S1 Composition of the study cohorts used. Each of the seven groups of samples included in this manuscript was independent from each other. In the UK population, direct genotyping was carried out on UK casesa and samples from the British Birth Control Cohorta (B58BCC). Genotypes from the WTCCC2b were used as out-of-study controls. The published data used n the meta-analysis described in this current manuscript was derived from US and Swedish samples. The US cohortc consisted of samples included in a GWAS and additional non-GWAS'd samples used just for the replication study, as described by Gateva et al. (2009) [4]. Full details of the Swedish (SWE)b replication samples are also described in Gateva et al (2009) [4]. (DOC) Click here for additional data file. Table S2 Quality control of genotype data and imputation boundaries for WTCCC2 control samples. The position of each variant (column “Pos”) is given using NCBI Build 36. The number of WTCCC2 control samples is given for each variant in the column marked “WTCCC2 samples.” (DOC) Click here for additional data file. Table S3 Power calculations. aNovel associations in this study (5×10−8). The OR, as a measure of effect size was taken from the case-control association study. The power was calculated according to Purcell et al 2003 (http://bioinformatics.oxfordjournals.org/content/19/1/149.full.pdfhtml), using a disease prevalence of 0.0002. The risk allele frequency was calculated in both cases and controls. GRR (AB) = (ABcase/AAcase)/(ABcontrol/AAcontrol) and GRR (AA) = (BBcase/AAcase)/(BBcontrol/AAcontrol). (DOC) Click here for additional data file. Table S4 Results of weighted meta-analysis using METAL and calculation of combined OR. The total number of individuals included in the meta-analysis was: 870 UK SLE cases and 5,551 UK control samples and 3,273 SLE cases and 12,188 controls taken from the US/SWE out-of-study cohort [1]. The risk allele frequency quoted is that from the UK cases. The column marked Het P-value represents the test for heterogeneity of odds ratios between the UK and published dataset and the column marked ORcomb represents the OR in the combined dataset, calculated using METAL. The column marked Direction of Effect demonstrates that the effect for each quoted allele is the same for the UK and US/SWE datasets. (DOC) Click here for additional data file. Table S5 Association Analysis in UK, and US-Swedish Populations for Markers Previously Showing Genome-Wide Significance (P<5×10−8). a For sample numbers see reference [1] and Table S1 (US GWAS: 1,310 cases and 7,859 controls; US replication cohort: 1,129 cases and 2,991 controls; Swedish replication cohort: 834 cases and 1,388 controls). b The risk allele frequency was calculated in control individuals. c Unpublished data. (DOC) Click here for additional data file. Text S1 This file contains supplementary genomic and functional details about the five SLE susceptibility genes reaching genome-wide levels of significance. (DOC) Click here for additional data file.