8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Clonal Characteristics of T-Cell Receptor Repertoires in Violent and Non-violent Patients With Schizophrenia

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background: Activated or impaired T-cell function in inflammatory and degenerative process can contribute to the risk and progression of schizophrenia. This study used immune repertoire sequencing to investigate the T-cell receptor beta variable chain (TRBV) presence in blood mononuclear cells in the violent or non-violent schizophrenic patients.

          Methods: Ten violent and 10 non-violent schizophrenic patients and 8 matched healthy controls were enrolled. The Brief Psychiatric Rating Scale (BPRS) was used to evaluate patients' psychiatric symptoms. The level of aggression was assessed using the Modified Overt Aggression Scale (MOAS). The complementarity-determining region 3 (CDR3) of TRBV was detected using multiplex-PCR and high-throughput sequencing.

          Results: The TCR repertoire diversity were no significant differences in the Shannon–Wiener or inverse Simpson diversity index between three groups. Principal component analysis (PCA) of TRBV composition and abundance showed that principal component 1 and principal component 2 can explain 28.88 and 13.24% of total variation, respectively. Schizophrenic patients (violent and non-violent) had significantly different V gene distribution compared to healthy controls. In particular, TRBV2 occurred at a significantly higher frequency in the violent schizophrenia group than in the non-violent schizophrenia and healthy control groups, and TRBV7-2 occurred at a significantly higher frequency in the non-violent schizophrenia group than in the violent schizophrenia and healthy control groups.

          Conclusions: The results suggest that violent and non-violent schizophrenic patients carry abnormal T-cell receptor repertoires, and these data provide a useful clue to explore the etiology of violent behavior in schizophrenia.

          Related collections

          Most cited references36

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Schizophrenia and Violence: Systematic Review and Meta-Analysis

          Introduction In the 1980s, expert opinion suggested that there was no increased risk for violence in individuals with schizophrenia and other psychoses [1]. However, with the publication of large population-based studies over the last two decades, it is now thought that there is a modest association between violence and schizophrenia and other psychoses [2]. This view is not shared by many mental health clinicians [3] or public advocacy groups. For example, a recent joint public education campaign by three leading UK mental health charities contends that the view that people with mental health problems are violent is a myth [4], and the National Alliance on Mental Illness in the US asserts that acts of violence by the mentally ill are “exceptional” [5]. In factsheets, the Schizophrenia and Related Disorders Alliance of America states that people with schizophrenia are no more likely to be violent than their neighbours [6], and SANE Australia state that people with mental illness who receive treatment are no more violent than others [7]. The issue remains topical because it is thought to have contributed to policy and legal developments for psychiatric patients [8] and the striking increase in the number of secure hospital patients in many Western countries (alongside sex offender legislation) [9]. It also contributes to the stigma associated with mental illness [10], which is considered to be the most significant obstacle to the development of mental health services [11]. Although there have been a number of studies examining the relationship between the psychoses and violent outcomes, there are wide variations in risk ratios reported with estimates ranging from 7-fold increases in violent offending in schizophrenia compared with general population controls [12],[13] to no association in a highly influential prospective investigation [14]. Previous reviews of the literature have not been quantitative or have not systematically explored the grey literature [9]–[12]. In addition, they have included selected samples, such as investigations solely of homicide offenders (who are more likely to have psychoses than other offenders) [15], and have not explored potential sources of heterogeneity. We report a systematic review of investigations examining the risk of schizophrenia and other psychoses for violent outcomes including homicide. We explored the reasons for variations between the primary studies using metaregression. We aimed to test whether risk estimates differed by gender, diagnosis (schizophrenia versus other psychoses), outcome measure (criminal convictions versus self-report or informant based information), country location (US or Nordic countries versus the rest of the world), study design (case-control versus longitudinal), and study period. In addition, we have conducted a systematic review of studies examining the risk of schizophrenia in homicide offenders. Methods Computerised Medline, Embase, and Psycinfo searches were performed from January 1970 to February 2009 using the terms viol*, crim*, homicide, schiz*, severe mental illness, major mental disorder, psychos*, and psychot*. References were hand searched for other references, including to grey literature, and non-English language publications were translated. In order to supplement the search of grey literature, US National Criminal Abstracts was searched as well as an extensive bibliography on crime and mental disorder prepared for the Public Health Agency of Canada [16]. We contacted authors of published studies for additional information as required. MOOSE guidelines (Meta-analyses of Observational Studies in Epidemiology, http://www.consort-statement.org/index.aspx?o=1031) were followed. Our inclusion criteria included case-control studies (including cross-sectional surveys) and cohort studies, which allowed an estimation of the risk of violence in patients with schizophrenia and/or other psychoses compared with a general population comparison group. Reports were excluded if: (i) Data were presented solely on all convictions not broken down for violence [17]. (ii) There was no general population comparison data [18]–[20]. Studies that used other psychiatric diagnoses as the comparator group were also excluded [21]. (iii) Data were superseded by subsequent work and inclusion would involve duplication of data [13],[22]–[25]. In one of these studies [24], data were used for the subgroup analysis on whether outcomes were different by diagnosis of cases (schizophrenia versus nonschizophrenic psychoses). In another, data for women were used from the older publication because it was not included in the updated work [25]. (iv) The cases included diagnoses of nonpsychotic illnesses such as personality disorder [14] and major depression [26]. However, we included one study where the proportion of psychoses was 95% [27]. We conducted a separate analysis of homicide only studies. For this analysis, studies were excluded if information on controls was taken from a different country and another time period [28],[29] or no data on controls were provided [30],[31]. For one of the included studies [32], state population data were specifically gathered from a government agency [33], and for another [24], data on homicides were specifically extracted for the purposes of this review. Data Extraction A standardised form was used to extract data, which included information on the study design, geographical location of study, last year of follow-up for violence (“study period”), diagnoses of cases, definition of violence, method of ascertainment of violence, sample size, mean age, adjustment for socio-demographic factors, and, in the cases, numbers with comorbid substance abuse. For those studies with comorbid substance abuse data, we also extracted data on primary and secondary diagnoses of substance abuse in the population controls (and in two comparisons [24],[34], these were extracted from data based on separate publications [25],[35]). Where possible, the control group was a population of individuals without any mental disorders. If data were available for both schizophrenia and nonschizophrenic psychoses, the former was used for the primary analyses. For the purposes of analysis, study design was explored as a dichotomous variable (case-control versus longitudinal) where nested case-control study were included as case-control designs, and also all three designs were compared (case-control versus nested case-control versus longitudinal). Longitudinal designs referred to studies where violence was assessed after diagnosis had been established. Study location was analyzed in two ways: Nordic countries versus the rest of the world, and the US versus the rest of the world. The analysis was done in this way because many of the studies were conducted in three Nordic countries (Sweden, Denmark, Finland) because of the availability for research of national registers for health and crime, and the possibility that the gun ownership laws and higher base rates of violence in the US lead to different risk estimates than other countries [36]. Sample size was analyzed as a continuous variable (for the metaregression) and by numbers of cases in three groups (0–99, 100–1,000, and >1,001 cases) for subgroup analysis. Outcomes measures were analyzed as a dichotomous variable: register-based versus self-report and/or informant interview. Study period was assessed by those reports where last year of follow-up was before 1990 and those on or after 1990. Gender was included in the metaregression analysis as a trichotomous (male, mixed, and female studies separately) and dichotomous (male and mixed studies combined versus female) variable. Suitability for inclusion was assessed and data extraction conducted independently by two researchers (SF and GG), and any differences resolved with discussion with the other authors. Data Analyses Meta-analyses of risk of violent outcomes were carried out generating pooled odds ratios (ORs) with 95% confidence intervals (CIs). Heterogeneity among studies was estimated using Cochran's Q (reported with a χ2-value and p-value) and the I 2 statistic, the latter describing the percentage of variation across studies that is due to heterogeneity rather than chance [37],[38], with 95% CIs [38]. I 2, unlike Q, does not inherently depend upon the number of studies considered with values of 25%, 50%, and 75% taken to indicate low, moderate, and high levels of heterogeneity, respectively. We explored the risk associated with substance abuse comorbidity separately by presenting estimates of risk ratios of schizophrenia and related psychoses with comorbidity, and without comorbidity. As others have noted, adjustment by substance abuse is not appropriate as it exists on the causal pathway between schizophrenia (exposure) and outcome (violence) [39],[40]. We calculated adjusted ORs by socio-demographic factors when stratum-specific estimates were given using the Mantel-Haenszel method [41]. We calculated population attributable risk fractions for the studies that reported on number of crimes in the samples investigated. We opted for individual counts of crime rather than number of convicted individuals for this analysis as it has been demonstrated that the number of crimes per conviction is significantly higher in individuals with severe mental illness than other offenders [24]. Hence, using crimes more accurately captures the population impact of violent criminality. For this analysis, the base rate r was defined as the number of separate violent crimes committed per 1,000 in the general population. r 0 was defined as the number of violent crimes per 1,000 individuals who had not been patients with schizophrenia. We then calculated the population-attributable risk as the difference in r−r 0 and the population-attributable risk fraction as population-attributable risk/r. These data were not synthesized because of their heterogeneity. Potential sources of heterogeneity were investigated further by metaregression analysis, subgrouping studies according to their inclusion criteria, and methodological factors. All subgroup analyses involved nonoverlapping data and used random-effects models. For metaregression analyses, male, female, and mixed-gender studies were included. All factors were entered individually and in combination to test for possible associations. Analyses were done in STATA statistical software package, version 10 (Statacorp, 2008) using the metan (for random and fixed-effects meta-analysis), metareg (for metaregression), and metabias (for publication bias analysis). Results Twenty individual studies were identified (for details of the studies see Table S1). The total number of schizophrenia and other psychoses cases in the included studies was 18,423. Of these cases, 1,832 (9.9%) were violent. These cases were compared with 1,714,904 individuals in the general population, of whom 27,185 (1.6%) were violent. Publications were from 11 countries: five from the US (874 cases, 4.7% of total number of cases) [42]–[46]; two from England and Wales (n = 66, 0.4%) [27],[47]; two from Denmark (n = 1,873, 10.2%) [48],[49]; three from Sweden (n = 9,024, 49.0%) [50]–[52]; two from Finland (n = 90, 0.5%) reported in three publications [12],[53],[54]; one from Australia (n = 2,861, 15.5%) [55]; Germany (n = 1,662, 9.0%) [56]; Austria (n = 1,325, 7.2%) [57]; Switzerland (n = 508, 2.8%) reported in three publications [25],[34],[58]; New Zealand (n = 39, 0.2%) [59]; and Israel (n = 101, 0.5%) reported in two publications [60],[61]. Violence was ascertained from register-based sources in 13 studies, by self-report and informants in five others, and in two investigations by both methods [43],[59]. Male Studies In the men, 13 studies were identified with 9,379 individuals with schizophrenia and other psychoses (Figure 1) [27],[45],[47]–[55],[58],[61]. The random-effects pooled crude OR comparing the risk of violence in cases with general population controls was 4.0 (95% CI 3.0–5.3) with substantial heterogeneity (I 2 = 88%, 95% CI 78–91). When using fixed-effects models, the overall crude OR for men was 2.9 (95% CI 2.7–3.1). When adjusting for socio-economic factors, possible in four of these studies [12],[49],[52],[60], the random-effects OR was 3.8 (2.6–5.0), and fixed-effects OR was 2.0 (1.8–2.1) with high heterogeneity (I 2 = 84% [74%–90%]). 10.1371/journal.pmed.1000120.g001 Figure 1 Risk estimates for violence in schizophrenia and other psychoses by gender. Note: Mixed refers to studies where both genders have been included. These estimates are for ORs that are mostly not adjusted for socio-economic factors although Monahan and Wallace have matched cases and controls by neighbourhood of residence and Modestin for occupational level and marital status (see Table S1). Female Studies Six studies provided risk estimates in female samples in 5,002 individuals with schizophrenia and other psychoses (Figure 1) [47],[49],[51],[52],[55],[61]. The random-effects pooled crude OR was 7.9 (95% CI 4.0–15.4), and the fixed-effects crude OR was 6.6 (5.6–8.0). These estimates were associated with high heterogeneity (I 2 = 86% [73%–93%]. Three additional studies that included 256 women with schizophrenia made no material difference to the risk estimates (random-effects pooled OR = 7.7; 4.2–14.1) [12],[25],[27]. These studies were excluded from sensitivity analyses as the base rate of violent was zero in the cases [27],[53] or the controls [25], and thus led to unstable risk estimates. Mixed Gender Studies Seven studies reported risk of violence in mixed samples (n = 3,786, 20.6% of all cases) [42]–[44],[46],[57],[59],[62], which reported an increased risk of violence compared with general population controls. The random-effects pooled OR was 5.0 (3.4–7.4), and the fixed-effects OR was 4.0 (3.4–4.7) with an I 2 of 80% (59%–90%). Substance Abuse Comorbidity Eleven studies involving 2,891 cases reported on risk of violence with and without substance abuse (Figures 2 and 3) [34],[42]–[45],[47],[49],[52],[53],[55],[60]. In six of these studies [42]–[44],[47],[52],[55], these were mixed gender samples. Risk of violence was raised in individuals of any gender with psychosis and comorbidity (random-effects OR = 8.9; 5.4–14.7; I 2 = 93%; 89%–95%) compared with general population controls. Violence risk was lower in persons with psychosis without comorbidity (OR = 2.1; 1.7–2.7; I 2 = 59%; 19%–79%) in comparison with general population controls. When this analysis was confined to the five studies that reported in men [34],[45],[49],[53],[55], the OR without comorbidity was 2.8 (2.3–3.5) (I 2 = 0%; 0%–66%) compared with an OR with comorbidity of 12.2 (9.5–15.8) (I 2 = 13%; 0%–57%). One study reported risk estimates in women with schizophrenia [49]. The OR without comorbidity was 19.9 (10.7–36.8), and with comorbidity, it was 74.8 (35.8–156.1). Substance abuse was highly significant on metaregression (β = −1.35, standard error [SE] = 0.26, t = −5.24, p 1,000 5 11,062 3.7 (2.7–5.2) 93 (87–96) a Number of cases differs in this analysis because data were included from Fazel [24]. b Number of studies and cases differs because data from Arseneault [59] contribute to both cells. c Number of studies is >20 because three studies contributed to more than one cell as the number of cases of the male and female samples differs for Brennan [49], Fazel [52], and Wallace [55]. There was no difference in risk estimates depending on type of outcome measure (criminal convictions and arrest data versus self- and informant-report; Figure 5). In the male-only studies, the OR was 4.1 (3.0–5.5) for the register-based outcomes, whereas it was 3.0 (1.6–5.8) for the investigations where self-report and informants were used to determine outcome. These were both associated with substantial heterogeneity (I 2's of 88% and 70%, respectively). There was only one study where risk estimates on both outcomes were reported [59]. 10.1371/journal.pmed.1000120.g005 Figure 5 Risk estimates for violence in schizophrenia and other psychoses by outcome measure. Note: Self-report also includes informant-based sources. There was no evidence of any difference in risk estimates by region when comparing studies conducted in Nordic reports with those from other countries (Table 1), or when the studies based in the US were compared with the rest of the world. In the male-only studies, the Nordic ones reported an OR of 4.4 (3.5–5.4) compared with the rest of the world where the OR was 3.8 (2.6–5.5). There was no significant difference in risk estimates for the other study characteristics: study period and study size. Nonsignificant differences by study type were found: longitudinal studies reported lower risk estimates (OR = 3.8, 2.6–5.5) but this was based on only four samples (Figure 6). Furthermore, there was some evidence of publication bias using Egger's test (t = 2.17, p = 0.04) but not with a funnel plot analysis (z = −0.31, p = 0.76). This finding was replicated when we combined the results for gender and used the publication as the unit of measurement (Egger's test, t = 2.75, p = 0.013; funnel plot, z = −0.39, p = 0.70). 10.1371/journal.pmed.1000120.g006 Figure 6 Risk estimates for violence in schizophrenia and other psychoses by study design. In the metaregression analysis with all the studies included, none of these study characteristics apart from substance abuse was statistically significant (individually or in a model where all factors were entered into simultaneously). Study type as a dichotomous variable (longitudinal versus case-control) was associated with some heterogeneity on metaregression when all factors were included in the model (β = −1.12, t = −1.57, p = 0.12). When the analysis was restricted to the male and mixed gender studies, the association almost reached statistical significance (β = −1.64, t = −2.44, p = 0.051). Substance Abuse and Violent Crime In men, there were five studies where the risk of violence was reported both in individuals with schizophrenia and other psychoses who have comorbid substance abuse, and in individuals with substance abuse alone [12],[25],[34],[45],[47],[53],[61]. In comparing these risk estimates, there was no apparent difference (Figure 7). We also compared all psychoses studies (irrespective of comorbidity) with those that reported risk of violence in individuals with a diagnosis of substance use disorders (Figure 8). Substance use disorders were associated with higher risk estimates, although the finding was nonsignificant using a random-effects model. Using fixed-effects, the OR in individuals with psychosis was 3.3 (3.0–3.5) compared with 5.5 (5.4–5.6) in substance abuse. 10.1371/journal.pmed.1000120.g007 Figure 7 Risk estimates for violence in men with schizophrenia comorbid with substance abuse compared with risk in men with substance abuse (without psychosis) reported in the same study. 10.1371/journal.pmed.1000120.g008 Figure 8 Risk estimates for violence in schizophrenia and other psychoses compared with risk in individuals with substance abuse. Note: Psychoses studies include individuals with psychotic disorders of both genders with and without substance abuse comorbidity. Substance abuse studies involve risk estimates of violence in individuals of both genders with a diagnosis of substance abuse. Population Attributable Risk Fractions We identified six studies where population attributable risk fractions could be extracted because information on the number of crimes was presented in addition to the number of convicted persons. These were no more than 10%, and were: 3.2% [55], 3.5% [50], 5.2% [24], 8.2% [54], 8.4% [59], and 9.9% [49]. Homicide as Outcome We identified five studies that reported on the risk of homicide in individuals with psychosis compared with the general population (Figure 9) [23],[24],[32],[57],[58]. There were 261 homicides committed by individuals with schizophrenia and other psychoses compared with 2,999 in the comparison group. The risk of homicide in individuals with schizophrenia was 0.3% compared with 0.02% in the general population. The random-effects pooled OR was 19.5 (14.7–25.8), with significant heterogeneity (I 2 = 60%; 0%–85%). Within these studies, we compared these estimates with the two studies that reported on risk of homicide in persons diagnosed with substance abuse. The risk of homicide in individuals with substance abuse was 0.3%, with a random-effects pooled OR of 10.9 (3.4–34.9). 10.1371/journal.pmed.1000120.g009 Figure 9 Risk estimates for homicide in individuals with schizophrenia and in individuals with substance abuse. Discussion This systematic review of the risk of violence in schizophrenia and other psychoses identified 20 studies including 18,423 individuals with these disorders. There were four main findings. The first was that the risk of violent outcomes was increased in individuals with schizophrenia and other psychoses. The risk estimates, reported as ORs, were all above one indicating an increased risk of violence in those with schizophrenia and other psychoses compared with the general population controls, although the risk estimates varied between one and seven in men, and between four and 29 in women. A second finding was that comorbidity with substance use disorders substantially increased this risk, with increased ORs between three and 25. Although there was considerable variation in this estimate between studies, the pooled estimate was around four times higher compared with individuals without comorbidity. Third, we found no significant differences in risk estimates for a number of study design characteristics for which there has been uncertainty. These included: whether the diagnosis was schizophrenia versus other psychoses, if the outcome measure was register-based arrests and convictions versus self-report, and if the study location was the US or Nordic countries compared with other countries. Finally, the increased risk of violence in schizophrenia and the psychoses comorbid with substance abuse was not different than the risk of violence in individuals with diagnoses of substance use disorders. In other words, schizophrenia and other psychoses did not appear to add any additional risk to that conferred by the substance abuse alone. We found higher risk estimates in the female-only and mixed gender studies compared with the general population, although these estimates were not significantly higher than male-only estimates using random-effects models. The higher risk estimates in women may be a consequence of the lower prevalence of drug and alcohol use in the general female population compared with the general male population, and so violence associated with other causes, including schizophrenia, would be overrepresented in the women [24]. Although other work has demonstrated a closing of the gender gap in rates of violence from patients discharged from psychiatric hospitals [63], this present systematic review has shown that risk of violence by gender is reversed compared with general population prevalence rates of violence. In addition, we found only five studies that compared risk of homicide in individuals with schizophrenia compared with the general population. Although the heterogeneity was large, the risk estimates were considerably higher than those for all violent outcomes. Although the risk of any individual with schizophrenia committing homicide was very small at 0.3% and similar in magnitude to the risk in individuals with substance abuse (which was also 0.3%), it does indicate a particularly strong association of psychosis and homicide. It may also reflect the better quality of these studies, including better ascertainment of cases. Apart from homicide, risk estimates do not appear to be elevated with the increasing severity of violent offence in individuals with psychosis [24],[52]. There were several potentially important negative findings. In particular, Nordic-based or US-based investigations did not provide different risk estimates than the rest of the world. This finding would argue against the suggestion that the association between mental illness and violent crime is modified by variations in population base rates of violence [36] or the availability of handguns. Lastly, there was no difference in risk estimates produced by studies conducted before and after 1990. Although deinstitutionalization would have occurred at different dates in the included studies, this finding may support the conclusions of two related investigations in the Australian state of Victoria that demonstrated that violent convictions have not increased in recent decades compared to these offences in the general population [13],[55]. Further research is needed to examine this issue. There are a number of limitations to this review. First, caution is warranted in the overall estimates provided in this review as there was significant heterogeneity. The lack of any explanation for this heterogeneity, apart from substance abuse comorbidity and possibly study design, suggests that methodological variations that we were not able to test may have been important. An alternative approach would be individual participant meta-analysis as it would provide some consistency across the potentially mediating characteristics. One notable finding was that, in all but three of the included studies, violence was assessed irrespective of the timing of the diagnosis of schizophrenia (i.e., violence before and after the diagnosis), which would overestimate the effects of the illness. There were three studies that used longitudinal designs (where violence was only included after diagnosis was established) [42],[43],[52], which provided lower risk estimates. Second, the overall pooled estimates will have overestimated the association because of inadequate adjustment for confounding and the use of a random-effects meta-analysis. A consequence of the latter was that risk estimates were less conservative than using a fixed-effects model, as the smaller studies were weighted more equally in the random-effects meta-analysis [64]. For example, in the men, the pooled OR was 2.9 in the fixed-effects model compared with a random-effects estimate of 4.0. The fixed-effects odd ratio was further reduced to 2.0 when adjustment for socio-demographic factors was included, possible in only four out of 13 male studies. However, the use of random-effects estimates in the subgroup analysis led to more conservative findings because of larger CIs. Another limitation was there were no studies outside of the US, Northern Europe, Israel, Australia, and New Zealand potentially limiting the generalizability of the findings. However, we found no difference by study region (such as the US or Nordic countries compared with other countries), which would suggest that the findings are applicable to Western countries. However, the lack of any studies in low income countries is notable. A number of recommendations for future research arise from this review. Residual and inadequate confounding is likely to have affected the estimates produced by the primary studies because of inadequate measurement of exposures and confounders. For example, some of the studies adjusted for socio-economic status by using the profession of the father [53], while another used neighbourhood controls [55]. More precise and reliable measures of confounders need to be included in future studies. One promising approach is to compare individuals with schizophrenia with unaffected siblings, and there is a recent study that found that the adjusted OR of violent crime for individuals with schizophrenia compared with their unaffected siblings was 1.8 (95% CI 1.3–1.8). When compared with general population controls matched for year of birth and gender, the adjusted OR was 2.0 (1.8–2.2) [52]. In addition, how substance abuse mediates violent offending needs further study. Whether future work needs to rely on resource-intensive ways of gathering outcome data such as self-report measures or interviewing informants is questioned by this review, although prevalence rates will be higher when such approaches are used. In addition, health services research could further examine the role of different service configurations in reducing violence outcomes in these patients. In particular, the role of continuity of care should be investigated. Research has demonstrated no reduction in the prevalence of violence when intensive case management has been used compared with standard care [65], but alternative models of service delivery need study. Finally, perhaps the most important research implication is the need for better quality and larger randomized controlled trials for the treatment of substance abuse comorbidity in schizophrenia [66]. A number of implications arise from this review. First, the findings highlight the importance of risk assessment and management for patients with substance abuse comorbidity. In those without substance abuse comorbidity, the risk of violent crime was modestly elevated with ORs ranging from 1 to 5. However, better adjustment for potentially relevant confounders and problems of misclassification (i.e., many of these patients may have undiagnosed and unreported substance abuse) would possibly reduce the observed risk. This effect has been demonstrated in a recent Swedish study where the adjusted OR was minimally raised (at 1.2) in individuals with schizophrenia and no comorbid substance abuse compared with general population controls [52]. The relationship between comorbid substance abuse and violence in schizophrenia may be mediated by personality features and/or social problems, and is unlikely to be a simple additive effect [67]. In support, one study demonstrated that rates of substance abuse have increased markedly in individuals with schizophrenia over 25 y, but rates of violence modestly. The authors concluded that a subgroup of people with schizophrenia at risk of violence have increasingly abused substances [55]. The relationship with medication adherence may also mediate the association with violent outcomes, particularly if it precedes substance abuse on the causal pathway to violence. The data on medication adherence has reported associations with violence in naturalistic studies [68], but a recent analysis of the Clinical Antipsychotic Trials in Intervention Effectiveness (CATIE) trial data for violent outcomes found no overall association with violence [69]. Further research is necessary to clarify the relationship between substance abuse, medication adherence, and violence. A second implication relates to attempts to redress the stigmatization of patients with schizophrenia and other psychoses that could be reconsidered in light of the findings of the risk of violence in substance use disorders [11]. Our findings suggest that individuals with substance use disorders may be more dangerous than individuals with schizophrenia and other psychoses, and that the psychoses comorbid with substance abuse may confer no additional risk over and above the risk associated with the substance abuse. As substance use disorders are three to four times more common than the psychoses [70],[71], public health strategies to reduce violence in society could focus on the prevention and treatment of substance abuse at individual, community, and societal levels [35],[72],[73]. In summary, there is a robust body of evidence that demonstrates an association between the psychoses and violence. This association is increased by substance abuse comorbidity and may be stronger in women. However, the increased risk associated with this comorbidity is of a similar magnitude to that in individuals with substance abuse alone. This finding would suggest that violence reduction strategies could consider focusing on the primary and secondary prevention of substance abuse rather than solely target individuals with severe mental illness. Supporting Information Table S1 Details of studies estimating risk of violence in individuals with schizophrenia and other psychoses. (0.08 MB DOC) Click here for additional data file.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Principal component analysis for clustering gene expression data.

            There is a great need to develop analytical methodology to analyze and to exploit the information contained in gene expression data. Because of the large number of genes and the complexity of biological networks, clustering is a useful exploratory technique for analysis of gene expression data. Other classical techniques, such as principal component analysis (PCA), have also been applied to analyze gene expression data. Using different data analysis techniques and different clustering algorithms to analyze the same data set can lead to very different conclusions. Our goal is to study the effectiveness of principal components (PCs) in capturing cluster structure. Specifically, using both real and synthetic gene expression data sets, we compared the quality of clusters obtained from the original data to the quality of clusters obtained after projecting onto subsets of the principal component axes. Our empirical study showed that clustering with the PCs instead of the original variables does not necessarily improve, and often degrades, cluster quality. In particular, the first few PCs (which contain most of the variation in the data) do not necessarily capture most of the cluster structure. We also showed that clustering with PCs has different impact on different algorithms and different similarity metrics. Overall, we would not recommend PCA before clustering except in special circumstances.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              IMGT/HighV QUEST paradigm for T cell receptor IMGT clonotype diversity and next generation repertoire immunoprofiling

              The T cell receptor (TR)1 is critical for peptide/major histocompatibility (pMH) recognition. The TR repertoire is vast, with direct estimates of 2.5 × 107 unique αβ TR per individual2 and significantly higher numbers by theoretical calculations3. The αβ TR is a membrane-bound, clonotypic, heterodimeric protein comprising one alpha chain (TRA) and one beta chain (TRB)1. Each chain comprises a variable (V) domain and a constant (C) region that includes a C domain and connecting, transmembrane and cytoplasmic regions. The V-ALPHA domain results from the rearrangement between a TRAV gene and a joining J (TRAJ) gene, whereas the V-BETA domain results from the rearrangement of a TRBV gene, a diversity D (TRBD) gene and a joining J (TRBJ) gene. The C region of the TR chains is encoded by the TRAC and TRBC (TRBC1 and TRBC2) genes, respectively1. Each V domain comprises three highly flexible complementarity-determining regions (CDR) at the antigen-binding face of the receptor1. When docking its cognate pMH ligand, the CDR1 and CDR2 facilitate binding of the receptor to the MH helices, while CDR3 principally engages the peptide within the MH groove4 5. The specificity of the TR predominantly depends on the CDR3 created by the V-(D)-J rearrangement. It remains a significant challenge to understand the diversity and specificity of T cells, particularly during natural infection. The approach by tetramer staining or antigen-induced cytokine release6 7 8 9 10 11 is limited by our knowledge of mapped epitopes. Classical DNA cloning and Sanger sequencing techniques are laborious and generally limit data to a few hundred, or in rare cases a few thousand, TR sequences per investigation6 7 8 9. The complexity and depth of the human TR repertoire was recently explored in several studies using next generation sequencing (NGS)12 13 14 15 16 17 18 19 20. In these studies, Illumina sequencing was primarily used, with a major advantage of generating very deep data, but a disadvantage that the read length was short and the data either required assembly12 13 or focused exclusively on the CDR3 (refs 14, 15). 454 sequencing was also used previously16 17 19 20 but only in combination with multiplex PCR. Earlier studies also explored various bioinformatic tools but different algorithms added potential layers of discrepancy. ImMunoGeneTics (IMGT)/HighV-QUEST21 22 (http://www.imgt.org) is the authentic high throughput version of the IMGT/V-QUEST tool23 24 25 (acknowledged as the international reference for immunoglobulin and TR sequence analysis, CSH Protocols, WHO/IUIS). IMGT/HighV-QUEST uses the same algorithm as IMGT/V-QUEST and achieves the same degree of resolution and high quality results. We now introduce a high throughput methodology for a standardized comparative TR analysis on the basis of IMGT-ONTOLOGY concepts. This approach consists of 5′ rapid amplification of cDNA ends (RACE)12 26 to avoid amplification bias associated with multiplex PCR, 454 sequencing to bypass the limitations of short-read assembly and IMGT/HighV-QUEST analysis21 22 to ensure the highest quality in sequence interpretation of full-length rearranged human TR V-BETA transcripts. We illustrate the usefulness and application of this methodology by the analysis of the human TR repertoire response towards a model immune challenge, the H1N1 vaccine. IMGT/HighV-QUEST was recently released21, and this report represents its official initial reference for application in high throughput TR repertoire and IMGT clonotype analysis. Results Workflow results A model immune challenge was provided by vaccinating a healthy volunteer with a H1N1 influenza vaccine. Three T cell subpopulations (‘CD4−’, ‘CD4+’ and ‘Treg’27 28 29 30) were isolated by flow cytometry, at four time points (baseline and days 3, 8 and 26 post vaccination) (Fig. 1a). Twelve amplicon libraries of the corresponding TR trancripts were prepared using anchored 5′RACE PCR12 and a TRBC gene-specific reverse primer1 31 (Supplementary Table S1). Sequencing was performed using 454 technology, which is appropriate for >400 nt long sequences. During the platform-specific data processing, 160,944 reads passed the 454 pipeline filter (the pass rate was 46.73%), but 7,405 of these were later discarded owing to missing or incomplete barcodes. Therefore, we obtained 153,539 ‘final 454-output’ reads for the 12 samples, of which 72% exceeded 300 nt. These reads were directly analysed by IMGT/HighV-QUEST, without the need for computational read assembly. As IMGT/HighV-QUEST currently accepts up to 150,000 sequences per job, the final 454-output 5′ reads (79,564 ‘MIDA_all’) and 3′ reads (73,975 ‘MIDB_all’) were submitted separately (Supplementary Fig. S1). Online statistical analyses (IMGT/HighV-QUEST currently accepts up to 450,000 results of analysed sequences per statistical run) were performed on the pooled results of the two jobs ‘MIDA_all’ and ‘MIDB_all’, and on the combined 5′+3′ reads of each of the 12 samples (designated as sets, for example, MID1 (Supplementary Table S1)). An additional level of expertise was specifically developed during this study to define and characterize individual IMGT clonotypes unambiguously from NGS data (clonal diversity) and determine the precise number of sequences assigned to each clonotype (clonal expression). This approach is on the basis of IMGT-ONTOLOGY32 33 and more specifically on the concepts of classification (gene and allele nomenclature)34, description (standardized labels)35 and numerotation (IMGT unique numbering)36 37 38. IMGT/HighV-QUEST summary The IMGT/HighV-QUEST ‘Summary’ of the statistical analysis (Fig. 2) made on ‘MIDA+MIDB’ (pooled results of the two jobs ‘MIDA_all’ and ‘MIDB_all’) shows that, of the 153,539 submitted sequences, 63,371 were categorized as ‘1 copy’ and 867 were categorized as ‘More than 1’. These sequences were filtered-in for statistical analysis (64,238 sequences, 41.84% of the submitted sequences), whereas sequences not answering the required criteria (e.g., ‘No results’, ‘Unknown functionality’) were filtered out21 (Supplementary Fig. S1). The ‘1 copy’ category (63,371, 98.65% of the filtered-in sequences) comprises the sequences to be analysed in detail (this category avoids repeating the same analysis on strictly identical sequences, which are stored instead in ‘More than 1’) (Supplementary Fig. S1). NGS ‘1 copy’ is not synonymous of ‘clonotype’: indeed, several ‘1 copy’ sequences may correspond to a single clonotype if the sequences only differ in their length and/or due to sequencing errors. One of the aims of this work was therefore to define, identify and characterize the distinct clonotypes from this ‘1 copy’ category, and thus to be able to evaluate the true clonal diversity. IMGT/HighV-QUEST is a generic tool, and the ‘More than 1’ category is designed for expression studies, in experiments with well-controlled parameters. Indeed, for each ‘1 copy’ sequence, the tool provides the number of ‘More than 1’ sequences (867 sequences, Fig. 2; Supplementary Fig. S1). A second aim of this work was therefore to assign, to each distinct clonotype, all the relevant ‘1 copy’ sequences, as well as the number of their corresponding ‘More than 1’, and thus to be able to provide the framework to evaluate the clonal expression. IMGT/HighV-QUEST detailed statistical analysis is performed on the ‘1 copy’ sequences. These sequences have an average length of 431 nt (Fig. 2) but the length of the V domain (V-D-J-REGION) within each sequence may vary. With the longer V-D-J-REGION, IMGT/HighV-QUEST identifies a single allele, unambiguously, whereas with the shorter V-D-J-REGION, the tool proposes several solutions. The ‘1 copy’ therefore comprises two categories: ‘single allele’ (one allele for V and J) and ‘several alleles (or genes)’ (several alleles for V and/or J) (Supplementary Fig. S1). In this study, the ‘single allele’ (for V and J) comprised 58,958 sequences (91.78% of the filtered-in sequences, average length 440 nt), with a V-D-J-REGION average length of 335 nt, whereas the ‘several alleles (or genes)’ (for V and/or J) comprised 4,413 sequences (6.87% of the filtered-in sequences, average length 309 nt) with a V-D-J-REGION average length of ~250 nt. Most of the ‘single allele’ sequences contained a complete V domain, with many even containing the leader region (L-REGION) or part of it (checked with the individual sequence files). We consider the sequences of the ‘single allele’ category to be superior to those of the ‘several alleles (or genes)’ category in terms of biological interpretations. IMGT/HighV-QUEST analysis is performed by default with the option of accepting insertions and/or deletions (indels) that looks for indels in the V-REGION and corrects these before characterizing the sequences22 25. More than 38% (38.41%) of the filtered-in sequences (24,674 sequences out of 64,238, Fig. 2) were detected by IMGT/HighV-QUEST as having indels. Most, if not all, of these indels correspond to sequencing errors and therefore the corresponding sequences corrected by IMGT/HighV-QUEST could be included in the final results, as they had no other anomaly. The IMGT/HighV-QUEST option of accepting insertions and/or deletions is therefore particularly appropriate for the 454 sequencing of TR. Analysis without that option would have led these sequences being assigned to one of the filtered-out categories or to an erroneous sequence characterization. TRB genotype and haplotype identification The TRBV, TRBD and TRBJ gene and allele usage was obtained using the statistical analysis of IMGT/HighV-QUEST available online21. This analysis is performed automatically on the ‘1 copy’ ‘single allele’ (for V and J) category (Supplementary Fig. S1). A total of 55 TRBV genes (47 functional F, 1F/open reading frame (ORF), 3 ORF, 4 pseudogenes) were identified in the rearrangements. This includes the TRBV6-3 gene as discussed below. The presence of rearranged transcripts for four in-frame pseudogenes TRBV1, TRBV3-2, TRBV12-1 and TRBV21-1 was rather unexpected (Fig. 3a), but consistent with the selection of the IMGT/V-QUEST directory ‘F+ORF+in-frame P’ for the analysis (Fig. 2). These pseudogenes were found with in-frame or out-of-frame junction rearrangements. Although a limited number of ‘1 copy’ ‘single allele’ for two of these pseudogenes was observed (three for TRBV1 and two for TRBV12-1; Fig. 3a), the fact that they were found in different sets with different junctions underline the quality of the data and confirmed that no TRBV gene was overlooked in the 5′RACE amplification step. The TRBV3-2 and TRBV21-1 in-frame pseudogenes were found in 105 sequences (56 for allele TRBV3-2*01 and 49 for allele TRBV3-2*03) and 549 sequences, respectively (Fig. 3a). In contrast, two other in-frame pseudogenes (TRBV12-2 and TRBV26) and two ORF (TRBV7-1 and TRBV17) were not found (Fig. 3a). For the first time, the TRBV genotype and haplotypes of an individual could be identified unambiguously from the gene and allele usage (Supplementary Table S2). The apparent ‘absence’ of the functional TRBV6-3 gene was expected as its allele TRBV6-3*01 has an identical sequence to TRBV6-2*01 (the IMGT/HighV-QUEST ‘1 copy’ results therefore include TRBV6-3*01 under ‘TRBV6-2*01). The TRBV6-3 gene was taken into account in the genotype identification (Supplementary Table S2), although it cannot be displayed in the histogram (Fig. 3a). No other similar case was detected. The individual is homozygous for most functional TRBV genes, except for 2, for which he is heterozygous, namely TRBV20-1 (alleles *01 and *02) and TRBV7-3 (allele *01 functional and allele *02 ORF). The TRBV genes for which the individual is homozygous have the allele *01, except for three genes, which have the allele *02 (TRBV5-5*02, TRBV15*02 and TRBV30*02). The frequently used V genes are distributed along the TR locus at uneven intervals (Fig. 3a). The 2 TRBD genes (Fig. 3b) and all 13 functional TRBJ genes (Fig. 3c) were detected in this analysis. As for the TRBV genes, the TRBD and TRBJ genotype and haplotypes were identified on the basis of the alleles identified by IMGT/HighV-QUEST. The individual is heterozygous for TRBD2 (TRBD2*01/TRBD2*02) and TRBJ1-6 (TRBJ1-6*01/TRBJ1-6*02) and homozygous for TRBD1 and the other TRBJ genes (all *01). Thus, the histograms and tables of the IMGT/HighV-QUEST statistical analysis of the TRBV, TRBD and TRBJ gene and allele usage, performed on the ‘1 copy’ ‘single allele’ (for V and J) category, provides an accurate genotype landscape of this individual. Moreover, for the first time, and based on the unambiguous TRBV, TRBD and TRBJ allele determination in V-D-J rearrangements, haplotypes could be described for NGS data. Thus, we demonstrated the respective linkage of the TRBV20-1*01 and TRBJ1-6*01 (and also TRBD2*02) on one chromosome, and of the TRBV20-1*02 and TRBJ1-6*02 (and also TRBD2*01) on the other. No such linkage could be obtained for the TRBV7-3 alleles because no rearrangement was found to TRBJ1-6. These results on the V, D and J genes and alleles provide important clues for the interpretation of sequences of the ‘1 copy’ ‘several alleles’ (for V and/or J) category. They also represent a crucial step towards the definition and characterization of IMGT clonotypes for an accurate description of repertoire immunoprofiles, as described below. IMGT clonotype definition and characterization In the literature, clonotypes are defined differently, depending on the experiment design (functional specificity) or available data. Thus, a clonotype may denote either a complete receptor (e.g., TR alpha-beta), or only one of the two chains of the receptor (e.g., TRA or TRB), or one domain (e.g., V-BETA), or the CDR3 sequence of a domain. Moreover the sequence can be at the amino acid (AA) or nucleotide level, and this is rarely specified. Therefore, our priority was to define clonotypes and their properties, which could be identified and characterized by IMGT/HighV-QUEST, unambiguously. In IMGT, the clonotype, designated as ‘IMGT clonotype (AA)’, is defined by a unique V-(D)-J rearrangement (with IMGT gene and allele names determined by IMGT/HighV-QUEST at the nucleotide level21 22 23 24 25) and a unique CDR3-IMGT AA (in-frame) junction sequence39 40 41. To identify ‘IMGT clonotypes (AA)’ in a given IMGT/HighV-QUEST data set, the ‘1 copy’ are filtered to select for sequences with in-frame junction, conserved anchors 104 and 118 ‘C, F’ (‘C’ is 2nd-CYS 104, and ‘F’ is the J-PHE 118 of V-BETA)36 37 38 and for V and J functional or ORF, and ‘single allele’ (for V and J; Supplementary Fig. S1). By definition, an ‘IMGT clonotype (AA)’ is ‘unique’ for a given data set (Fig. 4a). Consequently, each ‘IMGT clonotype (AA)’, in a given data set, has a unique set identifier (column ‘Exp. ID’) and, importantly, has a unique representative sequence (link in column ‘Sequence ID’) selected by IMGT/HighV-QUEST among the ‘1 copy’ ‘single allele’ (for V and J), based on the highest per cent of identity of the V-REGION (‘V %’) compared with that of the closest germline, and/or on the sequence length (thus, the most complete V-REGION). Thus in Fig. 4a, the ‘IMGT clonotype (AA)’ #17081, with an Exp. ID ‘13915-MIDAB_all’, has a unique rearrangement ‘TRBV20-1*02F – TRBD1*01F – TRBJ1-1*01F’, with a CDR3-IMGT length (AA) of ‘12 AA’ and a CDR3-IMGT sequence (AA) ‘SAPAEGGNTEAF’, and conserved anchors 104 and 118 ‘C, F’ (recall of the filter). The IMGT clonotype (AA) representative sequence has a V-REGION, which is 100% identical to that of TRBV20-1*02 and a length of 479 nt. Clonal diversity and clonal expression In this study, 22,234 unique IMGT clonotypes (AA) were identified and a representative sequence was assigned to each (Supplementary Fig. S1). The ‘1 copy’ ‘single allele’ sequences not selected as representative (25,153 sequences) were each then assigned to a characterized IMGT clonotype (AA). These sequences differ from the representative sequence by a different (usually shorter) length, and/or by sequencing errors in the V-REGION (lower ‘V %’ of identity) or in the J-REGION, and/or by nucleotide differences in the CDR3-IMGT. These sequences with nucleotide differences in the CDR3-IMGT are identified as ‘IMGT clonotypes (nt)’. The nucleotide differences may be due to sequencing errors or, if this can be proven experimentally, molecular convergence. A given ‘IMGT clonotype (AA)’ may have one or several ‘IMGT clonotypes (nt)’. Thus in Fig. 4b, the ‘IMGT clonotype (AA)’ #17379 has two ‘IMGT clonotypes (nt)’, as shown by the number (‘2’) of different CDR3-IMGT sequences (nt) (‘Nb diff CDR3-IMGT (nt)’). The ‘1 copy’ ‘several alleles (or genes)’ sequences are also assigned to an ‘IMGT clonotype (AA)’, provided that they have the same CDR3-IMGT (AA) and the same V and J alleles of the representative ‘IMGT clonotype (AA)’ among those proposed by IMGT/HighV-QUEST (Fig. 4b). In our study, 2,052 ‘several alleles (or genes)’ sequences could be assigned to an ‘IMGT clonotype (AA)’ (Supplementary Fig. S1). The nb of sequences of ‘More than 1’ for each ‘1 copy’ assigned to an IMGT clonotype (AA) is finally included (795 sequences). Thus, by proceeding stepwise to assign sequences, the high quality and specific characterization of the ‘IMGT clonotype (AA)’ remain unaltered. For the first time, for NGS antigen receptor data analysis, our standardized approach allows a clear distinction and accurate evaluation between clonal diversity (nb of ‘IMGT clonotypes (AA)’) and clonal expression (nb of sequences assigned, unambiguously, to a given ‘IMGT clonotype (AA)’). In our study, the 22,234 ‘IMGT clonotype (AA)’ (clonal diversity) corresponded to 50,234 sequences (clonal expression), which represented 78.2% of the filtered-in sequences (Supplementary Fig. S1). These assignments are clearly described and visualized in detail, so the user can check clonotypes, individually. Indeed, the sequences of each ‘1 copy’ assigned to a given ‘IMGT clonotype (AA)’ are available in ‘Sequences file’ (Fig. 4a,b). The user can easily perform an analysis of these sequences online with IMGT/V-QUEST (up to 50 sequences, selecting ‘Synthesis view display’ and the option ‘Search for insertions and deletions’) and/or with IMGT/JunctionAnalysis (up to 5,000 junction sequences), which provide a visual representation familiar to the IMGT users. Homo sapiens TRB normalized reference immunoprofiles The comparison of clonal diversity and expression results between studies and experiments requires standards and as these do not exist for NGS, we established Homo sapiens TRB normalized reference immunoprofiles. For clonal diversity, immunoprofiles were obtained by normalizing, to a total of 10,000 clonotypes, the nb of IMGT clonotypes (AA) per TRB (V, D and J) gene (in pink), from the values of 22,231 IMGT clonotypes (AA) (having excluded three abnormal clonotypes, each one represented by a unique sequence) (Fig. 5). For clonal expression, immunoprofiles were obtained by normalizing to a total of 10,000 sequences, the nb of sequences assigned to IMGT clonotypes (AA) per TRBV (in green), TRBD (in red) and TRBJ (in yellow) gene, from the values of the 50,231 assigned sequences per gene (Fig. 6). Normalized values for clonal diversity and expression are reported for TRBV (Supplementary Table S3), TRBD (Supplementary Table S4) and TRBJ (Supplementary Table S5). These TRB normalized reference immunoprofiles will be used to identify variations of interest between the 12 sets (in preparation), despite the overall similarity of the results obtained for the individual sets (an observation that led us to build the normalized reference from the results of the pooled sets). Similarly, the nb of IMGT clonotypes (AA) per CDR3-IMGT length (Fig. 7a) and the nb of sequences assigned to the IMGT clonotypes (AA) per CDR3-IMGT length (Fig. 7b) were normalized for 10,000 clonotypes (from 22,231 clonotypes) and for 10,000 sequences (from 50,231 sequences), respectively (Supplementary Table S6). This normalized distribution of clonotypes and sequences per CDR3-IMGT length will be used for comparison between the different sets (in preparation) or for results comparison with other studies performed with the same IMGT/HighV-QUEST standards. IMGT clonotypes (AA) in different T cell subpopulations Analysing an immune response implies the ability to identify the emergence of new IMGT clonotypes (AA) and track memory clonotypes within T cell subpopulations. Whereas the overall immunoprofile was similar between the 12 sets as indicated above, this contrasted with the high diversity of the ‘IMGT clonotypes (AA)’ sequences. Of the total of 22,231 IMGT clonotypes (AA) (50,231 sequences), 21,164 (40,898 seq) were unique to a set, with the following T cell subpopulation distribution: 6,234 clonotypes (12,854 seq) unique to CD4− sets, 9,492 (16,074 seq) to CD4+ sets and 5,438 (11,970 seq) to Treg sets. In contrast, 1,067 IMGT clonotypes (AA) were common to 2–7 sets (9,237 seq). Among these, 825 (6,525 seq) were common only to sets of the same T cell subpopulation, whereas 242 (2,712 seq) were common to sets between different T cell subpopulations, underlying the importance of studying clonotypes at the sequence level. The low number of common clonotypes between different T cell subpopulations at any given time point confirmed that the flow cytometry separation was effective. Only two IMGT clonotypes (AA) were found in seven sets and were the only common clonotypes pre-vaccination in the three T cell subpopulations. Common ‘IMGT clonotypes (AA)’ were identified post-vaccination, either within a given T cell subpopulation between different time points d3, d8 and d26 (28 clonotypes (272 seq), of which 8 (95 seq) were in CD4− sets, 4 (36 seq) in CD4+ sets and 16 (141 seq) in Treg sets), or between two T cell subpopulations (9 clonotypes (63 seq)), but no common clonotypes could be identified between all three T cell subpopulations at any time point post-vaccination. The clonotypes emerging after vaccination required more extensive molecular characterization and analysis. This however is associated with biological analysis, and beyond the scope of this study. Therefore, we focused on the IMGT clonotypes (AA), common at the four time points within a given T cell subpopulation. Thus, 82 IMGT clonotypes (AA), namely 29, 11 and 42 in the CD4−, CD4+ and Treg sets, respectively, were identified and followed individually. These IMGT clonotypes (AA) used different TRBV genes and alleles (Fig. 8). Whereas the TRBV gene and allele distribution differed between the three T cell subpopulations, the pattern was strikingly similar within a subpopulation at different time points. This supports the reproducibility of the IMGT/HighV-QUEST determination of the IMGT clonotypes (AA), between experiments, and importantly, means that if variability was observed (for example, in the case of CD4+ in Fig. 8), this warrants exploration for either experimental bias or biological significance. The individual clonal expression of the 82 common IMGT clonotypes (AA) within a given T cell subpopulation could also be followed using the IMGT/HighV-QUEST statistical analysis results, based on the nb of sequences assigned to each at the four time points and normalized for 10,000 sequences, in the CD4− sets (Supplementary Fig. S2a), CD4+ sets (Supplementary Fig. S2b) and Treg sets (Supplementary Fig. S2c). Discussion Although NGS exhibits great potential for the analysis of the immune repertoire, NGS data per se are still heavily biased owing to experimental and methodological flaws from the sample preparation, to TR transcript amplification, or to the sequencing and interpretation of the results. In this study, we used a combination of 5′RACE, 454 and IMGT/HighV-QUEST for standardized analysis of complete V domains, for genotype/haplotype analysis, characterization of IMGT clonotypes (AA), clonal diversity and clonal expression, and generation of immune profiles in normal repertoires and during disease. The 5′RACE12 26 is reliable for TR repertoire analysis as shown by the overall consistency of the clonotypic and expression histograms of 12 different sets (corresponding to three T cell subpopulations at four time points) and confirmed by the detection of rearranged transcripts of in-frame pseudogenes (which may be used as internal controls). Whereas the 5′RACE PCR introduced few errors, probably due to the use of high-fidelity polymerases and low cycle numbers, recent studies established that the majority of errors in TR deep sequencing occur during the solid-phase steps42. Interestingly, IMGT/HighV-QUEST analysis detects and corrects insertions and/or deletions, which represent current sequencing errors found with 454 due to homopolymer hybridization. The IMGT/HighV-QUEST functionality ‘Search for insertions and deletions’ is provided by default owing to the high number of indels observed in NGS data. This functionality is identical to that created in IMGT/V-QUEST22 25 online, as an option for analysis of sequences from leukaemic cells in which indels are frequent23. Sequencing errors in the CDR3-IMGT are not corrected by IMGT/HighV-QUEST, however our characterization of ‘IMGT clonotypes (nt)’ highlights sequences with CDR3-IMGT nt differences for each IMGT clonotype (AA). With free public online access, IMGT/HighV-QUEST allows our approach to be readily adaptable to other studies. IMGT/HighV-QUEST analyses directly the fully rearranged IG and TR V-J and V-D-J sequences, without the need of computational assembly. IMGT/HighV-QUEST is a generic tool that allows analysis of IG and TR of different species, including identification of new allele IG and TR polymorphisms and analysis of IG somatic hypermutations. Therefore, IMGT/HighV-QUEST requires NGS methodology, which provides sufficiently long and reliable sequences encompassing directly the V domain. The current average read length of 454 sequencing is ~400 nt (431 nt for the ‘1 copy’ in this study). A major feature of our work was to define and characterize ‘IMGT clonotype (AA)’ to determine their nb (clonal diversity) and to identify the nb of sequences assigned to each ‘IMGT clonotype (AA)’ (clonal expression). This requires several steps in the IMGT/HighV-QUEST statistical analysis. First, IMGT clonotypes (AA) are identified among the ‘1 copy’ with in-frame junctions, conserved anchors 104 and 118 (‘C, F’ for 2nd-CYS and V-BETA J-PHE, respectively), V and J functional or ORF, ‘single allele’ (for V and J). Their characterization includes the identification of the rearranged TRBV and TRBJ gene and allele at the nucleotide level by IMGT/HighV-QUEST, and that of a unique CDR3-IMGT (AA) sequence. As a given clonotype may be identified in sequences that differ in length and/or contain sequencing errors, a representative sequence (highest percentage identity of the V-REGION and longest sequence) and an identifier are assigned to each IMGT clonotype (AA) identified in a given data set. Second, the nb of sequences for an IMGT clonotype (AA) (clonal expression) is obtained by aggregating to the representative sequence the nb of sequences that are not selected as representative. The ‘Sequences file’ of the IMGT clonotypes (AA) allows a comparison of all the sequences assigned to a given clonotype (AA). We demonstrate that common IMGT clonotypes (AA) can be followed at different time points between T cell subpopulations, revealing the feasibility of a standardized approach for analysis of specific clones in the immune response. As a large number of antigens are implicated in any infection, it is impossible to identify and simultaneously investigate all antigen-specific T cells mobilized against a complex pathogen. IMGT/HighV-QUEST is capable of quantitatively analysing almost half a million (450,000) results of sequences, simultaneously. With more than 530 columns of results per sequence (Supplementary Table S7), the nb of data analysed is >2 × 108, and represents a genuine advance in standardized and high-quality TR repertoire analysis. It is becoming increasingly apparent that the nature of the T cell repertoire deployed during an immune response can directly affect disease outcomes7 9 43 44 45 46. As such, new tools and a standardized methodology (as presented in this case study) capable of dissecting the TR repertoire in a rapid, detailed and comprehensive fashion will be helpful in uncovering new immunopathological associations and accelerate knowledge of basic TR repertoire biology. Presently, TR repertoire investigation is limited by two polarizing challenges. At one end, high-throughput sequencing alone cannot correlate a clonotype with its functional parameters. At the other end, Sanger sequencing of sorted cells has low throughput and the method depends on prior knowledge of the antigen and/or the antigen-specific cells, thus often missing many antigen-specific populations. Combining high-throughput TR immunoprofiling using IMGT/HighV-QUEST analysis with cell identity-oriented approaches will bring genuine advances in TR repertoire studies in health and disease. Methods Ethics statement The study was approved by the Alfred Hospital Research Ethics Committee and the Victorian Department of Human Services Human Research Ethics Committee. Written informed consent was obtained from the volunteer. Cells and RNA A 45-year-old healthy male Caucasoid volunteer (HLA- A*0201/*3002, B*1501/*1801, C*0303/*0501, DRB1*0301/*0401, DQB1*0302/*0201) was vaccinated with H1N1 vaccine (Panvax H1N1 Vaccine, CSL), and blood samples were collected before vaccination and on days 3, 8 and 26 post-vaccination. PBMC at each time point were depleted of CD14+ and CD19+ cells using MACS (Miltenyi Biotec), stained for CD4, CD3, CD25 and CD127 surface expression (fluorochrome-conjugated monoclonal antibodies from BD Biosciences) and then sorted into three T cell subpopulations: regulatory T cells (‘Treg’, with a phenotype CD3+CD25+CD127−/lo)27 and conventional T cells CD3+CD4+ (‘CD4+’) and CD3+CD4− (‘CD4−’) using FACSAria (BD Biosciences) (Fig. 1a). Treg cells27 28 29 30, which represent a minor subpopulation (~5% ) within circulating T cells (or ~2% of PBMC) were included in the analysis to evaluate if the technique works for both abundant T cell subpopulations (e.g., CD4+) as well as small subpopulations. RNA was immediately extracted from sorted cells using RNeasy minikit (Qiagen). In one experiment, DNA was extracted from CD14+ and CD19+ cells, and was subsequently used in high-resolution HLA class I and II typing29. Amplicon library construction The concentration of RNA was determined using a NanoDrop ND-8000 spectrophotometer and ~200 ng RNA was used for each library. TRB transcripts were amplified using 5′RACE PCR12 26 because this strategy provides an unbiased amplification of full, rearranged V-D-J sequences. We chose to amplify mRNA over rearranged genomic DNA to obtain sufficiently long sequences with complete V domains, by avoiding the intervening sequence between J and C. A total of 12 libraries were constructed, corresponding to the 12 blood samples (three T cell subpopulations × four time points), using established protocols12 13 with minor modifications. In brief, a 5′RACE PCR was conducted using the SMARTer RACE cDNA Amplification Kit (Clontech Laboratories) according to the manufacturer’s instructions. The extension time for the first-strand cDNA synthesis was 90 min at 42 °C followed by 15-min inactivation at 70 °C. The first-round PCR was achieved using Phusion Hot-Start DNA Polymerase (Finnzymes), a template-switching oligonucleotide (TSO), a universal primer mix (supplied in the above SMARTer RACE cDNA amplification kit), along with the TRBC gene-specific reverse primer, 5′-TTCTGATGGCTCAAACAC-3′ (codon positions 11-6, IMGT unique numbering), which aligns to both TRBC1 and TRBC2 genes1 31 (IMGT Repertoire, http://www.imgt.org). The cycling conditions were: 30 s denaturation at 98 °C, 26 cycles of 10 s at 98 °C, 10 s at 55 °C and 20 s at 72 °C, plus a final extension for 5 min at 72 °C. The reaction products were purified using QIAquick columns (Qiagen). The purified DNA fragment was loaded on a 1.5% low melting temperature agarose gel, and a band corresponding to a 500- to 650-bp product was excised and purified using the QIAquick Gel Extraction Kit (Qiagen). A second-round PCR was performed on a fraction of the first-round reaction. This step incorporated Roche forward and reverse linker primers to enable the sequencing and the Multiplex Identifier (MID) or barcodes (MID1–MID8, MID10–MID12 and MID14) to distinguish the different cell fractions and time points (454 Sequencing Technical bulletin TCB N°013-2009, August 2009). The product of the second-round PCR was purified as described above, and quantified using PicoGreen reagent (Invitrogen). Finally, an equal amount (100 ng) of cDNA from each of the 12 libraries was pooled to obtain the final amplicon library, which represents the complete collection of TRBV transcripts sampled from this donor (Fig. 1b). Sequencing and initial data processing Sequencing was performed on a ¼ PicoTiterPlate by the Australian Genome Research Facility using the 454 Genome Sequencer FLX (GSFLX) Titanium (Roche). Initial data processing was performed using the manufacturer's software, which included the removal of low quality and erroneous sequences as determined by the standard filters of the Roche amplicon signal-processing pipeline. Sequences were assigned to samples based on incorporated barcodes, and read orientation was determined by the presence or absence of the sequence corresponding to the TSO used in the SMARTer RACE. Sequence segments corresponding to the adapters, barcodes and TSO were removed during this process. Quality control was conducted by spiking the amplicon library using classical cloning and sequencing methods7 11. Repertoire analysis using IMGT/HighV-QUEST The ‘final 454-output’ reads were submitted online to IMGT/HighV-QUEST21 22. The full capacity of IMGT/HighV-QUEST includes analysis of V-J and V-D-J rearranged sequences (up to 150,000 per job) and statistical analysis (on results of up to 450,000 sequences) (http://www.imgt.org, version July 2012). The IMGT/HighV-QUEST21 submission page allows users to submit a file containing up to 150,000 sequences and to select options (equivalent to those of IMGT/V-QUEST22 23 24 25) for the results display. The results are provided in a downloadable main folder with 11 files21 (Supplementary Table S7) in CSV format (results equivalent to those of the Excel file from IMGT/V-QUEST online22 23 24 25), and one folder with the individual files (up to 150,000) of all the sequence results21. For each analysed sequence, the results in those individual files are identical to those that could be obtained from IMGT/V-QUEST online (in display option ‘Text’ of 'Detailed view'22 23 24 25). Text and CSV formats facilitate statistical studies for further interpretation and information extraction. Before IMGT/HighV-QUEST analysis, the users can evaluate the quality of their sequences by checking the results obtained with IMGT/V-QUEST on a few sequences. In a second online step, the users can submit the results of one or several jobs (up to 450,000 results) for statistical analysis. The IMGT/HighV-QUEST ‘Summary’ table of the statistical analysis provides information in Results categories that are either filtered in (‘1 copy’, ‘More than 1’) or filtered out (‘Warnings’, ‘Unknown functionality’, ‘No results’)21. The number of sequences in the different categories provides the users with an immediate indication of data reliability. Before the final results, statistical analyses were also performed on ‘MIDA_all’ and ‘MIDB_all’ separately, and on the 5′ reads and 3′ reads separately of each of the 12 samples for the purpose of data evaluation. The 5′ and 3′ reads were pooled to overcome the limitation of 454 sequencing, which does not provide genuine ‘bi-directional’ sequences. Indeed, the 5′ reads and 3′ reads are generated independently in separate wells, and the comparison of the IMGT/HighV-QUEST statistical analysis performed on the 5′ or 3′ reads, separately or pooled, confirmed the necessity of pooling to avoid losing information. Genotype and haplotypes identification The genotype and haplotypes were deduced from the IMGT/HighV-QUEST statistical analysis performed on all pooled sets (‘MIDA+MIDB’) on the results category ‘1 copy’ ‘single allele’ (for V and J). Author contributions S.L., M.A.F., J.J.M., M.-P.L. and E.J.G. designed the project; S.L., D.F. and J.J.M. carried out the molecular biology work; M.-P.L., E.A., V.G. and P.D. carried out bioinformatics work; P.U.C. and J.P.S. designed and carried out clinical procedures; A.T.P. and V.D.A.C. helped data analysis; M.-P.L., S.L., V.G., J.J.M. and E.J.G. wrote the paper, with help from B.L., J.-P.S., S.R.B., M.P. and P.U.C. Additional information Accession code: Sequencing data has been deposited in the NCBI Sequence Read Archive under accession code SRX326382. How to cite this article: Li, S. et al. IMGT/HighV QUEST paradigm for T cell receptor IMGT clonotype diversity and next generation repertoire immunoprofiling. Nat. Commun. 4:2333 doi: 10.1038/ncomms3333 (2013). Supplementary Material Supplementary Information Supplementary Figures S1-S2 and Supplementary Tables S1-S7
                Bookmark

                Author and article information

                Contributors
                Journal
                Front Psychiatry
                Front Psychiatry
                Front. Psychiatry
                Frontiers in Psychiatry
                Frontiers Media S.A.
                1664-0640
                31 August 2018
                2018
                : 9
                : 403
                Affiliations
                [1] 1Department of Psychiatry and Mental Health Institute of the Second Xiangya Hospital, Central South University , Changsha, China
                [2] 2National Clinical Research Center on Mental Disorders and National Technology Institute on Mental Disorders, Hunan Key Laboratory of Psychiatry and Mental Health , Changsha, China
                [3] 3Department of Health Management Center, Third Xiangya Hospital, Central South University , Changsha, China
                [4] 4Department of Surgery, Chinese University of Hong Kong, Prince of Wales Hospital , Shatin, China
                [5] 5Department of Psychiatry, State Key Laboratory for Cognitive and Brain Sciences, HKU-SIRI, University of Hong Kong , Hong Kong, China
                Author notes

                Edited by: Xiang Yang Zhang, University of Texas Health Science Center at Houston, United States

                Reviewed by: Hongsheng Gui, Henry Ford Health System, United States; Qiang WANG, West China Hospital of Sichuan University, China

                *Correspondence: Jiansong Zhou zhoujs2003@ 123456csu.edu.cn

                This article was submitted to Behavioral and Psychiatric Genetics, a section of the journal Frontiers in Psychiatry

                Article
                10.3389/fpsyt.2018.00403
                6127418
                716f03d2-be5c-4df8-a599-c0d849a32907
                Copyright © 2018 Li, Zhou, Cao, Liu, Li, Li and Wang.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                History
                : 19 April 2018
                : 09 August 2018
                Page count
                Figures: 4, Tables: 2, Equations: 0, References: 53, Pages: 8, Words: 5029
                Funding
                Funded by: National Natural Science Foundation of China 10.13039/501100001809
                Award ID: 81371500
                Award ID: 81571316
                Award ID: 81571341
                Categories
                Psychiatry
                Original Research

                Clinical Psychology & Psychiatry
                schizophrenia,violence,t-cell receptor,immune repertoire sequencing,complementarity-determining region

                Comments

                Comment on this article