15
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: not found
      • Article: not found

      Pharmacogenetics and the immunogenicity of protein therapeutics

      Read this article at

      ScienceOpenPublisherPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Related collections

          Most cited references18

          • Record: found
          • Abstract: found
          • Article: not found

          A second generation human haplotype map of over 3.1 million SNPs

          We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25-35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10-30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A Systematic Assessment of MHC Class II Peptide Binding Predictions and Evaluation of a Consensus Approach

            Introduction The activation of CD4+ helper T cells is essential for the development of adaptive immunity against pathogens [1]–[4]. A critical step in CD4+ T cell activation is the recognition of epitopes presented by MHC class II molecules [5]. MHC class II molecules are heterodimers expressed on the surface of professional antigen presenting cells that bind peptide fragments derived from protein antigens [6]. X-ray crystallographic studies demonstrated that the MHC class II epitope binding site consists of a groove and several pockets provided by a β-sheet and two α-helices [7],[8]. Unlike class I, the class II binding groove is open at both ends. As a result, peptides binding to class II molecules tend to be of variable length, but typically between 13 and 25 residues. A hallmark of the MHC class II binding peptide groove is that there are four major pockets. These pockets accommodate side-chains of residues 1, 4, 6, and 9 of a 9-mer core region of the binding peptide. This core region interaction largely determines binding affinity and specificity [9]. In addition, peptide residues immediately flanking the core region have been indicated to make contact with the MHC molecule outside of the binding groove, and to contribute to MHC-peptide interaction [10]. MHC class II molecules are highly polymorphic, and this polymorphism largely corresponds with differences along the peptide binding groove. However, the binding motifs derived for MHC class II molecules are highly degenerate, and many promiscuous peptides have been identified that can bind multiple MHC class II molecules [11]. Promiscuous peptides are a prime target for vaccine and immunotherapy and computational tools have been developed to facilitate systematic scanning for promiscuous peptides [12]. Computational prediction of MHC class II epitopes is of important theoretical and practical value, as experimental identification is costly and time consuming [13],[14]. The basis of a successful computational prediction is a sufficiently large set of high quality training data. There are several databases hosting MHC epitope related data such as SYFPEITHI [15], MHCBN [16], Antijen [17], FIMM [18], HLA Ligand [19] and our own project, the Immune Epitope Database (IEDB) [20],[21]. Information from those databases is, for the most part, extracted from the literature. These databases typically combine data from different sources and different experimental approaches, which can complicate the generation of consistent training and evaluation datasets. The establishment of numerous MHC class II epitope databases has facilitated the development of a large number of algorithms aimed at predicting peptide binding to MHC molecules. Early works focused on finding peptide patterns and deriving motifs for MHC molecules [22]–[24]. With the accumulation of epitope data, more sophisticated algorithms were developed. Several methods have derived scoring matrices that evaluate the contribution to binding of different residues in a peptide based on quantitative binding data (ARB [25], SMM-align [26]). Others base similar scoring matrices on multiple peptide alignments (RANKPEP [27],[28]) or domain expert knowledge (SYFPEITHI method [15]). By combining the similarities of key residues forming the pockets of the binding groove with quantitative matrices derived from experiments, the TEPITOPE [29] algorithm can predict binding to MHC alleles for which no binding affinities were determined. Other machine learning algorithms that have been applied include hidden Markov models [30], evolutionary algorithms [31] and linear programming [32]. The MHC class II binding prediction problem has also been modeled with a distance function in a recently developed method PepDist [33]. In addition to the previously listed models that are directly interpretable, “black box” approaches, such as support vector machines [34] and artificial neural networks [35]–[37], have also been applied to MHC class II binding prediction with success. Despite the large number of available prediction methods, computational prediction of MHC class II epitopes remains a challenging problem. It has been suggested that the prediction performance of class II algorithms is systematically inferior to that of MHC class I epitope prediction methods [25]. To assess the current state of the MHC class II binding predictions, we have here sought to establish a systematic and quantitative benchmark similar to our previous effort for MHC class I molecules [38]. We present a large dataset of unpublished MHC class II-peptide binding affinities that were experimentally determined under uniform conditions. We then proceed to evaluate a set of nine publicly available MHC class II prediction methods using this dataset and systematically compared their performance. Finally, we analyzed the ability of current methods to identify the binding cores of peptides and to predict T-cell responses from peptide sequences. Results Overview of MHC Class II Epitope Affinity Dataset and MHC Class II Binding Prediction Methods We assembled a dataset of peptide binding affinities for various MHC class II molecules experimentally measured in our group (see Materials and Methods for details). Table 1 gives an overview of the dataset, encompassing a total of 10,017 experimentally determined peptide MHC II binding affinities. These data span a total of 16 human and mouse MHC class II types. The number of unique MHC-peptide affinities measured per type varies greatly, from 3,882 for HLA DRB1*0101, to only 39 for H-2-IEd. Compared to datasets publicly available on the IEDB and other MHC class II epitope databases, our new dataset expands the number of measured peptide-MHC class II interactions significantly for a large number of MHC class II molecules. For example, the number of peptides with known IC50 values for HLA DRB1*0101 was more than tripled with the addition of our new dataset. 10.1371/journal.pcbi.1000048.t001 Table 1 Overview of the MHC-peptide binding affinity dataset. Organism MHC class II types Number of MHC-peptide affinities New Knowna Human HLA-DRB1*0101 3882 1390 HLA-DRB1*0301 502 817 HLA-DRB1*0401 512 675 HLA-DRB1*0404 449 233 HLA-DRB1*0405 457 175 HLA-DRB1*0701 505 424 HLA-DRB1*0802 245 213 HLA-DRB1*0901 412 174 HLA-DRB1*1101 520 522 HLA-DRB1*1302 289 242 HLA-DRB1*1501 520 491 HLA-DRB3*0101 420 104 HLA-DRB4*0101 245 203 HLA-DRB5*0101 520 383 Mouse H-2-IAb 500 225 H-2-IEd 39 231 a Number of records in IEDB as of 12-04-2006. The MHC class II binding prediction tools evaluated in this study are listed in Table 2. We included as many prediction methods as possible provided that they (1) can perform predictions for MHC class II types in our dataset; (2) were publicly available; and (3) did not specifically disallow the use of automated prediction retrieval scripts. A total of nine methods matched these criteria. A more detailed description of tested methods is provided in the Materials and Methods section. 10.1371/journal.pcbi.1000048.t002 Table 2 Overview of nine MHC class II peptide prediction methods tested with the new dataset. Category Method MHC class II typesa Training dataset Algorithm Matrix based ARB 16 (16) IEDB Average relative binding (ARB) matrix PROPRED 51 (11) TEPITOPE Pocket profile SVMHC 51 (11) TEPITOPE Pocket profile SYFPEITHI 6 (6) SYFPEITHI Position specific scoring matrices RANKPEP 46 (16) MHCPEP Position specific scoring matrices SMM-align 17 (16) IEDB SYFPEITHI Stabilized matrix Machine Learning based SVRMHC 6 (5) AntiJen Support vector machine regression MHC2PRED 21 (15) MHCBN JenPep Support vector machine Multivariate regression MHCPRED 10 (6) JenPep Quantitative structure activity relationship (QSAR) regression a Number of MHC class II types covered by a prediction method. The number in parentheses is the number of MHC class II types also in our dataset. Performance Evaluation of Publicly Available Prediction Tools The binding predictions for peptides in our affinity dataset were extracted from the MHC class II binding prediction tools with custom scripts (see Materials and Methods for details). From the experimental data, peptides were classified into binders (IC50 95% by reversed-phase HPLC, and the purity assessed by amino acid sequence and/or composition analysis. Experimental Procedures to Measure MHC Class II Peptide Affinity Quantitative assays to measure the binding affinities of peptides to purified soluble class II molecules are based on the inhibition of binding of a radiolabeled standard peptide. Binding assays were performed essentially as described previously [13],[53]. Briefly, 0.1–1 nM radiolabeled peptide was coincubated for 2 days at room temperature with 1 µM to 1 nM purified MHC in the presence of a cocktail of protease inhibitors. Following a two-day incubation, the amount of MHC bound labelled peptide was determined by capturing MHC/peptide complexes on LB3.1 antibody coated Lumitrac 600 microplates (Greiner Bio-one, Longwood, FL), and measuring bound cpm using the TopCount microscintillation counter (Packard Instrument Co., Meriden, CT). Individual peptides were typically tested in 3 or more independent experiments for its capacity to inhibit the binding of the radiolabeled peptide. The concentration of peptide yielding 50% inhibition of the binding of the radiolabeled peptide was calculated. Under the conditions used, in which [label]<[MHC] and IC50≥[MHC], the measured IC50 values are reasonable approximations of the true K d values. The binding affinities are expressed in terms of IC50 and are capped at 50,000 nM, reflecting the experimental sensitivity threshold. Dataset of Binding Affinities Used in the Study The assembled MHC class II peptide binding affinities are listed in Table 1. The peptide binding affinities for various MHC class II molecules were generated in the context of various projects currently ongoing in our laboratory. Because they have been recently generated, to the best of our knowledge, none of the binding affinities in this dataset has been previously published. This assessment was confirmed by comparing our dataset to publicly available records contained in the IEDB (Table 1) or elsewhere. There are total 10,017 measured affinities in our dataset spanning thirteen human and three mouse MHC class II types. Peptides for 114 proteins from 30 organisms were synthesized and tested. While peptide sizes ranged form 9 to 37 amino acids, the vast majority of the measured affinities are for 15-mers (9,632 out of 10,017). The present dataset is currently in the process of being deposited in the IEDB. PDB Structures of MHC Class II and Epitope Complexes Structures of MHC class II were retrieved from the Protein Data Bank with a keyword search (using keyword “MHC class II”). The retrieved structures were then examined to select complexes have epitopes with at least 9 amino acids. In addition, the structures were examined to identify entries with identical MHC and binding peptide sequences. For duplicated structures of the same MHC and epitope, we retained the structure with the highest resolution. The final dataset contains 29 non-redundant structures. MHC Class II Binding Prediction Tools Evaluated in This Study The eight MHC class II binding prediction tools evaluated in this study are listed in Table 2. Five of the prediction methods are based on various scoring matrices. The method developed at IEDB utilizes the Average Relative Binding (ARB) matrix [25]. PROPRED [54] and SVMHC [55] are web servers based on TEPITOPE's pocket profile [29]. Both SYFPEITHI [15] and RANKPEP [28] use position specific matrices. Another matrix based approach, SMM-align [26], utilizes the stabilized matrix method (SMM [44]), but introduces a novel step to identify peptide binding cores, which makes it applicable to MHC class II predictions. Two of the methods, SVRMHC [56] and MHC2PRED (http://www.imtech.res.in/raghava/mhc2pred/index.html), apply support vector machine or support vector regression to predict epitopes. Finally, MHCPRED is a quantitative structure activity relationship (QSAR) regression based method [57].Three of the nine methods, ARB, MHC2PRED and SMM-align, give predictions in terms of the quantitative affinity of a peptide for a MHC class II molecule. The predictions of the other six methods are given as a score which is not directly translatable into an affinity of peptide-MHC binding. In terms of the number of MHC class II types covered, the two TEPITOPE based methods (PROPRED and SVMHC) have the broadest coverage with 51 types, 11 of which also appear in our dataset. The next most comprehensive method is RANKPEP which covers 46 types, 16 of which overlap with our dataset. ARB, MHC2PRED and SMM-align make predictions for about 20 MHC class II types and the majority of the types (15 to 16) also appear in our dataset. The three remaining methods (MHCPRED, SVRMHC and SYFPEITHI) have less coverage, as they only predict peptide binding for 5 to 6 MHC class II types in our dataset. Table 2 also lists the dataset used by each method to train their predictive models. Training on larger sets of data would be expected to yield better performance when tested on independent new data. In this context, the IEDB has HLA-DRB1*0101 binding information for 1390 peptides, AntiJen for 730, and MHCBN for 588. By contrast, SYFPEITHI lists only 42 entries for HLA-DRB1*0101. Thus the ARB and SMM-align methods which use data from the IEDB, had access to the largest training set compared to other methods, while the SYFPEITHI method had access to the smallest dataset. MHC Class II Epitope Prediction with External Tools We identified eight publicly available MHC class II prediction tools through literature search and the IMGT link list at http://imgt.cines.fr/textes/IMGTbloc-notes/. For each tool, we mapped the MHC types for which predictions could be made to the four-digit HLA nomenclature (e.g., HLA-DRB1*0101). If this mapping could not be done exactly, we left that type/tool combination out of the evaluation. For example, HLA-DR4 could refer to HLA-DRB1*0401, DRB1*0402 etc, which do have distinct binding specificities. For the ARB evaluation, the 10-fold cross validation results stored at IEDB was used to estimate performance since ARB was trained on datasets overlapping with the one used in this study. For the other seven tools in the evaluation, we wrote python script wrappers to automate prediction retrieval. For the SYFPEITHI prediction, we patched each testing peptide with three Glycine residues at both ends before we submitted it for prediction. This was recommended by the creators of SYFPEITHI method to ensure that all potential binders are presented to the prediction algorithm. For all other methods, the original testing peptides were submitted directly for prediction. Peptide sequences were sent to the web servers one at a time and predictions were extracted from the server's response. To assign a single prediction for peptides longer than nine amino acids in the context of tools predicting the affinity of 9-mer core binding regions, we took the highest affinity prediction of all possible 9-mers within the longer peptide as the prediction result. Consensus Approach to Predict MHC Class II Binding Peptides For each MHC class II molecules whose binding can be predicted by three or more algorithms, we employed the following approach to generate a consensus prediction. First, we selected the top three methods that give the best performance. For each method, the tested peptides are ranked by their scores with higher ranks for better binders. For each tested peptide, the three ranks from different methods are then taken and the median of the three is calculated. This median rank is taken as the consensus score. Performance Measure of External Tools Receiver operating characteristic (ROC) curves [58] were used to measure the performance of MHC class II binding prediction tools. For binding assays, the peptides were classified into binders (experimental IC50<1000 nM) and nonbinders (experimental IC50≥1000 nM), which was determined as a practical cutoff in a previous study [59]. For CD4+ T cell activation assays, the peptides were classified into T-cell epitopes (experimental SFC count≥100) or non-epitopes (experimental SFC count <100). For a given prediction method and a given cutoff for the predicted scores, the rate of true positive and false positive predictions can be calculated. An ROC curve is generated by varying the cutoff from the highest to the lowest predicted scores, and plotting the true positive rate against the false positive rate at each cutoff. The area under ROC curve is a measure of prediction algorithm performance where 0.5 is random prediction and 1.0 is perfect prediction. The plotting of ROC curve and calculation of AUC are all carried out with the ROCR [60] package for R [61]. LCMV Epitope Identification C57BL/6 (H-2b) mice were purchased from The Jackson Laboratory (Bar Harbor, ME), and infected intraperitoneally with 2×105 PFU of LCMV Armstrong (i.p.). Spleens were harvested eight days post infection, and IFN-γ ELISPOT assays were performed as previously described [62] using CD4+ T cells isolated with anti-CD4+ magnetic beads (Miltenyi Biotech Inc., Auburn, CA). Experimental values were expressed as the mean net spots per million CD4+ cells ±SD for each peptide pool or individual peptide. For the initial screening of the 83 pools, responses against each pool were considered positive if a) the number of spot forming cells (SFCs) /106 CD4+ T cells exceeded the absolute value of the mean negative control wells (effectors plus APCs without peptide) by two-fold, b) the value exceeded 200 SFCs/106 CD4+ cells and c) these conditions were met in at least two replicate independent experiments. Positive pools were deconvoluted into their eight individual components and tested again, to determine which individual peptides were responsible for the pooled IFN-γ response. Responses against individual peptides were considered positive if they exceeded the threshold of the mean negative control wells (effectors plus APCs without peptide) by at least 2 standard deviations and exceeded a threshold of 200 SFCs/106 CD4+ cells. Supporting Information Dataset S1 AUC values for the tested MHC class II binding prediction methods using different cutoffs. The cutoffs for binders were varied from 50 nM to 5000 nM. (0.03 MB XLS) Click here for additional data file.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Quality by design for biopharmaceuticals.

                Bookmark

                Author and article information

                Journal
                Nature Biotechnology
                Nat Biotechnol
                Springer Science and Business Media LLC
                1087-0156
                1546-1696
                October 2011
                October 13 2011
                October 2011
                : 29
                : 10
                : 870-873
                Article
                10.1038/nbt.2002
                21997623
                e1c28d91-6ca5-4801-ad7e-a1568c18a134
                © 2011

                http://www.springer.com/tdm

                History

                Comments

                Comment on this article