0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Feature selection using a one dimensional naïve Bayes’ classifier increases the accuracy of support vector machine classification of CDR3 repertoires

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation: Somatic DNA recombination, the hallmark of vertebrate adaptive immunity, has the potential to generate a vast diversity of antigen receptor sequences. How this diversity captures antigen specificity remains incompletely understood. In this study we use high throughput sequencing to compare the global changes in T cell receptor β chain complementarity determining region 3 (CDR3β) sequences following immunization with ovalbumin administered with complete Freund’s adjuvant (CFA) or CFA alone.

          Results: The CDR3β sequences were deconstructed into short stretches of overlapping contiguous amino acids. The motifs were ranked according to a one-dimensional Bayesian classifier score comparing their frequency in the repertoires of the two immunization classes. The top ranking motifs were selected and used to create feature vectors which were used to train a support vector machine. The support vector machine achieved high classification scores in a leave-one-out validation test reaching >90% in some cases.

          Summary: The study describes a novel two-stage classification strategy combining a one-dimensional Bayesian classifier with a support vector machine. Using this approach we demonstrate that the frequency of a small number of linear motifs three amino acids in length can accurately identify a CD4 T cell response to ovalbumin against a background response to the complex mixture of antigens which characterize Complete Freund’s Adjuvant.

          Availability and implementation: The sequence data is available at www.ncbi.nlm.nih.gov/sra/?term¼SRP075893. The Decombinator package is available at github.com/innate2adaptive/Decombinator. The R package e1071 is available at the CRAN repository https://cran.r-project.org/web/packages/e1071/index.html.

          Contact: b.chain@ 123456ucl.ac.uk

          Supplementary information: Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references10

          • Record: found
          • Abstract: found
          • Article: not found

          Bias in the αβ T-cell repertoire: implications for disease pathogenesis and vaccination.

          The naïve T-cell repertoire is vast, containing millions of unique T-cell receptor (TCR) structures. Faced with such diversity, the mobilization of TCR structures from this enormous pool was once thought to be a stochastic, even chaotic, process. However, steady and systematic dissection over the last 20 years has revealed that this is not the case. Instead, the TCR repertoire deployed against individual antigens is routinely ordered and biased. Often, identical and near-identical TCR repertoires can be observed across different individuals, suggesting that the system encompasses an element of predictability. This review provides a catalog of αβ TCR bias by disease and by species, and discusses the mechanisms that govern this inherent and widespread phenomenon.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            T-cell receptor repertoires share a restricted set of public and abundant CDR3 sequences that are associated with self-related immunity

            The T-cell receptor (TCR) repertoire is formed by random recombinations of genomic precursor elements; the resulting combinatorial diversity renders unlikely extensive TCR sharing between individuals. Here, we studied CDR3β amino acid sequence sharing in a repertoire-wide manner, using high-throughput TCR-seq in 28 healthy mice. We uncovered hundreds of public sequences shared by most mice. Public CDR3 sequences, relative to private sequences, are two orders of magnitude more abundant on average, express restricted V/J segments, and feature high convergent nucleic acid recombination. Functionally, public sequences are enriched for MHC-diverse CDR3 sequences that were previously associated with autoimmune, allograft, and tumor-related reactions, but not with anti-pathogen-related reactions. Public CDR3 sequences are shared between mice of different MHC haplotypes, but are associated with different, MHC-dependent, V genes. Thus, despite their random generation process, TCR repertoires express a degree of uniformity in their post-genomic organization. These results, together with numerical simulations of TCR genomic rearrangements, suggest that biases and convergence in TCR recombination combine with ongoing selection to generate a restricted subset of self-associated, public CDR3 TCR sequences, and invite reexamination of the basic mechanisms of T-cell repertoire formation.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Convergent recombination shapes the clonotypic landscape of the naive T-cell repertoire.

              Adaptive T-cell immunity relies on the recruitment of antigen-specific clonotypes, each defined by the expression of a distinct T-cell receptor (TCR), from an array of naïve T-cell precursors. Despite the enormous clonotypic diversity that resides within the naïve T-cell pool, interindividual sharing of TCR sequences has been observed within mobilized T-cell responses specific for certain peptide-major histocompatibility complex (pMHC) antigens. The mechanisms that underlie this phenomenon have not been fully elucidated, however. A mechanism of convergent recombination has been proposed to account for the occurrence of shared, or "public," TCRs in specific memory T-cell populations. According to this model, TCR sharing between individuals is directly related to TCR production frequency; this, in turn, is determined on a probabilistic basis by the relative generation efficiency of particular nucleotide and amino acid sequences during the recombination process. Here, we tested the key predictions of convergent recombination in a comprehensive evaluation of the naïve CD8(+) TCRβ repertoire in mice. Within defined segments of the naïve CD8(+) T-cell repertoire, TCRβ sequences with convergent features were (i) present at higher copy numbers within individual mice and (ii) shared between individual mice. Thus, the naïve CD8(+) T-cell repertoire is not flat, but comprises a hierarchy of recurrence rates for individual clonotypes that is determined by relative production frequencies. These findings provide a framework for understanding the early mobilization of public CD8(+) T-cell clonotypes, which can exert profound biological effects during acute infectious processes.
                Bookmark

                Author and article information

                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                01 April 2017
                05 January 2017
                05 January 2017
                : 33
                : 7
                : 951-955
                Affiliations
                [1 ]Division of Infection and Immunity
                [2 ]Department of Computer Science,
                [3 ]Complex, UCL, London, UK
                [4 ]Department of Immunology, Weizmann Institute, Rehovot, Israel
                Author notes
                [* ]To whom correspondence should be addressed.

                Associate Editor: Inanc Birol

                Article
                btw771
                10.1093/bioinformatics/btw771
                5860388
                28073756
                e5faae77-8e54-4d6a-96c4-c368133fd7ac
                © The Author 2017. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 15 September 2016
                : 07 November 2016
                : 7 December 2016
                Page count
                Pages: 5
                Funding
                Funded by: a studentship from Microsoft Research. This research was funded by studentships from the UK MRC and the EPSRC and supported by the National Institute for Health Research UCL Hospitals Biomedical Research Centre
                Funded by: Minerva Foundation with funding from the Federal German Ministry for Education and Research, the I-CORE Program of the Planning and Budgeting Committee and the Israel Science Foundation
                Categories
                Discovery Note
                Systems Biology

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article