10
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      BepiPred‐3.0: Improved B‐cell epitope prediction using protein language models

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          B‐cell epitope prediction tools are of great medical and commercial interest due to their practical applications in vaccine development and disease diagnostics. The introduction of protein language models (LMs), trained on unprecedented large datasets of protein sequences and structures, tap into a powerful numeric representation that can be exploited to accurately predict local and global protein structural features from amino acid sequences only. In this paper, we present BepiPred‐3.0, a sequence‐based epitope prediction tool that, by exploiting LM embeddings, greatly improves the prediction accuracy for both linear and conformational epitope prediction on several independent test sets. Furthermore, by carefully selecting additional input variables and epitope residue annotation strategy, performance was further improved, thus achieving unprecedented predictive power. Our tool can predict epitopes across hundreds of sequences in minutes. It is freely available as a web server and a standalone package at https://services.healthtech.dtu.dk/service.php?BepiPred-3.0 with a user‐friendly interface to navigate the results.

          Related collections

          Most cited references33

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Highly accurate protein structure prediction with AlphaFold

          Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort 1 – 4 , the structures of around 100,000 unique proteins have been determined 5 , but this represents a small fraction of the billions of known protein sequences 6 , 7 . Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’ 8 —has been an important open research problem for more than 50 years 9 . Despite recent progress 10 – 14 , existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14) 15 , demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm. AlphaFold predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            MUSCLE: multiple sequence alignment with high accuracy and high throughput.

            We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The Protein Data Bank.

              The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.
                Bookmark

                Author and article information

                Contributors
                cliffordjoakim@gmail.com
                Journal
                Protein Sci
                Protein Sci
                10.1002/(ISSN)1469-896X
                PRO
                Protein Science : A Publication of the Protein Society
                John Wiley & Sons, Inc. (Hoboken, USA )
                0961-8368
                1469-896X
                December 2022
                December 2022
                : 31
                : 12 ( doiID: 10.1002/pro.v31.12 )
                : e4497
                Affiliations
                [ 1 ] Department of Health Technology Technical University of Denmark Kongens Lyngby Denmark
                [ 2 ] La Jolla Institute for Immunology La Jolla California USA
                Author notes
                [*] [* ] Correspondence

                Joakim Nøddeskov Clifford, Department of Health Technology, Technical University of Denmark, Kongens Lyngby 2800, Denmark.

                Email: cliffordjoakim@ 123456gmail.com

                Author information
                https://orcid.org/0000-0002-8126-9209
                Article
                PRO4497
                10.1002/pro.4497
                9679979
                36366745
                0fb9c5be-2a45-4276-9c5c-7a339eeb1238
                © 2022 The Authors. Protein Science published by Wiley Periodicals LLC on behalf of The Protein Society.

                This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.

                History
                : 31 October 2022
                : 28 July 2022
                : 01 November 2022
                Page count
                Figures: 3, Tables: 6, Pages: 11, Words: 6726
                Categories
                Tools for Protein Science
                Tools for Protein Science
                Custom metadata
                2.0
                December 2022
                Converter:WILEY_ML3GV2_TO_JATSPMC version:6.2.1 mode:remove_FC converted:22.11.2022

                Biochemistry
                bepipred‐3.0,bepipred,b‐cell epitope prediction,protein language model,machine learning,deep learning,immunology,b‐cell epitopes,bioinformatics,immunoinformatics

                Comments

                Comment on this article