12
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      BepiPred-3.0: Improved B-cell epitope prediction using protein language models.

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          B-cell epitope prediction tools are of great medical and commercial interest due to their practical applications in vaccine development and disease diagnostics. The introduction of protein language models (LMs), trained on unprecedented large datasets of protein sequences and structures, tap into a powerful numeric representation that can be exploited to accurately predict local and global protein structural features from amino acid sequences only. In this paper, we present BepiPred-3.0, a sequence-based epitope prediction tool that, by exploiting LM embeddings, greatly improves the prediction accuracy for both linear and conformational epitope prediction on several independent test sets. Furthermore, by carefully selecting additional input variables and epitope residue annotation strategy, performance was further improved, thus achieving unprecedented predictive power. Our tool can predict epitopes across hundreds of sequences in minutes. It is freely available as a web server and a standalone package at https://services.healthtech.dtu.dk/service.php?BepiPred-3.0 with a user-friendly interface to navigate the results.

          Related collections

          Most cited references33

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Highly accurate protein structure prediction with AlphaFold

          Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort 1 – 4 , the structures of around 100,000 unique proteins have been determined 5 , but this represents a small fraction of the billions of known protein sequences 6 , 7 . Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’ 8 —has been an important open research problem for more than 50 years 9 . Despite recent progress 10 – 14 , existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14) 15 , demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm. AlphaFold predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            MUSCLE: multiple sequence alignment with high accuracy and high throughput.

            We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The Protein Data Bank.

              The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.
                Bookmark

                Author and article information

                Journal
                Protein Sci
                Protein science : a publication of the Protein Society
                Wiley
                1469-896X
                0961-8368
                Dec 2022
                : 31
                : 12
                Affiliations
                [1 ] Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark.
                [2 ] La Jolla Institute for Immunology, La Jolla, California, USA.
                Article
                10.1002/pro.4497
                9679979
                36366745
                0fb9c5be-2a45-4276-9c5c-7a339eeb1238
                History

                B-cell epitope prediction,BepiPred,BepiPred-3.0,immunology,B-cell epitopes,deep learning,immunoinformatics,machine learning,protein language model,bioinformatics

                Comments

                Comment on this article