4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      From sequence to function through structure: Deep learning for protein design

      review-article
      a , b , * , c , d , d , d , c , d , e , *
      Computational and Structural Biotechnology Journal
      Research Network of Computational and Structural Biotechnology
      ADMM, Alternating Direction Method of Multipliers, CNN, Convolutional Neural Network, DL, Deep learning, FNN, fully-connected neural network, GAN, Generative Adversarial Network, GCN, Graph Convolutional Network, GNN, Graph Neural Network, GO, Gene Ontology, GVP, Geometric Vector Perceptron, LSTM, Long-Short Term Memory, MLP, Multilayer Perceptron, MSA, Multiple Sequence Alignment, NLP, Natural Language Processing, NSR, Natural Sequence Recovery, pLM, protein Language Model, VAE, Variational Autoencoder, Protein design, Protein prediction, Drug discovery, Deep learning, Protein language models

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The process of designing biomolecules, in particular proteins, is witnessing a rapid change in available tooling and approaches, moving from design through physicochemical force fields, to producing plausible, complex sequences fast via end-to-end differentiable statistical models. To achieve conditional and controllable protein design, researchers at the interface of artificial intelligence and biology leverage advances in natural language processing (NLP) and computer vision techniques, coupled with advances in computing hardware to learn patterns from growing biological databases, curated annotations thereof, or both. Once learned, these patterns can be used to provide novel insights into mechanistic biology and the design of biomolecules. However, navigating and understanding the practical applications for the many recent protein design tools is complex. To facilitate this, we 1) document recent advances in deep learning (DL) assisted protein design from the last three years, 2) present a practical pipeline that allows to go from de novo-generated sequences to their predicted properties and web-powered visualization within minutes, and 3) leverage it to suggest a generated protein sequence which might be used to engineer a biosynthetic gene cluster to produce a molecular glue-like compound. Lastly, we discuss challenges and highlight opportunities for the protein design field.

          Related collections

          Most cited references98

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Highly accurate protein structure prediction with AlphaFold

          Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort 1 – 4 , the structures of around 100,000 unique proteins have been determined 5 , but this represents a small fraction of the billions of known protein sequences 6 , 7 . Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’ 8 —has been an important open research problem for more than 50 years 9 . Despite recent progress 10 – 14 , existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14) 15 , demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm. AlphaFold predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            UniProt: the universal protein knowledgebase in 2021

            (2020)
            Abstract The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              ColabFold: making protein folding accessible to all

              ColabFold offers accelerated prediction of protein structures and complexes by combining the fast homology search of MMseqs2 with AlphaFold2 or RoseTTAFold. ColabFold’s 40−60-fold faster search and optimized model utilization enables prediction of close to 1,000 structures per day on a server with one graphics processing unit. Coupled with Google Colaboratory, ColabFold becomes a free and accessible platform for protein folding. ColabFold is open-source software available at https://github.com/sokrypton/ColabFold and its novel environmental databases are available at https://colabfold.mmseqs.com . ColabFold is a free and accessible platform for protein folding that provides accelerated prediction of protein structures and complexes using AlphaFold2 or RoseTTAFold.
                Bookmark

                Author and article information

                Contributors
                Journal
                Comput Struct Biotechnol J
                Comput Struct Biotechnol J
                Computational and Structural Biotechnology Journal
                Research Network of Computational and Structural Biotechnology
                2001-0370
                19 November 2022
                2023
                19 November 2022
                : 21
                : 238-250
                Affiliations
                [a ]Institute of Informatics and Applications, University of Girona, Girona, Spain
                [b ]Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
                [c ]Department of Informatics, Bioinformatics & Computational Biology, Technische Universität München, 85748 Garching, Germany
                [d ]VantAI, 151 W 42nd Street, New York, NY 10036, United States
                [e ]NVIDIA DE GmbH, Einsteinstraße 172, 81677 München, Germany
                Author notes
                [* ]Corresponding authors at: Institute of Informatics and Applications, University of Girona, Girona, Spain (N. Ferruz). Department of Informatics, Bioinformatics & Computational Biology, Technische Universität München, 85748 Garching, Germany (C. Dallago). noelia.ferruz@ 123456udg.edu christian.dallago@ 123456tum.de
                Article
                S2001-0370(22)00508-6
                10.1016/j.csbj.2022.11.014
                9755234
                36544476
                f19ce987-5bbf-439a-8f74-b1488696bf79
                © 2022 The Authors

                This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

                History
                : 31 August 2022
                : 5 November 2022
                : 5 November 2022
                Categories
                Mini Review

                admm, alternating direction method of multipliers,cnn, convolutional neural network,dl, deep learning,fnn, fully-connected neural network,gan, generative adversarial network,gcn, graph convolutional network,gnn, graph neural network,go, gene ontology,gvp, geometric vector perceptron,lstm, long-short term memory,mlp, multilayer perceptron,msa, multiple sequence alignment,nlp, natural language processing,nsr, natural sequence recovery,plm, protein language model,vae, variational autoencoder,protein design,protein prediction,drug discovery,deep learning,protein language models

                Comments

                Comment on this article