20
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Machine Learning-Guided Protein Engineering

      review-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.

          Related collections

          Most cited references363

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Highly accurate protein structure prediction with AlphaFold

          Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort 1 – 4 , the structures of around 100,000 unique proteins have been determined 5 , but this represents a small fraction of the billions of known protein sequences 6 , 7 . Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’ 8 —has been an important open research problem for more than 50 years 9 . Despite recent progress 10 – 14 , existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14) 15 , demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm. AlphaFold predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture.
            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Deep Residual Learning for Image Recognition

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Long Short-Term Memory

              Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O(1). Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.
                Bookmark

                Author and article information

                Journal
                ACS Catal
                ACS Catal
                cs
                accacs
                ACS Catalysis
                American Chemical Society
                2155-5435
                13 October 2023
                03 November 2023
                : 13
                : 21
                : 13863-13895
                Affiliations
                [1 ]Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University , Kamenice 5, 625 00 Brno, Czech Republic
                [2 ]International Clinical Research Center, St. Anne’s University Hospital Brno , Pekarska 53, 656 91 Brno, Czech Republic
                [3 ]Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague , Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
                [4 ]Faculty of Electrical Engineering, Czech Technical University in Prague , Technicka 2, 166 27 Prague 6, Czech Republic
                [5 ]Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences , Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
                Author notes
                Author information
                https://orcid.org/0000-0002-9979-4159
                https://orcid.org/0009-0007-4783-6584
                https://orcid.org/0000-0002-7848-8216
                https://orcid.org/0000-0002-6940-3006
                https://orcid.org/0000-0003-3659-4819
                Article
                10.1021/acscatal.3c02743
                10629210
                37942269
                ddbbd5e4-25ce-412a-a604-5c7ec376bf95
                © 2023 The Authors. Published by American Chemical Society

                Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained ( https://creativecommons.org/licenses/by/4.0/).

                History
                : 15 June 2023
                : 20 September 2023
                Funding
                Funded by: Horizon 2020 Framework Programme, doi 10.13039/100010661;
                Award ID: 857560
                Funded by: European Regional Development Fund, doi 10.13039/501100008530;
                Award ID: CZ.02.1.01/0.0/0.0/15_003/0000468
                Funded by: Grantová Agentura Ceské Republiky, doi 10.13039/501100001824;
                Award ID: 21-11563M
                Funded by: Ministerstvo Školství, Mládeže a Telovýchovy, doi 10.13039/501100001823;
                Award ID: RI LM2023069
                Funded by: Ministerstvo Školství, Mládeže a Telovýchovy, doi 10.13039/501100001823;
                Award ID: LX22NPO5102
                Funded by: Ministerstvo Školství, Mládeže a Telovýchovy, doi 10.13039/501100001823;
                Award ID: LM2023055
                Funded by: Ministerstvo Školství, Mládeže a Telovýchovy, doi 10.13039/501100001823;
                Award ID: CZ.02.1.01/0.0/0.0/17_043/0009632
                Funded by: European Cooperation in Science and Technology, doi 10.13039/501100000921;
                Award ID: CA21162
                Funded by: HORIZON EUROPE Marie Sklodowska-Curie Actions, doi 10.13039/100018694;
                Award ID: 891397
                Funded by: Technology Agency of the Czech Republic, doi 10.13039/100014809;
                Award ID: TN02000122/001N
                Funded by: Technology Agency of the Czech Republic, doi 10.13039/100014809;
                Award ID: TN02000122
                Categories
                Perspective
                Custom metadata
                cs3c02743
                cs3c02743

                activity,artificial intelligence,biocatalysis,deep learning,protein design

                Comments

                Comment on this article