9
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Predicting changes in protein thermodynamic stability upon point mutation with deep 3D convolutional neural networks

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Predicting mutation-induced changes in protein thermodynamic stability (ΔΔG) is of great interest in protein engineering, variant interpretation, and protein biophysics. We introduce ThermoNet, a deep, 3D-convolutional neural network (3D-CNN) designed for structure-based prediction of ΔΔGs upon point mutation. To leverage the image-processing power inherent in CNNs, we treat protein structures as if they were multi-channel 3D images. In particular, the inputs to ThermoNet are uniformly constructed as multi-channel voxel grids based on biophysical properties derived from raw atom coordinates. We train and evaluate ThermoNet with a curated data set that accounts for protein homology and is balanced with direct and reverse mutations; this provides a framework for addressing biases that have likely influenced many previous ΔΔG prediction methods. ThermoNet demonstrates performance comparable to the best available methods on the widely used S sym test set. In addition, ThermoNet accurately predicts the effects of both stabilizing and destabilizing mutations, while most other methods exhibit a strong bias towards predicting destabilization. We further show that homology between S sym and widely used training sets like S2648 and VariBench has likely led to overestimated performance in previous studies. Finally, we demonstrate the practical utility of ThermoNet in predicting the ΔΔGs for two clinically relevant proteins, p53 and myoglobin, and for pathogenic and benign missense variants from ClinVar. Overall, our results suggest that 3D-CNNs can model the complex, non-linear interactions perturbed by mutations, directly from biophysical properties of atoms.

          Author summary

          The thermodynamic stability of a protein, usually represented as the Gibbs free energy for the biophysical process of protein folding (ΔG), is a fundamental thermodynamic quantity. Predicting mutation-induced changes in protein thermodynamic stability (ΔΔG) is of great interest in protein engineering, variant interpretation, and protein biophysics. However, predicting ΔΔGs in an accurate and unbiased manner has been a long-standing challenge in the field of computational biology. In this work, we introduce ThermoNet, a deep, 3D-convolutional neural network (3D-CNNs) designed for structure-based ΔΔG prediction. To leverage the image-processing power inherent in CNNs, we treat protein structures as if they were multi-channel 3D images. ThermoNet demonstrates performance comparable to the best available methods. In addition, ThermoNet accurately predicts the effects of both stabilizing and destabilizing mutations, while most other methods exhibit a strong bias towards predicting destabilization. We also demonstrate that the presence of homologous proteins in commonly used training and testing sets for ΔΔG prediction methods has likely influenced previous performance estimates. Finally, we highlight the practical utility of ThermoNet by applying it to predicting the ΔΔGs for two clinically relevant proteins, p53 and myoglobin, and for pathogenic and benign missense variants from ClinVar.

          Related collections

          Most cited references68

          • Record: found
          • Abstract: found
          • Article: not found

          Basic local alignment search tool.

          A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Deep learning.

            Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              A global reference for human genetic variation

              The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: Funding acquisitionRole: MethodologyRole: SoftwareRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: Data curationRole: Formal analysisRole: Writing – review & editing
                Role: Funding acquisitionRole: MethodologyRole: SupervisionRole: VisualizationRole: Writing – review & editing
                Role: ConceptualizationRole: Funding acquisitionRole: MethodologyRole: SupervisionRole: VisualizationRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS Comput Biol
                PLoS Comput Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, CA USA )
                1553-734X
                1553-7358
                30 November 2020
                November 2020
                : 16
                : 11
                : e1008291
                Affiliations
                [1 ] Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
                [2 ] Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
                [3 ] Department of Biological Sciences and Vanderbilt Genetics Institute, Vanderbilt University, Nashville, Tennessee, United States of America
                [4 ] Department of Computer Science, Yale University, New Haven, Connecticut, United States of America
                Universita degli Studi di Torino, ITALY
                Author notes

                The authors have declared that no competing interests exist.

                Author information
                https://orcid.org/0000-0002-6873-5279
                https://orcid.org/0000-0001-9743-1795
                https://orcid.org/0000-0002-9746-3719
                Article
                PCOMPBIOL-D-20-00330
                10.1371/journal.pcbi.1008291
                7728386
                33253214
                03eb7154-5057-43db-bd8c-e837253c996f
                © 2020 Li et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 2 March 2020
                : 26 August 2020
                Page count
                Figures: 5, Tables: 2, Pages: 24
                Funding
                MBG and YTY were supported by National Science Foundation award (NSF DBI1660648), JAC was supported by National Institute of Health awards (R35 GM127087 and R01 GM126249), BL was supported by an American Heart Association Postdoctoral Fellowship (20POST35220002). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Molecular Biology
                Macromolecular Structure Analysis
                Protein Structure
                Biology and Life Sciences
                Biochemistry
                Proteins
                Protein Structure
                Biology and Life Sciences
                Genetics
                Mutation
                Reverse Mutation
                Biology and Life Sciences
                Genetics
                Mutation
                Point Mutation
                Biology and Life Sciences
                Molecular Biology
                Macromolecular Structure Analysis
                Protein Structure
                Protein Structure Prediction
                Biology and Life Sciences
                Biochemistry
                Proteins
                Protein Structure
                Protein Structure Prediction
                Biology and Life Sciences
                Biophysics
                Physical Sciences
                Physics
                Biophysics
                Physical Sciences
                Physics
                Thermodynamics
                Biology and Life Sciences
                Molecular Biology
                Macromolecular Structure Analysis
                Protein Structure
                Protein Structure Determination
                Biology and Life Sciences
                Biochemistry
                Proteins
                Protein Structure
                Protein Structure Determination
                Biology and Life Sciences
                Molecular Biology
                Macromolecular Structure Analysis
                Protein Structure
                Protein Folding
                Biology and Life Sciences
                Biochemistry
                Proteins
                Protein Structure
                Protein Folding
                Custom metadata
                vor-update-to-uncorrected-proof
                2020-12-10
                All relevant data are within the manuscript, its Supporting information files and on https://github.com/gersteinlab/ThermoNet.

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article