2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Predicting the Sequence-Dependent Backbone Dynamics of Intrinsically Disordered Proteins

      research-article

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Dynamics is a crucial link between sequence and function for intrinsically disordered proteins (IDPs). NMR spin relaxation is a powerful technique for characterizing the sequence-dependent backbone dynamics of IDPs. Of particular interest is the 15N transverse relaxation rate ( R 2), which reports on slower dynamics (10s of ns up to 1 μs and beyond). NMR and molecular dynamics (MD) simulations have shown that local interactions and secondary structure formation slow down backbone dynamics and raise R 2. Elevated R 2 has been suggested to be indicators of propensities of membrane association, liquid-liquid phase separation, and other functional processes. Here we present a sequence-based method, SeqDYN, for predicting R 2 of IDPs. The R 2 value of a residue is expressed as the product of contributing factors from all residues, which attenuate with increasing sequence distance from the central residue. The mathematical model has 21 parameters, representing the correlation length (where the attenuation is at 50%) and the amplitudes of the contributing factors of the 20 types of amino acids. Training on a set of 45 IDPs reveals a correlation length of 5.6 residues, aromatic and long branched aliphatic amino acids and Arg as R 2 promotors whereas Gly and short polar amino acids as R 2 suppressors. The prediction accuracy of SeqDYN is competitive against that of recent MD simulations using IDP-specific force fields. For a structured protein, SeqDYN prediction represents R 2 in the unfolded state. SeqDYN is available as a web server at https://zhougroup-uic.github.io/SeqDYNidp/ for rapid R 2 prediction.

          Related collections

          Most cited references46

          • Record: found
          • Abstract: found
          • Article: not found

          Clustal W and Clustal X version 2.0.

          The Clustal W and Clustal X multiple sequence alignment programs have been completely rewritten in C++. This will facilitate the further development of the alignment algorithms in the future and has allowed proper porting of the programs to the latest versions of Linux, Macintosh and Windows operating systems. The programs can be run on-line from the EBI web server: http://www.ebi.ac.uk/tools/clustalw2. The source code and executables for Windows, Linux and Macintosh computers are available from the EBI ftp site ftp://ftp.ebi.ac.uk/pub/software/clustalw2/
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            APE: Analyses of Phylogenetics and Evolution in R language.

            Analysis of Phylogenetics and Evolution (APE) is a package written in the R language for use in molecular evolution and phylogenetics. APE provides both utility functions for reading and writing data and manipulating phylogenetic trees, as well as several advanced methods for phylogenetic and evolutionary analysis (e.g. comparative and population genetic methods). APE takes advantage of the many R functions for statistics and graphics, and also provides a flexible framework for developing and implementing further statistical methods for the analysis of evolutionary processes. The program is free and available from the official R package archive at http://cran.r-project.org/src/contrib/PACKAGES.html#ape. APE is licensed under the GNU General Public License.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences

              Learning biological properties from sequence data is a logical step toward generative and predictive artificial intelligence for biology. Here, we propose scaling a deep contextual language model with unsupervised learning to sequences spanning evolutionary diversity. We find that without prior knowledge, information emerges in the learned representations on fundamental properties of proteins such as secondary structure, contacts, and biological activity. We show the learned representations are useful across benchmarks for remote homology detection, prediction of secondary structure, long-range residue–residue contacts, and mutational effect. Unsupervised representation learning enables state-of-the-art supervised prediction of mutational effect and secondary structure and improves state-of-the-art features for long-range contact prediction. In the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised learning has led to major advances in representation learning and statistical generation. In the life sciences, the anticipated growth of sequencing promises unprecedented data on natural sequence diversity. Protein language modeling at the scale of evolution is a logical step toward predictive and generative artificial intelligence for biology. To this end, we use unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million protein sequences spanning evolutionary diversity. The resulting model contains information about biological properties in its representations. The representations are learned from sequence data alone. The learned representation space has a multiscale organization reflecting structure from the level of biochemical properties of amino acids to remote homology of proteins. Information about secondary and tertiary structure is encoded in the representations and can be identified by linear projections. Representation learning produces features that generalize across a range of applications, enabling state-of-the-art supervised prediction of mutational effect and secondary structure and improving state-of-the-art features for long-range contact prediction.
                Bookmark

                Author and article information

                Journal
                bioRxiv
                BIORXIV
                bioRxiv
                Cold Spring Harbor Laboratory
                03 February 2023
                : 2023.02.02.526886
                Affiliations
                [1 ]Department of Chemistry, University of Illinois at Chicago, Chicago, IL 60607, USA
                [2 ]Department of Physics, University of Illinois at Chicago, Chicago, IL 60607, USA
                Author notes
                [* ]Corresponding author. hzhou43@ 123456uic.edu
                Article
                10.1101/2023.02.02.526886
                9915584
                36778236
                6d2fd69f-bb65-4896-bd3f-1c7d59bd22a5

                This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.

                History
                Categories
                Article

                intrinsically disordered proteins,nmr spin relaxation,transverse relaxation rate,backbone dynamics,r2 prediction

                Comments

                Comment on this article