6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A subfamily classification to choreograph the diverse activities within glycoside hydrolase family 31

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The Carbohydrate-Active Enzyme classification groups enzymes that breakdown, assemble, or decorate glycans into protein families based on sequence similarity. The glycoside hydrolases (GH) are arranged into over 170 enzyme families, with some being very large and exhibiting distinct activities/specificities towards diverse substrates. Family GH31 is a large family that contains more than 20,000 sequences with a wide taxonomic diversity. Less than 1% of GH31 members are biochemically characterized and exhibit many different activities that include glycosidases, lyases, and transglycosidases. This diversity of activities limits our ability to predict the activities and roles of GH31 family members in their host organism and our ability to exploit these enzymes for practical purposes. Here, we established a subfamily classification using sequence similarity networks that was further validated by a structural analysis. While sequence similarity networks provide a sequence-based separation, we obtained good segregation between activities among the subfamilies. Our subclassification consists of 20 subfamilies with sixteen subfamilies containing at least one characterized member and eleven subfamilies that are monofunctional based on the available data. We also report the biochemical characterization of a member of the large subfamily 2 (GH31_2) that lacked any characterized members: RaGH31 from Rhodoferax aquaticus is an α-glucosidase with activity on a range of disaccharides including sucrose, trehalose, maltose, and nigerose. Our subclassification provides improved predictive power for the vast majority of uncharacterized proteins in family GH31 and highlights the remaining sequence space that remains to be functionally explored.

          Related collections

          Most cited references61

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability

          We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update. This report shows actual examples to explain how these features work, alone and in combination. Some examples incorrectly aligned by MAFFT are also shown to clarify its limitations. We discuss how to avoid misalignments, and our ongoing efforts to overcome such limitations.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Highly accurate protein structure prediction with AlphaFold

            Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort 1 – 4 , the structures of around 100,000 unique proteins have been determined 5 , but this represents a small fraction of the billions of known protein sequences 6 , 7 . Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’ 8 —has been an important open research problem for more than 50 years 9 . Despite recent progress 10 – 14 , existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14) 15 , demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm. AlphaFold predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies

              Motivation: Phylogenies are increasingly used in all fields of medical and biological research. Moreover, because of the next-generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under maximum likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. Results: I present some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees. In addition, an up-to-date 50-page user manual covering all new RAxML options is available. Availability and implementation: The code is available under GNU GPL at https://github.com/stamatak/standard-RAxML. Contact: alexandros.stamatakis@h-its.org Supplementary information: Supplementary data are available at Bioinformatics online.
                Bookmark

                Author and article information

                Contributors
                (View ORCID Profile)
                Journal
                Journal of Biological Chemistry
                Journal of Biological Chemistry
                Elsevier BV
                00219258
                April 2023
                April 2023
                : 299
                : 4
                : 103038
                Article
                10.1016/j.jbc.2023.103038
                10074150
                36806678
                0489ac48-8cf9-4728-9bbd-39835a7cebe6
                © 2023

                https://www.elsevier.com/tdm/userlicense/1.0/

                http://creativecommons.org/licenses/by/4.0/

                History

                Comments

                Comment on this article