4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      SulfAtlas, the sulfatase database: state of the art and new developments

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          SulfAtlas ( https://sulfatlas.sb-roscoff.fr/) is a knowledge-based resource dedicated to a sequence-based classification of sulfatases. Currently four sulfatase families exist (S1–S4) and the largest family (S1, formylglycine-dependent sulfatases) is divided into subfamilies by a phylogenetic approach, each subfamily corresponding to either a single characterized specificity (or few specificities in some cases) or to unknown substrates. Sequences are linked to their biochemical and structural information according to an expert scrutiny of the available literature. Database browsing was initially made possible both through a keyword search engine and a specific sequence similarity (BLAST) server. In this article, we will briefly summarize the experimental progresses in the sulfatase field in the last 6 years. To improve and speed up the (sub)family assignment of sulfatases in (meta)genomic data, we have developed a new, freely-accessible search engine using Hidden Markov model (HMM) for each (sub)family. This new tool (SulfAtlas HMM) is also a key part of the internal pipeline used to regularly update the database. SulfAtlas resource has indeed significantly grown since its creation in 2016, from 4550 sequences to 162 430 sequences in August 2022.

          Related collections

          Most cited references47

          • Record: found
          • Abstract: found
          • Article: not found

          Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

          S Altschul (1997)
          The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            MEGA6: Molecular Evolutionary Genetics Analysis version 6.0.

            We announce the release of an advanced version of the Molecular Evolutionary Genetics Analysis (MEGA) software, which currently contains facilities for building sequence alignments, inferring phylogenetic histories, and conducting molecular evolutionary analysis. In version 6.0, MEGA now enables the inference of timetrees, as it implements the RelTime method for estimating divergence times for all branching points in a phylogeny. A new Timetree Wizard in MEGA6 facilitates this timetree inference by providing a graphical user interface (GUI) to specify the phylogeny and calibration constraints step-by-step. This version also contains enhanced algorithms to search for the optimal trees under evolutionary criteria and implements a more advanced memory management that can double the size of sequence data sets to which MEGA can be applied. Both GUI and command-line versions of MEGA6 can be downloaded from www.megasoftware.net free of charge.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              UniProt: the universal protein knowledgebase in 2021

              (2020)
              Abstract The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
                Bookmark

                Author and article information

                Contributors
                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                06 January 2023
                01 November 2022
                01 November 2022
                : 51
                : D1
                : D647-D653
                Affiliations
                LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, CNRS, Université d'Évry, Université Paris-Saclay , 91057, Evry, Ile-de-France, France
                Sorbonne Université, CNRS, Laboratory of Integrative Biology of Marine Models (LBI2M), Station Biologique de Roscoff (SBR) , 29680 Roscoff, Bretagne, France
                Sorbonne Université, CNRS, FR2424, ABiMS, Station Biologique de Roscoff , 29680, Roscoff, Bretagne, France
                Sorbonne Université, CNRS, FR2424, ABiMS, Station Biologique de Roscoff , 29680, Roscoff, Bretagne, France
                Sorbonne Université, CNRS, FR2424, ABiMS, Station Biologique de Roscoff , 29680, Roscoff, Bretagne, France
                Sorbonne Université, CNRS, Laboratory of Integrative Biology of Marine Models (LBI2M), Station Biologique de Roscoff (SBR) , 29680 Roscoff, Bretagne, France
                Sorbonne Université, CNRS, Laboratory of Integrative Biology of Marine Models (LBI2M), Station Biologique de Roscoff (SBR) , 29680 Roscoff, Bretagne, France
                Author notes
                To whom correspondence should be addressed. Tel: +33 298 29 23 30; Fax: +33 298 29 23 24; Email: gurvan@ 123456sb-roscoff.fr
                Correspondence may also be addressed to Tristan Barbeyron. Tel: +33 298 29 23 30; Fax: +33 298 29 23 24; Email: tristan.barbeyron@ 123456sb-roscoff.fr

                The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.

                Author information
                https://orcid.org/0000-0001-6311-9752
                https://orcid.org/0000-0002-3009-6205
                Article
                gkac977
                10.1093/nar/gkac977
                9825549
                36318251
                1590411d-f620-41b5-9376-8b5584771755
                © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License ( https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@ 123456oup.com

                History
                : 17 October 2022
                : 14 October 2022
                : 02 September 2022
                Page count
                Pages: 7
                Funding
                Funded by: Agence National de la Recherche;
                Award ID: ANR-14-CE19-0020
                Funded by: ANR, DOI 10.13039/501100001665;
                Award ID: ANR-10-BTBR-04
                Funded by: Institut Français de Bioinformatique, DOI 10.13039/100016842;
                Award ID: ANR-11-INBS-0013
                Categories
                AcademicSubjects/SCI00010
                Database Issue

                Genetics
                Genetics

                Comments

                Comment on this article