17
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      cblaster: a remote search tool for rapid identification and visualization of homologous gene clusters

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation

          Genes involved in coordinated biological pathways, including metabolism, drug resistance and virulence, are often collocalized as gene clusters. Identifying homologous gene clusters aids in the study of their function and evolution, however, existing tools are limited to searching local sequence databases. Tools for remotely searching public databases are necessary to keep pace with the rapid growth of online genomic data.

          Results

          Here, we present cblaster, a Python-based tool to rapidly detect collocated genes in local and remote databases. cblaster is easy to use, offering both a command line and a user-friendly graphical user interface. It generates outputs that enable intuitive visualizations of large datasets and can be readily incorporated into larger bioinformatic pipelines. cblaster is a significant update to the comparative genomics toolbox.

          Availability and implementation

          cblaster source code and documentation is freely available from GitHub under the MIT license (github.com/gamcil/cblaster).

          Supplementary information

          Supplementary data are available at Bioinformatics Advances online.

          Related collections

          Most cited references50

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          SciPy 1.0: fundamental algorithms for scientific computing in Python

          SciPy is an open-source scientific computing library for the Python programming language. Since its initial release in 2001, SciPy has become a de facto standard for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per year. In this work, we provide an overview of the capabilities and development practices of SciPy 1.0 and highlight some recent technical developments.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            BLAST+: architecture and applications

            Background Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. Results We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. Conclusion The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Pfam: The protein families database in 2021

              Abstract The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new families have been added in Pfam 33.1 and numerous improvements have been made to existing entries. To facilitate research on COVID-19, we have revised the Pfam entries that cover the SARS-CoV-2 proteome, and built new entries for regions that were not covered by Pfam. We have reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family. The new Pfam-B is based on a clustering by the MMseqs2 software. We have compared all of the regions in the RepeatsDB to those in Pfam and have started to use the results to build and refine Pfam repeat families. Pfam is freely available for browsing and download at http://pfam.xfam.org/.
                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Bioinform Adv
                Bioinform Adv
                bioadv
                Bioinformatics Advances
                Oxford University Press
                2635-0041
                2021
                05 August 2021
                05 August 2021
                : 1
                : 1
                : vbab016
                Affiliations
                [1 ] School of Molecular Sciences, The University of Western Australia , Crawley, WA 6009, Australia
                [2 ] Bioinformatics Group, Wageningen University , Wageningen 6708PB, The Netherlands
                Author notes
                Author information
                https://orcid.org/0000-0001-7798-427X
                https://orcid.org/0000-0002-2191-2821
                https://orcid.org/0000-0001-7719-7524
                Article
                vbab016
                10.1093/bioadv/vbab016
                9710679
                36700093
                0fc085d7-5e3a-4325-8699-3a08c60d319b
                © The Author(s) 2021. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 07 July 2021
                : 28 July 2021
                : 29 July 2021
                : 03 August 2021
                : 23 August 2021
                Page count
                Pages: 10
                Funding
                Funded by: Australian Government Research Training Program PhD scholarship;
                Funded by: Australian Research Council Future Fellowship;
                Award ID: FT160100233
                Funded by: Cooperative Research Centres Projects scheme;
                Award ID: CRCPFIVE000119
                Categories
                Original Paper
                AcademicSubjects/SCI01060

                Comments

                Comment on this article