cblaster: a remote search tool for rapid identification and visualization of homologous gene clusters

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Motivation

Genes involved in coordinated biological pathways, including metabolism, drug resistance and virulence, are often collocalized as gene clusters. Identifying homologous gene clusters aids in the study of their function and evolution, however, existing tools are limited to searching local sequence databases. Tools for remotely searching public databases are necessary to keep pace with the rapid growth of online genomic data.

Results

Here, we present cblaster, a Python-based tool to rapidly detect collocated genes in local and remote databases. cblaster is easy to use, offering both a command line and a user-friendly graphical user interface. It generates outputs that enable intuitive visualizations of large datasets and can be readily incorporated into larger bioinformatic pipelines. cblaster is a significant update to the comparative genomics toolbox.

Availability and implementation

cblaster source code and documentation is freely available from GitHub under the MIT license (github.com/gamcil/cblaster).

Supplementary information

Supplementary data are available at Bioinformatics Advances online.

Related collections

Most cited references 50

Record: found
Abstract: found
Article: found

Is Open Access

SciPy 1.0: fundamental algorithms for scientific computing in Python

Pauli Virtanen, Ralf Gommers, Travis E. Oliphant … (2020)

SciPy is an open-source scientific computing library for the Python programming language. Since its initial release in 2001, SciPy has become a de facto standard for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per year. In this work, we provide an overview of the capabilities and development practices of SciPy 1.0 and highlight some recent technical developments.

0 comments Cited 6040 times     Rated -3 of 5. – based on 1 reviews

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

BLAST+: architecture and applications

Christiam Camacho, George Coulouris, Vahram Avagyan … (2009)

Background Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. Results We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. Conclusion The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.

0 comments Cited 4378 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Pfam: The protein families database in 2021

Jaina Mistry, Sara Rocío Chuguransky, Lowri Williams … (2020)

Abstract The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new families have been added in Pfam 33.1 and numerous improvements have been made to existing entries. To facilitate research on COVID-19, we have revised the Pfam entries that cover the SARS-CoV-2 proteome, and built new entries for regions that were not covered by Pfam. We have reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family. The new Pfam-B is based on a clustering by the MMseqs2 software. We have compared all of the regions in the RepeatsDB to those in Pfam and have started to use the results to build and refine Pfam repeat families. Pfam is freely available for browsing and download at http://pfam.xfam.org/.

0 comments Cited 1681 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Aida Ouangraoua: Role: Associate Editor

Journal

Journal ID (nlm-ta): Bioinform Adv

Journal ID (iso-abbrev): Bioinform Adv

Journal ID (publisher-id): bioadv

Title: Bioinformatics Advances

Publisher: Oxford University Press

ISSN (Electronic): 2635-0041

Publication date Collection: 2021

Publication date (Electronic): 05 August 2021

Publication date PMC-release: 05 August 2021

Volume: 1

Issue: 1

Electronic Location Identifier: vbab016

Affiliations

[1 ] School of Molecular Sciences, The University of Western Australia , Crawley, WA 6009, Australia

[2 ] Bioinformatics Group, Wageningen University , Wageningen 6708PB, The Netherlands

Author notes

To whom correspondence should be addressed. cameron.gilchrist@ 123456research.uwa.edu.au or yitheng.chooi@ 123456uwa.edu.au or marnix.medema@ 123456wur.nl

Author information

Cameron L M Gilchrist https://orcid.org/0000-0001-7798-427X

Marnix H Medema https://orcid.org/0000-0002-2191-2821

Yit-Heng Chooi https://orcid.org/0000-0001-7719-7524

Article

Publisher ID: vbab016

DOI: 10.1093/bioadv/vbab016

PMC ID: 9710679

PubMed ID: 36700093

SO-VID: 0fc085d7-5e3a-4325-8699-3a08c60d319b

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 07 July 2021

Date revision received : 28 July 2021

Date: 29 July 2021

Date accepted : 03 August 2021

Date: 23 August 2021

Page count

Pages: 10

Funding

Funded by: Australian Government Research Training Program PhD scholarship;

Funded by: Australian Research Council Future Fellowship;

Award ID: FT160100233

Funded by: Cooperative Research Centres Projects scheme;

Award ID: CRCPFIVE000119

Comments

Comment on this article

scite_

Cited by 51

See all cited by

Most referenced authors 1,391

See all reference authors

- Version 1

cblaster: a remote search tool for rapid identification and visualization of homologous gene clusters

Read this article at

Abstract

Motivation

Results

Availability and implementation

Supplementary information

Related collections

Resource Identification

Most cited references 50

SciPy 1.0: fundamental algorithms for scientific computing in Python

BLAST+: architecture and applications

Pfam: The protein families database in 2021

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Comments

Comment on this article

Similar content 51

Cited by 51

Most referenced authors 1,391