40
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Command-line annotation software tools have continuously gained popularity compared to centralized online services due to the worldwide increase of sequenced bacterial genomes. However, results of existing command-line software pipelines heavily depend on taxon-specific databases or sufficiently well annotated reference genomes. Here, we introduce Bakta, a new command-line software tool for the robust, taxon-independent, thorough and, nonetheless, fast annotation of bacterial genomes. Bakta conducts a comprehensive annotation workflow including the detection of small proteins taking into account replicon metadata. The annotation of coding sequences is accelerated via an alignment-free sequence identification approach that in addition facilitates the precise assignment of public database cross-references. Annotation results are exported in GFF3 and International Nucleotide Sequence Database Collaboration (INSDC)-compliant flat files, as well as comprehensive JSON files, facilitating automated downstream analysis. We compared Bakta to other rapid contemporary command-line annotation software tools in both targeted and taxonomically broad benchmarks including isolates and metagenomic-assembled genomes. We demonstrated that Bakta outperforms other tools in terms of functional annotations, the assignment of functional categories and database cross-references, whilst providing comparable wall-clock runtimes. Bakta is implemented in Python 3 and runs on MacOS and Linux systems. It is freely available under a GPLv3 license at https://github.com/oschwengers/bakta. An accompanying web version is available at https://bakta.computational.bio.

          Related collections

          Most cited references56

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          fastp: an ultra-fast all-in-one FASTQ preprocessor

          Abstract Motivation Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming and quality filtering. These tools are often insufficiently fast as most are developed using high-level programming languages (e.g. Python and Java) and provide limited multi-threading support. Reading and loading data multiple times also renders preprocessing slow and I/O inefficient. Results We developed fastp as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features. It can perform quality control, adapter trimming, quality filtering, per-read quality pruning and many other operations with a single scan of the FASTQ data. This tool is developed in C++ and has multi-threading support. Based on our evaluation, fastp is 2–5 times faster than other FASTQ preprocessing tools such as Trimmomatic or Cutadapt despite performing far more operations than similar tools. Availability and implementation The open-source code and corresponding instructions are available at https://github.com/OpenGene/fastp.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Prokka: rapid prokaryotic genome annotation.

            T Seemann (2014)
            The multiplex capability and high yield of current day DNA-sequencing instruments has made bacterial whole genome sequencing a routine affair. The subsequent de novo assembly of reads into contigs has been well addressed. The final step of annotating all relevant genomic features on those contigs can be achieved slowly using existing web- and email-based systems, but these are not applicable for sensitive data or integrating into computational pipelines. Here we introduce Prokka, a command line software tool to fully annotate a draft bacterial genome in about 10 min on a typical desktop computer. It produces standards-compliant output files for further analysis or viewing in genome browsers. Prokka is implemented in Perl and is freely available under an open source GPLv2 license from http://vicbioinformatics.com/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              BLAST+: architecture and applications

              Background Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. Results We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. Conclusion The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.
                Bookmark

                Author and article information

                Journal
                Microb Genom
                Microb Genom
                mgen
                mgen
                Microbial Genomics
                Microbiology Society
                2057-5858
                2021
                5 November 2021
                5 November 2021
                : 7
                : 11
                : 000685
                Affiliations
                [ 1] departmentBioinformatics and Systems Biology , Justus Liebig University Giessen , Giessen 35392, Germany
                Author notes
                *Correspondence: Oliver Schwengers, oliver.schwengers@ 123456cb.jlug.de
                Author information
                https://orcid.org/0000-0003-4216-2721
                https://orcid.org/0000-0002-9973-0374
                https://orcid.org/0000-0001-5130-546X
                https://orcid.org/0000-0002-9747-7096
                https://orcid.org/0000-0001-6455-3622
                https://orcid.org/0000-0002-7086-2568
                Article
                000685
                10.1099/mgen.0.000685
                8743544
                34739369
                3911ff70-06e5-453f-995d-8386b8b9114c
                © 2021 The Authors

                This is an open-access article distributed under the terms of the Creative Commons Attribution License.

                History
                : 01 July 2021
                : 08 September 2021
                Funding
                Funded by: BMBF
                Award ID: 031A533
                Award Recipient : AlexanderGoesmann
                Funded by: BMBF
                Award ID: 031L0209A
                Award Recipient : AlexanderGoesmann
                Categories
                Research Articles
                Functional Genomics and Microbe–Niche Interactions
                Custom metadata
                0

                bacteria,genome annotation,metagenome-assembled genomes,plasmids ,whole-genome sequencing

                Comments

                Comment on this article