6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Scalable and versatile container-based pipelines for de novo genome assembly and bacterial annotation.

      other

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background: Advancements in DNA sequencing technology have transformed the field of bacterial genomics, allowing for faster and more cost effective chromosome level assemblies compared to a decade ago. However, transforming raw reads into a complete genome model is a significant computational challenge due to the varying quality and quantity of data obtained from different sequencing instruments, as well as intrinsic characteristics of the genome and desired analyses. To address this issue, we have developed a set of container-based pipelines using Nextflow, offering both common workflows for inexperienced users and high levels of customization for experienced ones. Their processing strategies are adaptable based on the sequencing data type, and their modularity enables the incorporation of new components to address the community’s evolving needs.

          Methods: These pipelines consist of three parts: quality control, de novo genome assembly, and bacterial genome annotation. In particular, the genome annotation pipeline provides a comprehensive overview of the genome, including standard gene prediction and functional inference, as well as predictions relevant to clinical applications such as virulence and resistance gene annotation, secondary metabolite detection, prophage and plasmid prediction, and more.

          Results: The annotation results are presented in reports, genome browsers, and a web-based application that enables users to explore and interact with the genome annotation results.

          Conclusions: Overall, our user-friendly pipelines offer a seamless integration of computational tools to facilitate routine bacterial genomics research. The effectiveness of these is illustrated by examining the sequencing data of a clinical sample of Klebsiella pneumoniae.

          Related collections

          Most cited references71

          • Record: found
          • Abstract: found
          • Article: not found

          SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.

          The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            fastp: an ultra-fast all-in-one FASTQ preprocessor

            Abstract Motivation Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming and quality filtering. These tools are often insufficiently fast as most are developed using high-level programming languages (e.g. Python and Java) and provide limited multi-threading support. Reading and loading data multiple times also renders preprocessing slow and I/O inefficient. Results We developed fastp as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features. It can perform quality control, adapter trimming, quality filtering, per-read quality pruning and many other operations with a single scan of the FASTQ data. This tool is developed in C++ and has multi-threading support. Based on our evaluation, fastp is 2–5 times faster than other FASTQ preprocessing tools such as Trimmomatic or Cutadapt despite performing far more operations than similar tools. Availability and implementation The open-source code and corresponding instructions are available at https://github.com/OpenGene/fastp.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Prokka: rapid prokaryotic genome annotation.

              T Seemann (2014)
              The multiplex capability and high yield of current day DNA-sequencing instruments has made bacterial whole genome sequencing a routine affair. The subsequent de novo assembly of reads into contigs has been well addressed. The final step of annotating all relevant genomic features on those contigs can be achieved slowly using existing web- and email-based systems, but these are not applicable for sensitive data or integrating into computational pipelines. Here we introduce Prokka, a command line software tool to fully annotate a draft bacterial genome in about 10 min on a typical desktop computer. It produces standards-compliant output files for further analysis or viewing in genome browsers. Prokka is implemented in Perl and is freely available under an open source GPLv2 license from http://vicbioinformatics.com/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: SoftwareRole: VisualizationRole: Writing – Original Draft Preparation
                Role: ResourcesRole: Writing – Review & Editing
                Role: ConceptualizationRole: Funding AcquisitionRole: Project AdministrationRole: SupervisionRole: Writing – Review & Editing
                Journal
                F1000Res
                F1000Res
                F1000Research
                F1000 Research Limited (London, UK )
                2046-1402
                25 September 2023
                2023
                : 12
                : 1205
                Affiliations
                [1 ]Programa de Pós-graduação em Biologia Molecular, Universidade de Brasilia, Brasília, FD, 70910-900, Brazil
                [2 ]Departamento de Biologia Celular, Universidade de Brasília, Brasília, DF, 70910-900, Brazil
                [3 ]Programa de Pós-graduação em Biologia Microbiana, Universidade de Brasília, Brasília, DF, 70910-900, Brazil
                [1 ]One Codex, San Francisco, California, USA
                [1 ]Department of Biomedical Sciences, Stellenbosch University, Stellenbosch, Western Cape, South Africa
                [1 ]Innland Norway university of applied sciences, Hamar, Norway
                Author notes

                No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Author information
                https://orcid.org/0000-0002-6855-3379
                https://orcid.org/0000-0002-1100-976X
                Article
                10.12688/f1000research.139488.1
                10646344
                37970066
                e093c560-4313-43ef-9fee-66c01c2b1635
                Copyright: © 2023 Almeida FMd et al.

                This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 16 August 2023
                Funding
                Funded by: Grant by Fundação de Amparo à Pesquisa do Distrito Federal (FAP-DF)
                Award ID: 806/2019
                Funded by: Scholarship by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
                This work was funded in part by a scholarship by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) to FMA and by the grant number 806/2019 from Fundação de Amparo à Pesquisa do Distrito Federal (FAP-DF) to GPJ.
                The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Software Tool Article
                Articles

                bacterial genomics,pipelines,nextflow,antibiotic resistance,public health,virulence

                Comments

                Comment on this article