19
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Performance of neural network basecalling tools for Oxford Nanopore sequencing

      research-article
      1 , , 1 , 1 , 2
      Genome Biology
      BioMed Central
      Oxford Nanopore, Basecalling, Long-read sequencing

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Basecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here, we examine the performance of different basecalling tools, looking at accuracy at the level of bases within individual reads and at majority-rule consensus basecalls in an assembly. We also investigate some additional aspects of basecalling: training using a taxon-specific dataset, using a larger neural network model and improving consensus basecalls in an assembly by additional signal-level analysis with Nanopolish.

          Results

          Training basecallers on taxon-specific data results in a significant boost in consensus accuracy, mostly due to the reduction of errors in methylation motifs. A larger neural network is able to improve both read and consensus accuracy, but at a cost to speed. Improving consensus sequences (‘polishing’) with Nanopolish somewhat negates the accuracy differences in basecallers, but pre-polish accuracy does have an effect on post-polish accuracy.

          Conclusions

          Basecalling accuracy has seen significant improvements over the last 2 years. The current version of ONT’s Guppy basecaller performs well overall, with good accuracy and fast performance. If higher accuracy is required, users should consider producing a custom model using a larger neural network and/or training data from the same species.

          Electronic supplementary material

          The online version of this article (10.1186/s13059-019-1727-y) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references19

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Completing bacterial genome assemblies with multiplex MinION sequencing

          Illumina sequencing platforms have enabled widespread bacterial whole genome sequencing. While Illumina data is appropriate for many analyses, its short read length limits its ability to resolve genomic structure. This has major implications for tracking the spread of mobile genetic elements, including those which carry antimicrobial resistance determinants. Fully resolving a bacterial genome requires long-read sequencing such as those generated by Oxford Nanopore Technologies (ONT) platforms. Here we describe our use of the ONT MinION to sequence 12 isolates of Klebsiella pneumoniae on a single flow cell. We assembled each genome using a combination of ONT reads and previously available Illumina reads, and little to no manual intervention was needed to achieve fully resolved assemblies using the Unicycler hybrid assembler. Assembling only ONT reads with Canu was less effective, resulting in fewer resolved genomes and higher error rates even following error correction with Nanopolish. We demonstrate that multiplexed ONT sequencing is a valuable tool for high-throughput bacterial genome finishing. Specifically, we advocate the use of Illumina sequencing as a first analysis step, followed by ONT reads as needed to resolve genomic structure.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Messenger RNA modifications: Form, distribution, and function.

            RNA contains more than 100 distinct modifications that promote the functions of stable noncoding RNAs in translation and splicing. Recent technical advances have revealed widespread and sparse modification of messenger RNAs with N(6)-methyladenosine (m(6)A), 5-methylcytosine (m(5)C), and pseudouridine (Ψ). Here we discuss the rapidly evolving understanding of the location, regulation, and function of these dynamic mRNA marks, collectively termed the epitranscriptome. We highlight differences among modifications and between species that could instruct ongoing efforts to understand how specific mRNA target sites are selected and how their modification is regulated. Diverse molecular consequences of individual m(6)A modifications are beginning to be revealed, but the effects of m(5)C and Ψ remain largely unknown. Future work linking molecular effects to organismal phenotypes will broaden our understanding of mRNA modifications as cell and developmental regulators.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              SKESA: strategic k-mer extension for scrupulous assemblies

              SKESA is a DeBruijn graph-based de-novo assembler designed for assembling reads of microbial genomes sequenced using Illumina. Comparison with SPAdes and MegaHit shows that SKESA produces assemblies that have high sequence quality and contiguity, handles low-level contamination in reads, is fast, and produces an identical assembly for the same input when assembled multiple times with the same or different compute resources. SKESA has been used for assembling over 272,000 read sets in the Sequence Read Archive at NCBI and for real-time pathogen detection. Source code for SKESA is freely available at https://github.com/ncbi/SKESA/releases. Electronic supplementary material The online version of this article (10.1186/s13059-018-1540-z) contains supplementary material, which is available to authorized users.
                Bookmark

                Author and article information

                Contributors
                rrwick@gmail.com
                louise.judd@monash.edu
                kathryn.holt@monash.edu
                Journal
                Genome Biol
                Genome Biol
                Genome Biology
                BioMed Central (London )
                1474-7596
                1474-760X
                24 June 2019
                24 June 2019
                2019
                : 20
                : 129
                Affiliations
                [1 ]ISNI 0000 0004 1936 7857, GRID grid.1002.3, Department of Infectious Diseases, Central Clinical School, , Monash University, ; Melbourne, 3004 Australia
                [2 ]ISNI 0000 0004 0425 469X, GRID grid.8991.9, London School of Hygiene & Tropical Medicine, ; London, WC1E 7HT UK
                Author information
                http://orcid.org/0000-0001-8349-0778
                Article
                1727
                10.1186/s13059-019-1727-y
                6591954
                31234903
                b545d979-a28a-4f68-b8a6-53c7ba3c62ef
                © The Author(s) 2019

                Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 5 March 2019
                : 30 May 2019
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100000865, Bill and Melinda Gates Foundation;
                Award ID: OPP1175797
                Funded by: FundRef http://dx.doi.org/10.13039/100008717, Sylvia and Charles Viertel Charitable Foundation;
                Funded by: Australian Government Research Training Program Scholarship
                Categories
                Research
                Custom metadata
                © The Author(s) 2019

                Genetics
                oxford nanopore,basecalling,long-read sequencing
                Genetics
                oxford nanopore, basecalling, long-read sequencing

                Comments

                Comment on this article