9
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Sequencing DNA with nanopores: Troubles and biases

      research-article
      , *
      PLoS ONE
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Oxford Nanopore Technologies’ (ONT) long read sequencers offer access to longer DNA fragments than previous sequencer generations, at the cost of a higher error rate. While many papers have studied read correction methods, few have addressed the detailed characterization of observed errors, a task complicated by frequent changes in chemistry and software in ONT technology. The MinION sequencer is now more stable and this paper proposes an up-to-date view of its error landscape, using the most mature flowcell and basecaller. We studied Nanopore sequencing error biases on both bacterial and human DNA reads. We found that, although Nanopore sequencing is expected not to suffer from GC bias, it is a crucial parameter with respect to errors. In particular, low-GC reads have fewer errors than high-GC reads (about 6% and 8% respectively). The error profile for homopolymeric regions or regions with short repeats, the source of about half of all sequencing errors, also depends on the GC rate and mainly shows deletions, although there are some reads with long insertions. Another interesting finding is that the quality measure, although over-estimated, offers valuable information to predict the error rate as well as the abundance of reads. We supplemented this study with an analysis of a rapeseed RNA read set and shown a higher level of errors with a higher level of deletion in these data. Finally, we have implemented an open source pipeline for long-term monitoring of the error profile, which enables users to easily compute various analysis presented in this work, including for future developments of the sequencing device. Overall, we hope this work will provide a basis for the design of better error-correction methods.

          Related collections

          Most cited references33

          • Record: found
          • Abstract: found
          • Article: not found

          Minimap2: pairwise alignment for nucleotide sequences

          Heng Li (2018)
          Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 Mb in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            WebLogo: a sequence logo generator.

            WebLogo generates sequence logos, graphical representations of the patterns within a multiple sequence alignment. Sequence logos provide a richer and more precise description of sequence similarity than consensus sequences and can rapidly reveal significant features of the alignment otherwise difficult to perceive. Each logo consists of stacks of letters, one stack for each position in the sequence. The overall height of each stack indicates the sequence conservation at that position (measured in bits), whereas the height of symbols within the stack reflects the relative frequency of the corresponding amino or nucleic acid at that position. WebLogo has been enhanced recently with additional features and options, to provide a convenient and highly configurable sequence logo generator. A command line interface and the complete, open WebLogo source code are available for local installation and customization. Copyright 2004 Cold Spring Harbor Laboratory Press
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Performance of neural network basecalling tools for Oxford Nanopore sequencing

              Background Basecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here, we examine the performance of different basecalling tools, looking at accuracy at the level of bases within individual reads and at majority-rule consensus basecalls in an assembly. We also investigate some additional aspects of basecalling: training using a taxon-specific dataset, using a larger neural network model and improving consensus basecalls in an assembly by additional signal-level analysis with Nanopolish. Results Training basecallers on taxon-specific data results in a significant boost in consensus accuracy, mostly due to the reduction of errors in methylation motifs. A larger neural network is able to improve both read and consensus accuracy, but at a cost to speed. Improving consensus sequences (‘polishing’) with Nanopolish somewhat negates the accuracy differences in basecallers, but pre-polish accuracy does have an effect on post-polish accuracy. Conclusions Basecalling accuracy has seen significant improvements over the last 2 years. The current version of ONT’s Guppy basecaller performs well overall, with good accuracy and fast performance. If higher accuracy is required, users should consider producing a custom model using a larger neural network and/or training data from the same species. Electronic supplementary material The online version of this article (10.1186/s13059-019-1727-y) contains supplementary material, which is available to authorized users.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Formal analysisRole: InvestigationRole: MethodologyRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: InvestigationRole: MethodologyRole: Project administrationRole: SupervisionRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS One
                PLoS One
                plos
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                2021
                1 October 2021
                : 16
                : 10
                : e0257521
                Affiliations
                [001] Inria, CNRS, IRISA, Univ Rennes, Rennes, France
                Institute of Parasitology and Biomedicine, SPAIN
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                Author information
                https://orcid.org/0000-0002-7528-308X
                Article
                PONE-D-21-06109
                10.1371/journal.pone.0257521
                8486125
                34597327
                08ff9a54-3200-43d9-bbe7-50b2ef45ec90
                © 2021 Delahaye, Nicolas

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 23 February 2021
                : 6 September 2021
                Page count
                Figures: 14, Tables: 4, Pages: 29
                Funding
                The author(s) received no specific funding for this work.
                Categories
                Research Article
                Research and Analysis Methods
                Database and Informatics Methods
                Bioinformatics
                Sequence Analysis
                Sequence Alignment
                Physical Sciences
                Materials Science
                Material Properties
                Nanopores
                Biology and Life Sciences
                Microbiology
                Bacteriology
                Bacterial Genetics
                Bacterial Genomics
                Biology and Life Sciences
                Genetics
                Microbial Genetics
                Bacterial Genetics
                Bacterial Genomics
                Biology and Life Sciences
                Genetics
                Genomics
                Microbial Genomics
                Bacterial Genomics
                Biology and Life Sciences
                Microbiology
                Microbial Genomics
                Bacterial Genomics
                Physical Sciences
                Chemistry
                Chemical Reactions
                Methylation
                Biology and Life Sciences
                Genetics
                Genomics
                Physical Sciences
                Chemistry
                Polymer Chemistry
                Macromolecules
                Polymers
                Physical Sciences
                Materials Science
                Materials
                Polymers
                Physical Sciences
                Chemistry
                Polymer Chemistry
                Polymers
                Research and Analysis Methods
                Computational Techniques
                Split-Decomposition Method
                Multiple Alignment Calculation
                Biology and life sciences
                Molecular biology
                Molecular biology techniques
                Sequencing techniques
                RNA sequencing
                Research and analysis methods
                Molecular biology techniques
                Sequencing techniques
                RNA sequencing
                Custom metadata
                All relevant data are within the manuscript and its Supporting information files.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article

                scite_
                0
                0
                0
                0
                Smart Citations
                0
                0
                0
                0
                Citing PublicationsSupportingMentioningContrasting
                View Citations

                See how this article has been cited at scite.ai

                scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

                Similar content453

                Cited by181

                Most referenced authors821