16
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A Systematic Evaluation of High-Throughput Sequencing Approaches to Identify Low-Frequency Single Nucleotide Variants in Viral Populations

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          High-throughput sequencing such as those provided by Illumina are an efficient way to understand sequence variation within viral populations. However, challenges exist in distinguishing process-introduced error from biological variance, which significantly impacts our ability to identify sub-consensus single-nucleotide variants (SNVs). Here we have taken a systematic approach to evaluate laboratory and bioinformatic pipelines to accurately identify low-frequency SNVs in viral populations. Artificial DNA and RNA “populations” were created by introducing known SNVs at predetermined frequencies into template nucleic acid before being sequenced on an Illumina MiSeq platform. These were used to assess the effects of abundance and starting input material type, technical replicates, read length and quality, short-read aligner, and percentage frequency thresholds on the ability to accurately call variants. Analyses revealed that the abundance and type of input nucleic acid had the greatest impact on the accuracy of SNV calling as measured by a micro-averaged Matthews correlation coefficient score, with DNA and high RNA inputs (10 7 copies) allowing for variants to be called at a 0.2% frequency. Reduced input RNA (10 5 copies) required more technical replicates to maintain accuracy, while low RNA inputs (10 3 copies) suffered from consensus-level errors. Base errors identified at specific motifs identified in all technical replicates were also identified which can be excluded to further increase SNV calling accuracy. These findings indicate that samples with low RNA inputs should be excluded for SNV calling and reinforce the importance of optimising the technical and bioinformatics steps in pipelines that are used to accurately identify sequence variants.

          Related collections

          Most cited references38

          • Record: found
          • Abstract: found
          • Article: not found

          Fast gapped-read alignment with Bowtie 2.

          As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            HISAT: a fast spliced aligner with low memory requirements.

            HISAT (hierarchical indexing for spliced alignment of transcripts) is a highly efficient system for aligning reads from RNA sequencing experiments. HISAT uses an indexing scheme based on the Burrows-Wheeler transform and the Ferragina-Manzini (FM) index, employing two types of indexes for alignment: a whole-genome FM index to anchor each alignment and numerous local FM indexes for very rapid extensions of these alignments. HISAT's hierarchical index for the human genome contains 48,000 local FM indexes, each representing a genomic region of ∼64,000 bp. Tests on real and simulated data sets showed that HISAT is the fastest system currently available, with equal or better accuracy than any other method. Despite its large number of indexes, HISAT requires only 4.3 gigabytes of memory. HISAT supports genomes of any size, including those larger than 4 billion bases.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Quality control and preprocessing of metagenomic datasets

              Summary: Here, we present PRINSEQ for easy and rapid quality control and data preprocessing of genomic and metagenomic datasets. Summary statistics of FASTA (and QUAL) or FASTQ files are generated in tabular and graphical form and sequences can be filtered, reformatted and trimmed by a variety of options to improve downstream analysis. Availability and Implementation: This open-source application was implemented in Perl and can be used as a stand alone version or accessed online through a user-friendly web interface. The source code, user help and additional information are available at http://prinseq.sourceforge.net/. Contact: rschmied@sciences.sdsu.edu; redwards@cs.sdsu.edu
                Bookmark

                Author and article information

                Journal
                Viruses
                Viruses
                viruses
                Viruses
                MDPI
                1999-4915
                20 October 2020
                October 2020
                : 12
                : 10
                : 1187
                Affiliations
                [1 ]The Pirbright Institute, Woking, Surrey GU24 0NF, UK; dking1@ 123456dstl.gov.uk (D.J.K.); graham.freimanis@ 123456pirbright.ac.uk (G.F.); Lidia.Lasecka-Dykes@ 123456pirbright.ac.uk (L.L.-D.); amin.asfor@ 123456pirbright.ac.uk (A.A.); ryan.waters@ 123456pirbright.ac.uk (R.W.); donald.king@ 123456pirbright.ac.uk (D.P.K.)
                [2 ]Department of Microbial and Cellular Sciences, Faculty of Health and Medical Sciences, School of Biosciences and Medicine, University of Surrey, Guildford GU2 7XH, UK
                [3 ]Department of Pathology and Infectious Diseases, Faculty of Health and Medical sciences, School of Veterinary Medicine, University of Surrey, Guilford GU2 7XH, UK
                [4 ]Biomathematics and Statistics Scotland, Edinburgh, Midlothian EH9 3FD, UK; pribeca@ 123456bioss.ac.uk
                Author notes
                [* ]Correspondence: e.laing@ 123456surrey.ac.uk
                Author information
                https://orcid.org/0000-0003-0093-6549
                https://orcid.org/0000-0001-5599-3933
                https://orcid.org/0000-0002-6959-2708
                Article
                viruses-12-01187
                10.3390/v12101187
                7594041
                33092085
                3ee5885e-4db7-4d0a-b43f-4e2aedfcdc35
                © 2020 by the authors.

                Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

                History
                : 28 August 2020
                : 12 October 2020
                Categories
                Article

                Microbiology & Virology
                high-throughput sequencing,viral populations,sub-consensus variants,sequencing error

                Comments

                Comment on this article