8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Using QC-Blind for quality control and contamination screening of bacteria DNA sequencing data without reference genome

      Preprint
      , , , , ,
      bioRxiv

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          ABSTRACT

          Quality control in next generation sequencing has become increasingly important as the technique becomes widely used. Tools have been developed for filtering possible contaminants in the sequencing data of species with known reference genome. Unfortunately, reference genomes for all the species involved, including the contaminants, are required for these tools to work. This precludes many real-life samples that have no information about the complete genome of the target species, and are contaminated with unknown microbial species.

          In this work we propose QC-Blind, a novel quality control pipeline for removing contaminants without any use of reference genomes. The pipeline requires only very little information from the marker genes of the target species. The entire pipeline consists of unsupervised read assembly, contig binning, read clustering and marker gene assignment.

          When evaluated on in silico, ab initio and in vivo datasets, QC-Blind proved effective in removing unknown contaminants with high specificity and accuracy, while preserving most of the genomic information of the target bacterial species. Therefore, QC-Blind could serve well in situations where limited information is available for both target and contamination species.

          IMPORTANCE

          At present, many sequencing projects are still performed on potentially contaminated samples, which bring into question their accuracies. However, current reference-based quality control method are limited as they need either the genome of target species or contaminations. In this work we propose QC-Blind, a novel quality control pipeline for removing contaminants without any use of reference genomes. When evaluated on in silico, ab initio and in vivo datasets, QC-Blind proved effective in removing unknown contaminants with high specificity and accuracy, while preserving most of the genomic information of the target bacterial species. Therefore, QC-Blind is suitable for real-life samples where limited information is available for both target and contamination species.

          Related collections

          Author and article information

          Journal
          bioRxiv
          October 09 2018
          Article
          10.1101/438655
          8714ba83-0dc1-4f87-9e7a-7d6ababe39bb
          © 2018
          History

          Quantitative & Systems biology,Biophysics
          Quantitative & Systems biology, Biophysics

          Comments

          Comment on this article