157
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      dupRadar: a Bioconductor package for the assessment of PCR artifacts in RNA-Seq data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          PCR clonal artefacts originating from NGS library preparation can affect both genomic as well as RNA-Seq applications when protocols are pushed to their limits. In RNA-Seq however the artifactual reads are not easy to tell apart from normal read duplication due to natural over-sequencing of highly expressed genes. Especially when working with little input material or single cells assessing the fraction of duplicate reads is an important quality control step for NGS data sets. Up to now there are only tools to calculate the global duplication rates that do not take into account the effect of gene expression levels which leaves them of limited use for RNA-Seq data.

          Results

          Here we present the tool dupRadar, which provides an easy means to distinguish the fraction of reads originating in natural duplication due to high expression from the fraction induced by artefacts. dupRadar assesses the fraction of duplicate reads per gene dependent on the expression level. Apart from the Bioconductor package dupRadar we provide shell scripts for easy integration into processing pipelines.

          Conclusions

          The Bioconductor package dupRadar offers straight-forward methods to assess RNA-Seq datasets for quality issues with PCR duplicates. It is aimed towards simple integration into standard analysis pipelines as a default QC metric that is especially useful for low-input and single cell RNA-Seq data sets.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s12859-016-1276-2) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references9

          • Record: found
          • Abstract: found
          • Article: not found

          Quantitative single-cell RNA-seq with unique molecular identifiers.

          Single-cell RNA sequencing (RNA-seq) is a powerful tool to reveal cellular heterogeneity, discover new cell types and characterize tumor microevolution. However, losses in cDNA synthesis and bias in cDNA amplification lead to severe quantitative errors. We show that molecular labels--random sequences that label individual molecules--can nearly eliminate amplification noise, and that microfluidic sample preparation and optimized reagents produce a fivefold improvement in mRNA capture efficiency.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Streaming fragment assignment for real-time analysis of sequencing experiments

            We present eXpress, a software package for highly efficient probabilistic assignment of ambiguously mapping sequenced fragments. eXpress uses a streaming algorithm with linear run time and constant memory use. It can determine abundances of sequenced molecules in real time, and can be applied to ChIP-seq, metagenomics and other large-scale sequencing data. We demonstrate its use on RNA-seq data, showing greater efficiency than other quantification methods.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Systematic evaluation of spliced alignment programs for RNA-seq data

              High-throughput RNA sequencing is an increasingly accessible method for studying gene structure and activity on a genome-wide scale. A critical step in RNA-seq data analysis is the alignment of partial transcript reads to a reference genome sequence. to assess the performance of current mapping software, we invited developers of RNA-seq aligners to process four large human and mouse RNA-seq data sets. in total, we compared 26 mapping protocols based on 11 programs and pipelines and found major performance differences between methods on numerous benchmarks, including alignment yield, basewise accuracy, mismatch and gap placement, exon junction discovery and suitability of alignments for transcript reconstruction. We observed concordant results on real and simulated RNA-seq data, confirming the relevance of the metrics employed. Future developments in RNA-seq alignment methods would benefit from improved placement of multimapped reads, balanced utilization of existing gene annotation and a reduced false discovery rate for splice junctions.
                Bookmark

                Author and article information

                Contributors
                sergisayolspuig@imb-mainz.de
                holger.klein@boehringer-ingelheim.com
                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                21 October 2016
                21 October 2016
                2016
                : 17
                : 428
                Affiliations
                [1 ]Bioinformatics Core Facility, Institute of Molecular Biology, Ackermannweg 4, 55128 Mainz, Germany
                [2 ]Technische Hochschule Bingen, Berlinstraße 109, Bingen am Rhein, 55411 Germany
                [3 ]Target Discovery Research, Boehringer Ingelheim Pharma GmbH & Co KG, Birkendorferstraße 67, 88397 Biberach an der Riß, Germany
                Article
                1276
                10.1186/s12859-016-1276-2
                5073875
                27769170
                4b5094c1-f58d-4747-a6d2-56795e2f517c
                © The Author(s). 2016

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 16 January 2016
                : 21 September 2016
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100008349, Boehringer Ingelheim;
                Categories
                Software
                Custom metadata
                © The Author(s) 2016

                Bioinformatics & Computational biology
                rna-seq,pcr artefacts,duplication rate,single cell rna-seq,bioconductor,quality control tool

                Comments

                Comment on this article