12
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Elimination of PCR duplicates in RNA-seq and small RNA-seq using unique molecular identifiers

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          RNA-seq and small RNA-seq are powerful, quantitative tools to study gene regulation and function. Common high-throughput sequencing methods rely on polymerase chain reaction (PCR) to expand the starting material, but not every molecule amplifies equally, causing some to be overrepresented. Unique molecular identifiers (UMIs) can be used to distinguish undesirable PCR duplicates derived from a single molecule and identical but biologically meaningful reads from different molecules.

          Results

          We have incorporated UMIs into RNA-seq and small RNA-seq protocols and developed tools to analyze the resulting data. Our UMIs contain stretches of random nucleotides whose lengths sufficiently capture diverse molecule species in both RNA-seq and small RNA-seq libraries generated from mouse testis. Our approach yields high-quality data while allowing unique tagging of all molecules in high-depth libraries.

          Conclusions

          Using simulated and real datasets, we demonstrate that our methods increase the reproducibility of RNA-seq and small RNA-seq data. Notably, we find that the amount of starting material and sequencing depth, but not the number of PCR cycles, determine PCR duplicate frequency. Finally, we show that computational removal of PCR duplicates based only on their mapping coordinates introduces substantial bias into data analysis.

          Electronic supplementary material

          The online version of this article (10.1186/s12864-018-4933-1) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references37

          • Record: found
          • Abstract: found
          • Article: not found

          Comparative Analysis of Single-Cell RNA Sequencing Methods.

          Single-cell RNA sequencing (scRNA-seq) offers new possibilities to address biological and medical questions. However, systematic comparisons of the performance of diverse scRNA-seq protocols are lacking. We generated data from 583 mouse embryonic stem cells to evaluate six prominent scRNA-seq methods: CEL-seq2, Drop-seq, MARS-seq, SCRB-seq, Smart-seq, and Smart-seq2. While Smart-seq2 detected the most genes per cell and across cells, CEL-seq2, Drop-seq, MARS-seq, and SCRB-seq quantified mRNA levels with less amplification noise due to the use of unique molecular identifiers (UMIs). Power simulations at different sequencing depths showed that Drop-seq is more cost-efficient for transcriptome quantification of large numbers of cells, while MARS-seq, SCRB-seq, and Smart-seq2 are more efficient when analyzing fewer cells. Our quantitative comparison offers the basis for an informed choice among six prominent scRNA-seq methods, and it provides a framework for benchmarking further improvements of scRNA-seq protocols.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Genome-wide analysis of mammalian promoter architecture and evolution.

            Mammalian promoters can be separated into two classes, conserved TATA box-enriched promoters, which initiate at a well-defined site, and more plastic, broad and evolvable CpG-rich promoters. We have sequenced tags corresponding to several hundred thousand transcription start sites (TSSs) in the mouse and human genomes, allowing precise analysis of the sequence architecture and evolution of distinct promoter classes. Different tissues and families of genes differentially use distinct types of promoters. Our tagging methods allow quantitative analysis of promoter usage in different tissues and show that differentially regulated alternative TSSs are a common feature in protein-coding genes and commonly generate alternative N termini. Among the TSSs, we identified new start sites associated with the majority of exons and with 3' UTRs. These data permit genome-scale identification of tissue-specific promoters and analysis of the cis-acting elements associated with them.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Quantitative single-cell RNA-seq with unique molecular identifiers.

              Single-cell RNA sequencing (RNA-seq) is a powerful tool to reveal cellular heterogeneity, discover new cell types and characterize tumor microevolution. However, losses in cDNA synthesis and bias in cDNA amplification lead to severe quantitative errors. We show that molecular labels--random sequences that label individual molecules--can nearly eliminate amplification noise, and that microfluidic sample preparation and optimized reagents produce a fivefold improvement in mRNA capture efficiency.
                Bookmark

                Author and article information

                Contributors
                yfu@bu.edu
                Pei-Hsuan.Wu@umassmed.edu
                Timothy.Beane@umassmed.edu
                phillip.zamore@umassmed.edu
                zhiping.weng@umassmed.edu
                Journal
                BMC Genomics
                BMC Genomics
                BMC Genomics
                BioMed Central (London )
                1471-2164
                13 July 2018
                13 July 2018
                2018
                : 19
                : 531
                Affiliations
                [1 ]ISNI 0000 0004 1936 7558, GRID grid.189504.1, Bioinformatics Program, , Boston University, ; 44 Cummington Mall, Boston, MA 02215 USA
                [2 ]ISNI 0000 0001 0742 0364, GRID grid.168645.8, Program in Bioinformatics and Integrative Biology, , University of Massachusetts Medical School, ; 368 Plantation Street, Worcester, MA 01605 USA
                [3 ]ISNI 0000 0001 0742 0364, GRID grid.168645.8, RNA Therapeutics Institute and Howard Hughes Medical Institute, , University of Massachusetts Medical School, ; 368 Plantation Street, Worcester, MA 01605 USA
                [4 ]ISNI 0000 0001 0742 0364, GRID grid.168645.8, Department of Biochemistry and Molecular Pharmacology, , University of Massachusetts Medical School, ; 368 Plantation Street, Worcester, MA 01605 USA
                Author information
                http://orcid.org/0000-0002-4505-9618
                Article
                4933
                10.1186/s12864-018-4933-1
                6044086
                30001700
                b0077ea4-7ff2-414b-9d55-1066552c60bd
                © The Author(s). 2018

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 9 March 2018
                : 8 July 2018
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100000011, Howard Hughes Medical Institute;
                Funded by: FundRef http://dx.doi.org/10.13039/100000057, National Institute of General Medical Sciences;
                Award ID: R37GM062862
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100009633, Eunice Kennedy Shriver National Institute of Child Health and Human Development;
                Award ID: HD078253
                Award Recipient :
                Categories
                Methodology Article
                Custom metadata
                © The Author(s) 2018

                Genetics
                rna-seq,small rna-seq,unique molecular identifier,umi,pcr duplicates,pcr cycle,starting material,sequencing depth,transcriptome,ribognome

                Comments

                Comment on this article