21
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Alevin efficiently estimates accurate gene abundances from dscRNA-seq data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We introduce alevin, a fast end-to-end pipeline to process droplet-based single-cell RNA sequencing data, performing cell barcode detection, read mapping, unique molecular identifier (UMI) deduplication, gene count estimation, and cell barcode whitelisting. Alevin’s approach to UMI deduplication considers transcript-level constraints on the molecules from which UMIs may have arisen and accounts for both gene-unique reads and reads that multimap between genes. This addresses the inherent bias in existing tools which discard gene-ambiguous reads and improves the accuracy of gene abundance estimates. Alevin is considerably faster, typically eight times, than existing gene quantification approaches, while also using less memory.

          Electronic supplementary material

          The online version of this article (10.1186/s13059-019-1670-y) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references39

          • Record: found
          • Abstract: found
          • Article: not found

          STAR: ultrafast universal RNA-seq aligner.

          Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Fast gapped-read alignment with Bowtie 2.

            As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              featureCounts: an efficient general purpose program for assigning sequence reads to genomic features.

              Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.
                Bookmark

                Author and article information

                Contributors
                asrivastava@cs.stonybrook.edu
                lmalik@cs.stonybrook.edu
                tss38@cam.ac.uk
                i.sudbery@sheffield.ac.uk
                rob.patro@cs.stonybrook.edu
                Journal
                Genome Biol
                Genome Biol
                Genome Biology
                BioMed Central (London )
                1474-7596
                1474-760X
                27 March 2019
                27 March 2019
                2019
                : 20
                : 65
                Affiliations
                [1 ]ISNI 0000 0001 2216 9681, GRID grid.36425.36, Department of Computer Science, , Stony Brook University, ; Stony Brook, USA
                [2 ]ISNI 0000000121885934, GRID grid.5335.0, Cambridge Centre for Proteomics, Department of Biochemistry, , University of Cambridge, ; Cambridge, CB2 1GA UK
                [3 ]ISNI 0000 0004 1936 9262, GRID grid.11835.3e, Sheffield Institute for Nucleic Acids, Department of Molecular Biology and Biotechnology, , The University of Sheffield, ; Sheffield, S10 2TN UK
                Author information
                http://orcid.org/0000-0001-8463-1675
                Article
                1670
                10.1186/s13059-019-1670-y
                6437997
                30917859
                62a1248e-7774-4ef8-9332-7d8dc61c04bf
                © The Author(s) 2019

                Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 28 November 2018
                : 5 March 2019
                Funding
                Funded by: National Science Foundation (US)
                Award ID: BIO-1564917
                Funded by: National Science Foundation (US)
                Award ID: CCF-1750472
                Funded by: National Science Foundation (US)
                Award ID: CNS-1763680
                Funded by: National Institutes of Health (US)
                Award ID: R01HG009937
                Funded by: Silicon Valley Community Foundation (US)
                Award ID: 2018-182752
                Categories
                Method
                Custom metadata
                © The Author(s) 2019

                Genetics
                single-cell rna-seq,umi deduplication,quantification,cellular barcode
                Genetics
                single-cell rna-seq, umi deduplication, quantification, cellular barcode

                Comments

                Comment on this article