Alevin efficiently estimates accurate gene abundances from dscRNA-seq data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

We introduce alevin, a fast end-to-end pipeline to process droplet-based single-cell RNA sequencing data, performing cell barcode detection, read mapping, unique molecular identifier (UMI) deduplication, gene count estimation, and cell barcode whitelisting. Alevin’s approach to UMI deduplication considers transcript-level constraints on the molecules from which UMIs may have arisen and accounts for both gene-unique reads and reads that multimap between genes. This addresses the inherent bias in existing tools which discard gene-ambiguous reads and improves the accuracy of gene abundance estimates. Alevin is considerably faster, typically eight times, than existing gene quantification approaches, while also using less memory.

Electronic supplementary material

The online version of this article (10.1186/s13059-019-1670-y) contains supplementary material, which is available to authorized users.

Related collections

Most cited references 39

Record: found
Abstract: found
Article: not found

STAR: ultrafast universal RNA-seq aligner.

Alexander Dobin, Carrie A. Davis, Felix Schlesinger … (2013)

Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

0 comments Cited 14129 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Fast gapped-read alignment with Bowtie 2.

Ben Langmead, Steven L Salzberg (2022)

As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

0 comments Cited 13351 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

featureCounts: an efficient general purpose program for assigning sequence reads to genomic features.

Y. Liao, G K Smyth, W Shi (2014)

Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.

0 comments Cited 7199 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Avi Srivastava: asrivastava@cs.stonybrook.edu

Laraib Malik: lmalik@cs.stonybrook.edu

Tom Smith: tss38@cam.ac.uk

Ian Sudbery: i.sudbery@sheffield.ac.uk

Rob Patro:

ORCID: http://orcid.org/0000-0001-8463-1675

rob.patro@cs.stonybrook.edu

Journal

Journal ID (nlm-ta): Genome Biol

Journal ID (iso-abbrev): Genome Biol

Title: Genome Biology

Publisher: BioMed Central (London )

ISSN (Print): 1474-7596

ISSN (Electronic): 1474-760X

Publication date (Electronic): 27 March 2019

Publication date PMC-release: 27 March 2019

Publication date Collection: 2019

Volume: 20

Electronic Location Identifier: 65

Affiliations

[1 ]ISNI 0000 0001 2216 9681, GRID grid.36425.36, Department of Computer Science, , Stony Brook University, ; Stony Brook, USA

[2 ]ISNI 0000000121885934, GRID grid.5335.0, Cambridge Centre for Proteomics, Department of Biochemistry, , University of Cambridge, ; Cambridge, CB2 1GA UK

[3 ]ISNI 0000 0004 1936 9262, GRID grid.11835.3e, Sheffield Institute for Nucleic Acids, Department of Molecular Biology and Biotechnology, , The University of Sheffield, ; Sheffield, S10 2TN UK

Author information

Rob Patro http://orcid.org/0000-0001-8463-1675

Article

Publisher ID: 1670

DOI: 10.1186/s13059-019-1670-y

PMC ID: 6437997

PubMed ID: 30917859

SO-VID: 62a1248e-7774-4ef8-9332-7d8dc61c04bf

License:

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

History

Date received : 28 November 2018

Date accepted : 5 March 2019

Funding

Funded by: National Science Foundation (US)

Award ID: BIO-1564917

Funded by: National Science Foundation (US)

Award ID: CCF-1750472

Funded by: National Science Foundation (US)

Award ID: CNS-1763680

Funded by: National Institutes of Health (US)

Award ID: R01HG009937

Funded by: Silicon Valley Community Foundation (US)

Award ID: 2018-182752

Custom metadata

ScienceOpen disciplines: Genetics

Keywords: single-cell rna-seq,umi deduplication,quantification,cellular barcode

Data availability:

ScienceOpen disciplines: Genetics

Keywords: single-cell rna-seq, umi deduplication, quantification, cellular barcode

Comments

Comment on this article

Version and Review History

Published Version

Preprint

scite_

Cited by 98

See all cited by

Most referenced authors 1,081

See all reference authors

- Version 1

Alevin efficiently estimates accurate gene abundances from dscRNA-seq data

Read this article at

Abstract

Electronic supplementary material

Related collections

Genes & Diseases

Most cited references 39

STAR: ultrafast universal RNA-seq aligner.

Fast gapped-read alignment with Bowtie 2.

featureCounts: an efficient general purpose program for assigning sequence reads to genomic features.

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 163

Cited by 98

Most referenced authors 1,081