Alevin-fry unlocks rapid, accurate, and memory-frugal quantification of single-cell RNA-seq data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The rapid growth of high-throughput single-cell and single-nucleus RNA-sequencing (sc/snRNA-seq) technologies has produced a wealth of data over the past few years. The size, volume, and distinctive characteristics of these data necessitate the development of new computational methods to accurately and efficiently quantify sc/snRNA-seq data into count matrices that constitute the input to downstream analyses. We introduce the alevin-fry framework for quantifying sc/snRNA-seq data. In addition to being faster and more memory frugal than other accurate quantification approaches, alevin-fry ameliorates the memory scalability and false-positive expression issues that are exhibited by other lightweight tools. We demonstrate how alevin-fry can be effectively used to quantify sc/snRNA-seq data, and also how the spliced and unspliced molecule quantification required as input for RNA velocity analyses can be seamlessly extracted from the same preprocessed data used to generate regular gene expression count matrices.

Related collections

Most cited references 43

Record: found
Abstract: found
Article: not found

STAR: ultrafast universal RNA-seq aligner.

Alexander Dobin, Carrie A. Davis, Felix Schlesinger … (2013)

Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

0 comments Cited 16862 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Basic local alignment search tool.

Stephen F Altschul, Warren Gish, Webb Miller … (1990)

A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.

0 comments Cited 10717 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Integrated analysis of multimodal single-cell data

Yuhan Hao, Stephanie Hao, Erica Andersen-Nissen … (2021)

Summary The simultaneous measurement of multiple modalities represents an exciting frontier for single-cell genomics and necessitates computational methods that can define cellular states based on multimodal data. Here, we introduce “weighted-nearest neighbor” analysis, an unsupervised framework to learn the relative utility of each data type in each cell, enabling an integrative analysis of multiple modalities. We apply our procedure to a CITE-seq dataset of 211,000 human peripheral blood mononuclear cells (PBMCs) with panels extending to 228 antibodies to construct a multimodal reference atlas of the circulating immune system. Multimodal analysis substantially improves our ability to resolve cell states, allowing us to identify and validate previously unreported lymphoid subpopulations. Moreover, we demonstrate how to leverage this reference to rapidly map new datasets and to interpret immune responses to vaccination and coronavirus disease 2019 (COVID-19). Our approach represents a broadly applicable strategy to analyze single-cell multimodal datasets and to look beyond the transcriptome toward a unified and multimodal definition of cellular identity.

0 comments Cited 5413 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-journal-id): 101215604

Journal ID (pubmed-jr-id): 32338

Journal ID (nlm-ta): Nat Methods

Journal ID (iso-abbrev): Nat Methods

Title: Nature methods

ISSN (Print): 1548-7091

ISSN (Electronic): 1548-7105

Publication date Nihms-submitted: 4 March 2022

Publication date (Print): March 2022

Publication date (Electronic): 11 March 2022

Publication date PMC-release: 11 September 2022

Volume: 19

Issue: 3

Pages: 316-322

Affiliations

[1. ]Department of Cell Biology and Molecular Genetics and Center for Bioinformatics and Computational Biology, University of Maryland

[2. ]Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland

[3. ]Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts

[4. ]Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland

[5. ]SIB Swiss Institute of Bioinformatics, Basel, Switzerland

[6. ]New York Genome Center, New York City, New York

Author notes

Author Contributions Statement

All authors conceptualized the method. D.H., A.S., R.P., M.Z. and H.S. implemented the software. M.Z. and R.P. benchmarked the tools. D.H., R.P. and C.S. analyzed the results. All authors wrote and approved the manuscript.

[* ]Corresponding author: rob@ 123456cs.umd.edu

Article

Manuscript ID: NIHMS1775941

DOI: 10.1038/s41592-022-01408-3

PMC ID: 8933848

PubMed ID: 35277707

SO-VID: b4aa5700-72b3-4f31-9b7d-129a200d852c

License:

Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms

Alevin-fry unlocks rapid, accurate, and memory-frugal quantification of single-cell RNA-seq data

Read this article at

Abstract

Related collections

RNA drug delivery

Most cited references 43

STAR: ultrafast universal RNA-seq aligner.

Basic local alignment search tool.

Integrated analysis of multimodal single-cell data

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 427

Cited by 39

Most referenced authors 4,438