scPipe: A flexible R/Bioconductor preprocessing pipeline for single-cell RNA-sequencing data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Single-cell RNA sequencing (scRNA-seq) technology allows researchers to profile the transcriptomes of thousands of cells simultaneously. Protocols that incorporate both designed and random barcodes have greatly increased the throughput of scRNA-seq, but give rise to a more complex data structure. There is a need for new tools that can handle the various barcoding strategies used by different protocols and exploit this information for quality assessment at the sample-level and provide effective visualization of these results in preparation for higher-level analyses. To this end, we developed scPipe, an R/Bioconductor package that integrates barcode demultiplexing, read alignment, UMI-aware gene-level quantification and quality control of raw sequencing data generated by multiple protocols that include CEL-seq, MARS-seq, Chromium 10X, Drop-seq and Smart-seq. scPipe produces a count matrix that is essential for downstream analysis along with an HTML report that summarises data quality. These results can be used as input for downstream analyses including normalization, visualization and statistical testing. scPipe performs this processing in a few simple R commands, promoting reproducible analysis of single-cell data that is compatible with the emerging suite of open-source scRNA-seq analysis tools available in R/Bioconductor and beyond. The scPipe R package is available for download from https://www.bioconductor.org/packages/scPipe.

Author summary

Biotechnologies that allow researchers to measure gene activity in individual cells are growing in popularity. This has resulted in an avalanche of custom analysis methods designed to deal with the complex data that arises from this technology. Although hundreds of analysis methods are available, relatively few deal with raw data processing in a holistic way. Our scPipe software has been developed to fill this gap. scPipe is the first fully integrated R package that deals with the raw sequencing reads from single cell gene expression studies, processing them to the point where biologically interesting downstream analyses can take place. By following community developed standards, scPipe is compatible with many other software packages for single cell analysis available from the open-source Bioconductor project, facilitating a complete beginning to end analysis of single cell gene expression data. This allows various biological questions to be answered, ranging from the identification of novel cell types to the discovery of new marker genes. scPipe promotes reproducibility and makes it easier for researchers to share results and code.

Related collections

Most cited references 12

Record: found
Abstract: found
Article: found

Is Open Access

featureCounts: An efficient general-purpose program for assigning sequence reads to genomic features

, , (2013)

Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.

0 comments Cited 770 times – based on 0 reviews

Preprint

     Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor

Aaron Lun, Davis J. McCarthy, John Marioni … (2016)

Single-cell RNA sequencing (scRNA-seq) is widely used to profile the transcriptome of individual cells. This provides biological resolution that cannot be matched by bulk RNA sequencing, at the cost of increased technical noise and data complexity. The differences between scRNA-seq and bulk RNA-seq data mean that the analysis of the former cannot be performed by recycling bioinformatics pipelines for the latter. Rather, dedicated single-cell methods are required at various steps to exploit the cellular resolution while accounting for technical noise. This article describes a computational workflow for low-level analyses of scRNA-seq data, based primarily on software packages from the open-source Bioconductor project. It covers basic steps including quality control, data exploration and normalization, as well as more complex procedures such as cell cycle phase assignment, identification of highly variable and correlated genes, clustering into subpopulations and marker gene detection. Analyses were demonstrated on gene-level count data from several publicly available datasets involving haematopoietic stem cells, brain-derived cells, T-helper cells and mouse embryonic stem cells. This will provide a range of usage scenarios from which readers can construct their own analysis pipelines.

0 comments Cited 654 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

A general and flexible method for signal extraction from single-cell RNA-seq data

Davide Risso, Fanny Perraudeau, Svetlana Gribkova … (2018)

Single-cell RNA-sequencing (scRNA-seq) is a powerful high-throughput technique that enables researchers to measure genome-wide transcription levels at the resolution of single cells. Because of the low amount of RNA present in a single cell, some genes may fail to be detected even though they are expressed; these genes are usually referred to as dropouts. Here, we present a general and flexible zero-inflated negative binomial model (ZINB-WaVE), which leads to low-dimensional representations of the data that account for zero inflation (dropouts), over-dispersion, and the count nature of the data. We demonstrate, with simulated and real data, that the model and its associated estimation procedure are able to give a more stable and accurate low-dimensional representation of the data than principal component analysis (PCA) and zero-inflated factor analysis (ZIFA), without the need for a preliminary normalization step.

0 comments Cited 266 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Luyi Tian:

ORCID: http://orcid.org/0000-0003-3420-3685

Role: ConceptualizationRole: Data curationRole: InvestigationRole: ResourcesRole: SoftwareRole: Writing – original draft

Shian Su: Role: Data curationRole: Software

Xueyi Dong: Role: SoftwareRole: Visualization

Daniela Amann-Zalcenstein: Role: Data curationRole: Resources

Christine Biben: Role: Data curationRole: Resources

Azadeh Seidi: Role: Data curationRole: Resources

Douglas J. Hilton: Role: Data curationRole: Resources

Shalin H. Naik: Role: ConceptualizationRole: Data curationRole: Funding acquisitionRole: MethodologyRole: ResourcesRole: Supervision

Matthew E. Ritchie:

ORCID: http://orcid.org/0000-0002-7383-0609

Role: ConceptualizationRole: Funding acquisitionRole: MethodologyRole: Project administrationRole: ResourcesRole: SupervisionRole: Writing – original draftRole: Writing – review & editing

Mihaela Pertea: Role: Editor

Journal

Journal ID (nlm-ta): PLoS Comput Biol

Journal ID (iso-abbrev): PLoS Comput. Biol

Journal ID (publisher-id): plos

Journal ID (pmc): ploscomp

Title: PLoS Computational Biology

Publisher: Public Library of Science (San Francisco, CA USA )

ISSN (Print): 1553-734X

ISSN (Electronic): 1553-7358

Publication date Collection: August 2018

Publication date (Electronic): 10 August 2018

Volume: 14

Issue: 8

Electronic Location Identifier: e1006361

Affiliations

[1 ] Molecular Medicine Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Australia

[2 ] Department of Medical Biology, The University of Melbourne, Parkville, Australia

[3 ] College of Life Science, Zhejiang University, Hangzhou, Zhejiang Province, P.R. China

[4 ] Australian Genome Research Facility, Parkville, Australia

[5 ] School of Mathematics and Statistics, The University of Melbourne, Parkville, Australia

Johns Hopkins University, UNITED STATES

Author notes

The authors have declared that no competing interests exist.

* E-mail: tian.l@ 123456wehi.edu.au (LT); mritchie@ 123456wehi.edu.au (MER)

Author information

Luyi Tian http://orcid.org/0000-0003-3420-3685

Matthew E. Ritchie http://orcid.org/0000-0002-7383-0609

Article

Publisher ID: PCOMPBIOL-D-18-00398

DOI: 10.1371/journal.pcbi.1006361

PMC ID: 6105007

PubMed ID: 30096152

SO-VID: 2da215bd-921a-4f1e-85a6-d2903e2b6993

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 11 March 2018

Date accepted : 12 July 2018

Page count

Figures: 6, Tables: 1, Pages: 15

Funding

Funded by: funder-id http://dx.doi.org/10.13039/501100000925, National Health and Medical Research Council;

Award ID: GNT1143163, GNT1104924

Award Recipient :

ORCID: http://orcid.org/0000-0002-7383-0609

Matthew E. Ritchie

Funded by: funder-id http://dx.doi.org/10.13039/501100000925, National Health and Medical Research Council;

Award ID: GNT1124812, GNT1062820

Award Recipient : Shalin H. Naik

Funded by: funder-id http://dx.doi.org/10.13039/501100001782, University of Melbourne;

Award ID: Melbourne Research Scholarship

Award Recipient :

ORCID: http://orcid.org/0000-0003-3420-3685

Luyi Tian

Funded by: Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation

Award ID: 2018-182819

Award Recipient :

ORCID: http://orcid.org/0000-0002-7383-0609

Matthew E. Ritchie

This work was supported by the National Health and Medical Research Council (NHMRC) Project Grants (GNT1143163 to MER, GNT1124812 to SHN and MER, GNT1062820 to SHN), Fellowship GNT1104924 to MER, the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation (grant number 2018-182819 to MER), a Melbourne Research Scholarship to LT, Genomics Innovation Hub, Victorian State Government Operational Infrastructure Support and Australian Government NHMRC IRIISS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Custom metadata

PLOS Publication Stage vor-update-to-uncorrected-proof

Publication Update 2018-08-22

Data Availability Datasets are available under GEO accession numbers GSE109999 and GSE111108 or from the Human Cell Atlas Preview Datasets webpage ( https://preview.data.humancellatlas.org/).

scPipe: A flexible R/Bioconductor preprocessing pipeline for single-cell RNA-sequencing data

Read this article at

Abstract

Author summary

Related collections

Journal of Systems Thinking Preprints

Most cited references 12

featureCounts: An efficient general-purpose program for assigning sequence reads to genomic features

A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor

A general and flexible method for signal extraction from single-cell RNA-seq data

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 420

Cited by 39

Most referenced authors 1,583