From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

In recent years, RNA sequencing (RNA-seq) has become a very widely used technology for profiling gene expression. One of the most common aims of RNA-seq profiling is to identify genes or molecular pathways that are differentially expressed (DE) between two or more biological conditions. This article demonstrates a computational workflow for the detection of DE genes and pathways from RNA-seq data by providing a complete analysis of an RNA-seq experiment profiling epithelial cell subsets in the mouse mammary gland. The workflow uses R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, including alignment of read sequences, data exploration, differential expression analysis, visualization and pathway analysis. Read alignment and count quantification is conducted using the Rsubread package and the statistical analyses are performed using the edgeR package. The differential expression analysis uses the quasi-likelihood functionality of edgeR.

Related collections

Most cited references 13

Record: found
Abstract: found
Article: found

Is Open Access

featureCounts: An efficient general-purpose program for assigning sequence reads to genomic features

, , (2013)

Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.

0 comments Cited 770 times – based on 0 reviews

Preprint

     Review now

Bookmark

Record: found
Abstract: found
Article: not found

Small-sample estimation of negative binomial dispersion, with applications to SAGE data.

Mark Robinson, Gordon K. Smyth (2008)

We derive a quantile-adjusted conditional maximum likelihood estimator for the dispersion parameter of the negative binomial distribution and compare its performance, in terms of bias, to various other methods. Our estimation scheme outperforms all other methods in very small samples, typical of those from serial analysis of gene expression studies, the motivating data for this study. The impact of dispersion estimation on hypothesis testing is studied. We derive an "exact" test that outperforms the standard approximate asymptotic tests.

0 comments Cited 464 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Moderated statistical tests for assessing differences in tag abundance.

Mark Robinson, Gordon K. Smyth (2007)

Digital gene expression (DGE) technologies measure gene expression by counting sequence tags. They are sensitive technologies for measuring gene expression on a genomic scale, without the need for prior knowledge of the genome sequence. As the cost of sequencing DNA decreases, the number of DGE datasets is expected to grow dramatically. Various tests of differential expression have been proposed for replicated DGE data using binomial, Poisson, negative binomial or pseudo-likelihood (PL) models for the counts, but none of the these are usable when the number of replicates is very small. We develop tests using the negative binomial distribution to model overdispersion relative to the Poisson, and use conditional weighted likelihood to moderate the level of overdispersion across genes. Not only is our strategy applicable even with the smallest number of libraries, but it also proves to be more powerful than previous strategies when more libraries are available. The methodology is equally applicable to other counting technologies, such as proteomic spectral counts. An R package can be accessed from http://bioinf.wehi.edu.au/resources/

0 comments Cited 402 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): F1000Res

Journal ID (iso-abbrev): F1000Res

Journal ID (pmc): F1000Research

Title: F1000Research

Publisher: F1000Research (London, UK )

ISSN (Electronic): 2046-1402

Publication date (Electronic): 20 June 2016

Publication date Collection: 2016

Volume: 5

Electronic Location Identifier: 1438

Affiliations

[1 ]The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, 3052, Australia

[2 ]Department of Medical Biology, The University of Melbourne, Victoria, 3010, Australia

[3 ]Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Cambridge, UK

[4 ]Department of Mathematics and Statistics, The University of Melbourne, Victoria, 3010, Australia

[1 ]Department of Bioinformatics and Computational Biology, Genentech Inc., San Francisco, CA, USA

[1 ]Institute of Mathematical Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia

[1 ]Max Planck Institute of Immunobiology and Epigenetics, Freiburg, Germany

[1 ]Division of Computational Biology, College of Life Sciences, University of Dundee, Dundee, UK

[1 ]Mathematical Sciences Institute, Australian National University, Canberra, ACT, Australia

[2 ]Research School of Biology, Australian National University, Canberra, ACT, Australia

Author notes

[a ] smyth@ 123456wehi.edu.au

All authors developed and tested the code workflow. All authors wrote the article.

Competing interests: No competing interests were disclosed.

Article

DOI: 10.12688/f1000research.8987.1

PMC ID: 4934518

PubMed ID: 27508061

SO-VID: 1b6b4672-9c6d-4505-88b5-2960ae052440

License:

This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date accepted : 14 June 2016

Funding

Funded by: National Health and Medical Research Council

Award ID: 1058892

Award ID: 1054618

This work was supported by the National Health and Medical Research Council (Fellowship 1058892 and Program 1054618 to G.K.S, Independent Research Institutes Infrastructure Support to the Walter and Eliza Hall Institute) and by a Victorian State Government Operational Infrastructure Support Grant.

From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline

Read this article at

Abstract

Related collections

Measurement of Glucocorticoid Receptor Signaling in Major Depression

Most cited references 13

featureCounts: An efficient general-purpose program for assigning sequence reads to genomic features

Small-sample estimation of negative binomial dispersion, with applications to SAGE data.

Moderated statistical tests for assessing differences in tag abundance.

Author and article information

Journal

Affiliations

Author notes

Article

History

Funding

Categories

Comments

Comment on this article

Similar content 161

Cited by 263

Most referenced authors 838