68
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In recent years, RNA sequencing (RNA-seq) has become a very widely used technology for profiling gene expression. One of the most common aims of RNA-seq profiling is to identify genes or molecular pathways that are differentially expressed (DE) between two or more biological conditions. This article demonstrates a computational workflow for the detection of DE genes and pathways from RNA-seq data by providing a complete analysis of an RNA-seq experiment profiling epithelial cell subsets in the mouse mammary gland. The workflow uses R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, including alignment of read sequences, data exploration, differential expression analysis, visualization and pathway analysis. Read alignment and count quantification is conducted using the Rsubread package and the statistical analyses are performed using the edgeR package. The differential expression analysis uses the quasi-likelihood functionality of edgeR.

          Related collections

          Most cited references13

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          featureCounts: An efficient general-purpose program for assigning sequence reads to genomic features

          , , (2013)
          Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Small-sample estimation of negative binomial dispersion, with applications to SAGE data.

            We derive a quantile-adjusted conditional maximum likelihood estimator for the dispersion parameter of the negative binomial distribution and compare its performance, in terms of bias, to various other methods. Our estimation scheme outperforms all other methods in very small samples, typical of those from serial analysis of gene expression studies, the motivating data for this study. The impact of dispersion estimation on hypothesis testing is studied. We derive an "exact" test that outperforms the standard approximate asymptotic tests.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Moderated statistical tests for assessing differences in tag abundance.

              Digital gene expression (DGE) technologies measure gene expression by counting sequence tags. They are sensitive technologies for measuring gene expression on a genomic scale, without the need for prior knowledge of the genome sequence. As the cost of sequencing DNA decreases, the number of DGE datasets is expected to grow dramatically. Various tests of differential expression have been proposed for replicated DGE data using binomial, Poisson, negative binomial or pseudo-likelihood (PL) models for the counts, but none of the these are usable when the number of replicates is very small. We develop tests using the negative binomial distribution to model overdispersion relative to the Poisson, and use conditional weighted likelihood to moderate the level of overdispersion across genes. Not only is our strategy applicable even with the smallest number of libraries, but it also proves to be more powerful than previous strategies when more libraries are available. The methodology is equally applicable to other counting technologies, such as proteomic spectral counts. An R package can be accessed from http://bioinf.wehi.edu.au/resources/
                Bookmark

                Author and article information

                Journal
                F1000Res
                F1000Res
                F1000Research
                F1000Research
                F1000Research (London, UK )
                2046-1402
                20 June 2016
                2016
                : 5
                : 1438
                Affiliations
                [1 ]The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, 3052, Australia
                [2 ]Department of Medical Biology, The University of Melbourne, Victoria, 3010, Australia
                [3 ]Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Cambridge, UK
                [4 ]Department of Mathematics and Statistics, The University of Melbourne, Victoria, 3010, Australia
                [1 ]Department of Bioinformatics and Computational Biology, Genentech Inc., San Francisco, CA, USA
                [1 ]Institute of Mathematical Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
                [1 ]Max Planck Institute of Immunobiology and Epigenetics, Freiburg, Germany
                [1 ]Division of Computational Biology, College of Life Sciences, University of Dundee, Dundee, UK
                [1 ]Mathematical Sciences Institute, Australian National University, Canberra, ACT, Australia
                [2 ]Research School of Biology, Australian National University, Canberra, ACT, Australia
                Author notes

                All authors developed and tested the code workflow. All authors wrote the article.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Article
                10.12688/f1000research.8987.1
                4934518
                27508061
                1b6b4672-9c6d-4505-88b5-2960ae052440
                Copyright: © 2016 Chen Y et al.

                This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 14 June 2016
                Funding
                Funded by: National Health and Medical Research Council
                Award ID: 1058892
                Award ID: 1054618
                This work was supported by the National Health and Medical Research Council (Fellowship 1058892 and Program 1054618 to G.K.S, Independent Research Institutes Infrastructure Support to the Walter and Eliza Hall Institute) and by a Victorian State Government Operational Infrastructure Support Grant.
                Categories
                Software Tool Article
                Articles
                Genomics
                Statistical Methodologies & Health Informatics
                Theory & Simulation

                rna sequencing,molecular pathways,gene expression,r software

                Comments

                Comment on this article