4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Exaggerated false positives by popular differential expression methods when analyzing human population samples

      brief-report

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          When identifying differentially expressed genes between two conditions using human population RNA-seq samples, we found a phenomenon by permutation analysis: two popular bioinformatics methods, DESeq2 and edgeR, have unexpectedly high false discovery rates. Expanding the analysis to limma-voom, NOISeq, dearseq, and Wilcoxon rank-sum test, we found that FDR control is often failed except for the Wilcoxon rank-sum test. Particularly, the actual FDRs of DESeq2 and edgeR sometimes exceed 20% when the target FDR is 5%. Based on these results, for population-level RNA-seq studies with large sample sizes, we recommend the Wilcoxon rank-sum test.

          Supplementary Information

          The online version contains supplementary material available at 10.1186/s13059-022-02648-4.

          Related collections

          Most cited references43

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

          In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0550-8) contains supplementary material, which is available to authorized users.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              limma powers differential expression analyses for RNA-sequencing and microarray studies

              limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
                Bookmark

                Author and article information

                Contributors
                wei.li@uci.edu
                lijy03@g.ucla.edu
                Journal
                Genome Biol
                Genome Biol
                Genome Biology
                BioMed Central (London )
                1474-7596
                1474-760X
                15 March 2022
                15 March 2022
                2022
                : 23
                : 79
                Affiliations
                [1 ]GRID grid.266093.8, ISNI 0000 0001 0668 7243, Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, , University of California, Irvine, ; Irvine, CA 92697 USA
                [2 ]GRID grid.19006.3e, ISNI 0000 0000 9632 6718, Department of Statistics, , University of California, ; Los Angeles, CA 90095 USA
                [3 ]GRID grid.39382.33, ISNI 0000 0001 2160 926X, Department of Molecular and Cellular Biology, , Baylor College of Medicine, ; Houston, TX 77030 USA
                [4 ]GRID grid.19006.3e, ISNI 0000 0000 9632 6718, Interdepartmental Program in Bioinformatics, , University of California, ; Los Angeles, CA 90095 USA
                [5 ]GRID grid.19006.3e, ISNI 0000 0000 9632 6718, Department of Human Genetics, , University of California, ; Los Angeles, CA 90095 USA
                [6 ]GRID grid.19006.3e, ISNI 0000 0000 9632 6718, Department of Computational Medicine, , University of California, ; Los Angeles, CA 90095 USA
                [7 ]GRID grid.19006.3e, ISNI 0000 0000 9632 6718, Department of Biostatistics, , University of California, ; Los Angeles, CA 90095 USA
                Author information
                http://orcid.org/0000-0002-9288-5648
                Article
                2648
                10.1186/s13059-022-02648-4
                8922736
                35292087
                5568e494-bf93-4b28-9245-73a72b8f484b
                © The Author(s) 2022

                Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                History
                : 22 September 2021
                : 7 March 2022
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100000054, National Cancer Institute;
                Award ID: R01CA193466
                Award ID: R01CA228140
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100000057, National Institute of General Medical Sciences;
                Award ID: R01GM120507
                Award Recipient :
                Funded by: National Institute of General Medical Sciences
                Award ID: R35GM140888
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100000153, Division of Biological Infrastructure;
                Award ID: 1846216
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100000121, Division of Mathematical Sciences;
                Award ID: 2113754
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100004331, Johnson and Johnson;
                Award ID: WiSTEM2D Award
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100000879, Alfred P. Sloan Foundation;
                Award ID: Sloan Research Fellowship
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100000888, W. M. Keck Foundation;
                Award ID: UCLA David Geffen School of Medicine W.M. Keck Foundation Junior Faculty Award
                Award Recipient :
                Categories
                Short Report
                Custom metadata
                © The Author(s) 2022

                Genetics
                Genetics

                Comments

                Comment on this article