TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

In order to correctly decode phenotypic information from RNA-sequencing (RNA-seq) data, careful selection of the RNA-seq quantification measure is critical for inter-sample comparisons and for downstream analyses, such as differential gene expression between two or more conditions. Several methods have been proposed and continue to be used. However, a consensus has not been reached regarding the best gene expression quantification method for RNA-seq data analysis.

Methods

In the present study, we used replicate samples from each of 20 patient-derived xenograft (PDX) models spanning 15 tumor types, for a total of 61 human tumor xenograft samples available through the NCI patient-derived model repository (PDMR). We compared the reproducibility across replicate samples based on TPM (transcripts per million), FPKM (fragments per kilobase of transcript per million fragments mapped), and normalized counts using coefficient of variation, intraclass correlation coefficient, and cluster analysis.

Results

Our results revealed that hierarchical clustering on normalized count data tended to group replicate samples from the same PDX model together more accurately than TPM and FPKM data. Furthermore, normalized count data were observed to have the lowest median coefficient of variation (CV), and highest intraclass correlation (ICC) values across all replicate samples from the same model and for the same gene across all PDX models compared to TPM and FPKM data.

Conclusion

We provided compelling evidence for a preferred quantification measure to conduct downstream analyses of PDX RNA-seq data. To our knowledge, this is the first comparative study of RNA-seq data quantification measures conducted on PDX models, which are known to be inherently more variable than cell line models. Our findings are consistent with what others have shown for human tumors and cell lines and add further support to the thesis that normalized counts are the best choice for the analysis of RNA-seq data across samples.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12967-021-02936-w.

Related collections

Most cited references 44

Record: found
Abstract: found
Article: not found

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

A. Subramanian, P. Tamayo, V. K. Mootha … (2005)

Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

0 comments Cited 13336 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal.

J. Gao, B. A. Aksoy, U Dogrusoz … (2015)

The cBioPortal for Cancer Genomics (http://cbioportal.org) provides a Web resource for exploring, visualizing, and analyzing multidimensional cancer genomics data. The portal reduces molecular profiling data from cancer tissues and cell lines into readily understandable genetic, epigenetic, gene expression, and proteomic events. The query interface combined with customized data storage enables researchers to interactively explore genetic alterations across samples, genes, and pathways and, when available in the underlying data, to link these to clinical outcomes. The portal provides graphical summaries of gene-level data from multiple platforms, network visualization and analysis, survival analysis, patient-centric queries, and software programmatic access. The intuitive Web interface of the portal makes complex cancer genomics profiles accessible to researchers and clinicians without requiring bioinformatics expertise, thus facilitating biological discoveries. Here, we provide a practical guide to the analysis and visualization features of the cBioPortal for Cancer Genomics.

0 comments Cited 6351 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

Bo Li, Colin Dewey (2011)

Background RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. Results We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. Conclusions RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.

0 comments Cited 4825 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Lisa M. McShane: mcshanel@ctep.nci.nih.gov

Journal

Journal ID (nlm-ta): J Transl Med

Journal ID (iso-abbrev): J Transl Med

Title: Journal of Translational Medicine

Publisher: BioMed Central (London )

ISSN (Electronic): 1479-5876

Publication date (Electronic): 22 June 2021

Publication date PMC-release: 22 June 2021

Publication date Collection: 2021

Volume: 19

Electronic Location Identifier: 269

Affiliations

[1 ]GRID grid.48336.3a, ISNI 0000 0004 1936 8075, Biometric Research Program, Division of Cancer Treatment and Diagnosis, , National Cancer Institute, ; Rockville, MD USA

[2 ]GRID grid.418021.e, ISNI 0000 0004 0535 8394, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, ; Frederick, MD USA

[3 ]GRID grid.48336.3a, ISNI 0000 0004 1936 8075, Division of Cancer Treatment and Diagnosis, , National Cancer Institute, ; Bethesda, MD USA

Author information

Yingdong Zhao http://orcid.org/0000-0002-8514-0293

Article

Publisher ID: 2936

DOI: 10.1186/s12967-021-02936-w

PMC ID: 8220791

PubMed ID: 34158060

SO-VID: 844fe061-e11d-48cf-8766-179499fefaf7

License:

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

History

Date received : 26 April 2021

Date accepted : 10 June 2021

Funding

Funded by: FundRef http://dx.doi.org/10.13039/100000054, National Cancer Institute;

Award ID: HHSN261200800001E

Award Recipient : P. Mickey Williams

Funded by: National Institutes of Health (NIH)

Open Access :

Open Access funding provided by the National Institutes of Health (NIH).

Custom metadata

ScienceOpen disciplines: Medicine

Keywords: rna sequencing,quantification measures,normalization,tpm,fpkm,count,rsem,patient derived xenograft models,deseq2,tmm

Data availability:

ScienceOpen disciplines: Medicine

Keywords: rna sequencing, quantification measures, normalization, tpm, fpkm, count, rsem, patient derived xenograft models, deseq2, tmm

TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository

Read this article at

Abstract

Background

Methods

Results

Conclusion

Supplementary Information

Related collections

RNA drug delivery

Most cited references 44

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal.

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 93

Cited by 83

Most referenced authors 1,986