RNA-SSNV: A Reliable Somatic Single Nucleotide Variant Identification Framework for Bulk RNA-Seq Data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The usage of expressed somatic mutations may have a unique advantage in identifying active cancer driver mutations. However, accurately calling mutations from RNA-seq data is difficult due to confounding factors such as RNA-editing, reverse transcription, and gap alignment. In the present study, we proposed a framework (named RNA-SSNV, https://github.com/pmglab/RNA-SSNV) to call somatic single nucleotide variants (SSNV) from tumor bulk RNA-seq data. Based on a comprehensive multi-filtering strategy and a machine-learning classification model trained with comprehensively curated features, RNA-SSNV achieved the best precision–recall rate (0.880–0.884) in a testing dataset and robustly retained 0.94 AUC for the precision–recall curve in three validation adult-based TCGA (The Cancer Genome Atlas) datasets. We further showed that the somatic mutations called by RNA-SSNV tended to have a higher functional impact and therapeutic power in known driver genes. Furthermore, VAF (variant allele fraction) analysis revealed that subclonal harboring expressed mutations had evolutional selection advantage and RNA had higher detection power to rescue DNA-omitted mutations. In sum, RNA-SSNV will be a useful approach to accurately call expressed somatic mutations for a more insightful analysis of cancer drive genes and carcinogenic mechanisms.

Related collections

Most cited references 91

Record: found
Abstract: found
Article: not found

Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries

Hyuna Sung, Jacques Ferlay, Rebecca Siegel … (2021)

This article provides an update on the global cancer burden using the GLOBOCAN 2020 estimates of cancer incidence and mortality produced by the International Agency for Research on Cancer. Worldwide, an estimated 19.3 million new cancer cases (18.1 million excluding nonmelanoma skin cancer) and almost 10.0 million cancer deaths (9.9 million excluding nonmelanoma skin cancer) occurred in 2020. Female breast cancer has surpassed lung cancer as the most commonly diagnosed cancer, with an estimated 2.3 million new cases (11.7%), followed by lung (11.4%), colorectal (10.0 %), prostate (7.3%), and stomach (5.6%) cancers. Lung cancer remained the leading cause of cancer death, with an estimated 1.8 million deaths (18%), followed by colorectal (9.4%), liver (8.3%), stomach (7.7%), and female breast (6.9%) cancers. Overall incidence was from 2-fold to 3-fold higher in transitioned versus transitioning countries for both sexes, whereas mortality varied <2-fold for men and little for women. Death rates for female breast and cervical cancers, however, were considerably higher in transitioning versus transitioned countries (15.0 vs 12.8 per 100,000 and 12.4 vs 5.2 per 100,000, respectively). The global cancer burden is expected to be 28.4 million cases in 2040, a 47% rise from 2020, with a larger increase in transitioning (64% to 95%) versus transitioned (32% to 56%) countries due to demographic changes, although this may be further exacerbated by increasing risk factors associated with globalization and a growing economy. Efforts to build a sustainable infrastructure for the dissemination of cancer prevention measures and provision of cancer care in transitioning countries is critical for global cancer control.

0 comments Cited 33462 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Fast and accurate short read alignment with Burrows–Wheeler transform

Heng Li, Richard Durbin (2009)

Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ∼10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: rd@sanger.ac.uk

0 comments Cited 11116 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Aaron McKenna, Matthew Hanna, Eric R. Banks … (2010)

Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

0 comments Cited 5955 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Qihan Long: URI : https://loop.frontiersin.org/people/1867000/overview

Yangyang Yuan: URI : https://loop.frontiersin.org/people/1867076/overview

Miaoxin Li: URI : https://loop.frontiersin.org/people/847693/overview

Journal

Journal ID (nlm-ta): Front Genet

Journal ID (iso-abbrev): Front Genet

Journal ID (publisher-id): Front. Genet.

Title: Frontiers in Genetics

Publisher: Frontiers Media S.A.

ISSN (Electronic): 1664-8021

Publication date (Electronic): 30 June 2022

Publication date Collection: 2022

Volume: 13

Electronic Location Identifier: 865313

Affiliations

[1] ¹ Zhongshan School of Medicine , Sun Yat-Sen University , Guangzhou, China

[2] ² Center for Precision Medicine , Sun Yat-Sen University , Guangzhou, China

[3] ³ Center for Disease Genome Research , Sun Yat-Sen University , Guangzhou, China

[4] ⁴ Guangdong Provincial Key Laboratory of Biomedical Imaging and Guangdong Provincial Engineering Research Center of Molecular Imaging , The Fifth Affiliated Hospital , Sun Yat-sen University , Zhuhai, China

[5] ⁵ Key Laboratory of Tropical Disease Control (SYSU) , Ministry of Education , Guangzhou, China

Author notes

Edited by: Joel Correa Da Rosa, Icahn School of Medicine at Mount Sinai, United States

Reviewed by: Rodrigo Gularte Mérida, Memorial Sloan Kettering Cancer Center, United States

Yiyang Wu, Vanderbilt University Medical Center, United States

*Correspondence: Miaoxin Li, limiaoxin@ 123456mail.sysu.edu.cn

This article was submitted to Cancer Genetics and Oncogenomics, a section of the journal Frontiers in Genetics

Article

Publisher ID: 865313

DOI: 10.3389/fgene.2022.865313

PMC ID: 9279659

PubMed ID: 35846154

SO-VID: 03d175e7-5386-4a9c-b0e5-295502cb0c38

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

History

Date received : 29 January 2022

Date accepted : 17 May 2022

Funding

Funded by: National Natural Science Foundation of China , doi 10.13039/501100001809;

Award ID: 32170637 32100503

Funded by: Guangzhou Municipal Science and Technology Project , doi 10.13039/501100010256;

Award ID: 201803010116

RNA-SSNV: A Reliable Somatic Single Nucleotide Variant Identification Framework for Bulk RNA-Seq Data

Read this article at

Abstract

Related collections

RNA drug delivery

Most cited references 91

Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries

Fast and accurate short read alignment with Burrows–Wheeler transform

The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Funding

Categories

Comments

Comment on this article

Similar content 59

Cited by 1

Most referenced authors 2,951