Comprehensive Analysis of Alternative Splicing Across Tumors from 8,705 Patients

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

SUMMARY

Our comprehensive analysis of alternative splicing across 32 The Cancer Genome Atlas cancer types from 8,705 patients detects alternative splicing events and tumor variants by reanalyzing RNA and whole-exome sequencing data. Tumors have up to 30% more alternative splicing events than normal samples. Association analysis of somatic variants with alternative splicing events confirmed known trans associations with variants in SF3B1 and U2AF1 and identified additional trans-acting variants (e.g., TADA1, PPP2R1A). Many tumors have thousands of alternative splicing events not detectable in normal samples; on average, we identified ≈930 exon-exon junctions (“neojunctions”) in tumors not typically found in GTEx normals. From Clinical Proteomic Tumor Analysis Consortium data available for breast and ovarian tumor samples, we confirmed ≈1.7 neojunction- and ≈0.6 single nucleotide variant-derived peptides per tumor sample that are also predicted major histocompatibility complex-I binders (“putative neoantigens”).

Graphical Abstract

In Brief

A pan-cancer analysis by Kahles et al. shows increased alternative splicing events in tumors versus normal tissue and identifies trans-acting variants associated with alternative splicing events. Tumors contain neojunction-derived peptides absent in normal samples, including predicted MHC-I binders that are putative neoantigens.

Related collections

Most cited references 76

Record: found
Abstract: found
Article: found

Is Open Access

The Sequence Alignment/Map format and SAMtools

Heng Li, Bob Handsaker, Alec Wysoker … (2009)

Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: rd@sanger.ac.uk

0 comments Cited 14383 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

STAR: ultrafast universal RNA-seq aligner.

Alexander Dobin, Carrie A. Davis, Felix Schlesinger … (2013)

Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

0 comments Cited 14124 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Aaron McKenna, Matthew Hanna, Eric R. Banks … (2010)

Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

0 comments Cited 5786 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-journal-id): 101130617

Journal ID (pubmed-jr-id): 29778

Journal ID (nlm-ta): Cancer Cell

Journal ID (iso-abbrev): Cancer Cell

Title: Cancer cell

ISSN (Print): 1535-6108

ISSN (Electronic): 1878-3686

Publication date Nihms-submitted: 5 January 2023

Publication date (Print): 13 August 2018

Publication date (Electronic): 02 August 2018

Publication date PMC-release: 17 January 2023

Volume: 34

Issue: 2

Pages: 211-224.e6

Affiliations

[1 ]ETH Zurich, Department of Computer Science, Zurich, Switzerland

[2 ]Memorial Sloan Kettering Cancer Center, Computational Biology Department, New York, USA

[3 ]ETH Zurich, NEXUS Personalized Health Technologies, Zurich, Switzerland

[4 ]European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK

[5 ]University of Tübingen, Department of Computer Science, Tübingen, Germany

[6 ]Center for Bioinformatics, University of Tübingen, Tübingen, Germany

[7 ]Quantitative Biology Center, University of Tübingen, Tübingen, Germany

[8 ]Biomolecular Interactions, Max Planck Institute for Developmental Biology, Tübingen, Germany

[9 ]institute for Translational Bioinformatics, University Medical Center, Tübingen, Germany

[10 ]Dana-Farber Cancer Institute, cBio Center, Department of Biostatistics and Computational Biology, Boston, MA, USA

[11 ]Harvard Medical School, CompBio Collaboratory, Department of Cell Biology, Boston, USA

[12 ]University Hospital Zurich, Biomedical Informatics Research, Zurich, Switzerland

[13 ]ETH Zurich, Department of Biology, Zurich, Switzerland

[14 ]SIB Swiss Institute of Bioinformatics, Zurich, Switzerland

[15 ]These authors contributed equally

[16 ]Lead Contact

Author notes

AUTHOR CONTRIBUTIONS

G.R., A.K., and K.-V.L. conceived the work and designed experimental setup and data analysis with input from N.C.T., O.K., C.S., and O.S. A.K. and K.-V.L. jointly designed and implemented the RNA-seq analysis pipeline, with the help of S.G.S. A.K. performed RNA-seq analyses, generation of splicing phenotypes, and quantitative alternative splicing analysis. K.-V.L. performed QTL analyses, statistical modeling, and differential analysis with input from O.S. M.H. and A.K. contributed the splicing graph-derived peptides. Peptide filtering was the result of discussions among G.R., N.C.T., A.K., M.H., and K.-V.L. N.C.T. contributed the MHC binding predictions and, with the help of O.K. and T.S., performed the MS confirmation analyses. A.K., K.-V.L., G.R., C.S., and N.C.T. jointly wrote the manuscript. All authors provided feedback on manuscript drafts.

[* ]Correspondence: gunnar.ratsch@ 123456ratschlab.org

Article

Manuscript ID: NIHMS1859548

DOI: 10.1016/j.ccell.2018.07.001

PMC ID: 9844097

PubMed ID: 30078747

SO-VID: 37cde48c-7a8d-4e6f-90a6-2f020947162d

License:

This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/).

Comprehensive Analysis of Alternative Splicing Across Tumors from 8,705 Patients

Read this article at

SUMMARY

Graphical Abstract

In Brief

Related collections

Karger: Oncology

Most cited references 76

The Sequence Alignment/Map format and SAMtools

STAR: ultrafast universal RNA-seq aligner.

The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 400

Cited by 362

Most referenced authors 5,306