40
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Comprehensive Analysis of Alternative Splicing Across Tumors from 8,705 Patients

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          SUMMARY

          Our comprehensive analysis of alternative splicing across 32 The Cancer Genome Atlas cancer types from 8,705 patients detects alternative splicing events and tumor variants by reanalyzing RNA and whole-exome sequencing data. Tumors have up to 30% more alternative splicing events than normal samples. Association analysis of somatic variants with alternative splicing events confirmed known trans associations with variants in SF3B1 and U2AF1 and identified additional trans-acting variants (e.g., TADA1, PPP2R1A). Many tumors have thousands of alternative splicing events not detectable in normal samples; on average, we identified ≈930 exon-exon junctions (“neojunctions”) in tumors not typically found in GTEx normals. From Clinical Proteomic Tumor Analysis Consortium data available for breast and ovarian tumor samples, we confirmed ≈1.7 neojunction- and ≈0.6 single nucleotide variant-derived peptides per tumor sample that are also predicted major histocompatibility complex-I binders (“putative neoantigens”).

          Graphical Abstract

          In Brief

          A pan-cancer analysis by Kahles et al. shows increased alternative splicing events in tumors versus normal tissue and identifies trans-acting variants associated with alternative splicing events. Tumors contain neojunction-derived peptides absent in normal samples, including predicted MHC-I binders that are putative neoantigens.

          Related collections

          Most cited references76

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The Sequence Alignment/Map format and SAMtools

          Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: rd@sanger.ac.uk
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            STAR: ultrafast universal RNA-seq aligner.

            Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

              Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
                Bookmark

                Author and article information

                Journal
                101130617
                29778
                Cancer Cell
                Cancer Cell
                Cancer cell
                1535-6108
                1878-3686
                5 January 2023
                13 August 2018
                02 August 2018
                17 January 2023
                : 34
                : 2
                : 211-224.e6
                Affiliations
                [1 ]ETH Zurich, Department of Computer Science, Zurich, Switzerland
                [2 ]Memorial Sloan Kettering Cancer Center, Computational Biology Department, New York, USA
                [3 ]ETH Zurich, NEXUS Personalized Health Technologies, Zurich, Switzerland
                [4 ]European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
                [5 ]University of Tübingen, Department of Computer Science, Tübingen, Germany
                [6 ]Center for Bioinformatics, University of Tübingen, Tübingen, Germany
                [7 ]Quantitative Biology Center, University of Tübingen, Tübingen, Germany
                [8 ]Biomolecular Interactions, Max Planck Institute for Developmental Biology, Tübingen, Germany
                [9 ]institute for Translational Bioinformatics, University Medical Center, Tübingen, Germany
                [10 ]Dana-Farber Cancer Institute, cBio Center, Department of Biostatistics and Computational Biology, Boston, MA, USA
                [11 ]Harvard Medical School, CompBio Collaboratory, Department of Cell Biology, Boston, USA
                [12 ]University Hospital Zurich, Biomedical Informatics Research, Zurich, Switzerland
                [13 ]ETH Zurich, Department of Biology, Zurich, Switzerland
                [14 ]SIB Swiss Institute of Bioinformatics, Zurich, Switzerland
                [15 ]These authors contributed equally
                [16 ]Lead Contact
                Author notes

                AUTHOR CONTRIBUTIONS

                G.R., A.K., and K.-V.L. conceived the work and designed experimental setup and data analysis with input from N.C.T., O.K., C.S., and O.S. A.K. and K.-V.L. jointly designed and implemented the RNA-seq analysis pipeline, with the help of S.G.S. A.K. performed RNA-seq analyses, generation of splicing phenotypes, and quantitative alternative splicing analysis. K.-V.L. performed QTL analyses, statistical modeling, and differential analysis with input from O.S. M.H. and A.K. contributed the splicing graph-derived peptides. Peptide filtering was the result of discussions among G.R., N.C.T., A.K., M.H., and K.-V.L. N.C.T. contributed the MHC binding predictions and, with the help of O.K. and T.S., performed the MS confirmation analyses. A.K., K.-V.L., G.R., C.S., and N.C.T. jointly wrote the manuscript. All authors provided feedback on manuscript drafts.

                Article
                NIHMS1859548
                10.1016/j.ccell.2018.07.001
                9844097
                30078747
                37cde48c-7a8d-4e6f-90a6-2f020947162d

                This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/).

                History
                Categories
                Article

                Oncology & Radiotherapy
                Oncology & Radiotherapy

                Comments

                Comment on this article