Systematic evaluation of spliced alignment programs for RNA-seq data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

High-throughput RNA sequencing is an increasingly accessible method for studying gene structure and activity on a genome-wide scale. A critical step in RNA-seq data analysis is the alignment of partial transcript reads to a reference genome sequence. to assess the performance of current mapping software, we invited developers of RNA-seq aligners to process four large human and mouse RNA-seq data sets. in total, we compared 26 mapping protocols based on 11 programs and pipelines and found major performance differences between methods on numerous benchmarks, including alignment yield, basewise accuracy, mismatch and gap placement, exon junction discovery and suitability of alignments for transcript reconstruction. We observed concordant results on real and simulated RNA-seq data, confirming the relevance of the metrics employed. Future developments in RNA-seq alignment methods would benefit from improved placement of multimapped reads, balanced utilization of existing gene annotation and a reduced false discovery rate for splice junctions.

Related collections

Most cited references 9

Record: found
Abstract: found
Article: not found

Assemblathon 1: a competitive assessment of de novo short read assembly methods.

D. Earl, K. Bradnam, J. St. John … (2011)

Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.

0 comments Cited 213 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM).

D Pizarro, B Brunk, Jordan Pierce … (2011)

A critical task in high-throughput sequencing is aligning millions of short reads to a reference genome. Alignment is especially complicated for RNA sequencing (RNA-Seq) because of RNA splicing. A number of RNA-Seq algorithms are available, and claim to align reads with high accuracy and efficiency while detecting splice junctions. RNA-Seq data are discrete in nature; therefore, with reasonable gene models and comparative metrics RNA-Seq data can be simulated to sufficient accuracy to enable meaningful benchmarking of alignment algorithms. The exercise to rigorously compare all viable published RNA-Seq algorithms has not been performed previously. We developed an RNA-Seq simulator that models the main impediments to RNA alignment, including alternative splicing, insertions, deletions, substitutions, sequencing errors and intron signal. We used this simulator to measure the accuracy and robustness of available algorithms at the base and junction levels. Additionally, we used reverse transcription-polymerase chain reaction (RT-PCR) and Sanger sequencing to validate the ability of the algorithms to detect novel transcript features such as novel exons and alternative splicing in RNA-Seq data from mouse retina. A pipeline based on BLAT was developed to explore the performance of established tools for this problem, and to compare it to the recently developed methods. This pipeline, the RNA-Seq Unified Mapper (RUM), performs comparably to the best current aligners and provides an advantageous combination of accuracy, speed and usability. The RUM pipeline is distributed via the Amazon Cloud and for computing clusters using the Sun Grid Engine (http://cbil.upenn.edu/RUM). ggrant@pcbi.upenn.edu; epierce@mail.med.upenn.edu The RNA-Seq sequence reads described in the article are deposited at GEO, accession GSE26248.

0 comments Cited 145 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Tools for mapping high-throughput sequencing data.

Nuno A. Fonseca, Johan Rung, Alvis Brazma … (2012)

A ubiquitous and fundamental step in high-throughput sequencing analysis is the alignment (mapping) of the generated reads to a reference sequence. To accomplish this task, numerous software tools have been proposed. Determining the mappers that are most suitable for a specific application is not trivial. This survey focuses on classifying mappers through a wide number of characteristics. The goal is to allow practitioners to compare the mappers more easily and find those that are most suitable for their specific problem.

0 comments Cited 104 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-journal-id): 101215604

Journal ID (pubmed-jr-id): 32338

Journal ID (nlm-ta): Nat Methods

Journal ID (iso-abbrev): Nat. Methods

Title: Nature methods

ISSN (Print): 1548-7091

ISSN (Electronic): 1548-7105

Publication date Nihms-submitted: 14 April 2014

Publication date (Electronic): 03 November 2013

Publication date (Print): December 2013

Publication date PMC-release: 13 May 2014

Volume: 10

Issue: 12

Pages: 1185-1191

Affiliations

[1 ]European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK

[2 ]Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania, USA

[3 ]Penn Center for Bioinformatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA

[4 ]Computational Biology Center, Sloan-Kettering Institute, New York, New York, USA

[5 ]Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany

[7 ]Wellcome Trust Sanger Institute, Cambridge, UK

[8 ]Centre for Genomic Regulation, Barcelona, Spain

[9 ]Universitat Pompeu Fabra, Barcelona, Spain

[10 ]Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany

[11 ]Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany

[12 ]Wellcome Trust–Medical Research Council Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK

Author notes

Correspondence should be addressed to P.B. ( bertone@ 123456ebi.ac.uk )

[13]

Present address: Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Stockholm, Sweden.

[6]

Full lists of members and affiliations appear at the end of the paper.

AUTHOR CONTRIBUTIONS: P.B., R.G., J.H., T.J.H. and N.G. conceived of and organized the study. G.R.G. and B.S. created the simulated RNA-seq data. Consortium members provided alignments for evaluation. P.G.E., T.S., B.S. and G.R.G. analyzed the data. P.G.E. and P.B. coordinated the analysis and wrote the paper with input from the aforementioned authors. A.K. and G.R. carried out preliminary analysis and metric development based on earlier RNA-seq and alignment data but did not evaluate the alignments described herein.

Article

Manuscript ID: EMS58004

DOI: 10.1038/nmeth.2722

PMC ID: 4018468

PubMed ID: 24185836

SO-VID: 587745a4-d38d-4b08-bedb-c065eb12c559

License:

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/.

Systematic evaluation of spliced alignment programs for RNA-seq data

Read this article at

Abstract

Related collections

RNA drug delivery

Most cited references 9

Assemblathon 1: a competitive assessment of de novo short read assembly methods.

Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM).

Tools for mapping high-throughput sequencing data.

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 211

Cited by 236