Transcriptome assembly from long-read RNA-seq alignments with StringTie2

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

RNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new methods to handle the high error rate of long reads and offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of short-read assemblies. StringTie2 is more accurate and faster and uses less memory than all comparable short-read and long-read analysis tools.

Related collections

Most cited references 22

Record: found
Abstract: found
Article: not found

Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms

Rob Patro, Stephen Mount, Carl Kingsford (2014)

We introduce Sailfish, a computational method for quantifying the abundance of previously annotated RNA isoforms from RNA-seq data. Because Sailfish entirely avoids mapping reads, a time-consuming step in all current methods, it provides quantification estimates much faster than do existing approaches (typically 20 times faster) without loss of accuracy. By facilitating frequent reanalysis of data and reducing the need to optimize parameters, Sailfish exemplifies the potential of lightweight algorithms for efficiently processing sequencing reads.

0 comments Cited 286 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Nanopore native RNA sequencing of a human poly(A) transcriptome

Rachael E. Workman, Alison Tang, Paul Tang … (2019)

High throughput cDNA sequencing technologies have advanced our understanding of transcriptome complexity and regulation. However, these methods lose information contained in biological RNA because the copied reads are often short and because modifications are not retained. We address these limitations using a native poly(A) RNA sequencing strategy developed by Oxford Nanopore Technologies (ONT). Our study generated 9.9 million aligned sequence reads for the human cell line GM12878, using thirty MinION flow cells at six institutions. These native RNA reads had a median length of 771 bases, and a maximum aligned length of over 21,000 bases. Mitochondrial poly(A) reads provided an internal measure of read length quality. We combined these long nanopore reads with higher accuracy short-reads and annotated GM12878 promoter regions, to identify 33,984 plausible RNA isoforms. We describe strategies for assessing 3′ poly(A) tail length, base modifications, and transcript haplotypes.

0 comments Cited 261 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Improved data analysis for the MinION nanopore sequencer

Miten Jain, Ian T Fiddes, Karen H. Miga … (2015)

The Oxford Nanopore MinION sequences individual DNA molecules using an array of pores that read nucleotide identities based on ionic current steps. We evaluated and optimized MinION performance using M13 genomic dsDNA. Using expectation-maximization (EM) we obtained robust maximum likelihood (ML) estimates for read insertion, deletion and substitution error rates (4.9%, 7.8%, and 5.1% respectively). We found that 99% of high-quality ‘2D’ MinION reads mapped to reference at a mean identity of 85%. We present a MinION-tailored tool for single nucleotide variant (SNV) detection that uses ML parameter estimates and marginalization over many possible read alignments to achieve precision and recall of up to 99%. By pairing our high-confidence alignment strategy with long MinION reads, we resolved the copy number for a cancer/testis gene family (CT47) within an unresolved region of human chromosome Xq24.

0 comments Cited 248 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Sam Kovaka: skovaka1@jhu.edu

Aleksey V. Zimin: alekseyz@jhu.edu

Geo M. Pertea: geo.pertea@gmail.com

Roham Razaghi: roham.razzaghi@gmail.com

Steven L. Salzberg: salzberg@jhu.edu

Mihaela Pertea:

ORCID: http://orcid.org/0000-0003-0762-8637

mpertea@jhu.edu

Journal

Journal ID (nlm-ta): Genome Biol

Journal ID (iso-abbrev): Genome Biol

Title: Genome Biology

Publisher: BioMed Central (London )

ISSN (Print): 1474-7596

ISSN (Electronic): 1474-760X

Publication date (Electronic): 16 December 2019

Publication date PMC-release: 16 December 2019

Publication date Collection: 2019

Volume: 20

Electronic Location Identifier: 278

Affiliations

[1 ]ISNI 0000 0001 2171 9311, GRID grid.21107.35, Department of Computer Science, , Johns Hopkins University, ; Baltimore, MD 21218 USA

[2 ]ISNI 0000 0001 2171 9311, GRID grid.21107.35, Center for Computational Biology, Whiting School of Engineering, , Johns Hopkins University, ; Baltimore, MD 21205 USA

[3 ]ISNI 0000 0001 2171 9311, GRID grid.21107.35, Department of Biomedical Engineering, , Johns Hopkins University, ; Baltimore, MD 21218 USA

[4 ]ISNI 0000 0001 2171 9311, GRID grid.21107.35, Department of Biostatistics, Bloomberg School of Public Health, , Johns Hopkins University, ; Baltimore, MD 21205 USA

Author information

Mihaela Pertea http://orcid.org/0000-0003-0762-8637

Article

Publisher ID: 1910

DOI: 10.1186/s13059-019-1910-1

PMC ID: 6912988

PubMed ID: 31842956

SO-VID: c8ae8bdb-b164-431d-85f2-d9e6e8806071

License:

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

History

Date received : 10 September 2019

Date accepted : 2 December 2019

Funding

Funded by: FundRef http://dx.doi.org/10.13039/100000153, Division of Biological Infrastructure;

Award ID: 1458178

Award ID: 1759518

Award Recipient : Mihaela Pertea

Funded by: FundRef http://dx.doi.org/10.13039/100000002, National Institutes of Health;

Award ID: R01-HG006677

Funded by: FundRef http://dx.doi.org/10.13039/100000057, National Institute of General Medical Sciences;

Award ID: R35GM13051

Custom metadata

ScienceOpen disciplines: Genetics

Keywords: transcriptome assembly,rna-seq,long-read sequencing,gene expression

Data availability:

ScienceOpen disciplines: Genetics

Keywords: transcriptome assembly, rna-seq, long-read sequencing, gene expression

Comments

Comment on this article

scite_

Cited by 557

The complete sequence of a human genome*
Authors: Sergey Nurk, Sergey Koren, Arang Rhie …
Nanopore sequencing technology, bioinformatics and applications
Authors: Yunhao Wang, Yue Zhao, Audrey Bollas …
GFF Utilities: GffRead and GffCompare
Authors: Geo M Pertea, Mihaela Pertea, Michael Love …

See all cited by

Most referenced authors 1,199

See all reference authors

- Version 1
- Version 1

Transcriptome assembly from long-read RNA-seq alignments with StringTie2

Read this article at

Abstract

Related collections

RNA drug delivery

Most cited references 22

Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms

Nanopore native RNA sequencing of a human poly(A) transcriptome

Improved data analysis for the MinION nanopore sequencer

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 68

Cited by 557

Most referenced authors 1,199