Transcription start site signal profiling improves transposable element RNA expression analysis at locus-level

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The transcriptional activity of Transposable Elements (TEs) has been involved in numerous pathological processes, including neurodegenerative diseases such as amyotrophic lateral sclerosis and frontotemporal lobar degeneration. The TE expression analysis from short-read sequencing technologies is, however, challenging due to the multitude of similar sequences derived from singular TEs subfamilies and the exaptation of TEs within longer coding or non-coding RNAs. Specialised tools have been developed to quantify the expression of TEs that either relies on probabilistic re-distribution of multimapper count fractions or allow for discarding multimappers altogether. Until now, the benchmarking across those tools was largely limited to aggregated expression estimates over whole TEs subfamilies. Here, we compared the performance of recently published tools (SQuIRE, TElocal, SalmonTE) with simplistic quantification strategies (featureCounts in unique, fraction and random modes) at the individual loci level. Using simulated datasets, we examined the false discovery rate and the primary driver of those false positive hits in the optimal quantification strategy. Our findings suggest a high false discovery number that exceeds the total number of correctly recovered active loci for all the quantification strategies, including the best performing tool TElocal. As a remedy, filtering based on the minimum number of read counts or baseMean expression improves the F1 score and decreases the number of false positives. Finally, we demonstrate that additional profiling of Transcription Start Site mapping statistics (using a k-means clustering approach) significantly improves the performance of TElocal while reporting a reliable set of detected and differentially expressed TEs in human simulated RNA-seq data.

Related collections

Most cited references 39

Record: found
Abstract: found
Article: not found

STAR: ultrafast universal RNA-seq aligner.

Alexander Dobin, Carrie A. Davis, Felix Schlesinger … (2013)

Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

0 comments Cited 16875 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

featureCounts: an efficient general purpose program for assigning sequence reads to genomic features.

Y. Liao, G K Smyth, W Shi (2014)

Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.

0 comments Cited 8699 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

deepTools2: a next generation web server for deep-sequencing data analysis

Fidel Ramírez, Devon P Ryan, Björn Grüning … (2016)

We present an update to our Galaxy-based web server for processing and visualizing deeply sequenced data. Its core tool set, deepTools, allows users to perform complete bioinformatic workflows ranging from quality controls and normalizations of aligned reads to integrative analyses, including clustering and visualization approaches. Since we first described our deepTools Galaxy server in 2014, we have implemented new solutions for many requests from the community and our users. Here, we introduce significant enhancements and new tools to further improve data visualization and interpretation. deepTools continue to be open to all users and freely available as a web service at deeptools.ie-freiburg.mpg.de. The new deepTools2 suite can be easily deployed within any Galaxy framework via the toolshed repository, and we also provide source code for command line usage under Linux and Mac OS X. A public and documented API for access to deepTools functionality is also available.

0 comments Cited 2982 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Peter Heutink: URI : https://loop.frontiersin.org/people/452579/overview

Vikas Bansal: URI : https://loop.frontiersin.org/people/1104170/overview

Journal

Journal ID (nlm-ta): Front Genet

Journal ID (iso-abbrev): Front Genet

Journal ID (publisher-id): Front. Genet.

Title: Frontiers in Genetics

Publisher: Frontiers Media S.A.

ISSN (Electronic): 1664-8021

Publication date (Electronic): 21 October 2022

Publication date Collection: 2022

Volume: 13

Electronic Location Identifier: 1026847

Affiliations

German Center for Neurodegenerative Diseases (DZNE) , Tübingen, Germany

Author notes

Edited by: Jared C. Roach, Institute for Systems Biology (ISB), United States

Reviewed by: Justin Blumenstiel, University of Kansas, United States

Marika Drouin, Université de Sherbrooke, Canada

*Correspondence: Vikas Bansal, vikas.bansal@ 123456dzne.de

This article was submitted to Human and Medical Genomics, a section of the journal Frontiers in Genetics

Article

Publisher ID: 1026847

DOI: 10.3389/fgene.2022.1026847

PMC ID: 9633680

PubMed ID: 36338986

SO-VID: b011de68-eb45-41a7-9bf2-c77bd7a2794e

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

History

Date received : 24 August 2022

Date accepted : 11 October 2022

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Cited by 4

See all cited by

Most referenced authors 457

See all reference authors

Transcription start site signal profiling improves transposable element RNA expression analysis at locus-level

Read this article at

Abstract

Related collections

RNA drug delivery

Most cited references 39

STAR: ultrafast universal RNA-seq aligner.

featureCounts: an efficient general purpose program for assigning sequence reads to genomic features.

deepTools2: a next generation web server for deep-sequencing data analysis

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 156

Cited by 4

Most referenced authors 457