3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Challenges in identifying mRNA transcript starts and ends from long-read sequencing data

      Preprint
      research-article
      1 , 1 , 1
      bioRxiv
      Cold Spring Harbor Laboratory

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Long-read sequencing (LRS) technologies have the potential to revolutionize scientific discoveries in RNA biology, especially by enabling the comprehensive identification and quantification of full length mRNA isoforms. However, inherently high error rates make the analysis of long-read sequencing data challenging. While these error rates have been characterized for sequence and splice site identification, it is still unclear how accurately LRS reads represent transcript start and end sites. Here, we systematically assess the variability and accuracy of mRNA terminal ends identified by LRS reads across multiple sequencing platforms. We find substantial inconsistencies in both the start and end coordinates of LRS reads spanning a gene, such that LRS reads often fail to accurately recapitulate annotated or empirically derived terminal ends of mRNA molecules. To address this challenge, we introduce an approach to condition reads based on empirically derived terminal ends and identified a subset of reads that are more likely to represent full-length transcripts. Our approach can improve transcriptome analyses by enhancing the fidelity of transcript terminal end identification, but may result in lower power to quantify genes or discover novel isoforms. Thus, it is necessary to be cautious when selecting sequencing approaches and/or interpreting data from long-read RNA sequencing.

          Related collections

          Most cited references42

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Ensembl 2021

          Abstract The Ensembl project (https://www.ensembl.org) annotates genomes and disseminates genomic data for vertebrate species. We create detailed and comprehensive annotation of gene structures, regulatory elements and variants, and enable comparative genomics by inferring the evolutionary history of genes and genomes. Our integrated genomic data are made available in a variety of ways, including genome browsers, search interfaces, specialist tools such as the Ensembl Variant Effect Predictor, download files and programmatic interfaces. Here, we present recent Ensembl developments including two new website portals. Ensembl Rapid Release (http://rapid.ensembl.org) is designed to provide core tools and services for genomes as soon as possible and has been deployed to support large biodiversity sequencing projects. Our SARS-CoV-2 genome browser (https://covid-19.ensembl.org) integrates our own annotation with publicly available genomic data from numerous sources to facilitate the use of genomics in the international scientific response to the COVID-19 pandemic. We also report on other updates to our annotation resources, tools and services. All Ensembl data and software are freely available without restriction.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Opportunities and challenges in long-read sequencing data analysis

            Long-read technologies are overcoming early limitations in accuracy and throughput, broadening their application domains in genomics. Dedicated analysis tools that take into account the characteristics of long-read data are thus required, but the fast pace of development of such tools can be overwhelming. To assist in the design and analysis of long-read sequencing projects, we review the current landscape of available tools and present an online interactive database, long-read-tools.org, to facilitate their browsing. We further focus on the principles of error correction, base modification detection, and long-read transcriptomics analysis and highlight the challenges that remain.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Nanopore native RNA sequencing of a human poly(A) transcriptome

              High throughput cDNA sequencing technologies have advanced our understanding of transcriptome complexity and regulation. However, these methods lose information contained in biological RNA because the copied reads are often short and because modifications are not retained. We address these limitations using a native poly(A) RNA sequencing strategy developed by Oxford Nanopore Technologies (ONT). Our study generated 9.9 million aligned sequence reads for the human cell line GM12878, using thirty MinION flow cells at six institutions. These native RNA reads had a median length of 771 bases, and a maximum aligned length of over 21,000 bases. Mitochondrial poly(A) reads provided an internal measure of read length quality. We combined these long nanopore reads with higher accuracy short-reads and annotated GM12878 promoter regions, to identify 33,984 plausible RNA isoforms. We describe strategies for assessing 3′ poly(A) tail length, base modifications, and transcript haplotypes.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: MethodologyRole: Formal analysisRole: InvestigationRole: Writing - Original DraftRole: Visualization
                Role: ConceptualizationRole: InvestigationRole: Writing - Original Draft
                Role: ConceptualizationRole: InvestigationRole: Writing - Review & EditingRole: Supervision
                Journal
                bioRxiv
                BIORXIV
                bioRxiv
                Cold Spring Harbor Laboratory
                27 July 2023
                : 2023.07.26.550536
                Affiliations
                [1. ]RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, MA
                Author notes
                [* ]corresponding author: athma.pai@ 123456umassmed.edu
                Article
                10.1101/2023.07.26.550536
                10402045
                37546743
                4c1e4907-9bd9-431a-b4b9-43682fe1170b

                This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.

                History
                Funding
                Funded by: National Institute of General Medical Sciences, ECR, RFD, AAP, National Institute of Allergy and Infectious Disease, RFD
                Award ID: R35GM133762, R21AI166281
                Categories
                Article

                Comments

                Comment on this article