45
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Letting the data speak for themselves: a fully Bayesian approach to transcriptome assembly

      research-article
      Genome Biology
      BioMed Central

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          A novel method for transcriptome assembly, Bayesembler, provides greater accuracy without sacrifice of computational speed, and particular advantages for alternative transcripts expressed at low levels.

          Related collections

          Most cited references9

          • Record: found
          • Abstract: found
          • Article: not found

          Assessment of transcript reconstruction methods for RNA-seq

          RNA sequencing (RNA-seq) is transforming genome biology, enabling comprehensive transcriptome profiling with unprecendented accuracy and detail. Due to technical limitations of current high-throughput sequencing platforms, transcript identity, structure and expression level must be inferred programmatically from partial sequence reads of fragmented gene products. We evaluated 24 protocol variants of 14 independent computational methods for exon identification, transcript reconstruction and expression level quantification from RNA-seq data. Our results show that most algorithms are able to identify discrete transcript components with high success rates, but that assembly of complete isoform structures poses a major challenge even when all constituent elements are identified. Expression level estimates also varied widely across methods, even when based on similar transcript models. Consequently, the complexity of higher eukaryotic genomes imposes severe limitations in transcript recall and splice product discrimination that are likely to remain limiting factors for the analysis of current-generation RNA-seq data.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            IsoLasso: a LASSO regression approach to RNA-Seq based transcriptome assembly.

            The new second generation sequencing technology revolutionizes many biology-related research fields and poses various computational biology challenges. One of them is transcriptome assembly based on RNA-Seq data, which aims at reconstructing all full-length mRNA transcripts simultaneously from millions of short reads. In this article, we consider three objectives in transcriptome assembly: the maximization of prediction accuracy, minimization of interpretation, and maximization of completeness. The first objective, the maximization of prediction accuracy, requires that the estimated expression levels based on assembled transcripts should be as close as possible to the observed ones for every expressed region of the genome. The minimization of interpretation follows the parsimony principle to seek as few transcripts in the prediction as possible. The third objective, the maximization of completeness, requires that the maximum number of mapped reads (or ?expressed segments? in gene models) be explained by (i.e., contained in) the predicted transcripts in the solution. Based on the above three objectives, we present IsoLasso, a new RNA-Seq based transcriptome assembly tool. IsoLasso is based on the well-known LASSO algorithm, a multivariate regression method designated to seek a balance between the maximization of prediction accuracy and the minimization of interpretation. By including some additional constraints in the quadratic program involved in LASSO, IsoLasso is able to make the set of assembled transcripts as complete as possible. Experiments on simulated and real RNA-Seq datasets show that IsoLasso achieves, simultaneously, higher sensitivity and precision than the state-of-art transcript assembly tools.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              iReckon: Simultaneous isoform discovery and abundance estimation from RNA-seq data

              High-throughput RNA sequencing (RNA-seq) promises to revolutionize our understanding of genes and their role in human disease by characterizing the RNA content of tissues and cells. The realization of this promise, however, is conditional on the development of effective computational methods for the identification and quantification of transcripts from incomplete and noisy data. In this article, we introduce iReckon, a method for simultaneous determination of the isoforms and estimation of their abundances. Our probabilistic approach incorporates multiple biological and technical phenomena, including novel isoforms, intron retention, unspliced pre-mRNA, PCR amplification biases, and multimapped reads. iReckon utilizes regularized expectation-maximization to accurately estimate the abundances of known and novel isoforms. Our results on simulated and real data demonstrate a superior ability to discover novel isoforms with a significantly reduced number of false-positive predictions, and our abundance accuracy prediction outmatches that of other state-of-the-art tools. Furthermore, we have applied iReckon to two cancer transcriptome data sets, a triple-negative breast cancer patient sample and the MCF7 breast cancer cell line, and show that iReckon is able to reconstruct the complex splicing changes that were not previously identified. QT-PCR validations of the isoforms detected in the MCF7 cell line confirmed all of iReckon's predictions and also showed strong agreement ( r 2 = 0.94) with the predicted abundances.
                Bookmark

                Author and article information

                Contributors
                mschulz@mmci.uni-saarland.de
                Journal
                Genome Biol
                Genome Biology
                BioMed Central (London )
                1465-6906
                1465-6914
                31 October 2014
                31 October 2014
                2014
                : 15
                : 10
                : 498
                Affiliations
                [ ]Excellence Cluster for Multimodal Computing and Interaction, Saarland University, Saarbrücken, Saarland 66123 Germany
                [ ]Department for Computational Biology and Applied Computing, Max Planck Institute for Informatics, Saarbrücken, Saarland 66123 Germany
                Article
                498
                10.1186/s13059-014-0498-8
                4318165
                e3fdcf04-6db8-4cb4-907e-7635d72fbd7d
                © Schulz; licensee BioMed Central Ltd. 2014

                The licensee has exclusive rights to distribute this article, in any medium, for 12 months following its publication. After this time, the article is available under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                Categories
                Research Highlight
                Custom metadata
                © The Author(s) 2014

                Genetics
                Genetics

                Comments

                Comment on this article