16
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Parallel computation of genome-scale RNA secondary structure to detect structural constraints on human genome

      research-article
      ,
      BMC Bioinformatics
      BioMed Central
      RNA secondary structure prediction, Parallel computation, PARS, Intron, Splicing

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          RNA secondary structure around splice sites is known to assist normal splicing by promoting spliceosome recognition. However, analyzing the structural properties of entire intronic regions or pre-mRNA sequences has been difficult hitherto, owing to serious experimental and computational limitations, such as low read coverage and numerical problems.

          Results

          Our novel software, “ParasoR”, is designed to run on a computer cluster and enables the exact computation of various structural features of long RNA sequences under the constraint of maximal base-pairing distance. ParasoR divides dynamic programming (DP) matrices into smaller pieces, such that each piece can be computed by a separate computer node without losing the connectivity information between the pieces. ParasoR directly computes the ratios of DP variables to avoid the reduction of numerical precision caused by the cancellation of a large number of Boltzmann factors. The structural preferences of mRNAs computed by ParasoR shows a high concordance with those determined by high-throughput sequencing analyses.

          Using ParasoR, we investigated the global structural preferences of transcribed regions in the human genome. A genome-wide folding simulation indicated that transcribed regions are significantly more structural than intergenic regions after removing repeat sequences and k-mer frequency bias. In particular, we observed a highly significant preference for base pairing over entire intronic regions as compared to their antisense sequences, as well as to intergenic regions. A comparison between pre-mRNAs and mRNAs showed that coding regions become more accessible after splicing, indicating constraints for translational efficiency. Such changes are correlated with gene expression levels, as well as GC content, and are enriched among genes associated with cytoskeleton and kinase functions.

          Conclusions

          We have shown that ParasoR is very useful for analyzing the structural properties of long RNA sequences such as mRNAs, pre-mRNAs, and long non-coding RNAs whose lengths can be more than a million bases in the human genome. In our analyses, transcribed regions including introns are indicated to be subject to various types of structural constraints that cannot be explained from simple sequence composition biases. ParasoR is freely available at https://github.com/carushi/ParasoR.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s12859-016-1067-9) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references38

          • Record: found
          • Abstract: found
          • Article: not found

          The equilibrium partition function and base pair binding probabilities for RNA secondary structure.

          A novel application of dynamic programming to the folding problem for RNA enables one to calculate the full equilibrium partition function for secondary structure and the probabilities of various substructures. In particular, both the partition function and the probabilities of all base pairs are computed by a recursive scheme of polynomial order N3 in the sequence length N. The temperature dependence of the partition function gives information about melting behavior for the secondary structure. The pair binding probabilities, the computation of which depends on the partition function, are visually summarized in a "box matrix" display and this provides a useful tool for examining the full ensemble of probable alternative equilibrium structures. The calculation of this ensemble representation allows a proper application and assessment of the predictive power of the secondary structure method, and yields important information on alternatives and intermediates in addition to local information about base pair opening and slippage. The results are illustrated for representative tRNA, 5S RNA, and self-replicating and self-splicing RNA molecules, and allow a direct comparison with enzymatic structure probes. The effect of changes in the thermodynamic parameters on the equilibrium ensemble provides a further sensitivity check to the predictions.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Alternative splicing: a pivotal step between eukaryotic transcription and translation.

            Alternative splicing was discovered simultaneously with splicing over three decades ago. Since then, an enormous body of evidence has demonstrated the prevalence of alternative splicing in multicellular eukaryotes, its key roles in determining tissue- and species-specific differentiation patterns, the multiple post- and co-transcriptional regulatory mechanisms that control it, and its causal role in hereditary disease and cancer. The emerging evidence places alternative splicing in a central position in the flow of eukaryotic genetic information, between transcription and translation, in that it can respond not only to various signalling pathways that target the splicing machinery but also to transcription factors and chromatin structure.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Ensembl Genomes 2016: more genomes, more complexity

              Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces.
                Bookmark

                Author and article information

                Contributors
                kawaguchi-rs@cb.k.u-tokyo.ac.jp
                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                6 May 2016
                6 May 2016
                2016
                : 17
                : 203
                Affiliations
                Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8561 Japan
                Article
                1067
                10.1186/s12859-016-1067-9
                4858847
                27153986
                86345479-8123-4dfa-a1fa-be07508861ba
                © Kawaguchi and Kiryu. 2016

                Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 4 December 2015
                : 29 April 2016
                Funding
                Funded by: JSPS KAKENHI
                Award ID: 22240031
                Award ID: 14J00402
                Award ID: 25870190
                Award ID: 25134701
                Award ID: 15H01465
                Award Recipient :
                Categories
                Methodology Article
                Custom metadata
                © The Author(s) 2016

                Bioinformatics & Computational biology
                rna secondary structure prediction,parallel computation,pars,intron,splicing

                Comments

                Comment on this article