6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      RNA secondary structure prediction using deep learning with thermodynamic integration

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Accurate predictions of RNA secondary structures can help uncover the roles of functional non-coding RNAs. Although machine learning-based models have achieved high performance in terms of prediction accuracy, overfitting is a common risk for such highly parameterized models. Here we show that overfitting can be minimized when RNA folding scores learnt using a deep neural network are integrated together with Turner’s nearest-neighbor free energy parameters. Training the model with thermodynamic regularization ensures that folding scores and the calculated free energy are as close as possible. In computational experiments designed for newly discovered non-coding RNAs, our algorithm (MXfold2) achieves the most robust and accurate predictions of RNA secondary structures without sacrificing computational efficiency compared to several other algorithms. The results suggest that integrating thermodynamic information could help improve the robustness of deep learning-based predictions of RNA secondary structure.

          Abstract

          Accurately predicting the secondary structure of non-coding RNAs can help unravel their function. Here the authors propose a method integrating thermodynamic information and deep learning to improve the robustness of RNA secondary structure prediction compared to several existing algorithms.

          Related collections

          Most cited references40

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          Deep Residual Learning for Image Recognition

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Long Short-Term Memory

            Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O(1). Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              CD-HIT: accelerated for clustering the next-generation sequencing data

              Summary: CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to ∼24 cores and a quasi-linear speedup for up to ∼8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions. Availability: http://cd-hit.org. Contact: liwz@sdsc.edu Supplementary information: Supplementary data are available at Bioinformatics online.
                Bookmark

                Author and article information

                Contributors
                satoken@bio.keio.ac.jp
                Journal
                Nat Commun
                Nat Commun
                Nature Communications
                Nature Publishing Group UK (London )
                2041-1723
                11 February 2021
                11 February 2021
                2021
                : 12
                : 941
                Affiliations
                GRID grid.26091.3c, ISNI 0000 0004 1936 9959, Department of Biosciences and Informatics, , Keio University, ; 3–14–1 Hiyoshi, Kohoku-ku, Yokohama, Japan
                Author information
                http://orcid.org/0000-0001-6744-7390
                Article
                21194
                10.1038/s41467-021-21194-4
                7878809
                33574226
                844484bd-c215-4094-bfaf-b04f05d0afa6
                © The Author(s) 2021

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 22 September 2020
                : 15 January 2021
                Funding
                Funded by: FundRef https://doi.org/10.13039/501100001691, MEXT | Japan Society for the Promotion of Science (JSPS);
                Award ID: 19H04210
                Award ID: 19K22897
                Award ID: 17H06410
                Award ID: 18J21767
                Award ID: 17H06410
                Award Recipient :
                Categories
                Article
                Custom metadata
                © The Author(s) 2021

                Uncategorized
                structure determination,rna,machine learning,non-coding rnas
                Uncategorized
                structure determination, rna, machine learning, non-coding rnas

                Comments

                Comment on this article