RNA secondary structure prediction using deep learning with thermodynamic integration

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Accurate predictions of RNA secondary structures can help uncover the roles of functional non-coding RNAs. Although machine learning-based models have achieved high performance in terms of prediction accuracy, overfitting is a common risk for such highly parameterized models. Here we show that overfitting can be minimized when RNA folding scores learnt using a deep neural network are integrated together with Turner’s nearest-neighbor free energy parameters. Training the model with thermodynamic regularization ensures that folding scores and the calculated free energy are as close as possible. In computational experiments designed for newly discovered non-coding RNAs, our algorithm (MXfold2) achieves the most robust and accurate predictions of RNA secondary structures without sacrificing computational efficiency compared to several other algorithms. The results suggest that integrating thermodynamic information could help improve the robustness of deep learning-based predictions of RNA secondary structure.

Abstract

Accurately predicting the secondary structure of non-coding RNAs can help unravel their function. Here the authors propose a method integrating thermodynamic information and deep learning to improve the robustness of RNA secondary structure prediction compared to several existing algorithms.

Related collections

Most cited references 40

Record: found
Abstract: not found
Conference Proceedings: not found

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren … (2019)

0 comments Cited 8516 times – based on 0 reviews

Bookmark

Record: found
Abstract: found
Article: not found

Long Short-Term Memory

Jürgen Schmidhuber, Jürgen Schmidhuber (2002)

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O(1). Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.

0 comments Cited 7086 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

CD-HIT: accelerated for clustering the next-generation sequencing data

Limin Fu, Beifang Niu, Zhengwei Zhu … (2012)

Summary: CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to ∼24 cores and a quasi-linear speedup for up to ∼8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions. Availability: http://cd-hit.org. Contact: liwz@sdsc.edu Supplementary information: Supplementary data are available at Bioinformatics online.

0 comments Cited 2550 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Kengo Sato:

ORCID: http://orcid.org/0000-0001-6744-7390

satoken@bio.keio.ac.jp

Journal

Journal ID (nlm-ta): Nat Commun

Journal ID (iso-abbrev): Nat Commun

Title: Nature Communications

Publisher: Nature Publishing Group UK (London )

ISSN (Electronic): 2041-1723

Publication date (Electronic): 11 February 2021

Publication date PMC-release: 11 February 2021

Publication date Collection: 2021

Volume: 12

Electronic Location Identifier: 941

Affiliations

GRID grid.26091.3c, ISNI 0000 0004 1936 9959, Department of Biosciences and Informatics, , Keio University, ; 3–14–1 Hiyoshi, Kohoku-ku, Yokohama, Japan

Author information

Kengo Sato http://orcid.org/0000-0001-6744-7390

Article

Publisher ID: 21194

DOI: 10.1038/s41467-021-21194-4

PMC ID: 7878809

PubMed ID: 33574226

SO-VID: 844484bd-c215-4094-bfaf-b04f05d0afa6

License:

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

History

Date received : 22 September 2020

Date accepted : 15 January 2021

Funding

Funded by: FundRef https://doi.org/10.13039/501100001691, MEXT | Japan Society for the Promotion of Science (JSPS);

Award ID: 19H04210

Award ID: 19K22897

Award ID: 17H06410

Award ID: 18J21767

Award ID: 17H06410

Award Recipient : Kengo Sato Manato Akiyama Yasubumi Sakakibara

Custom metadata

ScienceOpen disciplines: Uncategorized

Keywords: structure determination,rna,machine learning,non-coding rnas

Data availability:

ScienceOpen disciplines: Uncategorized

Keywords: structure determination, rna, machine learning, non-coding rnas

Comments

Comment on this article

scite_

Cited by 99

See all cited by

Most referenced authors 2,048

See all reference authors

- Version 1

RNA secondary structure prediction using deep learning with thermodynamic integration

Read this article at

Abstract

Abstract

Related collections

RNA drug delivery

Most cited references 40

Deep Residual Learning for Image Recognition

Long Short-Term Memory

CD-HIT: accelerated for clustering the next-generation sequencing data

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 234

Cited by 99

Most referenced authors 2,048