Accurate prediction of protein–nucleic acid complexes using RoseTTAFoldNA

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Protein–RNA and protein–DNA complexes play critical roles in biology. Despite considerable recent advances in protein structure prediction, the prediction of the structures of protein–nucleic acid complexes without homology to known complexes is a largely unsolved problem. Here we extend the RoseTTAFold machine learning protein-structure-prediction approach to additionally predict nucleic acid and protein–nucleic acid complexes. We develop a single trained network, RoseTTAFoldNA, that rapidly produces three-dimensional structure models with confidence estimates for protein–DNA and protein–RNA complexes. Here we show that confident predictions have considerably higher accuracy than current state-of-the-art methods. RoseTTAFoldNA should be broadly useful for modeling the structure of naturally occurring protein–nucleic acid complexes, and for designing sequence-specific RNA and DNA-binding proteins.

Abstract

RoseTTAFoldNA extends the RoseTTAFold2 platform to predict the structures of protein–DNA and protein–RNA complexes.

Related collections

Most cited references 33

Record: found
Abstract: found
Article: found

Is Open Access

Highly accurate protein structure prediction with AlphaFold

John Jumper, Richard Evans, Alexander Pritzel … (2021)

Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort 1 – 4 , the structures of around 100,000 unique proteins have been determined 5 , but this represents a small fraction of the billions of known protein sequences 6 , 7 . Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’ 8 —has been an important open research problem for more than 50 years 9 . Despite recent progress 10 – 14 , existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14) 15 , demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm. AlphaFold predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture.

0 comments Cited 10847 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

S Altschul (1997)

The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

0 comments Cited 4428 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Accurate prediction of protein structures and interactions using a 3-track neural network

Minkyung Baek, Frank DiMaio, Ivan V Anishchenko … (2022)

DeepMind presented remarkably accurate predictions at the recent CASP14 protein structure prediction assessment conference. We explored network architectures incorporating related ideas and obtained the best performance with a 3-track network in which information at the 1D sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The 3-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables the rapid solution of challenging X-ray crystallography and cryo-EM structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate protein-protein complex models from sequence information alone, short circuiting traditional approaches which require modeling of individual subunits followed by docking. We make the method available to the scientific community to speed biological research.

0 comments Cited 1551 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Frank DiMaio:

ORCID: http://orcid.org/0000-0002-7524-8938

dimaio@uw.edu

Journal

Journal ID (nlm-ta): Nat Methods

Journal ID (iso-abbrev): Nat Methods

Title: Nature Methods

Publisher: Nature Publishing Group US (New York )

ISSN (Print): 1548-7091

ISSN (Electronic): 1548-7105

Publication date (Electronic): 23 November 2023

Publication date PMC-release: 23 November 2023

Publication date (Print): 2024

Volume: 21

Issue: 1

Pages: 117-121

Affiliations

[1 ]School of Biological Sciences, Seoul National University, ( https://ror.org/04h9pn542) Seoul, Republic of Korea

[2 ]Department of Biochemistry, University of Washington, ( https://ror.org/00cvxb145) Seattle, WA USA

[3 ]Institute for Protein Design, University of Washington, ( https://ror.org/00cvxb145) Seattle, WA USA

[4 ]GRID grid.47840.3f, ISNI 0000 0001 2181 7878, Department of Electrical Engineering and Computer Sciences, , University of California, ; Berkeley, CA USA

[5 ]GRID grid.34477.33, ISNI 0000000122986657, Howard Hughes Medical Institute, , University of Washington, ; Seattle, WA USA

Author information

Minkyung Baek http://orcid.org/0000-0003-3414-9404

Ryan McHugh http://orcid.org/0000-0003-0291-2196

Ivan Anishchenko http://orcid.org/0000-0003-3645-2044

David Baker http://orcid.org/0000-0001-7896-6217

Frank DiMaio http://orcid.org/0000-0002-7524-8938

Article

Publisher ID: 2086

DOI: 10.1038/s41592-023-02086-5

PMC ID: 10776382

PubMed ID: 37996753

SO-VID: 193a67c6-3643-40dd-9d56-f4c5151a2a9a

License:

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

History

Date received : 12 January 2023

Date accepted : 16 October 2023

Funding

Funded by: FundRef https://doi.org/10.13039/100000057, U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS);

Award ID: GM123089

Award Recipient : Frank DiMaio

Custom metadata

ScienceOpen disciplines: Life sciences

Keywords: machine learning,dna-binding proteins,rna-binding proteins

Data availability:

ScienceOpen disciplines: Life sciences

Keywords: machine learning, dna-binding proteins, rna-binding proteins

Comments

Comment on this article

scite_

Cited by 35

See all cited by

Most referenced authors 1,231

See all reference authors

- Version 1
- Version 1

Accurate prediction of protein–nucleic acid complexes using RoseTTAFoldNA

Read this article at

Abstract

Abstract

Related collections

Artificial Intelligence in Medicine

Most cited references 33

Highly accurate protein structure prediction with AlphaFold

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Accurate prediction of protein structures and interactions using a 3-track neural network

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 195

Cited by 35

Most referenced authors 1,231