6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The AlphaFold Database Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) has significantly impacted structural biology by amassing over 214 million predicted protein structures, expanding from the initial 300k structures released in 2021. Enabled by the groundbreaking AlphaFold2 artificial intelligence (AI) system, the predictions archived in AlphaFold DB have been integrated into primary data resources such as PDB, UniProt, Ensembl, InterPro and MobiDB. Our manuscript details subsequent enhancements in data archiving, covering successive releases encompassing model organisms, global health proteomes, Swiss-Prot integration, and a host of curated protein datasets. We detail the data access mechanisms of AlphaFold DB, from direct file access via FTP to advanced queries using Google Cloud Public Datasets and the programmatic access endpoints of the database. We also discuss the improvements and services added since its initial release, including enhancements to the Predicted Aligned Error viewer, customisation options for the 3D viewer, and improvements in the search engine of AlphaFold DB.

          Graphical Abstract

          Graphical Abstract

          Related collections

          Most cited references26

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Highly accurate protein structure prediction with AlphaFold

          Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort 1 – 4 , the structures of around 100,000 unique proteins have been determined 5 , but this represents a small fraction of the billions of known protein sequences 6 , 7 . Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’ 8 —has been an important open research problem for more than 50 years 9 . Despite recent progress 10 – 14 , existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14) 15 , demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm. AlphaFold predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            BLAST+: architecture and applications

            Background Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. Results We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. Conclusion The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Accurate prediction of protein structures and interactions using a 3-track neural network

              DeepMind presented remarkably accurate predictions at the recent CASP14 protein structure prediction assessment conference. We explored network architectures incorporating related ideas and obtained the best performance with a 3-track network in which information at the 1D sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The 3-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables the rapid solution of challenging X-ray crystallography and cryo-EM structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate protein-protein complex models from sequence information alone, short circuiting traditional approaches which require modeling of individual subunits followed by docking. We make the method available to the scientific community to speed biological research.
                Bookmark

                Author and article information

                Contributors
                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                05 January 2024
                02 November 2023
                02 November 2023
                : 52
                : D1
                : D368-D375
                Affiliations
                European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton, UK
                European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton, UK
                European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton, UK
                European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton, UK
                European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton, UK
                European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton, UK
                European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton, UK
                European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton, UK
                School of Biological Sciences, Seoul National University , Seoul, South Korea
                School of Biological Sciences, Seoul National University , Seoul, South Korea
                Google DeepMind , London, UK
                Google DeepMind , London, UK
                Google DeepMind , London, UK
                Google DeepMind , London, UK
                Google DeepMind , London, UK
                Google DeepMind , London, UK
                Google DeepMind , London, UK
                Google DeepMind , London, UK
                Google DeepMind , London, UK
                European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton, UK
                School of Biological Sciences, Seoul National University , Seoul, South Korea
                Google DeepMind , London, UK
                European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton, UK
                Author notes
                To whom correspondence should be addressed. Email: sameer@ 123456ebi.ac.uk
                Correspondence may also be addressed to Demis Hassabis. Email: dhcontact@ 123456deepmind.com
                Author information
                https://orcid.org/0000-0002-3687-0839
                https://orcid.org/0000-0001-8314-8497
                https://orcid.org/0000-0002-8439-5964
                Article
                gkad1011
                10.1093/nar/gkad1011
                10767828
                37933859
                b5e6e066-3033-49a0-8685-25001a9f3020
                © The Author(s) 2023. Published by Oxford University Press on behalf of Nucleic Acids Research.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 18 October 2023
                : 13 October 2023
                : 29 September 2023
                Page count
                Pages: 8
                Funding
                Funded by: Google DeepMind;
                Funded by: National Research Foundation of Korea, DOI 10.13039/501100003725;
                Award ID: 2019R1A6A1A10073437
                Award ID: 2020M3A9G7103933
                Award ID: 2021R1C1C102065
                Award ID: 2021M3A9I4021220
                Funded by: Samsung DS Research Fund;
                Funded by: Seoul National University, DOI 10.13039/501100002551;
                Funded by: National Research Foundation of Korea, DOI 10.13039/501100003725;
                Award ID: RS-2023-00250470
                Categories
                AcademicSubjects/SCI00010
                Database Issue

                Genetics
                Genetics

                Comments

                Comment on this article