32
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Current progress and open challenges for applying deep learning across the biosciences

      review-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Deep Learning (DL) has recently enabled unprecedented advances in one of the grand challenges in computational biology: the half-century-old problem of protein structure prediction. In this paper we discuss recent advances, limitations, and future perspectives of DL on five broad areas: protein structure prediction, protein function prediction, genome engineering, systems biology and data integration, and phylogenetic inference. We discuss each application area and cover the main bottlenecks of DL approaches, such as training data, problem scope, and the ability to leverage existing DL architectures in new contexts. To conclude, we provide a summary of the subject-specific and general challenges for DL across the biosciences.

          Abstract

          Deep learning has enabled advances in understanding biology. In this review, the authors outline advances, and limitations of deep learning in five broad areas and the future challenges for the biosciences.

          Related collections

          Most cited references139

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          limma powers differential expression analyses for RNA-sequencing and microarray studies

          limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Highly accurate protein structure prediction with AlphaFold

            Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort 1 – 4 , the structures of around 100,000 unique proteins have been determined 5 , but this represents a small fraction of the billions of known protein sequences 6 , 7 . Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’ 8 —has been an important open research problem for more than 50 years 9 . Despite recent progress 10 – 14 , existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14) 15 , demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm. AlphaFold predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Basic local alignment search tool.

              A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.
                Bookmark

                Author and article information

                Contributors
                treangen@rice.edu
                Journal
                Nat Commun
                Nat Commun
                Nature Communications
                Nature Publishing Group UK (London )
                2041-1723
                1 April 2022
                1 April 2022
                2022
                : 13
                : 1728
                Affiliations
                [1 ]GRID grid.21940.3e, ISNI 0000 0004 1936 8278, Department of Computer Science, , Rice University, ; Houston, TX USA
                [2 ]GRID grid.47840.3f, ISNI 0000 0001 2181 7878, Department of Electrical Engineering and Computer Sciences, , University of California Berkeley, ; Berkeley, CA USA
                [3 ]GRID grid.266436.3, ISNI 0000 0004 1569 9707, Department of Biology and Biochemistry, , University of Houston, ; Houston, TX USA
                [4 ]GRID grid.21940.3e, ISNI 0000 0004 1936 8278, Department of Electrical and Computer Engineering, , Rice University, ; Houston, TX USA
                [5 ]GRID grid.21940.3e, ISNI 0000 0004 1936 8278, Department of Bioengineering, , Rice University, ; Houston, TX USA
                Author information
                http://orcid.org/0000-0002-0736-5075
                http://orcid.org/0000-0001-7947-6455
                http://orcid.org/0000-0002-9738-1916
                http://orcid.org/0000-0003-3288-6769
                http://orcid.org/0000-0003-2433-5553
                http://orcid.org/0000-0002-3201-9983
                http://orcid.org/0000-0002-3760-564X
                Article
                29268
                10.1038/s41467-022-29268-7
                8976012
                35365602
                22a731d1-e2e2-4e5e-a848-33eccfd9c80d
                © The Author(s) 2022

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 25 August 2021
                : 9 March 2022
                Funding
                Funded by: FundRef https://doi.org/10.13039/100011039, ODNI | Intelligence Advanced Research Projects Activity (IARPA);
                Award ID: W911NF-17-2-0089
                Award Recipient :
                Funded by: FundRef https://doi.org/10.13039/100000060, U.S. Department of Health & Human Services | NIH | National Institute of Allergy and Infectious Diseases (NIAID);
                Award ID: P01AI152999-01
                Award Recipient :
                Funded by: FundRef https://doi.org/10.13039/100000001, National Science Foundation (NSF);
                Award ID: EF-2126387
                Award Recipient :
                Funded by: FundRef https://doi.org/10.13039/100000092, U.S. Department of Health & Human Services | NIH | U.S. National Library of Medicine (NLM);
                Award ID: T15LM007093
                Award Recipient :
                Categories
                Review Article
                Custom metadata
                © The Author(s) 2022

                Uncategorized
                computer science,computational biology and bioinformatics,machine learning
                Uncategorized
                computer science, computational biology and bioinformatics, machine learning

                Comments

                Comment on this article