Incorporating Machine Learning into Established Bioinformatics Frameworks

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The exponential growth of biomedical data in recent years has urged the application of numerous machine learning techniques to address emerging problems in biology and clinical research. By enabling the automatic feature extraction, selection, and generation of predictive models, these methods can be used to efficiently study complex biological systems. Machine learning techniques are frequently integrated with bioinformatic methods, as well as curated databases and biological networks, to enhance training and validation, identify the best interpretable features, and enable feature and model investigation. Here, we review recently developed methods that incorporate machine learning within the same framework with techniques from molecular evolution, protein structure analysis, systems biology, and disease genomics. We outline the challenges posed for machine learning, and, in particular, deep learning in biomedicine, and suggest unique opportunities for machine learning techniques integrated with established bioinformatics approaches to overcome some of these challenges.

Related collections

Most cited references 223

Record: found
Abstract: found
Article: not found

Deep learning.

Yann LeCun, Yoshua Bengio, Geoffrey E Hinton (2015)

Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

0 comments Cited 8920 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

mixOmics: An R package for ‘omics feature selection and multiple data integration

Florian Rohart, Benoît Gautier, Amrit Singh … (2017)

The advent of high throughput technologies has led to a wealth of publicly available ‘omics data coming from different sources, such as transcriptomics, proteomics, metabolomics. Combining such large-scale biological data sets can lead to the discovery of important biological insights, provided that relevant information can be extracted in a holistic manner. Current statistical approaches have been focusing on identifying small subsets of molecules (a ‘molecular signature’) to explain or predict biological conditions, but mainly for a single type of ‘omics. In addition, commonly used methods are univariate and consider each biological feature independently. We introduce mixOmics, an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation. By adopting a systems biology approach, the toolkit provides a wide range of methods that statistically integrate several data sets at once to probe relationships between heterogeneous ‘omics data sets. Our recent methods extend Projection to Latent Structure (PLS) models for discriminant analysis, for data integration across multiple ‘omics data or across independent studies, and for the identification of molecular signatures. We illustrate our latest mixOmics integrative frameworks for the multivariate analyses of ‘omics data available from the package.

0 comments Cited 1157 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Improved protein structure prediction using potentials from deep learning

Andrew W. Senior, Richard Evans, John Jumper … (2020)

Protein structure prediction can be used to determine the three-dimensional shape of a protein from its amino acid sequence1. This problem is of fundamental importance as the structure of a protein largely determines its function2; however, protein structures can be difficult to determine experimentally. Considerable progress has recently been made by leveraging genetic information. It is possible to infer which amino acid residues are in contact by analysing covariation in homologous sequences, which aids in the prediction of protein structures3. Here we show that we can train a neural network to make accurate predictions of the distances between pairs of residues, which convey more information about the structure than contact predictions. Using this information, we construct a potential of mean force4 that can accurately describe the shape of a protein. We find that the resulting potential can be optimized by a simple gradient descent algorithm to generate structures without complex sampling procedures. The resulting system, named AlphaFold, achieves high accuracy, even for sequences with fewer homologous sequences. In the recent Critical Assessment of Protein Structure Prediction5 (CASP13)-a blind assessment of the state of the field-AlphaFold created high-accuracy structures (with template modelling (TM) scores6 of 0.7 or higher) for 24 out of 43 free modelling domains, whereas the next best method, which used sampling and contact information, achieved such accuracy for only 14 out of 43 domains. AlphaFold represents a considerable advance in protein-structure prediction. We expect this increased accuracy to enable insights into the function and malfunction of proteins, especially in cases for which no structures for homologous proteins have been experimentally determined7.

0 comments Cited 934 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Jung Hun Oh: Role: Academic Editor

Journal

Journal ID (nlm-ta): Int J Mol Sci

Journal ID (iso-abbrev): Int J Mol Sci

Journal ID (publisher-id): ijms

Title: International Journal of Molecular Sciences

Publisher: MDPI

ISSN (Electronic): 1422-0067

Publication date (Electronic): 12 March 2021

Publication date Collection: March 2021

Volume: 22

Issue: 6

Electronic Location Identifier: 2903

Affiliations

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; ayal.gussow@ 123456nih.gov

Author notes

[* ]Correspondence: noam.auslander@ 123456nih.gov (N.A.); koonin@ 123456ncbi.nlm.nih.gov (E.V.K.)

[†]

Co-first authors.

Author information

Noam Auslander https://orcid.org/0000-0002-1923-8735

Eugene V. Koonin https://orcid.org/0000-0003-3943-8299

Article

Publisher ID: ijms-22-02903

DOI: 10.3390/ijms22062903

PMC ID: 8000113

PubMed ID: 33809353

SO-VID: 4f8a0e7b-8bf3-4dfc-9f93-8089547b3f68

License:

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

Incorporating Machine Learning into Established Bioinformatics Frameworks

Read this article at

Abstract

Related collections

Annual Reviews AI, Machine Learning, and Society

Most cited references 223

Deep learning.

mixOmics: An R package for ‘omics feature selection and multiple data integration

Improved protein structure prediction using potentials from deep learning

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Categories

Comments

Comment on this article

Similar content 305

Cited by 20

Most referenced authors 4,747