Evolving scenario of big data and Artificial Intelligence (AI) in drug discovery

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The accumulation of massive data in the plethora of Cheminformatics databases has made the role of big data and artificial intelligence (AI) indispensable in drug design. This has necessitated the development of newer algorithms and architectures to mine these databases and fulfil the specific needs of various drug discovery processes such as virtual drug screening, de novo molecule design and discovery in this big data era. The development of deep learning neural networks and their variants with the corresponding increase in chemical data has resulted in a paradigm shift in information mining pertaining to the chemical space. The present review summarizes the role of big data and AI techniques currently being implemented to satisfy the ever-increasing research demands in drug discovery pipelines.

Graphic abstract

Related collections

Most cited references 161

Record: found
Abstract: found
Article: not found

KEGG: kyoto encyclopedia of genes and genomes.

M Kanehisa (2000)

KEGG (Kyoto Encyclopedia of Genes and Genomes) is a knowledge base for systematic analysis of gene functions, linking genomic information with higher order functional information. The genomic information is stored in the GENES database, which is a collection of gene catalogs for all the completely sequenced genomes and some partial genomes with up-to-date annotation of gene functions. The higher order functional information is stored in the PATHWAY database, which contains graphical representations of cellular processes, such as metabolism, membrane transport, signal transduction and cell cycle. The PATHWAY database is supplemented by a set of ortholog group tables for the information about conserved subpathways (pathway motifs), which are often encoded by positionally coupled genes on the chromosome and which are especially useful in predicting gene functions. A third database in KEGG is LIGAND for the information about chemical compounds, enzyme molecules and enzymatic reactions. KEGG provides Java graphics tools for browsing genome maps, comparing two genome maps and manipulating expression maps, as well as computational tools for sequence comparison, graph comparison and path computation. The KEGG databases are daily updated and made freely available (http://www. genome.ad.jp/kegg/).

0 comments Cited 9021 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Long Short-Term Memory

Jürgen Schmidhuber, Jürgen Schmidhuber (2003)

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O(1). Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.

0 comments Cited 7852 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

SMOTE: Synthetic Minority Over-sampling Technique

N. Chawla, K. W. Bowyer, L Hall … (2002)

An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ``normal'' examples with only a small percentage of ``abnormal'' or ``interesting'' examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.

0 comments Cited 3358 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Punit Kaur:

ORCID: http://orcid.org/0000-0002-7358-3716

punitkaur1@hotmail.com

Journal

Journal ID (nlm-ta): Mol Divers

Journal ID (iso-abbrev): Mol Divers

Title: Molecular Diversity

Publisher: Springer International Publishing (Cham )

ISSN (Print): 1381-1991

ISSN (Electronic): 1573-501X

Publication date (Electronic): 23 June 2021

Pages: 1-22

Affiliations

[1 ]GRID grid.413618.9, ISNI 0000 0004 1767 6103, Department of Biophysics, , All India Institute of Medical Sciences, ; New Delhi, 110029 India

[2 ]GRID grid.464647.3, ISNI 0000 0004 1770 0679, Department of Biochemistry, , Pt. Jawahar Lal Nehru Memorial Medical College, ; Raipur, 492001 India

Author information

Punit Kaur http://orcid.org/0000-0002-7358-3716

Article

Publisher ID: 10256

DOI: 10.1007/s11030-021-10256-w

PMC ID: 8219515

PubMed ID: 34159484

SO-VID: b401fcd0-e59a-44d4-80f3-47e6e5a28aeb

License:

This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.

History

Date received : 31 March 2021

Date accepted : 14 June 2021

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Cited by 21

See all cited by

Most referenced authors 2,847

See all reference authors

Evolving scenario of big data and Artificial Intelligence (AI) in drug discovery

Read this article at

Abstract

Abstract

Graphic abstract

Related collections

Artificial Intelligence in Medicine

Most cited references 161

KEGG: kyoto encyclopedia of genes and genomes.

Long Short-Term Memory

SMOTE: Synthetic Minority Over-sampling Technique

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Categories

Comments

Comment on this article

Similar content 106

Cited by 21

Most referenced authors 2,847