11
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A review on machine learning approaches and trends in drug discovery

      review-article
      a , a , b , b , c , a , a , a , b , c , d , a , b , e , a , b , e , *
      Computational and Structural Biotechnology Journal
      Research Network of Computational and Structural Biotechnology
      Machine Learning, Drug Discovery, Cheminformatics, QSAR, Molecular Descriptors, Deep Learning, ML, Machine Learning, AI, Artificial Intelligence, SMILES, simplified molecular-input line-entry system, DNA, Deoxyribonucleic acid, RNA, Ribonucleic Acid, PCA, Principal Component Analyisis, t-SNE, t-Distributed Stochastic Neighbor Embedding, FS, Feature Selection, CV, Cross Validation, QSAR, Quantitative structure–activity relationship, MD, Molecular Descriptors, FP, Fringerprints, ECFP, Extended Connectivity Fingerprints, MACCS, Molecular ACCess System, APFP, Atom Pairs 2d FingerPrint, CDK, Chemical Development Kit, SVM, Support Vector Machines, ANN, Artificial Neural Networks, NB, Naive Bayes, FNN, Fully Connected Neural Networks, CNN, Convolutional Neural Networks, GNN, Graph Neural Networks, GCN, Graph Convolutional Networks, ADMET, Absorption, distribution, metabolism, elimination and toxicity, ADR, Adverse Drug Reaction, CPI, Compound-protein interaction, CNS, Central Nervous System, BBB, Blood–Brain barrier, KEGG, Kyoto Encyclopedia of Genes and Genomes, WHO, World Health Organization, AUC, Area under the Curve, GEO, Gene Expression Omnibus, FDA, Food and Drug Administration, MKL, Multiple Kernel Learning, OOB, Out of Bag, TCGA, The Cancer Genome Atlas, GO, Gene Ontology, MCC, Matthews correlation coefficient, RF, Random Forest, DL, Deep Learning

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Graphical abstract

          Highlights

          • Machine Learning in drug discovery has greatly benefited the pharmaceutical industry.

          • Application of machine algorithms must entail a robust design in real clinical tasks.

          • Trending machine learning algorithms in drug design: NB, SVM, RF and ANN.

          Abstract

          Drug discovery aims at finding new compounds with specific chemical properties for the treatment of diseases. In the last years, the approach used in this search presents an important component in computer science with the skyrocketing of machine learning techniques due to its democratization. With the objectives set by the Precision Medicine initiative and the new challenges generated, it is necessary to establish robust, standard and reproducible computational methodologies to achieve the objectives set. Currently, predictive models based on Machine Learning have gained great importance in the step prior to preclinical studies. This stage manages to drastically reduce costs and research times in the discovery of new drugs. This review article focuses on how these new methodologies are being used in recent years of research. Analyzing the state of the art in this field will give us an idea of where cheminformatics will be developed in the short term, the limitations it presents and the positive results it has achieved. This review will focus mainly on the methods used to model the molecular data, as well as the biological problems addressed and the Machine Learning algorithms used for drug discovery in recent years.

          Related collections

          Most cited references124

          • Record: found
          • Abstract: not found
          • Article: not found

          A tutorial on support vector regression

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found

            The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups

            The elucidation of breast cancer subgroups and their molecular drivers requires integrated views of the genome and transcriptome from representative numbers of patients. We present an integrated analysis of copy number and gene expression in a discovery and validation set of 997 and 995 primary breast tumours, respectively, with long-term clinical follow-up. Inherited variants (copy number variants and single nucleotide polymorphisms) and acquired somatic copy number aberrations (CNAs) were associated with expression in ~40% of genes, with the landscape dominated by cis- and trans-acting CNAs. By delineating expression outlier genes driven in cis by CNAs, we identified putative cancer genes, including deletions in PPP2R2A, MTAP and MAP2K4. Unsupervised analysis of paired DNA–RNA profiles revealed novel subgroups with distinct clinical outcomes, which reproduced in the validation cohort. These include a high-risk, oestrogen-receptor-positive 11q13/14 cis-acting subgroup and a favourable prognosis subgroup devoid of CNAs. Trans-acting aberration hotspots were found to modulate subgroup-specific gene networks, including a TCR deletion-mediated adaptive immune response in the ‘CNA-devoid’ subgroup and a basal-specific chromosome 5 deletion-associated mitotic network. Our results provide a novel molecular stratification of the breast cancer population, derived from the impact of somatic CNAs on the transcriptome.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The graph neural network model.

              Many underlying relationships among data in several areas of science and engineering, e.g., computer vision, molecular chemistry, molecular biology, pattern recognition, and data mining, can be represented in terms of graphs. In this paper, we propose a new neural network model, called graph neural network (GNN) model, that extends existing neural network methods for processing the data represented in graph domains. This GNN model, which can directly process most of the practically useful types of graphs, e.g., acyclic, cyclic, directed, and undirected, implements a function tau(G,n) is an element of IR(m) that maps a graph G and one of its nodes n into an m-dimensional Euclidean space. A supervised learning algorithm is derived to estimate the parameters of the proposed GNN model. The computational cost of the proposed algorithm is also considered. Some experimental results are shown to validate the proposed learning algorithm, and to demonstrate its generalization capabilities.
                Bookmark

                Author and article information

                Contributors
                Journal
                Comput Struct Biotechnol J
                Comput Struct Biotechnol J
                Computational and Structural Biotechnology Journal
                Research Network of Computational and Structural Biotechnology
                2001-0370
                12 August 2021
                2021
                12 August 2021
                : 19
                : 4538-4558
                Affiliations
                [a ]Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
                [b ]CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
                [c ]Department of Computer Science and Information Technologies, Faculty of Communication Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
                [d ]Biomedical Informatics Group, Artificial Intelligence Department, Polytechnic University of Madrid, Calle de los Ciruelos, Boadilla del Monte, Madrid 28660, Spain
                [e ]Grupo de Redes de Neuronas Artificiales y Sistemas Adaptativos. Imagen Médica y Diagnóstico Radiológico (RNASA-IMEDIR), Complexo Hospitalario Universitario de A Coruña (CHUAC), SERGAS, Universidade da Coruña, Instituto de Investigación Biomédica de A Coruña (INIBIC), A Coruña, Spain
                Author notes
                [* ]Corresponding author at: Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain. carlos.fernandez@ 123456udc.es
                Article
                S2001-0370(21)00342-1
                10.1016/j.csbj.2021.08.011
                8387781
                34471498
                41f8beb8-867f-4a77-9087-09f4e0368b2d
                © 2021 The Author(s)

                This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

                History
                : 1 March 2021
                : 6 August 2021
                : 6 August 2021
                Categories
                Review

                machine learning,drug discovery,cheminformatics,qsar,molecular descriptors,deep learning,ml, machine learning,ai, artificial intelligence,smiles, simplified molecular-input line-entry system,dna, deoxyribonucleic acid,rna, ribonucleic acid,pca, principal component analyisis,t-sne, t-distributed stochastic neighbor embedding,fs, feature selection,cv, cross validation,qsar, quantitative structure–activity relationship,md, molecular descriptors,fp, fringerprints,ecfp, extended connectivity fingerprints,maccs, molecular access system,apfp, atom pairs 2d fingerprint,cdk, chemical development kit,svm, support vector machines,ann, artificial neural networks,nb, naive bayes,fnn, fully connected neural networks,cnn, convolutional neural networks,gnn, graph neural networks,gcn, graph convolutional networks,admet, absorption, distribution, metabolism, elimination and toxicity,adr, adverse drug reaction,cpi, compound-protein interaction,cns, central nervous system,bbb, blood–brain barrier,kegg, kyoto encyclopedia of genes and genomes,who, world health organization,auc, area under the curve,geo, gene expression omnibus,fda, food and drug administration,mkl, multiple kernel learning,oob, out of bag,tcga, the cancer genome atlas,go, gene ontology,mcc, matthews correlation coefficient,rf, random forest,dl, deep learning

                Comments

                Comment on this article