New machine learning and physics-based scoring functions for drug discovery

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Scoring functions are essential for modern in silico drug discovery. However, the accurate prediction of binding affinity by scoring functions remains a challenging task. The performance of scoring functions is very heterogeneous across different target classes. Scoring functions based on precise physics-based descriptors better representing protein–ligand recognition process are strongly needed. We developed a set of new empirical scoring functions, named DockTScore, by explicitly accounting for physics-based terms combined with machine learning. Target-specific scoring functions were developed for two important drug targets, proteases and protein–protein interactions, representing an original class of molecules for drug discovery. Multiple linear regression (MLR), support vector machine and random forest algorithms were employed to derive general and target-specific scoring functions involving optimized MMFF94S force-field terms, solvation and lipophilic interactions terms, and an improved term accounting for ligand torsional entropy contribution to ligand binding. DockTScore scoring functions demonstrated to be competitive with the current best-evaluated scoring functions in terms of binding energy prediction and ranking on four DUD-E datasets and will be useful for in silico drug design for diverse proteins as well as for specific targets such as proteases and protein–protein interactions. Currently, the MLR DockTScore is available at www.dockthor.lncc.br.

Related collections

Most cited references 78

Record: found
Abstract: not found
Article: not found

Random Forests

Leo Breiman (2001)

0 comments Cited 7708 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Protein and ligand preparation: parameters, protocols, and influence on virtual screening enrichments.

G. Madhavi Sastry, Matvey Adzhigirey, Tyler Day … (2013)

Structure-based virtual screening plays an important role in drug discovery and complements other screening approaches. In general, protein crystal structures are prepared prior to docking in order to add hydrogen atoms, optimize hydrogen bonds, remove atomic clashes, and perform other operations that are not part of the x-ray crystal structure refinement process. In addition, ligands must be prepared to create 3-dimensional geometries, assign proper bond orders, and generate accessible tautomer and ionization states prior to virtual screening. While the prerequisite for proper system preparation is generally accepted in the field, an extensive study of the preparation steps and their effect on virtual screening enrichments has not been performed. In this work, we systematically explore each of the steps involved in preparing a system for virtual screening. We first explore a large number of parameters using the Glide validation set of 36 crystal structures and 1,000 decoys. We then apply a subset of protocols to the DUD database. We show that database enrichment is improved with proper preparation and that neglecting certain steps of the preparation process produces a systematic degradation in enrichments, which can be large for some targets. We provide examples illustrating the structural changes introduced by the preparation that impact database enrichment. While the work presented here was performed with the Protein Preparation Wizard and Glide, the insights and guidance are expected to be generalizable to structure-based virtual screening with other docking methods.

0 comments Cited 1092 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

PROPKA3: Consistent Treatment of Internal and Surface Residues in Empirical pKa Predictions.

Mats H. M. Olsson, Chresten Søndergaard, Michal Rostkowski … (2011)

In this study, we have revised the rules and parameters for one of the most commonly used empirical pKa predictors, PROPKA, based on better physical description of the desolvation and dielectric response for the protein. We have introduced a new and consistent approach to interpolate the description between the previously distinct classifications into internal and surface residues, which otherwise is found to give rise to an erratic and discontinuous behavior. Since the goal of this study is to lay out the framework and validate the concept, it focuses on Asp and Glu residues where the protein pKa values and structures are assumed to be more reliable. The new and improved implementation is evaluated and discussed; it is found to agree better with experiment than the previous implementation (in parentheses): rmsd = 0.79 (0.91) for Asp and Glu, 0.75 (0.97) for Tyr, 0.65 (0.72) for Lys, and 1.00 (1.37) for His residues. The most significant advance, however, is in reducing the number of outliers and removing unreasonable sensitivity to small structural changes that arise from classifying residues as either internal or surface.