Explainable deep drug–target representations for binding affinity prediction

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Several computational advances have been achieved in the drug discovery field, promoting the identification of novel drug–target interactions and new leads. However, most of these methodologies have been overlooking the importance of providing explanations to the decision-making process of deep learning architectures. In this research study, we explore the reliability of convolutional neural networks (CNNs) at identifying relevant regions for binding, specifically binding sites and motifs, and the significance of the deep representations extracted by providing explanations to the model’s decisions based on the identification of the input regions that contributed the most to the prediction. We make use of an end-to-end deep learning architecture to predict binding affinity, where CNNs are exploited in their capacity to automatically identify and extract discriminating deep representations from 1D sequential and structural data.

Results

The results demonstrate the effectiveness of the deep representations extracted from CNNs in the prediction of drug–target interactions. CNNs were found to identify and extract features from regions relevant for the interaction, where the weight associated with these spots was in the range of those with the highest positive influence given by the CNNs in the prediction. The end-to-end deep learning model achieved the highest performance both in the prediction of the binding affinity and on the ability to correctly distinguish the interaction strength rank order when compared to baseline approaches.

Conclusions

This research study validates the potential applicability of an end-to-end deep learning architecture in the context of drug discovery beyond the confined space of proteins and ligands with determined 3D structure. Furthermore, it shows the reliability of the deep representations extracted from the CNNs by providing explainability to the decision-making process.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12859-022-04767-y.

Related collections

Most cited references 42

Record: found
Abstract: found
Article: found

Is Open Access

BLAST+: architecture and applications

Christiam Camacho, George Coulouris, Vahram Avagyan … (2009)

Background Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. Results We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. Conclusion The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.

0 comments Cited 4659 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

S Altschul (1997)

The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

0 comments Cited 4366 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

UniProt: the universal protein knowledgebase in 2021

(2020)

Abstract The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.

0 comments Cited 2525 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Nelson R. C. Monteiro: nelsonrcm@dei.uc.pt

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Journal ID (iso-abbrev): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central (London )

ISSN (Electronic): 1471-2105

Publication date (Electronic): 17 June 2022

Publication date PMC-release: 17 June 2022

Publication date Collection: 2022

Volume: 23

Electronic Location Identifier: 237

Affiliations

[1 ]GRID grid.8051.c, ISNI 0000 0000 9511 4342, Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, ; Coimbra, Portugal

[2 ]GRID grid.8051.c, ISNI 0000 0000 9511 4342, BSIM Therapeutics, , Instituto Pedro Nunes, ; Coimbra, Portugal

[3 ]GRID grid.7311.4, ISNI 0000000123236065, IEETA, Department of Electronics, Telecommunications and Informatics, , University of Aveiro, ; Aveiro, Portugal

Article

Publisher ID: 4767

DOI: 10.1186/s12859-022-04767-y

PMC ID: 9204982

SO-VID: 46d1ca2b-05d3-488a-9861-f265a875988b

License:

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

History

Date received : 2 February 2022

Date accepted : 25 May 2022

Funding

Funded by: FundRef http://dx.doi.org/10.13039/501100001871, Fundação para a Ciência e a Tecnologia;

Award ID: 2020.04741.BD

Award ID: CENTRO-01-0145-FEDER-029266

Award Recipient : Nelson R. C. Monteiro Carlos J. V. Simões Maryam Abbasi José L. Oliveira Joel P. Arrais

Custom metadata

ScienceOpen disciplines: Bioinformatics & Computational biology

Keywords: drug–target interaction,binding affinity,explainable deep learning,convolutional neural network

Data availability:

ScienceOpen disciplines: Bioinformatics & Computational biology

Keywords: drug–target interaction, binding affinity, explainable deep learning, convolutional neural network

Explainable deep drug–target representations for binding affinity prediction

Read this article at

Abstract

Background

Results

Conclusions

Supplementary Information

Related collections

REPO4EU WP2 Systematic Reviews

Most cited references 42

BLAST+: architecture and applications

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

UniProt: the universal protein knowledgebase in 2021

Author and article information

Contributors

Journal

Affiliations

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 38

Cited by 2

Most referenced authors 4,468