INTRODUCTION
The classical concept of structure-based drug design, which depends on the concept of targeting a single receptor and its interaction with a ligand, has been increasingly criticized for presenting a simplistic view of cellular systems. This approach, which directs all attention on a single-target protein, fails to account for the complexity of biological networks where multiple proteins interact within cellular pathways. 1 Consequently, the single-target model overlooks the broader context of disease mechanisms, leading to incomplete drug efficacy and safety assessments. This narrow focus also ignores off-target effects that can be detrimental or beneficial to therapeutic outcomes. 2 A more holistic understanding of cellular processes is essential in complex diseases such as cancer, neurological disorders, and autoimmune diseases, where entire protein networks and pathways are disrupted. There is an increasing demand for advanced methodologies that can model the interconnected nature of these systems, enabling the identification of multiple drug targets and the prediction of system-wide effects, ultimately improving treatment efficacy and reducing side effects. 3
To overcome the limitations of traditional protein models, which oversimplified the protein interactions, developing interaction networks is crucial. 4 These networks offer a more comprehensive and realistic view of the cell, enabling researchers to capture the intricate dependencies and regulatory mechanisms between proteins. By simulating these processes, scientists can gain deeper insights into how drugs affect individual targets and entire protein networks, which is critical for understanding the broader implications of therapeutic interventions. 5 These simulations are particularly valuable for modeling diseases driven by complex protein interactions, such as cancer or neurodegenerative disorders, where multiple signaling pathways are disrupted. 6,7
Given the limitations of current target-based models and their inability to fully capture the complexity of cellular networks, drug repurposing offers a practical alternative. By utilizing drugs with known safety profiles, repurposing allows researchers to bypass some of the challenges posed by incomplete models, offering a faster and more efficient approach to discovering treatments that can target complex diseases more effectively. Drug repurposing has become an essential strategy in modern pharmacology, offering a more cost-effective pathway to drug development. This approach leverages existing data on a drug’s safety, pharmacokinetics, and pharmacodynamics, significantly reducing the time and resources needed to bring treatments to market. 8,9 Additionally, drug repurposing has proven particularly valuable in addressing urgent medical needs, such as rare diseases, cancer, and emerging infectious diseases, where time is critical. 10 The pre-existing regulatory approval of repurposed drugs also facilitates quicker clinical trials and regulatory processes, enhancing their appeal in research and industry settings. Recent successes in repurposing, such as the use of sildenafil for pulmonary hypertension and thalidomide for multiple myeloma, illustrate the growing importance of this strategy in modern drug discovery. 11
Advanced modeling techniques, such as multistate Markov models, can further enhance the understanding of drug–protein interactions. These models address the limitations of traditional approaches by capturing the dynamic and stochastic nature of protein behavior. Unlike static models, which capture only a single state, these models simulate the transitions between various molecular states over time, making them ideal for capturing the dynamic behavior of biological systems. 12–14 By allowing for real-time mapping of protein conformational changes and drug effects, multistate Markov models offer a more accurate understanding of how drugs interact with their targets within the cellular environment. 15 This continuous updating of molecular characteristics reflects the transient and fluctuating nature of protein structures, which is essential for predicting drug efficacy in real-world biological systems. The flexibility of this approach enables the identification of intermediate states that are often missed in static models, providing valuable insights into drug–protein interactions that are critical for more precise and effective drug discovery. 16–18 To construct comprehensive protein interaction networks and improve the accuracy of drug–protein interaction models, we integrate two computational approaches: an electrotopological descriptor for molecular similarity and a tool for binding pocket comparison. The electrotopological descriptor calculates molecular similarity by analyzing structural and topological features, leveraging three-dimensional (3D) atomic arrangements to enable precise comparisons of molecular geometries—an advantage particularly valuable in dynamic environments like binding sites. 19,20 Meanwhile, the binding pocket comparison tool provides a robust analysis of binding site similarity through water-mapping-based molecular dynamics (MD), exploring the spatial and volumetric properties of binding pockets with a focus on hydrophilic and hydrophobic regions often overlooked by static methods. 21 The integration of these methods supports the construction of dynamic protein interaction networks that capture real-time molecular behavior. By combining molecular similarity assessments with binding site analysis, we improve the precision of drug repurposing efforts, providing deeper insights into protein–ligand interactions and facilitating the identification of new therapeutic opportunities. 22
METHODS
Molecular Similarity Calculations
The electrotopological descriptor is a novel method designed to compute molecular similarity by considering molecules’ atomistic topology and geometry. Unlike traditional 2D-based methods, this approach uses 3D representations of molecules, allowing for a more accurate assessment of molecular similarities, particularly in dynamic environments like protein binding sites. We encode both the atomistic topology and geometry of each atom to precisely measure intermolecular distances. The algorithm starts by computing the atomic properties and positioning the molecule at the center of a sphere. A projection is then cast onto the sphere’s surface, creating a 2D shadow of each atom on the opposite side. The location of each shadow point is determined by converting the atom’s Cartesian coordinates (x, y, z) into spherical coordinates (see Figure 1 ). The algorithm calculates pairwise distances between atoms of two molecules, adjusting these distances based on atomistic properties such as charge and atomic weight. This flexibility is essential when molecules undergo conformational changes, especially in dynamic binding environments. The electrotopological descriptor ensures consistent similarity assessments across different molecular topologies by incorporating a hyperbolic tangent function-based normalization. 23

Representation of mapped atomic positions derived from 3D Cartesian coordinates. Each atom’s properties (xa ,xb ,xc , etc.) are mapped onto specific atomic points, encoding key atomic attributes. The angles (ϕa and ψa) represent the radial coordinates of the atoms within the molecular structure, capturing their spatial relationships. The atom a′ is a 2D mapping of the original atom a. 2D, two-dimensional; 3D, three-dimensional.
Binding Site Similarity Analysis
The binding pocket comparison tool uses MD simulations to explore protein binding pockets’ spatial and volumetric properties. As shown in Figure 2 , by introducing water molecules into the MD simulation, this tool maps the dynamic behavior of binding sites, 24 including hydrophilic regions often overlooked by static methods. The algorithm iteratively calculates the binding pocket’s volume and hydrophilic characteristics, comprehensively analyzing the binding site’s geometry and flexibility.

The binding pocket comparison tool’s process for analyzing binding sites using molecular dynamics. Initial setup with water molecules in the binding pocket, followed by a 10-ns MD simulation (top left). The network of hydrogen bonds among water molecules within the pocket highlights hydrophilic interactions (center left). A refined water distribution map identifying key hydration zones (center right). Final binding pocket structure, showing hydrophilic and hydrophobic regions essential for evaluating binding affinity and drug interactions (far right).
The binding pocket comparison tool employs MD simulations to map receptors’ active sites comprehensively. By introducing water molecules into the system, the method fully exploits their ability to occupy cavities through cohesive hydrogen-bonding networks. These networks capture essential details about the geometry and topology of the binding site, providing a richer characterization than simple volumetric analyses.
Multiple water networks are generated during all-atom MD simulations of the hydrated receptor, corresponding to different snapshots along the simulation trajectory. It has been proved that 10 ns is sufficient to map the binding site. Averaging the probability of water occupancy across these snapshots allows the tool to integrate geometric, topological, and dynamic information about the binding pocket. This approach characterizes the receptor’s binding site not as a static structure but as a dynamic entity, reflecting its flexibility and interaction potential.
In addition, the tool introduces a novel volume- and topology-based similarity metric for comparing binding pockets across proteins. This innovation enables the identification of structurally similar binding sites that may accommodate existing drugs, offering new opportunities for drug repurposing and therapeutic innovation.
Hydrophobic interactions, often associated with regions favoring the binding of nonpolar molecules, are frequently misunderstood. These interactions are best described as van der Waals forces arising from temporary and induced dipoles. In aqueous environments, where water molecules exhibit a strong dipole moment, these interactions are enhanced due to significant dipole-induced effects.
The preference of nonpolar molecules for hydrophobic sites does not stem solely from direct interaction strength but is primarily driven by the entropic contribution of bulk water molecules. Upon binding, water molecules are displaced from the cavity, resulting in an entropic gain stabilizing the interaction. Mapping binding cavities with water molecules provides a detailed and exhaustive exploration of their geometry and enables the creation of cohesive water networks. These networks can be represented and analyzed as graphs, offering a robust framework to understand and compare binding site properties across different proteins.
Integrating Heterogeneous Knowledge Graphs for Protein Interaction Networks
To capture the complex nature of protein interactions, we constructed a network by integrating data from a wide range of sources, including literature and molecular, genomic, structural, and ontological databases. The following databases were utilized in this framework: MeSH, 25 Drugs@FDA, 26 RxList, 27 DrugBank, 28 PubChem, 29 MedlinePlus, 30 PubMed, 31 UMLS, 32 NCBI, 33 UniProt, 34 GO Ontology, GO Gene, GO Annotation, 35 Reactome, 36 KEGG, 37 and RCSB PDB. 38 This comprehensive integration allows for the identification of deep correlations among proteins, their structures, and associated pathways. By combining heterogeneous knowledge graphs with tools for molecular similarity and binding pocket comparison, the network integrates various perspectives on protein function and interaction.
The workflow begins with a semantic literature analysis to create a disease-specific knowledge graph encompassing relationships among diseases, genes, drugs, and other biological entities. This foundation is enhanced with molecular similarity data and binding pocket comparison analyses, enabling the refinement of protein interaction networks. For example, edges in the graph are reinforced for proteins with structurally similar active sites, emphasizing their functional relevance, while the overall interaction weights are normalized to maintain network consistency.
This comprehensive approach leverages over 60 million publications from PubMed and data from molecular and genomic repositories, creating a unified framework to explore the intricate interplay of molecular and cellular components associated with a given disease.
We integrate molecular similarity data from the electrotopological descriptor and binding pocket comparison tools alongside gene expression data to refine the protein interaction network. Gene expression levels are obtained from publicly available sources, such as TCGA, to establish baseline activity for genes encoding the proteins of interest. The electrotopological descriptor computes pairwise molecular similarities based on proteins’ 3D atomic topology and geometry, generating a similarity matrix. The molecular similarity data from the electrotopological descriptor are used to strengthen interactions between proteins that share strong molecular interactions with common molecules, applying a similar boosting approach. This iterative process of reinforcement and normalization yields a biologically relevant, pruned protein interaction network. We filter out weaker interactions by applying a threshold to this matrix, focusing on the most significant relationships, which are normalized for consistency.
RESULTS
As a proof of concept, we applied the integrated approach of electrotopological descriptors, a binding pocket comparison tool, and heterogeneous knowledge graph integration to a set of established drug–disease systems. We considered hepatocellular carcinoma and diabetes mellitus, along with their associated drugs, as the established drug–disease systems. This approach allowed us to dynamically map disease-associated protein networks and evaluate the impact of various drug interactions on cellular processes. The electrotopological descriptors enabled precise molecular similarity calculations, while the binding pocket comparison tool provided detailed insights into binding site dynamics. In an initial case study, we focused on a known drug repurposing candidate for cancer treatment. Using the electrotopological descriptors, we identified a structurally similar compound with potential therapeutic effects. It was implemented on a set of clinically approved 153 antineoplastic small molecules retrieved from ChEMBL database 39 to perform a similarity-based drug repurposing against tyrosine kinase that plays a pivotal role in the development of hepatocellular carcinoma, renal, and thyroid cancer. 40 It was reported that drug-like sorafenib actively inhibits the tyrosine kinase receptor. 41 It suppresses tumor growth by inhibiting the RAF/MEK/ERK signaling pathway, which is crucial for cell proliferation and survival. Additionally, it reduces tumor angiogenesis by blocking vascular endothelial growth factor (VEGFR) and platelet-derived growth factor (PDGFR), key receptors involved in the formation and stabilization of blood vessels. 42 To identify a similar drug to sorafenib from the retrieved molecules, we performed the electrotopological descriptor-based similarity assessment by considering sorafenib as the reference molecule. We used atomistic formal charge and mass properties for the intermolecular distance calculation. The obtained result suggested that regorafenib depicted maximum similarity with sorafenib with a similarity value of 0.97 ( Figure 3 ). The literature source also validated regorafenib as an inhibitor of tyrosine kinase 45,46 and is very similar to sorafenib. 47

Scatter plot illustrating molecular similarity (similarity > 0.60) based on atomic formal charge and atomic mass for compounds compared to sorafenib in an in-vacuum state. Regorafenib exhibited the highest similarity to sorafenib. Notably, capmatinib and alpelisib demonstrated similarity scores exceeding 0.80. The literature evidence further indicates that both drugs are reported inhibitors of tyrosine kinase. 43,44
The binding pocket comparison tool confirmed the compound’s ability to bind to the target protein’s binding site effectively, and the integration of heterogeneous knowledge graphs illustrated the complex interactions between the drug and various proteins within the network. Traditional drug discovery approaches often rely on identifying known target protein inhibitors and searching molecular databases for structurally similar compounds to evaluate their potency against the same target. However, this approach overlooks a significant portion of the chemical space, potentially missing promising molecules with therapeutic potential. The proposed protocol overcomes these limitations by mapping proteins with similar binding pockets and their associated known inhibitors. These inhibitors can then be leveraged to analyze their binding potential and activity against the target of interest. This strategy broadens the exploration of chemical libraries, enabling the identification of novel compounds that may have been overlooked using conventional methods, thereby enhancing the discovery of new therapeutic candidates. Compared to traditional static models such as CASTp, 48 the integration of protein dynamics-based network analysis provides a more comprehensive understanding of the drug’s effects within the complex network of cellular interactions.
Upon establishing this knowledge graph, a tool for binding pocket comparison is employed to analyze target proteins with structural or functional similarities, focusing on their binding sites to identify interaction patterns. This is complemented by electrotopological descriptors to calculate molecular similarities, facilitating the identification of compounds with analogous therapeutic potential based on their structural attributes. The final graph ( Figure 4 ) integrates gene expression data and structural and functional similarities, offering a more accurate model of protein interactions within cellular systems. This refined network is instrumental in elucidating disease mechanisms, identifying drug targets, and predicting the broader effects of therapeutic interventions.

The network graph illustrating key molecular and genetic components associated with hepatocellular carcinoma and diabetes mellitus. The central pink node represents the disease, while the cyan circles indicate genes prominently involved in the disease mechanism. Dark purple circles represent drugs used in therapeutic interventions. The thickness of the edges indicates the strength of interactions, highlighting the complex interconnectivity between genes, molecules, and drugs relevant to hepatocellular carcinoma and diabetes mellitus treatment. The mentioned graph can be accessed at https://newroad.biovista.com/#!bv_gid=179e4d5784d7407bd321faa1542120fb
CONCLUSION
This workflow highlights the transformative potential of computational methodologies for drug repurposing, with the development of a framework that integrates molecular similarity, binding site comparison, and semantic analysis of the literature. This integration leverages diverse sources and databases, such as molecular structures, binding site information, and literature-based insights, enabling a unified and comprehensive exploration of drug–disease interactions. This approach enables the systematic and data-driven exploration of drug–disease interactions, offering a significant advancement in computational tools for drug discovery. Beyond its immediate applications, this methodology underscores the evolving landscape of drug repurposing, where the integration of semantic analysis and artificial intelligence (AI) introduces both unprecedented opportunities and complex challenges. One of the most significant advancements is the ability to process and synthesize vast amounts of heterogeneous data, providing previously inaccessible insights. Additionally, this framework facilitates data accuracy and reliability validation by cross-referencing multiple sources and leveraging computational power for deeper analysis. However, this also raises challenges in ensuring such data’s accuracy, relevance, and contextual interpretation, mainly when it originates from unstructured sources like scientific literature. The interaction between semantic search and AI further shifts the role of the researcher. Rather than being a data consumer, the researcher must act as an interpreter and integrator, navigating rapidly expanding possibilities. This requires a deep understanding of the underlying scientific principles and the ability to critically evaluate and validate computational predictions. The need for interdisciplinary expertise becomes apparent, as bridging computational tools with experimental validation is essential to translate predictions into actionable insights.
Integrating diverse datasets and sophisticated algorithms demands significant computational power, raising questions about scalability and accessibility. Optimizing workflows to ensure seamless integration, computational efficiency, and data validation is crucial for overcoming these challenges and enhancing scalability. Finally, this study demonstrates that while computational methods significantly improve the scope and depth of drug repurposing efforts, their full potential is yet to be realized. The ability to uncover novel therapeutic applications hinges on the platform’s robustness and the researcher’s ability to adapt to this new paradigm. The interplay between advanced computational tools and human expertise will ultimately define the success of such integrative approaches, paving the way for more innovative and effective strategies in drug discovery.