5
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      DrugReAlign: a multisource prompt framework for drug repurposing based on large language models

      research-article

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Drug repurposing is a promising approach in the field of drug discovery owing to its efficiency and cost-effectiveness. Most current drug repurposing models rely on specific datasets for training, which limits their predictive accuracy and scope. The number of both market-approved and experimental drugs is vast, forming an extensive molecular space. Due to limitations in parameter size and data volume, traditional drug-target interaction (DTI) prediction models struggle to generalize well within such a broad space. In contrast, large language models (LLMs), with their vast parameter sizes and extensive training data, demonstrate certain advantages in drug repurposing tasks. In our research, we introduce a novel drug repurposing framework, DrugReAlign, based on LLMs and multi-source prompt techniques, designed to fully exploit the potential of existing drugs efficiently. Leveraging LLMs, the DrugReAlign framework acquires general knowledge about targets and drugs from extensive human knowledge bases, overcoming the data availability limitations of traditional approaches. Furthermore, we collected target summaries and target-drug space interaction data from databases as multi-source prompts, substantially improving LLM performance in drug repurposing. We validated the efficiency and reliability of the proposed framework through molecular docking and DTI datasets. Significantly, our findings suggest a direct correlation between the accuracy of LLMs' target analysis and the quality of prediction outcomes. These findings signify that the proposed framework holds the promise of inaugurating a new paradigm in drug repurposing.

          Supplementary Information

          The online version contains supplementary material available at 10.1186/s12915-024-02028-3.

          Related collections

          Most cited references27

          • Record: found
          • Abstract: found
          • Article: not found

          AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading.

          AutoDock Vina, a new program for molecular docking and virtual screening, is presented. AutoDock Vina achieves an approximately two orders of magnitude speed-up compared with the molecular docking software previously developed in our lab (AutoDock 4), while also significantly improving the accuracy of the binding mode predictions, judging by our tests on the training set used in AutoDock 4 development. Further speed-up is achieved from parallelism, by using multithreading on multicore machines. AutoDock Vina automatically calculates the grid maps and clusters the results in a way transparent to the user. Copyright 2009 Wiley Periodicals, Inc.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            GROMACS: fast, flexible, and free.

            This article describes the software suite GROMACS (Groningen MAchine for Chemical Simulation) that was developed at the University of Groningen, The Netherlands, in the early 1990s. The software, written in ANSI C, originates from a parallel hardware project, and is well suited for parallelization on processor clusters. By careful optimization of neighbor searching and of inner loop performance, GROMACS is a very fast program for molecular dynamics simulation. It does not have a force field of its own, but is compatible with GROMOS, OPLS, AMBER, and ENCAD force fields. In addition, it can handle polarizable shell models and flexible constraints. The program is versatile, as force routines can be added by the user, tabulated functions can be specified, and analyses can be easily customized. Nonequilibrium dynamics and free energy determinations are incorporated. Interfaces with popular quantum-chemical packages (MOPAC, GAMES-UK, GAUSSIAN) are provided to perform mixed MM/QM simulations. The package includes about 100 utility and analysis programs. GROMACS is in the public domain and distributed (with source code and documentation) under the GNU General Public License. It is maintained by a group of developers from the Universities of Groningen, Uppsala, and Stockholm, and the Max Planck Institute for Polymer Research in Mainz. Its Web site is http://www.gromacs.org. (c) 2005 Wiley Periodicals, Inc.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Language Models are Few-Shot Learners

              Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general. 40+32 pages
                Bookmark

                Author and article information

                Contributors
                zhuoninnin@163.com
                fxzheng@hkbu.edu.hk
                oriental-cds@163.com
                Journal
                BMC Biol
                BMC Biol
                BMC Biology
                BioMed Central (London )
                1741-7007
                8 October 2024
                8 October 2024
                2024
                : 22
                : 226
                Affiliations
                [1 ]School of Data Science and Artificial Intelligence, Wenzhou University of Technology, ( https://ror.org/03dd7qj98) Wenzhou, 325027 China
                [2 ]School of Chinese Medicine, Hong Kong Baptist University, ( https://ror.org/0145fw131) Hong Kong, 519087 China
                [3 ]College of Computer Science and Electronic Engineering, Hunan University, ( https://ror.org/05htk5m33) Changsha, 410012 China
                [4 ]Department of Computer Science, University of Tsukuba, ( https://ror.org/02956yf07) Tsukuba, 3058577 Japan
                [5 ]Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, ( https://ror.org/04qr3zq92) Chengdu, 611730 China
                [6 ]Central South University, Hunan University, Changsha, 410083 China
                Article
                2028
                10.1186/s12915-024-02028-3
                11463036
                39379930
                1bc5ae24-dc02-4d7b-a467-2644a6dd73b6
                © The Author(s) 2024

                Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

                History
                : 17 June 2024
                : 1 October 2024
                Categories
                Research
                Custom metadata
                © BioMed Central Ltd., part of Springer Nature 2024

                Life sciences
                drug repositioning,large language model,drug-target interactions,molecular docking

                Comments

                Comment on this article