1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      De novo drug design as GPT language modeling: large chemistry models with supervised and reinforcement learning

      research-article

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In recent years, generative machine learning algorithms have been successful in designing innovative drug-like molecules. SMILES is a sequence-like language used in most effective drug design models. Due to data’s sequential structure, models such as recurrent neural networks and transformers can design pharmacological compounds with optimized efficacy. Large language models have advanced recently, but their implications on drug design have not yet been explored. Although one study successfully pre-trained a large chemistry model (LCM), its application to specific tasks in drug discovery is unknown. In this study, the drug design task is modeled as a causal language modeling problem. Thus, the procedure of reward modeling, supervised fine-tuning, and proximal policy optimization was used to transfer the LCM to drug design, similar to Open AI’s ChatGPT and InstructGPT procedures. By combining the SMILES sequence with chemical descriptors, the novel efficacy evaluation model exceeded its performance compared to previous studies. After proximal policy optimization, the drug design model generated molecules with 99.2% having efficacy pIC 50 > 7 towards the amyloid precursor protein, with 100% of the generated molecules being valid and novel. This demonstrated the applicability of LCMs in drug discovery, with benefits including less data consumption while fine-tuning. The applicability of LCMs to drug discovery opens the door for larger studies involving reinforcement-learning with human feedback, where chemists provide feedback to LCMs and generate higher-quality molecules. LCMs’ ability to design similar molecules from datasets paves the way for more accessible, non-patented alternatives to drug molecules.

          Related collections

          Most cited references20

          • Record: found
          • Abstract: not found
          • Article: not found

          SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Innovation in the pharmaceutical industry: New estimates of R&D costs.

            The research and development costs of 106 randomly selected new drugs were obtained from a survey of 10 pharmaceutical firms. These data were used to estimate the average pre-tax cost of new drug and biologics development. The costs of compounds abandoned during testing were linked to the costs of compounds that obtained marketing approval. The estimated average out-of-pocket cost per approved new compound is $1395 million (2013 dollars). Capitalizing out-of-pocket costs to the point of marketing approval at a real discount rate of 10.5% yields a total pre-approval cost estimate of $2558 million (2013 dollars). When compared to the results of the previous study in this series, total capitalized costs were shown to have increased at an annual rate of 8.5% above general price inflation. Adding an estimate of post-approval R&D costs increases the cost estimate to $2870 million (2013 dollars).
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Deep reinforcement learning for de novo drug design

              We introduce an artificial intelligence approach to de novo design of molecules with desired physical or biological properties.
                Bookmark

                Author and article information

                Contributors
                yeeeyee004@gmail.com
                Journal
                J Comput Aided Mol Des
                J Comput Aided Mol Des
                Journal of Computer-Aided Molecular Design
                Springer International Publishing (Cham )
                0920-654X
                1573-4951
                22 April 2024
                22 April 2024
                2024
                : 38
                : 1
                : 20
                Affiliations
                Columbia Grammar & Preparatory School, New York, NY USA
                Article
                559
                10.1007/s10822-024-00559-z
                11035455
                38647700
                b7939932-55fc-4727-baed-d81b3fdbcb1e
                © The Author(s) 2024

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 8 December 2023
                : 22 March 2024
                Categories
                Article
                Custom metadata
                © Springer Nature Switzerland AG 2024

                Biomedical engineering
                large language model,gpt,large chemistry model,de novo drug design,reinforcement learning,efficacy optimization

                Comments

                Comment on this article