3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Molecular set transformer: attending to the co-crystals in the Cambridge structural database

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Molecular set transformer is a deep learning architecture for scoring molecular pairs found in co-crystals, whilst tackling the class imbalance problem observed on datasets that include only successful synthetic attempts.

          Abstract

          In this paper we introduce molecular set transformer, a Pytorch-based deep learning architecture designed for solving the molecular pair scoring task whilst tackling the class imbalance problem observed on datasets extracted from databases reporting only successful synthetic attempts. Our models are being trained on all the existing molecular pairs that form co-crystals and are deposited in the Cambridge Structural Database (CSD). Given any new molecular combination, the primary objective of the tool is to be able to select the most effective way to represent the pair and then assign a score coupled with an uncertainty estimation. Molecular set transformer is an attention-based framework which learns the important interactions in the various molecular combinations by trying to reconstruct its input by minimizing its bidirectional loss. Several methods to represent the input were tested, both fixed and learnt, with the Graph Neural Network (GNN) and the Extended-Connectivity Fingerprints (ECFP4) molecular representations to perform best showing an overall accuracy higher than 75% on previously unseen data. The trustworthiness of the models is enhanced by adding uncertainty estimates which aims to help chemists prioritize at the early materials design stage both the pairs with high scores and low uncertainty and pairs with low scores and high uncertainty. Our results indicate that the method can achieve comparable or better performance on specific APIs for which the accuracy of other computational chemistry and machine learning tools is reported in the literature. To help visualize and get further insights of all the co-crystals deposited in CSD, we developed an interactive browser-based explorer (https://csd-cocrystals.herokuapp.com/). An online Graphical User Interface (GUI) has also been designed for enabling the wider use of our models for rapid in silico co-crystal screening reporting the scores and uncertainty of any user given molecular pair (https://share.streamlit.io/katerinavr/streamlit/app.py).

          Related collections

          Most cited references47

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules

          To be effective as a drug, a potent molecule must reach its target in the body in sufficient concentration, and stay there in a bioactive form long enough for the expected biologic events to occur. Drug development involves assessment of absorption, distribution, metabolism and excretion (ADME) increasingly earlier in the discovery process, at a stage when considered compounds are numerous but access to the physical samples is limited. In that context, computer models constitute valid alternatives to experiments. Here, we present the new SwissADME web tool that gives free access to a pool of fast yet robust predictive models for physicochemical properties, pharmacokinetics, drug-likeness and medicinal chemistry friendliness, among which in-house proficient methods such as the BOILED-Egg, iLOGP and Bioavailability Radar. Easy efficient input and interpretation are ensured thanks to a user-friendly interface through the login-free website http://www.swissadme.ch. Specialists, but also nonexpert in cheminformatics or computational chemistry can predict rapidly key parameters for a collection of molecules to support their drug discovery endeavours.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            The Cambridge Structural Database

            This paper is the definitive article describing the creation, maintenance, information content and availability of the Cambridge Structural Database (CSD), the world’s repository of small molecule crystal structures.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Analyzing Learned Molecular Representations for Property Prediction

              Advancements in neural machinery have led to a wide range of algorithmic solutions for molecular property prediction. Two classes of models in particular have yielded promising results: neural networks applied to computed molecular fingerprints or expert-crafted descriptors and graph convolutional neural networks that construct a learned molecular representation by operating on the graph structure of the molecule. However, recent literature has yet to clearly determine which of these two methods is superior when generalizing to new chemical space. Furthermore, prior research has rarely examined these new models in industry research settings in comparison to existing employed models. In this paper, we benchmark models extensively on 19 public and 16 proprietary industrial data sets spanning a wide variety of chemical end points. In addition, we introduce a graph convolutional model that consistently matches or outperforms models using fixed molecular descriptors as well as previous graph neural architectures on both public and proprietary data sets. Our empirical findings indicate that while approaches based on these representations have yet to reach the level of experimental reproducibility, our proposed model nevertheless offers significant improvements over models currently used in industrial workflows.
                Bookmark

                Author and article information

                Contributors
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                Journal
                DDIIAI
                Digital Discovery
                Digital Discovery
                Royal Society of Chemistry (RSC)
                2635-098X
                December 05 2022
                2022
                : 1
                : 6
                : 834-850
                Affiliations
                [1 ]Department of Chemistry and Materials Innovation Factory, University of Liverpool, 51 Oxford Street, Liverpool L7 3NY, UK
                [2 ]Leverhulme Research Centre for Functional Materials Design, University of Liverpool, Oxford Street, Liverpool L7 3NY, UK
                [3 ]Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, UK
                [4 ]Materials Innovation Factory and Computer Science Department, University of Liverpool, Liverpool, L69 3BX UK
                Article
                10.1039/D2DD00068G
                27574936-4dc5-4b48-a162-9cf61db2d0f9
                © 2022

                http://creativecommons.org/licenses/by/3.0/

                History

                Comments

                Comment on this article