0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      From Molecular Descriptors to Intrinsic Fish Toxicity of Chemicals: An Alternative Approach to Chemical Prioritization

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The European and U.S. chemical agencies have listed approximately 800k chemicals about which knowledge of potential risks to human health and the environment is lacking. Filling these data gaps experimentally is impossible, so in silico approaches and prediction are essential. Many existing models are however limited by assumptions (e.g., linearity and continuity) and small training sets. In this study, we present a supervised direct classification model that connects molecular descriptors to toxicity. Categories can be driven by either data (using k-means clustering) or defined by regulation. This was tested via 907 experimentally defined 96 h LC 50 values for acute fish toxicity. Our classification model explained ≈90% of the variance in our data for the training set and ≈80% for the test set. This strategy gave a 5-fold decrease in the frequency of incorrect categorization compared to a quantitative structure–activity relationship (QSAR) regression model. Our model was subsequently employed to predict the toxicity categories of ≈32k chemicals. A comparison between the model-based applicability domain (AD) and the training set AD was performed, suggesting that the training set-based AD is a more adequate way to avoid extrapolation when using such models. The better performance of our direct classification model compared to that of QSAR methods makes this approach a viable tool for assessing the hazards and risks of chemicals.

          Abstract

          In this study, a machine learning-based strategy that is an alternative to conventional quantitative structure−activity relationship models is used for the toxicity categorization of chemicals using molecular descriptors and direct classification.

          Related collections

          Most cited references49

          • Record: found
          • Abstract: not found
          • Article: not found

          SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Extended-connectivity fingerprints.

            Extended-connectivity fingerprints (ECFPs) are a novel class of topological fingerprints for molecular characterization. Historically, topological fingerprints were developed for substructure and similarity searching. ECFPs were developed specifically for structure-activity modeling. ECFPs are circular fingerprints with a number of useful qualities: they can be very rapidly calculated; they are not predefined and can represent an essentially infinite number of different molecular features (including stereochemical information); their features represent the presence of particular substructures, allowing easier interpretation of analysis results; and the ECFP algorithm can be tailored to generate different types of circular fingerprints, optimized for different uses. While the use of ECFPs has been widely adopted and validated, a description of their implementation has not previously been presented in the literature.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              PubChem Substance and Compound databases

              PubChem (https://pubchem.ncbi.nlm.nih.gov) is a public repository for information on chemical substances and their biological activities, launched in 2004 as a component of the Molecular Libraries Roadmap Initiatives of the US National Institutes of Health (NIH). For the past 11 years, PubChem has grown to a sizable system, serving as a chemical information resource for the scientific research community. PubChem consists of three inter-linked databases, Substance, Compound and BioAssay. The Substance database contains chemical information deposited by individual data contributors to PubChem, and the Compound database stores unique chemical structures extracted from the Substance database. Biological activity data of chemical substances tested in assay experiments are contained in the BioAssay database. This paper provides an overview of the PubChem Substance and Compound databases, including data sources and contents, data organization, data submission using PubChem Upload, chemical structure standardization, web-based interfaces for textual and non-textual searches, and programmatic access. It also gives a brief description of PubChem3D, a resource derived from theoretical three-dimensional structures of compounds in PubChem, as well as PubChemRDF, Resource Description Framework (RDF)-formatted PubChem data for data sharing, analysis and integration with information contained in other databases.
                Bookmark

                Author and article information

                Journal
                Environ Sci Technol
                Environ Sci Technol
                es
                esthag
                Environmental Science & Technology
                American Chemical Society
                0013-936X
                1520-5851
                08 December 2022
                21 November 2023
                : 57
                : 46 , Data Science for Advancing Environmental Science, Engineering, and Technology
                : 17950-17958
                Affiliations
                []Van ’t Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam (UvA) , 1090 GDAmsterdam, The Netherlands
                []UvA Data Science Center, University of Amsterdam , 1090 GDAmsterdam, The Netherlands
                [§ ]Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland , Brisbane, QLD4072, Australia
                []Norwegian Institute for Water Research (NIVA) , NO-0579Oslo, Norway
                []Institute for Biodiversity and Ecosystem Dynamics (IBED), University of Amsterdam , 1090 GDAmsterdam, The Netherlands
                Author notes
                Author information
                https://orcid.org/0000-0001-8270-6979
                https://orcid.org/0000-0001-9336-9656
                https://orcid.org/0000-0002-2155-100X
                https://orcid.org/0000-0003-0197-0116
                Article
                10.1021/acs.est.2c07353
                10666547
                36480454
                455da980-c5b7-40e3-9c44-a48056328260
                © 2022 The Authors. Published by American Chemical Society

                Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained ( https://creativecommons.org/licenses/by/4.0/).

                History
                : 07 October 2022
                : 28 November 2022
                : 27 November 2022
                Funding
                Funded by: Queensland Health, doi 10.13039/100010230;
                Award ID: NA
                Funded by: University of Amsterdam Data Science Centre, doi NA;
                Award ID: NA
                Funded by: National Health and Medical Research Council, doi 10.13039/501100000925;
                Award ID: EL1 2009209
                Funded by: Australian Research Council, doi 10.13039/501100000923;
                Award ID: DP190102476
                Categories
                Article
                Custom metadata
                es2c07353
                es2c07353

                General environmental science
                machine learning,lc50,qsar,toxicity categorization,hazard assessment
                General environmental science
                machine learning, lc50, qsar, toxicity categorization, hazard assessment

                Comments

                Comment on this article