326
views
0
recommends
+1 Recommend
2 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      “MS-Ready” structures for non-targeted high-resolution mass spectrometry screening studies

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Chemical database searching has become a fixture in many non-targeted identification workflows based on high-resolution mass spectrometry (HRMS). However, the form of a chemical structure observed in HRMS does not always match the form stored in a database (e.g., the neutral form versus a salt; one component of a mixture rather than the mixture form used in a consumer product). Linking the form of a structure observed via HRMS to its related form(s) within a database will enable the return of all relevant variants of a structure, as well as the related metadata, in a single query. A Konstanz Information Miner (KNIME) workflow has been developed to produce structural representations observed using HRMS (“MS-Ready structures”) and links them to those stored in a database. These MS-Ready structures, and associated mappings to the full chemical representations, are surfaced via the US EPA’s Chemistry Dashboard ( https://comptox.epa.gov/dashboard/). This article describes the workflow for the generation and linking of ~ 700,000 MS-Ready structures (derived from ~ 760,000 original structures) as well as download, search and export capabilities to serve structure identification using HRMS. The importance of this form of structural representation for HRMS is demonstrated with several examples, including integration with the in silico fragmentation software application MetFrag. The structures, search, download and export functionality are all available through the CompTox Chemistry Dashboard, while the MetFrag implementation can be viewed at https://msbi.ipb-halle.de/MetFragBeta/.

          Electronic supplementary material

          The online version of this article (10.1186/s13321-018-0299-2) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references34

          • Record: found
          • Abstract: found
          • Article: not found

          MassBank: a public repository for sharing mass spectral data for life sciences.

          MassBank is the first public repository of mass spectra of small chemical compounds for life sciences (<3000 Da). The database contains 605 electron-ionization mass spectrometry (EI-MS), 137 fast atom bombardment MS and 9276 electrospray ionization (ESI)-MS(n) data of 2337 authentic compounds of metabolites, 11 545 EI-MS and 834 other-MS data of 10,286 volatile natural and synthetic compounds, and 3045 ESI-MS(2) data of 679 synthetic drugs contributed by 16 research groups (January 2010). ESI-MS(2) data were analyzed under nonstandardized, independent experimental conditions. MassBank is a distributed database. Each research group provides data from its own MassBank data servers distributed on the Internet. MassBank users can access either all of the MassBank data or a subset of the data by specifying one or more experimental conditions. In a spectral search to retrieve mass spectra similar to a query mass spectrum, the similarity score is calculated by a weighted cosine correlation in which weighting exponents on peak intensity and the mass-to-charge ratio are optimized to the ESI-MS(2) data. MassBank also provides a merged spectrum for each compound prepared by merging the analyzed ESI-MS(2) data on an identical compound under different collision-induced dissociation conditions. Data merging has significantly improved the precision of the identification of a chemical compound by 21-23% at a similarity score of 0.6. Thus, MassBank is useful for the identification of chemical compounds and the publication of experimental data. 2010 John Wiley & Sons, Ltd.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            MetFrag relaunched: incorporating strategies beyond in silico fragmentation

            Background The in silico fragmenter MetFrag, launched in 2010, was one of the first approaches combining compound database searching and fragmentation prediction for small molecule identification from tandem mass spectrometry data. Since then many new approaches have evolved, as has MetFrag itself. This article details the latest developments to MetFrag and its use in small molecule identification since the original publication. Results MetFrag has gone through algorithmic and scoring refinements. New features include the retrieval of reference, data source and patent information via ChemSpider and PubChem web services, as well as InChIKey filtering to reduce candidate redundancy due to stereoisomerism. Candidates can be filtered or scored differently based on criteria like occurence of certain elements and/or substructures prior to fragmentation, or presence in so-called “suspect lists”. Retention time information can now be calculated either within MetFrag with a sufficient amount of user-provided retention times, or incorporated separately as “user-defined scores” to be included in candidate ranking. The changes to MetFrag were evaluated on the original dataset as well as a dataset of 473 merged high resolution tandem mass spectra (HR-MS/MS) and compared with another open source in silico fragmenter, CFM-ID. Using HR-MS/MS information only, MetFrag2.2 and CFM-ID had 30 and 43 Top 1 ranks, respectively, using PubChem as a database. Including reference and retention information in MetFrag2.2 improved this to 420 and 336 Top 1 ranks with ChemSpider and PubChem (89 and 71 %), respectively, and even up to 343 Top 1 ranks (PubChem) when combining with CFM-ID. The optimal parameters and weights were verified using three additional datasets of 824 merged HR-MS/MS spectra in total. Further examples are given to demonstrate flexibility of the enhanced features. Conclusions In many cases additional information is available from the experimental context to add to small molecule identification, which is especially useful where the mass spectrum alone is not sufficient for candidate selection from a large number of candidates. The results achieved with MetFrag2.2 clearly show the benefit of considering this additional information. The new functions greatly enhance the chance of identification success and have been incorporated into a command line interface in a flexible way designed to be integrated into high throughput workflows. Feedback on the command line version of MetFrag2.2 available at http://c-ruttkies.github.io/MetFrag/ is welcome. Electronic supplementary material The online version of this article (doi:10.1186/s13321-016-0115-9) contains supplementary material, which is available to authorized users.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research.

                Bookmark

                Author and article information

                Contributors
                1-919-541-3001 , mceachran.andrew@epa.gov
                kamel.mansouri@nih.gov
                grulke.chris@epa.gov
                emma.schymanski@uni.lu
                christoph.ruttkies@ipb-halle.de
                1-919-541-1033 , williams.antony@epa.gov
                Journal
                J Cheminform
                J Cheminform
                Journal of Cheminformatics
                Springer International Publishing (Cham )
                1758-2946
                30 August 2018
                30 August 2018
                2018
                : 10
                : 45
                Affiliations
                [1 ]ISNI 0000 0001 2146 2763, GRID grid.418698.a, Oak Ridge Institute for Science and Education (ORISE) Research Participation Program, , U.S. Environmental Protection Agency, ; 109 T.W. Alexander Dr., Research Triangle Park, NC 27711 USA
                [2 ]ISNI 0000 0001 2146 2763, GRID grid.418698.a, National Center for Computational Toxicology, Office of Research and Development, , U.S. Environmental Protection Agency, ; Mail Drop D143-02, 109 T.W. Alexander Dr., Research Triangle Park, NC 27711 USA
                [3 ]ISNI 0000 0004 0589 1113, GRID grid.280855.2, Present Address: Integrated Laboratory Systems, Inc., ; 601 Keystone Dr., Morrisville, NC 27650 USA
                [4 ]ISNI 0000 0001 2295 9843, GRID grid.16008.3f, Luxembourg Centre for Systems Biomedicine (LCSB), , University of Luxembourg, ; 6, avenue du Swing, 4367 Belvaux, Luxembourg
                [5 ]ISNI 0000 0004 0493 728X, GRID grid.425084.f, Department of Stress and Development Biology, , Leibniz Institute of Plant Biochemistry (IPB), ; Weinberg 3, 06120 Halle (Saale), Germany
                Article
                299
                10.1186/s13321-018-0299-2
                6117229
                30167882
                1467e0d0-6ff4-4427-942d-e0a287bde5c8
                © The Author(s) 2018

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 16 May 2018
                : 21 August 2018
                Funding
                Funded by: EU FP7 SOLUTIONS
                Award ID: 603437
                Award Recipient :
                Funded by: EU H2020 PhenoMeNal
                Award ID: 654241
                Award Recipient :
                Categories
                Methodology
                Custom metadata
                © The Author(s) 2018

                Chemoinformatics
                high-resolution mass spectrometry (hrms),structure identification,structure curation,database searching

                Comments

                Comment on this article