56
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Detection of IUPAC and IUPAC-like chemical names

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation: Chemical compounds like small signal molecules or other biological active chemical substances are an important entity class in life science publications and patents. Several representations and nomenclatures for chemicals like SMILES, InChI, IUPAC or trivial names exist. Only SMILES and InChI names allow a direct structure search, but in biomedical texts trivial names and Iupac like names are used more frequent. While trivial names can be found with a dictionary-based approach and in such a way mapped to their corresponding structures, it is not possible to enumerate all IUPAC names. In this work, we present a new machine learning approach based on conditional random fields (CRF) to find mentions of IUPAC and IUPAC-like names in scientific text as well as its evaluation and the conversion rate with available name-to-structure tools.

          Results: We present an IUPAC name recognizer with an F 1 measure of 85.6% on a MEDLINE corpus. The evaluation of different CRF orders and offset conjunction orders demonstrates the importance of these parameters. An evaluation of hand-selected patent sections containing large enumerations and terms with mixed nomenclature shows a good performance on these cases ( F 1 measure 81.5%). Remaining recognition problems are to detect correct borders of the typically long terms, especially when occurring in parentheses or enumerations. We demonstrate the scalability of our implementation by providing results from a full MEDLINE run.

          Availability: We plan to publish the corpora, annotation guideline as well as the conditional random field model as a UIMA component.

          Contact: roman.klinger@ 123456scai.fraunhofer.de

          Related collections

          Most cited references51

          • Record: found
          • Abstract: found
          • Book: not found

          An Introduction to the Bootstrap

          Statistics is a subject of many uses and surprisingly few effective practitioners. The traditional road to statistical knowledge is blocked, for most, by a formidable wall of mathematics. The approach in An Introduction to the Bootstrap avoids that wall. It arms scientists and engineers, as well as statisticians, with the computational techniques they need to analyze and understand complicated data sets.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Factor graphs and the sum-product algorithm

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Updating quasi-Newton matrices with limited storage

                Bookmark

                Author and article information

                Journal
                Bioinformatics
                bioinformatics
                bioinfo
                Bioinformatics
                Oxford University Press
                1367-4803
                1460-2059
                1 July 2008
                1 July 2008
                : 24
                : 13
                : i268-i276
                Affiliations
                1Fraunhofer Institute Algorithms and Scientific Computing (SCAI), Department of Bioinformatics, Schloss Birlinghoven, 53574 Sankt Augustin and 2Bonn-Aachen International Center for Information Technology (B-IT), Department of Applied Life Science Informatics, Dahlmannstrasse 2, D-53113 Bonn, Germany
                Author notes
                *To whom correspondence should be addressed.
                Article
                btn181
                10.1093/bioinformatics/btn181
                2718657
                18586724
                ae3e33b9-07a7-44b9-9c48-cc96ef2b6f64
                © 2008 The Author(s)

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                Categories
                Ismb 2008 Conference Proceedings 19–23 July 2008, Toronto
                Original Papers
                Text Mining

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article