17
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      CLAMP – a toolkit for efficiently building customized clinical natural language processing pipelines

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Existing general clinical natural language processing (NLP) systems such as MetaMap and Clinical Text Analysis and Knowledge Extraction System have been successfully applied to information extraction from clinical text. However, end users often have to customize existing systems for their individual tasks, which can require substantial NLP skills. Here we present CLAMP (Clinical Language Annotation, Modeling, and Processing), a newly developed clinical NLP toolkit that provides not only state-of-the-art NLP components, but also a user-friendly graphic user interface that can help users quickly build customized NLP pipelines for their individual applications. Our evaluation shows that the CLAMP default pipeline achieved good performance on named entity recognition and concept encoding. We also demonstrate the efficiency of the CLAMP graphic user interface in building customized, high-performance NLP pipelines with 2 use cases, extracting smoking status and lab test values. CLAMP is publicly available for research use, and we believe it is a unique asset for the clinical NLP community.

          Related collections

          Most cited references30

          • Record: found
          • Abstract: found
          • Article: not found

          A simple algorithm for identifying negated findings and diseases in discharge summaries.

          Narrative reports in medical records contain a wealth of information that may augment structured data for managing patient information and predicting trends in diseases. Pertinent negatives are evident in text but are not usually indexed in structured databases. The objective of the study reported here was to test a simple algorithm for determining whether a finding or disease mentioned within narrative medical reports is present or absent. We developed a simple regular expression algorithm called NegEx that implements several phrases indicating negation, filters out sentences containing phrases that falsely appear to be negation phrases, and limits the scope of the negation phrases. We compared NegEx against a baseline algorithm that has a limited set of negation phrases and a simpler notion of scope. In a test of 1235 findings and diseases in 1000 sentences taken from discharge summaries indexed by physicians, NegEx had a specificity of 94.5% (versus 85.3% for the baseline), a positive predictive value of 84.5% (versus 68.4% for the baseline) while maintaining a reasonable sensitivity of 77.8% (versus 88.3% for the baseline). We conclude that with little implementation effort a simple regular expression algorithm for determining whether a finding or disease is absent can identify a large portion of the pertinent negatives from discharge summaries.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            What can natural language processing do for clinical decision support?

            Computerized clinical decision support (CDS) aims to aid decision making of health care providers and the public by providing easily accessible health-related information at the point and time it is needed. natural language processing (NLP) is instrumental in using free-text information to drive CDS, representing clinical knowledge and CDS interventions in standardized formats, and leveraging clinical narrative. The early innovative NLP research of clinical narrative was followed by a period of stable research conducted at the major clinical centers and a shift of mainstream interest to biomedical NLP. This review primarily focuses on the recently renewed interest in development of fundamental NLP methods and advances in the NLP systems for CDS. The current solutions to challenges posed by distinct sublanguages, intended user groups, and support goals are discussed.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Extracting medication information from clinical text.

              The Third i2b2 Workshop on Natural Language Processing Challenges for Clinical Records focused on the identification of medications, their dosages, modes (routes) of administration, frequencies, durations, and reasons for administration in discharge summaries. This challenge is referred to as the medication challenge. For the medication challenge, i2b2 released detailed annotation guidelines along with a set of annotated discharge summaries. Twenty teams representing 23 organizations and nine countries participated in the medication challenge. The teams produced rule-based, machine learning, and hybrid systems targeted to the task. Although rule-based systems dominated the top 10, the best performing system was a hybrid. Of all medication-related fields, durations and reasons were the most difficult for all systems to detect. While medications themselves were identified with better than 0.75 F-measure by all of the top 10 systems, the best F-measure for durations and reasons were 0.525 and 0.459, respectively. State-of-the-art natural language processing systems go a long way toward extracting medication names, dosages, modes, and frequencies. However, they are limited in recognizing duration and reason fields and would benefit from future research.
                Bookmark

                Author and article information

                Journal
                J Am Med Inform Assoc
                J Am Med Inform Assoc
                jamia
                Journal of the American Medical Informatics Association : JAMIA
                Oxford University Press
                1067-5027
                1527-974X
                March 2018
                24 November 2017
                24 November 2017
                : 25
                : 3
                : 331-336
                Affiliations
                [1 ]School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
                [2 ]Department of Pharmaceutical Care and Health System, University of Minnesota Twin Cities, Minneapolis, MN, USA
                [3 ]Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
                Author notes
                Corresponding Author: Hua Xu, 7000 Fannin, Suite 600, Houston, TX 77030, USA. E-mail: Hua.Xu@ 123456uth.tmc.edu . Phone:713.500.3924
                Article
                ocx132
                10.1093/jamia/ocx132
                7378877
                29186491
                a7a0220f-29f6-40ed-8033-c8e31bc8f667
                © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                History
                : 4 May 2017
                : 28 September 2017
                : 19 October 2017
                Page count
                Pages: 6
                Funding
                Funded by: NIH, DOI 10.13039/100000002;
                Funded by: National Institute of General Medical Sciences, DOI 10.13039/100000057;
                Award ID: GM102282 and GM103859
                Funded by: National Cancer Institute, DOI 10.13039/100000054;
                Award ID: CA194215
                Funded by: Cancer Prevention and Research Institute of Texas, DOI 10.13039/100004917;
                Award ID: R1307
                Categories
                Brief Communication

                Bioinformatics & Computational biology
                natural language processing,machine learning,clinical text processing

                Comments

                Comment on this article