92
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The importance of incorporating Natural Language Processing (NLP) methods in clinical informatics research has been increasingly recognized over the past years, and has led to transformative advances.

          Typically, clinical NLP systems are developed and evaluated on word, sentence, or document level annotations that model specific attributes and features, such as document content (e.g., patient status, or report type), document section types (e.g., current medications, past medical history, or discharge summary), named entities and concepts (e.g., diagnoses, symptoms, or treatments) or semantic attributes (e.g., negation, severity, or temporality).

          From a clinical perspective, on the other hand, research studies are typically modelled and evaluated on a patient- or population-level, such as predicting how a patient group might respond to specific treatments or patient monitoring over time. While some NLP tasks consider predictions at the individual or group user level, these tasks still constitute a minority. Owing to the discrepancy between scientific objectives of each field, and because of differences in methodological evaluation priorities, there is no clear alignment between these evaluation approaches.

          Here we provide a broad summary and outline of the challenging issues involved in defining appropriate intrinsic and extrinsic evaluation methods for NLP research that is to be used for clinical outcomes research, and vice versa. A particular focus is placed on mental health research, an area still relatively understudied by the clinical NLP research community, but where NLP methods are of notable relevance. Recent advances in clinical NLP method development have been significant, but we propose more emphasis needs to be placed on rigorous evaluation for the field to advance further. To enable this, we provide actionable suggestions, including a minimal protocol that could be used when reporting clinical NLP method development and its evaluation.

          Related collections

          Most cited references62

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records

          Secondary use of electronic health records (EHRs) promises to advance clinical research and better inform clinical decision making. Challenges in summarizing and representing patient data prevent widespread practice of predictive modeling using EHRs. Here we present a novel unsupervised deep feature learning method to derive a general-purpose patient representation from EHR data that facilitates clinical predictive modeling. In particular, a three-layer stack of denoising autoencoders was used to capture hierarchical regularities and dependencies in the aggregated EHRs of about 700,000 patients from the Mount Sinai data warehouse. The result is a representation we name “deep patient”. We evaluated this representation as broadly predictive of health states by assessing the probability of patients to develop various diseases. We performed evaluation using 76,214 test patients comprising 78 diseases from diverse clinical domains and temporal windows. Our results significantly outperformed those achieved using representations based on raw EHR data and alternative feature learning strategies. Prediction performance for severe diabetes, schizophrenia, and various cancers were among the top performing. These findings indicate that deep learning applied to EHRs can derive patient representations that offer improved clinical predictions, and could provide a machine learning framework for augmenting clinical decision systems.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network.

            Genetic studies require precise phenotype definitions, but electronic medical record (EMR) phenotype data are recorded inconsistently and in a variety of formats. To present lessons learned about validation of EMR-based phenotypes from the Electronic Medical Records and Genomics (eMERGE) studies. The eMERGE network created and validated 13 EMR-derived phenotype algorithms. Network sites are Group Health, Marshfield Clinic, Mayo Clinic, Northwestern University, and Vanderbilt University. By validating EMR-derived phenotypes we learned that: (1) multisite validation improves phenotype algorithm accuracy; (2) targets for validation should be carefully considered and defined; (3) specifying time frames for review of variables eases validation time and improves accuracy; (4) using repeated measures requires defining the relevant time period and specifying the most meaningful value to be studied; (5) patient movement in and out of the health plan (transience) can result in incomplete or fragmented data; (6) the review scope should be defined carefully; (7) particular care is required in combining EMR and research data; (8) medication data can be assessed using claims, medications dispensed, or medications prescribed; (9) algorithm development and validation work best as an iterative process; and (10) validation by content experts or structured chart review can provide accurate results. Despite the diverse structure of the five EMRs of the eMERGE sites, we developed, validated, and successfully deployed 13 electronic phenotype algorithms. Validation is a worthwhile process that not only measures phenotype performance but also strengthens phenotype algorithm definitions and enhances their inter-institutional sharing.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              What can natural language processing do for clinical decision support?

              Computerized clinical decision support (CDS) aims to aid decision making of health care providers and the public by providing easily accessible health-related information at the point and time it is needed. natural language processing (NLP) is instrumental in using free-text information to drive CDS, representing clinical knowledge and CDS interventions in standardized formats, and leveraging clinical narrative. The early innovative NLP research of clinical narrative was followed by a period of stable research conducted at the major clinical centers and a shift of mainstream interest to biomedical NLP. This review primarily focuses on the recently renewed interest in development of fundamental NLP methods and advances in the NLP systems for CDS. The current solutions to challenges posed by distinct sublanguages, intended user groups, and support goals are discussed.
                Bookmark

                Author and article information

                Journal
                100970413
                J Biomed Inform
                J Biomed Inform
                Journal of biomedical informatics
                1532-0464
                1532-0480
                1 December 2018
                24 October 2018
                28 January 2020
                : 88
                : 11-19
                Affiliations
                [a ]Institute of Psychiatry, Psychology & Neuroscience, King’s College London, UK
                [b ]School of Electrical Engineering and Computer Science, KTH, Stockholm, Sweden
                [c ]College of Engineering and Computer Science, The Australian National University, Data61/CSIRO, University of Canberra, Australia
                [d ]University of Turku, Finland
                [e ]Department of Computer Science, University of Warwick/Alan Turing Institute, UK
                [f ]Institute of Health Informatics, University College London, UK
                [g ]University College London NHS Foundation Trust, London, UK
                [h ]Melbourne School of Population and Global Health, The University of Melbourne, Australia
                [i ]Division of Psychiatry, University College London, UK
                [j ]Camden and Islington NHS Foundation Trust, London, UK
                [k ]South London and Maudsley NHS Foundation Trust, London, UK
                [l ]Department of Biomedical Informatics, University of Utah, United States
                Author notes
                [* ]Corresponding author at: King’s College London and KTH Stockholm, SLaM Biomedical Research Centre Nucleus, Maudsley Site, Ground Floor Mapother House, De Crespigny Park, Denmark Hill, London SE5 8AF, UK.
                Article
                EMS83088
                10.1016/j.jbi.2018.10.005
                6986921
                30368002
                59cf5bcc-def6-487f-bf68-8d0b71528b47

                This is an open access article under the CC BY license ( http://creativecommons.org/licenses/BY/4.0/).

                History
                Categories
                Article

                natural language processing,information extraction,text analytics,evaluation,clinical informatics,mental health informatics,epidemiology,public health

                Comments

                Comment on this article

                scite_
                0
                0
                0
                0
                Smart Citations
                0
                0
                0
                0
                Citing PublicationsSupportingMentioningContrasting
                View Citations

                See how this article has been cited at scite.ai

                scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

                Similar content58

                Cited by66

                Most referenced authors1,659