95
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Development of phenotype algorithms using electronic medical records and incorporating natural language processing

      other

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Electronic medical records are emerging as a major source of data for clinical and translational research studies, although phenotypes of interest need to be accurately defined first. This article provides an overview of how to develop a phenotype algorithm from electronic medical records, incorporating modern informatics and biostatistics methods.

          Related collections

          Most cited references18

          • Record: found
          • Abstract: found
          • Article: not found

          The prevalence and geographic distribution of Crohn's disease and ulcerative colitis in the United States.

          Previous US studies of inflammatory bowel disease (IBD) prevalence have sampled small, geographically restricted populations and may not be generalizable to the entire nation. This study sought to determine the prevalence of Crohn's disease (CD) and ulcerative colitis (UC) in a large national sample and to compare the prevalence across geographic regions and other sociodemographic characteristics. We analyzed the health insurance claims for 9 million Americans, pooled from 87 health plans in 33 states, and identified cases of CD and UC using diagnosis codes. Prevalence was determined by dividing the number of cases by the number of persons enrolled for 2 years. Logistic regression was used to compare prevalence estimates by geographic region, age, sex, and insurance type (Medicaid vs commercial). The prevalence of CD and UC in children younger than 20 years was 43 (95% confidence interval [CI], 40-45) and 28 (95% CI, 26-30) per 100,000, respectively. In adults, the prevalence of CD and UC was 201 (95% CI, 197-204) and 238 (95% CI, 234-241), respectively. The prevalence of both conditions was lower in the South, compared with the Northeast, Midwest, and West. IBD appears to be more common in commercially insured individuals, compared with those insured by Medicaid. This estimation of the prevalence of IBD in the US should help quantify the overall burden of disease and inform the planning of appropriate clinical services.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            DNorm: disease name normalization with pairwise learning to rank

            Motivation: Despite the central role of diseases in biomedical research, there have been much fewer attempts to automatically determine which diseases are mentioned in a text—the task of disease name normalization (DNorm)—compared with other normalization tasks in biomedical text mining research. Methods: In this article we introduce the first machine learning approach for DNorm, using the NCBI disease corpus and the MEDIC vocabulary, which combines MeSH® and OMIM. Our method is a high-performing and mathematically principled framework for learning similarities between mentions and concept names directly from training data. The technique is based on pairwise learning to rank, which has not previously been applied to the normalization task but has proven successful in large optimization problems for information retrieval. Results: We compare our method with several techniques based on lexical normalization and matching, MetaMap and Lucene. Our algorithm achieves 0.782 micro-averaged F-measure and 0.809 macro-averaged F-measure, an increase over the highest performing baseline method of 0.121 and 0.098, respectively. Availability: The source code for DNorm is available at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/DNorm, along with a web-based demonstration and links to the NCBI disease corpus. Results on PubMed abstracts are available in PubTator: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator Contact: zhiyong.lu@nih.gov
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Automated identification of postoperative complications within an electronic medical record using natural language processing.

              Currently most automated methods to identify patient safety occurrences rely on administrative data codes; however, free-text searches of electronic medical records could represent an additional surveillance approach. To evaluate a natural language processing search-approach to identify postoperative surgical complications within a comprehensive electronic medical record. Cross-sectional study involving 2974 patients undergoing inpatient surgical procedures at 6 Veterans Health Administration (VHA) medical centers from 1999 to 2006. Postoperative occurrences of acute renal failure requiring dialysis, deep vein thrombosis, pulmonary embolism, sepsis, pneumonia, or myocardial infarction identified through medical record review as part of the VA Surgical Quality Improvement Program. We determined the sensitivity and specificity of the natural language processing approach to identify these complications and compared its performance with patient safety indicators that use discharge coding information. The proportion of postoperative events for each sample was 2% (39 of 1924) for acute renal failure requiring dialysis, 0.7% (18 of 2327) for pulmonary embolism, 1% (29 of 2327) for deep vein thrombosis, 7% (61 of 866) for sepsis, 16% (222 of 1405) for pneumonia, and 2% (35 of 1822) for myocardial infarction. Natural language processing correctly identified 82% (95% confidence interval [CI], 67%-91%) of acute renal failure cases compared with 38% (95% CI, 25%-54%) for patient safety indicators. Similar results were obtained for venous thromboembolism (59%, 95% CI, 44%-72% vs 46%, 95% CI, 32%-60%), pneumonia (64%, 95% CI, 58%-70% vs 5%, 95% CI, 3%-9%), sepsis (89%, 95% CI, 78%-94% vs 34%, 95% CI, 24%-47%), and postoperative myocardial infarction (91%, 95% CI, 78%-97%) vs 89%, 95% CI, 74%-96%). Both natural language processing and patient safety indicators were highly specific for these diagnoses. Among patients undergoing inpatient surgical procedures at VA medical centers, natural language processing analysis of electronic medical records to identify postoperative complications had higher sensitivity and lower specificity compared with patient safety indicators based on discharge coding.
                Bookmark

                Author and article information

                Contributors
                Role: assistant professor
                Role: professor
                Role: associate professor
                Role: associate professor
                Role: associate professor
                Role: assistant professor
                Role: senior analyst
                Role: assistant professor
                Role: assistant professor
                Role: professor
                Role: executive director
                Role: professor
                Journal
                BMJ
                BMJ
                bmj
                BMJ : British Medical Journal
                BMJ Publishing Group Ltd.
                0959-8138
                1756-1833
                2015
                24 April 2015
                : 350
                : h1885
                Affiliations
                [1 ]Division of Rheumatology, Immunology and Allergy, Brigham and Women’s Hospital, Boston, MA 02115, USA
                [2 ]Harvard Medical School, Boston
                [3 ]Department of Biostatistics, Harvard School of Public Health, Boston
                [4 ]Department of Pediatrics, Children’s Hospital of Boston, Boston
                [5 ]Department of Neurology, Massachusetts General Hospital, Boston
                [6 ]Department of Gastroenterology, Massachusetts General Hospital, MGH Crohn’s and Colitis Center, Boston
                [7 ]Partners Research Computing, Partners HealthCare System, Boston
                [8 ]Center for Systems Biology, Massachusetts General Hospital, Boston
                [9 ]Department of Neurology, Harvard Medical School, Boston
                [10 ]Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA
                Author notes
                Correspondence to: K P Liao kliao@ 123456partners.org
                Article
                liak023069
                10.1136/bmj.h1885
                4707569
                25911572
                ccd88bb4-868f-4e4e-92f3-441463d6b2fb
                © BMJ Publishing Group Ltd 2015
                History
                : 2 February 2015
                Categories
                Research Methods & Reporting
                1779

                Medicine
                Medicine

                Comments

                Comment on this article