18
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Automatic ICD-10 coding algorithm using an improved longest common subsequence based on semantic similarity

      research-article
      1 , 2 , 3 , 1 , 4 , *
      PLoS ONE
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          ICD-10(International Classification of Diseases 10th revision) is a classification of a disease, symptom, procedure, or injury. Diseases are often described in patients’ medical records with free texts, such as terms, phrases and paraphrases, which differ significantly from those used in ICD-10 classification. This paper presents an improved approach based on the Longest Common Subsequence (LCS) and semantic similarity for automatic Chinese diagnoses, mapping from the disease names given by clinician to the disease names in ICD-10. LCS refers to the longest string that is a subsequence of every member of a given set of strings. The proposed method of improved LCS in this paper can increase the accuracy of processing in Chinese disease mapping.

          Related collections

          Most cited references21

          • Record: found
          • Abstract: found
          • Article: not found

          Measuring diagnoses: ICD code accuracy.

          To examine potential sources of errors at each step of the described inpatient International Classification of Diseases (ICD) coding process. The use of disease codes from the ICD has expanded from classifying morbidity and mortality information for statistical purposes to diverse sets of applications in research, health care policy, and health care finance. By describing a brief history of ICD coding, detailing the process for assigning codes, identifying where errors can be introduced into the process, and reviewing methods for examining code accuracy, we help code users more systematically evaluate code accuracy for their particular applications. We summarize the inpatient ICD diagnostic coding process from patient admission to diagnostic code assignment. We examine potential sources of errors at each step and offer code users a tool for systematically evaluating code accuracy. Main error sources along the "patient trajectory" include amount and quality of information at admission, communication among patients and providers, the clinician's knowledge and experience with the illness, and the clinician's attention to detail. Main error sources along the "paper trail" include variance in the electronic and written records, coder training and experience, facility quality-control efforts, and unintentional and intentional coder errors, such as misspecification, unbundling, and upcoding. By clearly specifying the code assignment process and heightening their awareness of potential error sources, code users can better evaluate the applicability and limitations of codes for their particular situations. ICD codes can then be used in the most appropriate ways.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            A linear space algorithm for computing maximal common subsequences

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Automating the assignment of diagnosis codes to patient encounters using example-based and machine learning techniques.

              Human classification of diagnoses is a labor intensive process that consumes significant resources. Most medical practices use specially trained medical coders to categorize diagnoses for billing and research purposes. We have developed an automated coding system designed to assign codes to clinical diagnoses. The system uses the notion of certainty to recommend subsequent processing. Codes with the highest certainty are generated by matching the diagnostic text to frequent examples in a database of 22 million manually coded entries. These code assignments are not subject to subsequent manual review. Codes at a lower certainty level are assigned by matching to previously infrequently coded examples. The least certain codes are generated by a naïve Bayes classifier. The latter two types of codes are subsequently manually reviewed. Standard information retrieval accuracy measurements of precision, recall and f-measure were used. Micro- and macro-averaged results were computed. RESULTS At least 48% of all EMR problem list entries at the Mayo Clinic can be automatically classified with macro-averaged 98.0% precision, 98.3% recall and an f-score of 98.2%. An additional 34% of the entries are classified with macro-averaged 90.1% precision, 95.6% recall and 93.1% f-score. The remaining 18% of the entries are classified with macro-averaged 58.5%. Over two thirds of all diagnoses are coded automatically with high accuracy. The system has been successfully implemented at the Mayo Clinic, which resulted in a reduction of staff engaged in manual coding from thirty-four coders to seven verifiers.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                17 March 2017
                2017
                : 12
                : 3
                : e0173410
                Affiliations
                [1 ]Zhejiang University School of Medicine, Hangzhou, China
                [2 ]Hangzhou Vocational and Technical College, Hangzhou, China
                [3 ]College of Information Engineering of China Jiliang University, Hangzhou, China
                [4 ]Zhejiang University the First Affiliated Hospital, Hangzhou, China
                Huazhong University of Science and Technology, CHINA
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                • Conceptualization: LJL.

                • Data curation: YZC.

                • Funding acquisition: HJL YZC.

                • Investigation: LJL.

                • Methodology: HJL YZC LJL.

                • Project administration: YZC.

                • Resources: YZC HJL LJL.

                • Supervision: LJL HJL.

                • Validation: HJL YZC LJL.

                • Writing – original draft: YZC.

                • Writing – review & editing: YZC HJL.

                Author information
                http://orcid.org/0000-0001-5917-3836
                Article
                PONE-D-16-40232
                10.1371/journal.pone.0173410
                5356997
                28306739
                9ada995a-467b-424b-b1c4-ed4cf4369191
                © 2017 Chen et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 12 October 2016
                : 19 February 2017
                Page count
                Figures: 10, Tables: 5, Pages: 17
                Funding
                This study is supported by National Natural Science Foundation of China (No. 61272315). It is also supported by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (CICAEET) and Hangzhou "131" middle-aged and young talent training plan.
                Categories
                Research Article
                Medicine and health sciences
                Infectious diseases
                Viral diseases
                Hepatitis
                Hepatitis A
                Medicine and health sciences
                Gastroenterology and hepatology
                Liver diseases
                Infectious hepatitis
                Hepatitis A
                Biology and Life Sciences
                Neuroscience
                Cognitive Science
                Cognitive Psychology
                Language
                Biology and Life Sciences
                Psychology
                Cognitive Psychology
                Language
                Social Sciences
                Psychology
                Cognitive Psychology
                Language
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Research and Analysis Methods
                Simulation and Modeling
                Algorithms
                Computer and Information Sciences
                Data Visualization
                Infographics
                Charts
                Medicine and Health Sciences
                Epidemiology
                Disease Surveillance
                Medicine and Health Sciences
                Medicine and health sciences
                Infectious diseases
                Viral diseases
                Hepatitis
                Hepatitis B
                Medicine and health sciences
                Gastroenterology and hepatology
                Liver diseases
                Infectious hepatitis
                Hepatitis B
                People and Places
                Geographical Locations
                Asia
                China
                Custom metadata
                All relevant data are within the paper and its Supporting Information files.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article