3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Increasing the Density of Laboratory Measures for Machine Learning Applications

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background. The imputation of missingness is a key step in Electronic Health Records (EHR) mining, as it can significantly affect the conclusions derived from the downstream analysis in translational medicine. The missingness of laboratory values in EHR is not at random, yet imputation techniques tend to disregard this key distinction. Consequently, the development of an adaptive imputation strategy designed specifically for EHR is an important step in improving the data imbalance and enhancing the predictive power of modeling tools for healthcare applications. Method. We analyzed the laboratory measures derived from Geisinger’s EHR on patients in three distinct cohorts—patients tested for Clostridioides difficile (Cdiff) infection, patients with a diagnosis of inflammatory bowel disease (IBD), and patients with a diagnosis of hip or knee osteoarthritis (OA). We extracted Logical Observation Identifiers Names and Codes (LOINC) from which we excluded those with 75% or more missingness. The comorbidities, primary or secondary diagnosis, as well as active problem lists, were also extracted. The adaptive imputation strategy was designed based on a hybrid approach. The comorbidity patterns of patients were transformed into latent patterns and then clustered. Imputation was performed on a cluster of patients for each cohort independently to show the generalizability of the method. The results were compared with imputation applied to the complete dataset without incorporating the information from comorbidity patterns. Results. We analyzed a total of 67,445 patients (11,230 IBD patients, 10,000 OA patients, and 46,215 patients tested for C. difficile infection). We extracted 495 LOINC and 11,230 diagnosis codes for the IBD cohort, 8160 diagnosis codes for the Cdiff cohort, and 2042 diagnosis codes for the OA cohort based on the primary/secondary diagnosis and active problem list in the EHR. Overall, the most improvement from this strategy was observed when the laboratory measures had a higher level of missingness. The best root mean square error (RMSE) difference for each dataset was recorded as −35.5 for the Cdiff, −8.3 for the IBD, and −11.3 for the OA dataset. Conclusions. An adaptive imputation strategy designed specifically for EHR that uses complementary information from the clinical profile of the patient can be used to improve the imputation of missing laboratory values, especially when laboratory codes with high levels of missingness are included in the analysis.

          Related collections

          Most cited references31

          • Record: found
          • Abstract: not found
          • Article: not found

          mice: Multivariate Imputation by Chained Equations inR

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found

            Multiple imputation using chained equations: Issues and guidance for practice

            Multiple imputation by chained equations is a flexible and practical approach to handling missing data. We describe the principles of the method and show how to impute categorical and quantitative variables, including skewed variables. We give guidance on how to specify the imputation model and how many imputations are needed. We describe the practical analysis of multiply imputed data, including model building and model checking. We stress the limitations of the method and discuss the possible pitfalls. We illustrate the ideas using a data set in mental health, giving Stata code fragments. 2010 John Wiley & Sons, Ltd.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found
              Is Open Access

              Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls

              Most studies have some missing data. Jonathan Sterne and colleagues describe the appropriate use and reporting of the multiple imputation approach to dealing with them
                Bookmark

                Author and article information

                Journal
                J Clin Med
                J Clin Med
                jcm
                Journal of Clinical Medicine
                MDPI
                2077-0383
                30 December 2020
                January 2021
                : 10
                : 1
                : 103
                Affiliations
                [1 ]Department of Molecular and Functional Genomics, Geisinger Health System, Danville, PA 17822, USA; jli@ 123456geisinger.edu (J.L.); vavula1@ 123456geisinger.edu (V.A.)
                [2 ]NIMML Institute, Blacksburg, VA 24060, USA; rmagarzo@ 123456biotherapeuticsinc.com (R.H.); jbassaganya@ 123456biotherapeuticsinc.com (J.B.-R.)
                [3 ]Geisinger Medical Center, Biomedical Translational Informatics Institute, Danville, PA 17822, USA; manu.ksmanu@ 123456gmail.com
                [4 ]Geisinger Medical Center, Neuroscience Institute, Danville, PA 17822, USA; dpchaudhary@ 123456geisinger.edu (D.P.C.); rzand@ 123456geisinger.edu (R.Z.)
                [5 ]Geisinger Medical Center, Department of Gastroenterology and Hepatology, Danville, PA 17822, USA; mjshellenberger@ 123456geisinger.edu (M.J.S.); hskhara@ 123456geisinger.edu (H.S.K.)
                [6 ]Geisinger Medical Center, Genomic Medicine Institute, Danville, PA 17822, USA; yzhang1@ 123456geisinger.edu (Y.Z.); mlee2@ 123456geisinger.edu (M.T.M.L.)
                [7 ]Molecular and Microbial Diagnostics and Development, Geisinger Medical Center, Danville, PA 17822, USA; dmwolk@ 123456geisinger.edu
                [8 ]Department of Electrical and Computer Engineering, Memphis University, Memphis, TN 38152, USA; myeasin@ 123456memphis.edu
                [9 ]BioTherapeutics, Inc., Blacksburg, VA 24060, USA
                Author notes
                Author information
                https://orcid.org/0000-0001-7689-933X
                https://orcid.org/0000-0001-5141-1502
                https://orcid.org/0000-0002-9826-6370
                https://orcid.org/0000-0003-2850-625X
                https://orcid.org/0000-0003-2828-5135
                Article
                jcm-10-00103
                10.3390/jcm10010103
                7795258
                33396741
                96b9493c-2eb3-43bb-8200-a749e94c59f2
                © 2020 by the authors.

                Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

                History
                : 25 November 2020
                : 25 December 2020
                Categories
                Article

                imputation,electronic health records,machine learning,ehr,laboratory measures,medical informatics,inflammatory bowel disease,c. difficile infection,osteoarthritis,complex diseases

                Comments

                Comment on this article