0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A novel interpretable machine learning system to generate clinical risk scores: An application for predicting early mortality or unplanned readmission in a retrospective cohort study

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Risk scores are widely used for clinical decision making and commonly generated from logistic regression models. Machine-learning-based methods may work well for identifying important predictors to create parsimonious scores, but such ‘black box’ variable selection limits interpretability, and variable importance evaluated from a single model can be biased. We propose a robust and interpretable variable selection approach using the recently developed Shapley variable importance cloud (ShapleyVIC) that accounts for variability in variable importance across models. Our approach evaluates and visualizes overall variable contributions for in-depth inference and transparent variable selection, and filters out non-significant contributors to simplify model building steps. We derive an ensemble variable ranking from variable contributions across models, which is easily integrated with an automated and modularized risk score generator, AutoScore, for convenient implementation. In a study of early death or unplanned readmission after hospital discharge, ShapleyVIC selected 6 variables from 41 candidates to create a well-performing risk score, which had similar performance to a 16-variable model from machine-learning-based ranking. Our work contributes to the recent emphasis on interpretability of prediction models for high-stakes decision making, providing a disciplined solution to detailed assessment of variable importance and transparent development of parsimonious clinical risk scores.

          Author summary

          Risk scores help clinicians quickly assess the risk for a patient by adding up a few scores associated with key predictors. Given the simplicity of such scores, shortlisting the most important predictors is key to predictive performance, but traditional methods are sometimes insufficient when there are a lot of candidates to choose from. As a rising area of research, machine learning provides a growing toolkit for variable selection, but as many machine learning models are complex ‘black boxes’ that differ considerably from risk scores, directly plugging machine learning tools into risk score development can harm both interpretability and predictive performance. We propose a robust and interpretable variable selection mechanism that is tailored to risk scores, and integrate it with an automated framework for convenient risk score development. In a clinical example, we demonstrated how our proposed method can help researchers understand the contribution of 41 candidate variables to outcome prediction through visualizations, filter out 20 variables with non-significant contribution and build a well-performing risk score using only 6 variables, whereas a machine-learning-based method selected 16 variables to achieve a similar performance. We have thus presented a useful tool to support transparent high-stakes decision making.

          Related collections

          Most cited references32

          • Record: found
          • Abstract: found
          • Article: not found

          A new method of classifying prognostic comorbidity in longitudinal studies: Development and validation

          The objective of this study was to develop a prospectively applicable method for classifying comorbid conditions which might alter the risk of mortality for use in longitudinal studies. A weighted index that takes into account the number and the seriousness of comorbid disease was developed in a cohort of 559 medical patients. The 1-yr mortality rates for the different scores were: "0", 12% (181); "1-2", 26% (225); "3-4", 52% (71); and "greater than or equal to 5", 85% (82). The index was tested for its ability to predict risk of death from comorbid disease in the second cohort of 685 patients during a 10-yr follow-up. The percent of patients who died of comorbid disease for the different scores were: "0", 8% (588); "1", 25% (54); "2", 48% (25); "greater than or equal to 3", 59% (18). With each increased level of the comorbidity index, there were stepwise increases in the cumulative mortality attributable to comorbid disease (log rank chi 2 = 165; p less than 0.0001). In this longer follow-up, age was also a predictor of mortality (p less than 0.001). The new index performed similarly to a previous system devised by Kaplan and Feinstein. The method of classifying comorbidity provides a simple, readily applicable and valid method of estimating risk of death from comorbid disease for use in longitudinal studies. Further work in larger populations is still required to refine the approach because the number of patients with any given condition in this study was relatively small.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

            Black box machine learning models are currently being used for high stakes decision-making throughout society, causing problems throughout healthcare, criminal justice, and in other domains. People have hoped that creating methods for explaining these black box models will alleviate some of these problems, but trying to explain black box models, rather than creating models that are interpretable in the first place, is likely to perpetuate bad practices and can potentially cause catastrophic harm to society. There is a way forward - it is to design models that are inherently interpretable. This manuscript clarifies the chasm between explaining black boxes and using inherently interpretable models, outlines several key reasons why explainable black boxes should be avoided in high-stakes decisions, identifies challenges to interpretable machine learning, and provides several example applications where interpretable models could potentially replace black box models in criminal justice, healthcare, and computer vision.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Derivation and validation of an index to predict early death or unplanned readmission after discharge from hospital to the community.

              Readmissions to hospital are common, costly and often preventable. An easy-to-use index to quantify the risk of readmission or death after discharge from hospital would help clinicians identify patients who might benefit from more intensive post-discharge care. We sought to derive and validate an index to predict the risk of death or unplanned readmission within 30 days after discharge from hospital to the community. In a prospective cohort study, 48 patient-level and admission-level variables were collected for 4812 medical and surgical patients who were discharged to the community from 11 hospitals in Ontario. We used a split-sample design to derive and validate an index to predict the risk of death or nonelective readmission within 30 days after discharge. This index was externally validated using administrative data in a random selection of 1,000,000 Ontarians discharged from hospital between 2004 and 2008. Of the 4812 participating patients, 385 (8.0%) died or were readmitted on an unplanned basis within 30 days after discharge. Variables independently associated with this outcome (from which we derived the mnemonic "LACE") included length of stay ("L"); acuity of the admission ("A"); comorbidity of the patient (measured with the Charlson comorbidity index score) ("C"); and emergency department use (measured as the number of visits in the six months before admission) ("E"). Scores using the LACE index ranged from 0 (2.0% expected risk of death or urgent readmission within 30 days) to 19 (43.7% expected risk). The LACE index was discriminative (C statistic 0.684) and very accurate (Hosmer-Lemeshow goodness-of-fit statistic 14.1, p=0.59) at predicting outcome risk. The LACE index can be used to quantify risk of death or unplanned readmission within 30 days after discharge from hospital. This index can be used with both primary and administrative data. Further research is required to determine whether such quantification changes patient care or outcomes.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Project administrationRole: SoftwareRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: Formal analysisRole: MethodologyRole: SoftwareRole: ValidationRole: Writing – review & editing
                Role: InvestigationRole: MethodologyRole: ValidationRole: Writing – review & editing
                Role: InvestigationRole: MethodologyRole: ValidationRole: Writing – review & editing
                Role: InvestigationRole: MethodologyRole: ValidationRole: Writing – review & editing
                Role: InvestigationRole: MethodologyRole: ValidationRole: Writing – review & editing
                Role: ConceptualizationRole: InvestigationRole: MethodologyRole: Project administrationRole: SupervisionRole: ValidationRole: Writing – review & editing
                Role: Editor
                Journal
                PLOS Digit Health
                PLOS Digit Health
                plos
                PLOS Digital Health
                Public Library of Science (San Francisco, CA USA )
                2767-3170
                13 June 2022
                June 2022
                : 1
                : 6
                : e0000062
                Affiliations
                [1 ] Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
                [2 ] Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore
                [3 ] Health Services Research Centre, Singapore Health Services, Singapore, Singapore
                [4 ] Department of Emergency Medicine, Singapore General Hospital, Singapore, Singapore
                [5 ] Department of Statistics and Data Science, National University of Singapore, Singapore, Singapore
                [6 ] Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States of America
                [7 ] Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
                [8 ] SingHealth AI Health Program, Singapore Health Services, Singapore, Singapore
                [9 ] Institute of Data Science, National University of Singapore, Singapore, Singapore
                National Yang Ming Chiao Tung University, TAIWAN
                Author notes

                The authors have declared that no competing interests exist.

                Author information
                https://orcid.org/0000-0002-6758-4472
                https://orcid.org/0000-0002-1660-105X
                https://orcid.org/0000-0001-7874-7612
                https://orcid.org/0000-0002-0215-667X
                https://orcid.org/0000-0002-7366-0478
                https://orcid.org/0000-0003-3610-4883
                Article
                PDIG-D-22-00042
                10.1371/journal.pdig.0000062
                9931273
                36812536
                b9b3d9ca-8756-47b1-b778-c2c007f755cd
                © 2022 Ning et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 8 February 2022
                : 10 May 2022
                Page count
                Figures: 4, Tables: 4, Pages: 20
                Funding
                Funded by: The Estate of Tan Sri Khoo Teck Puat
                Award ID: Duke-NUS-KPFA/2021/0051
                Award Recipient :
                Yilin Ning is supported by the Khoo Postdoctoral Fellowship Award (Project No. Duke-NUS-KPFA/2021/0051) from the Estate of Tan Sri Khoo Teck Puat. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Medicine and Health Sciences
                Critical Care and Emergency Medicine
                Medicine and Health Sciences
                Epidemiology
                Medical Risk Factors
                Medicine and Health Sciences
                Health Care
                Patients
                Inpatients
                Medicine and Health Sciences
                Oncology
                Cancers and Neoplasms
                Metastatic Tumors
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Medicine and Health Sciences
                Nephrology
                Renal Cancer
                Medicine and Health Sciences
                Oncology
                Metastasis
                Medicine and Health Sciences
                Oncology
                Basic Cancer Research
                Metastasis
                Research and Analysis Methods
                Mathematical and Statistical Techniques
                Statistical Methods
                Forecasting
                Physical Sciences
                Mathematics
                Statistics
                Statistical Methods
                Forecasting
                Custom metadata
                Due to ethical reasons and institutional guidelines, the data presented in the study cannot be shared publicly. Data are available to researchers with some access restrictions applied upon request. Interested researchers may contact SingHealth Health Services Research Centre (Email: hsr@ 123456singhealth.com.sg ) for more details.

                Comments

                Comment on this article