A novel interpretable machine learning system to generate clinical risk scores: An application for predicting early mortality or unplanned readmission in a retrospective cohort study

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Risk scores are widely used for clinical decision making and commonly generated from logistic regression models. Machine-learning-based methods may work well for identifying important predictors to create parsimonious scores, but such ‘black box’ variable selection limits interpretability, and variable importance evaluated from a single model can be biased. We propose a robust and interpretable variable selection approach using the recently developed Shapley variable importance cloud (ShapleyVIC) that accounts for variability in variable importance across models. Our approach evaluates and visualizes overall variable contributions for in-depth inference and transparent variable selection, and filters out non-significant contributors to simplify model building steps. We derive an ensemble variable ranking from variable contributions across models, which is easily integrated with an automated and modularized risk score generator, AutoScore, for convenient implementation. In a study of early death or unplanned readmission after hospital discharge, ShapleyVIC selected 6 variables from 41 candidates to create a well-performing risk score, which had similar performance to a 16-variable model from machine-learning-based ranking. Our work contributes to the recent emphasis on interpretability of prediction models for high-stakes decision making, providing a disciplined solution to detailed assessment of variable importance and transparent development of parsimonious clinical risk scores.

Author summary

Risk scores help clinicians quickly assess the risk for a patient by adding up a few scores associated with key predictors. Given the simplicity of such scores, shortlisting the most important predictors is key to predictive performance, but traditional methods are sometimes insufficient when there are a lot of candidates to choose from. As a rising area of research, machine learning provides a growing toolkit for variable selection, but as many machine learning models are complex ‘black boxes’ that differ considerably from risk scores, directly plugging machine learning tools into risk score development can harm both interpretability and predictive performance. We propose a robust and interpretable variable selection mechanism that is tailored to risk scores, and integrate it with an automated framework for convenient risk score development. In a clinical example, we demonstrated how our proposed method can help researchers understand the contribution of 41 candidate variables to outcome prediction through visualizations, filter out 20 variables with non-significant contribution and build a well-performing risk score using only 6 variables, whereas a machine-learning-based method selected 16 variables to achieve a similar performance. We have thus presented a useful tool to support transparent high-stakes decision making.

Related collections

Most cited references 32

Record: found
Abstract: found
Article: not found

A new method of classifying prognostic comorbidity in longitudinal studies: Development and validation

Mary Charlson, Peter Pompei, Kathy Ales … (1987)

The objective of this study was to develop a prospectively applicable method for classifying comorbid conditions which might alter the risk of mortality for use in longitudinal studies. A weighted index that takes into account the number and the seriousness of comorbid disease was developed in a cohort of 559 medical patients. The 1-yr mortality rates for the different scores were: "0", 12% (181); "1-2", 26% (225); "3-4", 52% (71); and "greater than or equal to 5", 85% (82). The index was tested for its ability to predict risk of death from comorbid disease in the second cohort of 685 patients during a 10-yr follow-up. The percent of patients who died of comorbid disease for the different scores were: "0", 8% (588); "1", 25% (54); "2", 48% (25); "greater than or equal to 3", 59% (18). With each increased level of the comorbidity index, there were stepwise increases in the cumulative mortality attributable to comorbid disease (log rank chi 2 = 165; p less than 0.0001). In this longer follow-up, age was also a predictor of mortality (p less than 0.001). The new index performed similarly to a previous system devised by Kaplan and Feinstein. The method of classifying comorbidity provides a simple, readily applicable and valid method of estimating risk of death from comorbid disease for use in longitudinal studies. Further work in larger populations is still required to refine the approach because the number of patients with any given condition in this study was relatively small.

0 comments Cited 6017 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

Cynthia Rudin (2019)

Black box machine learning models are currently being used for high stakes decision-making throughout society, causing problems throughout healthcare, criminal justice, and in other domains. People have hoped that creating methods for explaining these black box models will alleviate some of these problems, but trying to explain black box models, rather than creating models that are interpretable in the first place, is likely to perpetuate bad practices and can potentially cause catastrophic harm to society. There is a way forward - it is to design models that are inherently interpretable. This manuscript clarifies the chasm between explaining black boxes and using inherently interpretable models, outlines several key reasons why explainable black boxes should be avoided in high-stakes decisions, identifies challenges to interpretable machine learning, and provides several example applications where interpretable models could potentially replace black box models in criminal justice, healthcare, and computer vision.

0 comments Cited 975 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Derivation and validation of an index to predict early death or unplanned readmission after discharge from hospital to the community.

Carl van Walraven, Irfan Dhalla, Chaim Bell … (2010)

Readmissions to hospital are common, costly and often preventable. An easy-to-use index to quantify the risk of readmission or death after discharge from hospital would help clinicians identify patients who might benefit from more intensive post-discharge care. We sought to derive and validate an index to predict the risk of death or unplanned readmission within 30 days after discharge from hospital to the community. In a prospective cohort study, 48 patient-level and admission-level variables were collected for 4812 medical and surgical patients who were discharged to the community from 11 hospitals in Ontario. We used a split-sample design to derive and validate an index to predict the risk of death or nonelective readmission within 30 days after discharge. This index was externally validated using administrative data in a random selection of 1,000,000 Ontarians discharged from hospital between 2004 and 2008. Of the 4812 participating patients, 385 (8.0%) died or were readmitted on an unplanned basis within 30 days after discharge. Variables independently associated with this outcome (from which we derived the mnemonic "LACE") included length of stay ("L"); acuity of the admission ("A"); comorbidity of the patient (measured with the Charlson comorbidity index score) ("C"); and emergency department use (measured as the number of visits in the six months before admission) ("E"). Scores using the LACE index ranged from 0 (2.0% expected risk of death or urgent readmission within 30 days) to 19 (43.7% expected risk). The LACE index was discriminative (C statistic 0.684) and very accurate (Hosmer-Lemeshow goodness-of-fit statistic 14.1, p=0.59) at predicting outcome risk. The LACE index can be used to quantify risk of death or unplanned readmission within 30 days after discharge from hospital. This index can be used with both primary and administrative data. Further research is required to determine whether such quantification changes patient care or outcomes.

0 comments Cited 257 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Yilin Ning:

ORCID: https://orcid.org/0000-0002-6758-4472

Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Project administrationRole: SoftwareRole: ValidationRole: Writing – original draftRole: Writing – review & editing

Siqi Li:

ORCID: https://orcid.org/0000-0002-1660-105X

Role: Formal analysisRole: MethodologyRole: SoftwareRole: ValidationRole: Writing – review & editing

Marcus Eng Hock Ong:

ORCID: https://orcid.org/0000-0001-7874-7612

Role: InvestigationRole: MethodologyRole: ValidationRole: Writing – review & editing

Feng Xie:

ORCID: https://orcid.org/0000-0002-0215-667X

Role: InvestigationRole: MethodologyRole: ValidationRole: Writing – review & editing

Bibhas Chakraborty:

ORCID: https://orcid.org/0000-0002-7366-0478

Role: InvestigationRole: MethodologyRole: ValidationRole: Writing – review & editing

Daniel Shu Wei Ting: Role: InvestigationRole: MethodologyRole: ValidationRole: Writing – review & editing

Nan Liu:

ORCID: https://orcid.org/0000-0003-3610-4883

Role: ConceptualizationRole: InvestigationRole: MethodologyRole: Project administrationRole: SupervisionRole: ValidationRole: Writing – review & editing

Henry Horng-Shing Lu: Role: Editor

Journal

Journal ID (nlm-ta): PLOS Digit Health

Journal ID (iso-abbrev): PLOS Digit Health

Journal ID (publisher-id): plos

Title: PLOS Digital Health

Publisher: Public Library of Science (San Francisco, CA USA )

ISSN (Electronic): 2767-3170

Publication date (Electronic): 13 June 2022

Publication date Collection: June 2022

Volume: 1

Issue: 6

Electronic Location Identifier: e0000062

Affiliations

[1 ] Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore

[2 ] Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore

[3 ] Health Services Research Centre, Singapore Health Services, Singapore, Singapore

[4 ] Department of Emergency Medicine, Singapore General Hospital, Singapore, Singapore

[5 ] Department of Statistics and Data Science, National University of Singapore, Singapore, Singapore

[6 ] Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States of America

[7 ] Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore

[8 ] SingHealth AI Health Program, Singapore Health Services, Singapore, Singapore

[9 ] Institute of Data Science, National University of Singapore, Singapore, Singapore

National Yang Ming Chiao Tung University, TAIWAN

Author notes

The authors have declared that no competing interests exist.

* E-mail: liu.nan@ 123456duke-nus.edu.sg

Author information

Yilin Ning https://orcid.org/0000-0002-6758-4472

Siqi Li https://orcid.org/0000-0002-1660-105X

Marcus Eng Hock Ong https://orcid.org/0000-0001-7874-7612

Feng Xie https://orcid.org/0000-0002-0215-667X

Bibhas Chakraborty https://orcid.org/0000-0002-7366-0478

Nan Liu https://orcid.org/0000-0003-3610-4883

Article

Publisher ID: PDIG-D-22-00042

DOI: 10.1371/journal.pdig.0000062

PMC ID: 9931273

PubMed ID: 36812536

SO-VID: b9b3d9ca-8756-47b1-b778-c2c007f755cd

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 8 February 2022

Date accepted : 10 May 2022

Page count

Figures: 4, Tables: 4, Pages: 20

Funding

Funded by: The Estate of Tan Sri Khoo Teck Puat

Award ID: Duke-NUS-KPFA/2021/0051

Award Recipient :

ORCID: https://orcid.org/0000-0002-6758-4472

Yilin Ning

Yilin Ning is supported by the Khoo Postdoctoral Fellowship Award (Project No. Duke-NUS-KPFA/2021/0051) from the Estate of Tan Sri Khoo Teck Puat. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Custom metadata

Data Availability Due to ethical reasons and institutional guidelines, the data presented in the study cannot be shared publicly. Data are available to researchers with some access restrictions applied upon request. Interested researchers may contact SingHealth Health Services Research Centre (Email: hsr@ 123456singhealth.com.sg ) for more details.

A novel interpretable machine learning system to generate clinical risk scores: An application for predicting early mortality or unplanned readmission in a retrospective cohort study

Read this article at

Abstract

Author summary

Related collections

Annual Reviews AI, Machine Learning, and Society

Most cited references 32

A new method of classifying prognostic comorbidity in longitudinal studies: Development and validation

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

Derivation and validation of an index to predict early death or unplanned readmission after discharge from hospital to the community.

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 96

Cited by 4

Most referenced authors 335