Comparing different machine learning techniques for predicting COVID-19 severity

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Coronavirus disease 2019 (COVID-19) is still ongoing spreading globally, machine learning techniques were used in disease diagnosis and to predict treatment outcomes, which showed favorable performance. The present study aims to predict COVID-19 severity at admission by different machine learning techniques including random forest (RF), support vector machine (SVM), and logistic regression (LR). Feature importance to COVID-19 severity were further identified.

Methods

A retrospective design was adopted in the JinYinTan Hospital from January 26 to March 28, 2020, eighty-six demographic, clinical, and laboratory features were selected with LassoCV method, Spearman’s rank correlation, experts’ opinions, and literature evaluation. RF, SVM, and LR were performed to predict severe COVID-19, the performance of the models was compared by the area under curve (AUC). Additionally, feature importance to COVID-19 severity were analyzed by the best performance model.

Results

A total of 287 patients were enrolled with 36.6% severe cases and 63.4% non-severe cases. The median age was 60.0 years (interquartile range: 49.0–68.0 years). Three models were established using 23 features including 1 clinical, 1 chest computed tomography (CT) and 21 laboratory features. Among three models, RF yielded better overall performance with the highest AUC of 0.970 than SVM of 0.948 and LR of 0.928, RF also achieved a favorable sensitivity of 96.7%, specificity of 69.5%, and accuracy of 84.5%. SVM had sensitivity of 93.9%, specificity of 79.0%, and accuracy of 88.5%. LR also achieved a favorable sensitivity of 92.3%, specificity of 72.3%, and accuracy of 85.2%. Additionally, chest-CT had highest importance to illness severity, and the following features were neutrophil to lymphocyte ratio, lactate dehydrogenase, and D-dimer, respectively.

Conclusions

Our results indicated that RF could be a useful predictive tool to identify patients with severe COVID-19, which may facilitate effective care and further optimize resources.

Graphical Abstract

Supplementary Information

The online version contains supplementary material available at 10.1186/s40249-022-00946-4.

Related collections

Most cited references 41

Record: found
Abstract: found
Article: not found

Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study

Fei Zhou, Ting Yu, Ronghui Du … (2020)

Summary Background Since December, 2019, Wuhan, China, has experienced an outbreak of coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Epidemiological and clinical characteristics of patients with COVID-19 have been reported but risk factors for mortality and a detailed clinical course of illness, including viral shedding, have not been well described. Methods In this retrospective, multicentre cohort study, we included all adult inpatients (≥18 years old) with laboratory-confirmed COVID-19 from Jinyintan Hospital and Wuhan Pulmonary Hospital (Wuhan, China) who had been discharged or had died by Jan 31, 2020. Demographic, clinical, treatment, and laboratory data, including serial samples for viral RNA detection, were extracted from electronic medical records and compared between survivors and non-survivors. We used univariable and multivariable logistic regression methods to explore the risk factors associated with in-hospital death. Findings 191 patients (135 from Jinyintan Hospital and 56 from Wuhan Pulmonary Hospital) were included in this study, of whom 137 were discharged and 54 died in hospital. 91 (48%) patients had a comorbidity, with hypertension being the most common (58 [30%] patients), followed by diabetes (36 [19%] patients) and coronary heart disease (15 [8%] patients). Multivariable regression showed increasing odds of in-hospital death associated with older age (odds ratio 1·10, 95% CI 1·03–1·17, per year increase; p=0·0043), higher Sequential Organ Failure Assessment (SOFA) score (5·65, 2·61–12·23; p<0·0001), and d-dimer greater than 1 μg/mL (18·42, 2·64–128·55; p=0·0033) on admission. Median duration of viral shedding was 20·0 days (IQR 17·0–24·0) in survivors, but SARS-CoV-2 was detectable until death in non-survivors. The longest observed duration of viral shedding in survivors was 37 days. Interpretation The potential risk factors of older age, high SOFA score, and d-dimer greater than 1 μg/mL could help clinicians to identify patients with poor prognosis at an early stage. Prolonged viral shedding provides the rationale for a strategy of isolation of infected patients and optimal antiviral interventions in the future. Funding Chinese Academy of Medical Sciences Innovation Fund for Medical Sciences; National Science Grant for Distinguished Young Scholars; National Key Research and Development Program of China; The Beijing Science and Technology Project; and Major Projects of National Science and Technology on New Drug Creation and Development.

0 comments Cited 13584 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Characteristics of SARS-CoV-2 and COVID-19

Ben Hu, Hua Guo, Peng Zhou … (2020)

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a highly transmissible and pathogenic coronavirus that emerged in late 2019 and has caused a pandemic of acute respiratory disease, named ‘coronavirus disease 2019’ (COVID-19), which threatens human health and public safety. In this Review, we describe the basic virology of SARS-CoV-2, including genomic characteristics and receptor use, highlighting its key difference from previously known coronaviruses. We summarize current knowledge of clinical, epidemiological and pathological features of COVID-19, as well as recent progress in animal models and antiviral treatment approaches for SARS-CoV-2 infection. We also discuss the potential wildlife hosts and zoonotic origin of this emerging virus in detail.

0 comments Cited 1907 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal

Laure Wynants, Ben Van Calster, Marc M. J. Bonten … (2020)

Abstract Objective To review and critically appraise published and preprint reports of prediction models for diagnosing coronavirus disease 2019 (covid-19) in patients with suspected infection, for prognosis of patients with covid-19, and for detecting people in the general population at risk of being admitted to hospital for covid-19 pneumonia. Design Rapid systematic review and critical appraisal. Data sources PubMed and Embase through Ovid, Arxiv, medRxiv, and bioRxiv up to 24 March 2020. Study selection Studies that developed or validated a multivariable covid-19 related prediction model. Data extraction At least two authors independently extracted data using the CHARMS (critical appraisal and data extraction for systematic reviews of prediction modelling studies) checklist; risk of bias was assessed using PROBAST (prediction model risk of bias assessment tool). Results 2696 titles were screened, and 27 studies describing 31 prediction models were included. Three models were identified for predicting hospital admission from pneumonia and other events (as proxy outcomes for covid-19 pneumonia) in the general population; 18 diagnostic models for detecting covid-19 infection (13 were machine learning based on computed tomography scans); and 10 prognostic models for predicting mortality risk, progression to severe disease, or length of hospital stay. Only one study used patient data from outside of China. The most reported predictors of presence of covid-19 in patients with suspected disease included age, body temperature, and signs and symptoms. The most reported predictors of severe prognosis in patients with covid-19 included age, sex, features derived from computed tomography scans, C reactive protein, lactic dehydrogenase, and lymphocyte count. C index estimates ranged from 0.73 to 0.81 in prediction models for the general population (reported for all three models), from 0.81 to more than 0.99 in diagnostic models (reported for 13 of the 18 models), and from 0.85 to 0.98 in prognostic models (reported for six of the 10 models). All studies were rated at high risk of bias, mostly because of non-representative selection of control patients, exclusion of patients who had not experienced the event of interest by the end of the study, and high risk of model overfitting. Reporting quality varied substantially between studies. Most reports did not include a description of the study population or intended use of the models, and calibration of predictions was rarely assessed. Conclusion Prediction models for covid-19 are quickly entering the academic literature to support medical decision making at a time when they are urgently needed. This review indicates that proposed models are poorly reported, at high risk of bias, and their reported performance is probably optimistic. Immediate sharing of well documented individual participant data from covid-19 studies is needed for collaborative efforts to develop more rigorous prediction models and validate existing ones. The predictors identified in included studies could be considered as candidate predictors for new models. Methodological guidance should be followed because unreliable predictions could cause more harm than benefit in guiding clinical decisions. Finally, studies should adhere to the TRIPOD (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) reporting guideline. Systematic review registration Protocol https://osf.io/ehc47/, registration https://osf.io/wy245.

0 comments Cited 1260 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Cheng Lu: lv_cheng0816@163.com

Luqi Huang: Huangluqi01@126.com

Journal

Journal ID (nlm-ta): Infect Dis Poverty

Journal ID (iso-abbrev): Infect Dis Poverty

Title: Infectious Diseases of Poverty

Publisher: BioMed Central (London )

ISSN (Print): 2095-5162

ISSN (Electronic): 2049-9957

Publication date (Electronic): 17 February 2022

Publication date PMC-release: 17 February 2022

Publication date Collection: 2022

Volume: 11

Electronic Location Identifier: 19

Affiliations

[1 ]GRID grid.410318.f, ISNI 0000 0004 0632 3409, Institute of Basic Research in Clinical Medicine, , China Academy of Chinese Medical Sciences, ; No. 16, Nanxiao Street, Dongzhimen, Dongcheng District, Beijing, 100700 Beijing China

[2 ]GRID grid.507952.c, ISNI 0000 0004 1764 577X, Department of Infectious Diseases, , JinYinTan Hospital, ; Wuhan, 430040 China

[3 ]GRID grid.198530.6, ISNI 0000 0000 8803 2373, Information Center, , Chinese Center for Disease Control and Prevention, ; Beijing, 102206 China

[4 ]GRID grid.410318.f, ISNI 0000 0004 0632 3409, National Resource Center for Chinese Materia Medica, , China Academy of Chinese Medical Sciences, ; No. 16, Nanxiao Street, Dongzhimen, Dongcheng District, Beijing, 100700 Beijing China

Article

Publisher ID: 946

DOI: 10.1186/s40249-022-00946-4

PMC ID: 8851750

PubMed ID: 35177120

SO-VID: a0115170-403d-4555-b2c4-0d61d01e18ad

License:

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

History

Date received : 31 August 2021

Date accepted : 9 February 2022

Custom metadata

Keywords: covid-19,severity,machine learning,support vector machine,random forest,logistic regression

Data availability:

Keywords: covid-19, severity, machine learning, support vector machine, random forest, logistic regression

Comparing different machine learning techniques for predicting COVID-19 severity

Read this article at

Abstract

Background

Methods

Results

Conclusions

Graphical Abstract

Supplementary Information

Related collections

Novel Coronavirus Disease COVID-19

Most cited references 41

Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study

Characteristics of SARS-CoV-2 and COVID-19

Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal

Author and article information

Contributors

Journal

Affiliations

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 94

Cited by 12

Most referenced authors 2,064