9
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      XGBoost-Based Framework for Smoking-Induced Noncommunicable Disease Prediction

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Smoking-induced noncommunicable diseases (SiNCDs) have become a significant threat to public health and cause of death globally. In the last decade, numerous studies have been proposed using artificial intelligence techniques to predict the risk of developing SiNCDs. However, determining the most significant features and developing interpretable models are rather challenging in such systems. In this study, we propose an efficient extreme gradient boosting (XGBoost) based framework incorporated with the hybrid feature selection (HFS) method for SiNCDs prediction among the general population in South Korea and the United States. Initially, HFS is performed in three stages: (I) significant features are selected by t-test and chi-square test; (II) multicollinearity analysis serves to obtain dissimilar features; (III) final selection of best representative features is done based on least absolute shrinkage and selection operator (LASSO). Then, selected features are fed into the XGBoost predictive model. The experimental results show that our proposed model outperforms several existing baseline models. In addition, the proposed model also provides important features in order to enhance the interpretability of the SiNCDs prediction model. Consequently, the XGBoost based framework is expected to contribute for early diagnosis and prevention of the SiNCDs in public health concerns.

          Related collections

          Most cited references48

          • Record: found
          • Abstract: not found
          • Article: not found

          Matplotlib: A 2D Graphics Environment

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Global, regional, and national comparative risk assessment of 79 behavioural, environmental and occupational, and metabolic risks or clusters of risks, 1990–2015: a systematic analysis for the Global Burden of Disease Study 2015

            Summary Background The Global Burden of Diseases, Injuries, and Risk Factors Study 2015 provides an up-to-date synthesis of the evidence for risk factor exposure and the attributable burden of disease. By providing national and subnational assessments spanning the past 25 years, this study can inform debates on the importance of addressing risks in context. Methods We used the comparative risk assessment framework developed for previous iterations of the Global Burden of Disease Study to estimate attributable deaths, disability-adjusted life-years (DALYs), and trends in exposure by age group, sex, year, and geography for 79 behavioural, environmental and occupational, and metabolic risks or clusters of risks from 1990 to 2015. This study included 388 risk-outcome pairs that met World Cancer Research Fund-defined criteria for convincing or probable evidence. We extracted relative risk and exposure estimates from randomised controlled trials, cohorts, pooled cohorts, household surveys, census data, satellite data, and other sources. We used statistical models to pool data, adjust for bias, and incorporate covariates. We developed a metric that allows comparisons of exposure across risk factors—the summary exposure value. Using the counterfactual scenario of theoretical minimum risk level, we estimated the portion of deaths and DALYs that could be attributed to a given risk. We decomposed trends in attributable burden into contributions from population growth, population age structure, risk exposure, and risk-deleted cause-specific DALY rates. We characterised risk exposure in relation to a Socio-demographic Index (SDI). Findings Between 1990 and 2015, global exposure to unsafe sanitation, household air pollution, childhood underweight, childhood stunting, and smoking each decreased by more than 25%. Global exposure for several occupational risks, high body-mass index (BMI), and drug use increased by more than 25% over the same period. All risks jointly evaluated in 2015 accounted for 57·8% (95% CI 56·6–58·8) of global deaths and 41·2% (39·8–42·8) of DALYs. In 2015, the ten largest contributors to global DALYs among Level 3 risks were high systolic blood pressure (211·8 million [192·7 million to 231·1 million] global DALYs), smoking (148·6 million [134·2 million to 163·1 million]), high fasting plasma glucose (143·1 million [125·1 million to 163·5 million]), high BMI (120·1 million [83·8 million to 158·4 million]), childhood undernutrition (113·3 million [103·9 million to 123·4 million]), ambient particulate matter (103·1 million [90·8 million to 115·1 million]), high total cholesterol (88·7 million [74·6 million to 105·7 million]), household air pollution (85·6 million [66·7 million to 106·1 million]), alcohol use (85·0 million [77·2 million to 93·0 million]), and diets high in sodium (83·0 million [49·3 million to 127·5 million]). From 1990 to 2015, attributable DALYs declined for micronutrient deficiencies, childhood undernutrition, unsafe sanitation and water, and household air pollution; reductions in risk-deleted DALY rates rather than reductions in exposure drove these declines. Rising exposure contributed to notable increases in attributable DALYs from high BMI, high fasting plasma glucose, occupational carcinogens, and drug use. Environmental risks and childhood undernutrition declined steadily with SDI; low physical activity, high BMI, and high fasting plasma glucose increased with SDI. In 119 countries, metabolic risks, such as high BMI and fasting plasma glucose, contributed the most attributable DALYs in 2015. Regionally, smoking still ranked among the leading five risk factors for attributable DALYs in 109 countries; childhood underweight and unsafe sex remained primary drivers of early death and disability in much of sub-Saharan Africa. Interpretation Declines in some key environmental risks have contributed to declines in critical infectious diseases. Some risks appear to be invariant to SDI. Increasing risks, including high BMI, high fasting plasma glucose, drug use, and some occupational exposures, contribute to rising burden from some conditions, but also provide opportunities for intervention. Some highly preventable risks, such as smoking, remain major causes of attributable DALYs, even as exposure is declining. Public policy makers need to pay attention to the risks that are increasingly major contributors to global burden. Funding Bill & Melinda Gates Foundation.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              COVID-19 and smoking: A systematic review of the evidence

              COVID-19 is a coronavirus outbreak that initially appeared in Wuhan, Hubei Province, China, in December 2019, but it has already evolved into a pandemic spreading rapidly worldwide 1,2 . As of 18 March 2020, a total number of 194909 cases of COVID-19 have been reported, including 7876 deaths, the majority of which have been reported in China (3242) and Italy (2505) 3 . However, as the pandemic is still unfortunately under progression, there are limited data with regard to the clinical characteristics of the patients as well as to their prognostic factors 4 . Smoking, to date, has been assumed to be possibly associated with adverse disease prognosis, as extensive evidence has highlighted the negative impact of tobacco use on lung health and its causal association with a plethora of respiratory diseases 5 . Smoking is also detrimental to the immune system and its responsiveness to infections, making smokers more vulnerable to infectious diseases 6 . Previous studies have shown that smokers are twice more likely than non-smokers to contract influenza and have more severe symptoms, while smokers were also noted to have higher mortality in the previous MERS-CoV outbreak 7,8 . Given the gap in the evidence, we conducted a systematic review of studies on COVID-19 that included information on patients’ smoking status to evaluate the association between smoking and COVID-19 outcomes including the severity of the disease, the need for mechanical ventilation, the need for intensive care unit (ICU) hospitalization and death. The literature search was conducted on 17 March 2020, using two databases (PubMed, ScienceDirect), with the search terms: [‘smoking’ OR ‘tobacco’ OR ‘risk factors’ OR ‘smoker*’] AND [‘COVID-19’ OR ‘COVID 19’ OR ‘novel coronavirus’ OR ‘sars cov-2’ OR ‘sars cov 2’] and included studies published in 2019 and 2020. Further inclusion criteria were that the studies were in English and referred to humans. We also searched the reference lists of the studies included. A total of 71 studies were retrieved through the search, of which 66 were excluded after full-text screening, leaving five studies that were included. All of the studies were conducted in China, four in Wuhan and one across provinces in mainland China. The populations in all studies were patients with COVID-19, and the sample size ranged from 41 to 1099 patients. With regard to the study design, retrospective and prospective methods were used, and the timeframe of all five studies covered the first two months of the COVID-19 pandemic (December 2019, January 2020). Specifically, Zhou et al. 9 studied the epidemiological characteristics of 191 individuals infected with COVID-19, without, however, reporting in more detail the mortality risk factors and the clinical outcomes of the disease. Among the 191 patients, there were 54 deaths, while 137 survived. Among those that died, 9% were current smokers compared to 4% among those that survived, with no statistically significant difference between the smoking rates of survivors and non-survivors (p=0.21) with regard to mortality from COVID-19. Similarly, Zhang et al. 10 presented clinical characteristics of 140 patients with COVID-19. The results showed that among severe patients (n=58), 3.4% were current smokers and 6.9% were former smokers, in contrast to non-severe patients (n=82) among which 0% were current smokers and 3.7% were former smokers , leading to an OR of 2.23; (95% CI: 0.65–7.63; p=0.2). Huang et al. 11 studied the epidemiological characteristics of COVID-19 among 41 patients. In this study, none of those who needed to be admitted to an ICU (n=13) was a current smoker. In contrast, three patients from the non-ICU group were current smokers, with no statistically significant difference between the two groups of patients (p=0.31), albeit the small sample size of the study. The largest study population of 1099 patients with COVID-19 was provided by Guan et al. 12 from multiple regions of mainland China. Descriptive results on the smoking status of patients were provided for the 1099 patients, of which 173 had severe symptoms, and 926 had non-severe symptoms. Among the patients with severe symptoms, 16.9% were current smokers and 5.2% were former smokers, in contrast to patients with non-severe symptoms where 11.8% were current smokers and 1.3% were former smokers. Additionally, in the group of patients that either needed mechanical ventilation, admission to an ICU or died, 25.5% were current smokers and 7.6% were former smokers. In contrast, in the group of patients that did not have these adverse outcomes, only 11.8% were current smokers and 1.6% were former smokers. No statistical analysis for evaluating the association between the severity of the disease outcome and smoking status was conducted in that study. Finally, Liu et al. 13 found among their population of 78 patients with COVID-19 that the adverse outcome group had a significantly higher proportion of patients with a history of smoking (27.3%) than the group that showed improvement or stabilization (3.0%), with this difference statistically significant at the p=0.018 level. In their multivariate logistic regression analysis, the history of smoking was a risk factor of disease progression (OR=14.28; 95% CI: 1.58–25.00; p= 0.018). We identified five studies that reported data on the smoking status of patients infected with COVID-19. Notably, in the largest study that assessed severity, there were higher percentages of current and former smokers among patients that needed ICU support, mechanical ventilation or who had died, and a higher percentage of smokers among the severe cases 12 . However, from their published data we can calculate that the smokers were 1.4 times more likely (RR=1.4, 95% CI: 0.98–2.00) to have severe symptoms of COVID-19 and approximately 2.4 times more likely to be admitted to an ICU, need mechanical ventilation or die compared to non-smokers (RR=2.4, 95% CI: 1.43–4.04). In conclusion, although further research is warranted as the weight of the evidence increases, with the limited available data, and although the above results are unadjusted for other factors that may impact disease progression, smoking is most likely associated with the negative progression and adverse outcomes of COVID-19. Table 1 Overview of the five studies included in the systematic review Title Setting Population Study design and time horizon Outcomes Smoking rates by outcome Zhou et al. 9 (2020)Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study Jinyintan Hospital and Wuhan Pulmonary Hospital, Wuhan, China All adult inpatients (aged ≥18 years) with laboratory confirmed COVID-19 (191 patients) Retrospective multicenter cohort study until 31 January 2020 Mortality 54 patients died during hospitalisation and 137 were discharged Current smokers: n=11 (6%)Non-survivors: n=5 (9%)Survivors: n=6 (4%)(p=0.20) Current smoker vs non-smokerUnivariate logistic regression(OR=2.23; 95% CI: 0.65–7.63; p=0.2) Zhang et al. 10 (2020)Clinical characteristics of 140 patients infected with SARS-CoV-2 in Wuhan, China No. 7 Hospital of Wuhan, China All hospitalised patients clinically diagnosed as ‘viral pneumonia’ based on their clinical symptoms with typical changes in chest radiology (140 patients) Retrospective 16 January to 3 February 2020 Disease Severity Non-severepatients: n=82Severe patients:n=58 Disease Severity Former smokers: n=7Severe: n=4 (6.9%)Non-severe: n=3 (3.7%) (p= 0.448) Current smokers: n=2Severe: n=2 (3.4%)Non-severe: n=0 (0%) Guan et al. 12 (2019)Clinical Characteristics of Coronavirus Disease 2019 in China 552 hospitals in 30 provinces, autonomous regions, and municipalities in mainland China Patients with laboratory-confirmed COVID-19 (1099 patients) Retrospective until 29 January 2020 Severity and admission to an ICU, the use of mechanical ventilation, or death Non-severe patients: n=926 Severe patients: n=173 By severity Severe cases16.9% current smokers5.2% former smokers77.9% never smokers Non-severe cases11.8% current smokers1.3% former smokers86.9% never smokers By mechanical ventilation, ICU or death Needed mechanical ventilation, ICU or died25.8% current smokers7.6% former smokers66.7% non-smokers No mechanical ventilation, ICU or death11.8% current smokers1.6% former smokers86.7% never smokers Huang et al. 11 (2020)Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China A hospital in Wuhan, China Laboratory-confirmed 2019-nCoV patients in Wuhan (41 patients) Prospective from 16 December 2019 to 2 January 2020 Mortality As of 22 January 2020, 28 (68%) of 41 patients were discharged and 6 (15%) patients died Current smokers: n=3ICU care: n=0Non-ICU care: n=3 (11%) Current smokers in ICU care vs non-ICU care patients (p=0.31) Liu et al. 13 (2019)Analysis of factors associated with disease outcomes in hospitalised patients with 2019 novel coronavirus disease Three tertiary hospitals in Wuhan, China Patients tested positive for COVID-19 (78 patients) Retrospective multicentre cohort study from 30 December 2019 to 15 January 2020 Disease progression 11 patients (14.1%) in the progression group 67 patients (85.9%) in the improvement/stabilization group 2 deaths Negative progression group: 27.3% smokersIn the improvement group: 3% smokers The negative progression group had a significantly higher proportion of patients with a history of smoking than the improvement/stabilisation group (27.3% vs 3.0%)Multivariate logistic regression analysis indicated that the history of smoking was a risk factor of disease progression (OR=14.28; 95% CI: 1.58–25.00; p= 0.018)
                Bookmark

                Author and article information

                Journal
                Int J Environ Res Public Health
                Int J Environ Res Public Health
                ijerph
                International Journal of Environmental Research and Public Health
                MDPI
                1661-7827
                1660-4601
                07 September 2020
                September 2020
                : 17
                : 18
                : 6513
                Affiliations
                [1 ]Database and Bioinformatics Laboratory, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju 28644, Korea; khishigsurend@ 123456gmail.com
                [2 ]Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh 700000, Vietnam; phamvanhuy@ 123456tdtu.edu.vn
                [3 ]Department of Electrical Engineering, Faculty of Engineering, Chiang Mai University, Chiang Mai 50200, Thailand; nipon.t@ 123456cmu.ac.th
                [4 ]Biomedical Engineering Institute, Chiang Mai University, Chiang Mai 50200, Thailand
                Author notes
                [* ]Correspondence: khryu@ 123456tdtu.edu.vn ; Tel.: +82-10-4930-1500
                [†]

                These authors contributed equally to the research.

                Author information
                https://orcid.org/0000-0002-2951-9610
                https://orcid.org/0000-0003-0394-9054
                Article
                ijerph-17-06513
                10.3390/ijerph17186513
                7558165
                32906777
                069136b3-5ede-4499-8267-cd9d3f33e386
                © 2020 by the authors.

                Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

                History
                : 22 July 2020
                : 05 September 2020
                Categories
                Article

                Public health
                smoking,noncommunicable disease,feature selection,extreme gradient boosting
                Public health
                smoking, noncommunicable disease, feature selection, extreme gradient boosting

                Comments

                Comment on this article