17
views
0
recommends
+1 Recommend
1 collections
    0
    shares

      Submit your digital health research with an established publisher
      - celebrating 25 years of open access

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Accurate Prediction of Coronary Heart Disease for Patients With Hypertension From Electronic Health Records With Big Data and Machine-Learning Methods: Model Development and Performance Evaluation

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Predictions of cardiovascular disease risks based on health records have long attracted broad research interests. Despite extensive efforts, the prediction accuracy has remained unsatisfactory. This raises the question as to whether the data insufficiency, statistical and machine-learning methods, or intrinsic noise have hindered the performance of previous approaches, and how these issues can be alleviated.

          Objective

          Based on a large population of patients with hypertension in Shenzhen, China, we aimed to establish a high-precision coronary heart disease (CHD) prediction model through big data and machine-learning

          Methods

          Data from a large cohort of 42,676 patients with hypertension, including 20,156 patients with CHD onset, were investigated from electronic health records (EHRs) 1-3 years prior to CHD onset (for CHD-positive cases) or during a disease-free follow-up period of more than 3 years (for CHD-negative cases). The population was divided evenly into independent training and test datasets. Various machine-learning methods were adopted on the training set to achieve high-accuracy prediction models and the results were compared with traditional statistical methods and well-known risk scales. Comparison analyses were performed to investigate the effects of training sample size, factor sets, and modeling approaches on the prediction performance.

          Results

          An ensemble method, XGBoost, achieved high accuracy in predicting 3-year CHD onset for the independent test dataset with an area under the receiver operating characteristic curve (AUC) value of 0.943. Comparison analysis showed that nonlinear models (K-nearest neighbor AUC 0.908, random forest AUC 0.938) outperform linear models (logistic regression AUC 0.865) on the same datasets, and machine-learning methods significantly surpassed traditional risk scales or fixed models (eg, Framingham cardiovascular disease risk models). Further analyses revealed that using time-dependent features obtained from multiple records, including both statistical variables and changing-trend variables, helped to improve the performance compared to using only static features. Subpopulation analysis showed that the impact of feature design had a more significant effect on model accuracy than the population size. Marginal effect analysis showed that both traditional and EHR factors exhibited highly nonlinear characteristics with respect to the risk scores.

          Conclusions

          We demonstrated that accurate risk prediction of CHD from EHRs is possible given a sufficiently large population of training data. Sophisticated machine-learning methods played an important role in tackling the heterogeneity and nonlinear nature of disease prediction. Moreover, accumulated EHR data over multiple time points provided additional features that were valuable for risk prediction. Our study highlights the importance of accumulating big data from EHRs for accurate disease predictions.

          Related collections

          Most cited references28

          • Record: found
          • Abstract: found
          • Article: not found

          General cardiovascular risk profile for use in primary care: the Framingham Heart Study.

          Separate multivariable risk algorithms are commonly used to assess risk of specific atherosclerotic cardiovascular disease (CVD) events, ie, coronary heart disease, cerebrovascular disease, peripheral vascular disease, and heart failure. The present report presents a single multivariable risk function that predicts risk of developing all CVD and of its constituents. We used Cox proportional-hazards regression to evaluate the risk of developing a first CVD event in 8491 Framingham study participants (mean age, 49 years; 4522 women) who attended a routine examination between 30 and 74 years of age and were free of CVD. Sex-specific multivariable risk functions ("general CVD" algorithms) were derived that incorporated age, total and high-density lipoprotein cholesterol, systolic blood pressure, treatment for hypertension, smoking, and diabetes status. We assessed the performance of the general CVD algorithms for predicting individual CVD events (coronary heart disease, stroke, peripheral artery disease, or heart failure). Over 12 years of follow-up, 1174 participants (456 women) developed a first CVD event. All traditional risk factors evaluated predicted CVD risk (multivariable-adjusted P<0.0001). The general CVD algorithm demonstrated good discrimination (C statistic, 0.763 [men] and 0.793 [women]) and calibration. Simple adjustments to the general CVD risk algorithms allowed estimation of the risks of each CVD component. Two simple risk scores are presented, 1 based on all traditional risk factors and the other based on non-laboratory-based predictors. A sex-specific multivariable risk factor algorithm can be conveniently used to assess general CVD risk and risk of individual CVD events (coronary, cerebrovascular, and peripheral arterial disease and heart failure). The estimated absolute CVD event rates can be used to quantify risk and to guide preventive care.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Environmental factors in cardiovascular disease.

            Environmental exposure is an important but underappreciated risk factor contributing to the development and severity of cardiovascular disease (CVD). The heart and vascular system are highly vulnerable to a number of environmental agents--ambient air pollution and the metals arsenic, cadmium, and lead are widespread and the most-extensively studied. Like traditional risk factors, such as smoking and diabetes mellitus, these exposures advance disease and mortality via augmentation or initiation of pathophysiological processes associated with CVD, including blood-pressure control, carbohydrate and lipid metabolism, vascular function, and atherogenesis. Although residence in highly polluted areas is associated with high levels of cardiovascular risk, adverse effects on cardiovascular health also occur at exposure levels below current regulatory standards. Considering the widespread prevalence of exposure, even modest contributions to CVD risk can have a substantial effect on population health. Evidence-based clinical and public-health strategies aimed at reducing environmental exposures from current levels could substantially lower the burden of CVD-related death and disability worldwide.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              2016 SCCT/STR guidelines for coronary artery calcium scoring of noncontrast noncardiac chest CT scans: A report of the Society of Cardiovascular Computed Tomography and Society of Thoracic Radiology.

              The Society of Cardiovascular Computed Tomography (SCCT) and the Society of Thoracic Radiology (STR) have jointly produced this document. Experts in this subject have been selected from both organizations to examine subject-specific data and write this guideline in partnership. A formal literature review, weighing the strength of evidence has been performed. When available, information from studies on cost was considered. Computed tomography (CT) acquisition, CAC scoring methodologies and clinical outcomes are the primary basis for the recommendations in this guideline. This guideline is intended to assist healthcare providers in clinical decision making. The recommendations reflect a consensus after a thorough review of the best available current scientific evidence and practice patterns of experts in the field and are intended to improve patient care while acknowledging that situations arise where additional information may be needed to better inform patient care.
                Bookmark

                Author and article information

                Contributors
                Journal
                JMIR Med Inform
                JMIR Med Inform
                JMI
                JMIR Medical Informatics
                JMIR Publications (Toronto, Canada )
                2291-9694
                July 2020
                6 July 2020
                : 8
                : 7
                : e17257
                Affiliations
                [1 ] Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences Shenzhen China
                [2 ] Fiberhome Technologies College Wuhan Research Institute of Posts and Telecommunications Wuhan China
                [3 ] University of Chinese Academy of Sciences Beijing China
                [4 ] Shenzhen Health Information Center Shenzhen China
                [5 ] FuWai Hospital Chinese Academy of Medical Sciences Shenzhen China
                Author notes
                Corresponding Author: Yunpeng Cai yp.cai@ 123456siat.ac.cn
                Author information
                https://orcid.org/0000-0002-7828-3392
                https://orcid.org/0000-0003-0752-0706
                https://orcid.org/0000-0003-2540-6077
                https://orcid.org/0000-0002-8943-3202
                https://orcid.org/0000-0002-0175-3229
                https://orcid.org/0000-0002-5351-8546
                https://orcid.org/0000-0002-7389-9112
                https://orcid.org/0000-0003-2968-1826
                https://orcid.org/0000-0002-6907-6885
                https://orcid.org/0000-0001-8797-4243
                Article
                v8i7e17257
                10.2196/17257
                7381262
                32628616
                8a6fe139-dd73-4797-9ee1-b3636b20c60a
                ©Zhenzhen Du, Yujie Yang, Jing Zheng, Qi Li, Denan Lin, Ye Li, Jianping Fan, Wen Cheng, Xie-Hui Chen, Yunpeng Cai. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 06.07.2020.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.

                History
                : 29 November 2019
                : 27 January 2020
                : 9 March 2020
                : 28 March 2020
                Categories
                Original Paper
                Original Paper

                coronary heart disease,machine learning,electronic health records,predictive algorithms,hypertension

                Comments

                Comment on this article