96
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Can machine-learning improve cardiovascular risk prediction using routine clinical data?

      research-article

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Current approaches to predict cardiovascular risk fail to identify many people who would benefit from preventive treatment, while others receive unnecessary intervention. Machine-learning offers opportunity to improve accuracy by exploiting complex interactions between risk factors. We assessed whether machine-learning can improve cardiovascular risk prediction.

          Methods

          Prospective cohort study using routine clinical data of 378,256 patients from UK family practices, free from cardiovascular disease at outset. Four machine-learning algorithms (random forest, logistic regression, gradient boosting machines, neural networks) were compared to an established algorithm (American College of Cardiology guidelines) to predict first cardiovascular event over 10-years. Predictive accuracy was assessed by area under the ‘receiver operating curve’ (AUC); and sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) to predict 7.5% cardiovascular risk (threshold for initiating statins).

          Findings

          24,970 incident cardiovascular events (6.6%) occurred. Compared to the established risk prediction algorithm (AUC 0.728, 95% CI 0.723–0.735), machine-learning algorithms improved prediction: random forest +1.7% (AUC 0.745, 95% CI 0.739–0.750), logistic regression +3.2% (AUC 0.760, 95% CI 0.755–0.766), gradient boosting +3.3% (AUC 0.761, 95% CI 0.755–0.766), neural networks +3.6% (AUC 0.764, 95% CI 0.759–0.769). The highest achieving (neural networks) algorithm predicted 4,998/7,404 cases (sensitivity 67.5%, PPV 18.4%) and 53,458/75,585 non-cases (specificity 70.7%, NPV 95.7%), correctly predicting 355 (+7.6%) more patients who developed cardiovascular disease compared to the established algorithm.

          Conclusions

          Machine-learning significantly improves accuracy of cardiovascular risk prediction, increasing the number of patients identified who could benefit from preventive treatment, while avoiding unnecessary treatment of others.

          Related collections

          Most cited references25

          • Record: found
          • Abstract: found
          • Article: not found

          General cardiovascular risk profile for use in primary care: the Framingham Heart Study.

          Separate multivariable risk algorithms are commonly used to assess risk of specific atherosclerotic cardiovascular disease (CVD) events, ie, coronary heart disease, cerebrovascular disease, peripheral vascular disease, and heart failure. The present report presents a single multivariable risk function that predicts risk of developing all CVD and of its constituents. We used Cox proportional-hazards regression to evaluate the risk of developing a first CVD event in 8491 Framingham study participants (mean age, 49 years; 4522 women) who attended a routine examination between 30 and 74 years of age and were free of CVD. Sex-specific multivariable risk functions ("general CVD" algorithms) were derived that incorporated age, total and high-density lipoprotein cholesterol, systolic blood pressure, treatment for hypertension, smoking, and diabetes status. We assessed the performance of the general CVD algorithms for predicting individual CVD events (coronary heart disease, stroke, peripheral artery disease, or heart failure). Over 12 years of follow-up, 1174 participants (456 women) developed a first CVD event. All traditional risk factors evaluated predicted CVD risk (multivariable-adjusted P<0.0001). The general CVD algorithm demonstrated good discrimination (C statistic, 0.763 [men] and 0.793 [women]) and calibration. Simple adjustments to the general CVD risk algorithms allowed estimation of the risks of each CVD component. Two simple risk scores are presented, 1 based on all traditional risk factors and the other based on non-laboratory-based predictors. A sex-specific multivariable risk factor algorithm can be conveniently used to assess general CVD risk and risk of individual CVD events (coronary, cerebrovascular, and peripheral arterial disease and heart failure). The estimated absolute CVD event rates can be used to quantify risk and to guide preventive care.
            Bookmark
            • Record: found
            • Abstract: not found
            • Book: not found

            Applied Logistic Regression

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Development and validation of improved algorithms for the assessment of global cardiovascular risk in women: the Reynolds Risk Score.

              Despite improved understanding of atherothrombosis, cardiovascular prediction algorithms for women have largely relied on traditional risk factors. To develop and validate cardiovascular risk algorithms for women based on a large panel of traditional and novel risk factors. Thirty-five factors were assessed among 24 558 initially healthy US women 45 years or older who were followed up for a median of 10.2 years (through March 2004) for incident cardiovascular events (an adjudicated composite of myocardial infarction, ischemic stroke, coronary revascularization, and cardiovascular death). We used data among a random two thirds (derivation cohort, n = 16 400) to develop new risk algorithms that were then tested to compare observed and predicted outcomes in the remaining one third of women (validation cohort, n = 8158). Minimization of the Bayes Information Criterion was used in the derivation cohort to develop the best-fitting parsimonious prediction models. In the validation cohort, we compared predicted vs actual 10-year cardiovascular event rates when the new algorithms were compared with models based on covariates included in the Adult Treatment Panel III risk score. In the derivation cohort, a best-fitting model (model A) and a clinically simplified model (model B, the Reynolds Risk Score) had lower Bayes Information Criterion scores than models based on covariates used in Adult Treatment Panel III. In the validation cohort, all measures of fit, discrimination, and calibration were improved when either model A or B was used. For example, among participants without diabetes with estimated 10-year risks according to the Adult Treatment Panel III of 5% to less than 10% (n = 603) or 10% to less than 20% (n = 156), model A reclassified 379 (50%) into higher- or lower-risk categories that in each instance more accurately matched actual event rates. Similar effects were achieved for clinically simplified model B limited to age, systolic blood pressure, hemoglobin A(1c) if diabetic, smoking, total and high-density lipoprotein cholesterol, high-sensitivity C-reactive protein, and parental history of myocardial infarction before age 60 years. Neither new algorithm provided substantive information about women at very low risk based on the published Adult Treatment Panel III score. We developed, validated, and demonstrated highly improved accuracy of 2 clinical algorithms for global cardiovascular risk prediction that reclassified 40% to 50% of women at intermediate risk into higher- or lower-risk categories.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                4 April 2017
                2017
                : 12
                : 4
                : e0174944
                Affiliations
                [1 ]NIHR School for Primary Care Research, University of Nottingham, Nottingham, United Kingdom
                [2 ]Division of Primary Care, School of Medicine, University of Nottingham, Nottingham, United Kingdom
                [3 ]Advanced Data Analysis Centre, University of Nottingham, Nottingham, United Kingdom
                [4 ]School of Computer Science, University of Nottingham, Nottingham, United Kingdom
                Harbin Institute of Technology Shenzhen Graduate School, CHINA
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                • Conceptualization: SW JR JK NQ JG.

                • Data curation: SW JR.

                • Formal analysis: SW JR.

                • Funding acquisition: SW NQ JK.

                • Investigation: SW JR.

                • Methodology: SW JR JK NQ JG.

                • Project administration: SW.

                • Resources: SW JR.

                • Software: SW JR.

                • Supervision: SW NQ.

                • Validation: SW JR JK NQ JG.

                • Visualization: SW JR JK NQ JG.

                • Writing – original draft: SW JR.

                • Writing – review & editing: SW JR JK NQ JG.

                ‡ These authors also contributed equally to this work.

                Author information
                http://orcid.org/0000-0002-5281-9590
                Article
                PONE-D-16-49429
                10.1371/journal.pone.0174944
                5380334
                28376093
                6c46cc3b-eca1-498d-8241-186e201649ff
                © 2017 Weng et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 14 December 2016
                : 18 March 2017
                Page count
                Figures: 2, Tables: 4, Pages: 14
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/501100000272, National Institute for Health Research;
                Award ID: NIHR School for Primary Care Research Fellowship (2015-2018)
                Award Recipient :
                This paper presents independent research funded by the National Institute for Health Research School for Primary Care Research (NIHR SPCR): personal training fellowship award for SW from 2015-2018. URL: https://www.spcr.nihr.ac.uk/trainees. The views expressed are those of the authors and not necessarily those of the NIHR, the NHS, or the Department of Health.
                Categories
                Research Article
                Medicine and Health Sciences
                Cardiovascular Medicine
                Cardiovascular Diseases
                Biology and Life Sciences
                Biochemistry
                Lipids
                Cholesterol
                Computer and Information Sciences
                Neural Networks
                Biology and Life Sciences
                Neuroscience
                Neural Networks
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Research and Analysis Methods
                Simulation and Modeling
                Algorithms
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Machine Learning Algorithms
                Research and Analysis Methods
                Simulation and Modeling
                Algorithms
                Machine Learning Algorithms
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Machine Learning Algorithms
                Medicine and Health Sciences
                Vascular Medicine
                Blood Pressure
                Medicine and Health Sciences
                Endocrinology
                Endocrine Disorders
                Diabetes Mellitus
                Medicine and Health Sciences
                Metabolic Disorders
                Diabetes Mellitus
                Custom metadata
                This dataset contains patient level health records with intellectual property rights held by The Crown copyright, which is subject to UK information governance laws. The authors will make their data available upon specific requests subject to the requestor obtaining ethical and research approvals from the Clinical Practice Research Datalink Independent Scientific Advisory Committee ( https://www.cprd.com/intro.asp) at the UK Medicines and Health Products Regulatory Agency.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article