0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Predicting cardiovascular risk from national administrative databases using a combined survival analysis and deep learning approach

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Machine learning-based risk prediction models may outperform traditional statistical models in large datasets with many variables, by identifying both novel predictors and the complex interactions between them. This study compared deep learning extensions of survival analysis models with Cox proportional hazards models for predicting cardiovascular disease (CVD) risk in national health administrative datasets.

          Methods

          Using individual person linkage of administrative datasets, we constructed a cohort of all New Zealanders aged 30–74 who interacted with public health services during 2012. After excluding people with prior CVD, we developed sex-specific deep learning and Cox proportional hazards models to estimate the risk of CVD events within 5 years. Models were compared based on the proportion of explained variance, model calibration and discrimination, and hazard ratios for predictor variables.

          Results

          First CVD events occurred in 61 927 of 2 164 872 people. Within the reference group, the largest hazard ratios estimated by the deep learning models were for tobacco use in women (2.04, 95% CI: 1.99, 2.10) and chronic obstructive pulmonary disease with acute lower respiratory infection in men (1.56, 95% CI: 1.50, 1.62). Other identified predictors (e.g. hypertension, chest pain, diabetes) aligned with current knowledge about CVD risk factors. Deep learning outperformed Cox proportional hazards models on the basis of proportion of explained variance (R 2: 0.468 vs 0.425 in women and 0.383 vs 0.348 in men), calibration and discrimination (all P <0.0001).

          Conclusions

          Deep learning extensions of survival analysis models can be applied to large health administrative datasets to derive interpretable CVD risk prediction equations that are more accurate than traditional Cox proportional hazards models.

          Related collections

          Most cited references28

          • Record: found
          • Abstract: not found
          • Article: not found

          Regression Models and Life-Tables

          D R Cox (1972)
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Bias in random forest variable importance measures: Illustrations, sources and a solution

            Background Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related scientific fields, for instance to select a subset of genetic markers relevant for the prediction of a certain disease. We show that random forest variable importance measures are a sensible means for variable selection in many applications, but are not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. This is particularly important in genomics and computational biology, where predictors often include variables of different types, for example when predictors include both sequence data and continuous variables such as folding energy, or when amino acid sequence data show different numbers of categories. Results Simulation studies are presented illustrating that, when random forest variable importance measures are used with data of varying types, the results are misleading because suboptimal predictor variables may be artificially preferred in variable selection. The two mechanisms underlying this deficiency are biased variable selection in the individual classification trees used to build the random forest on one hand, and effects induced by bootstrap sampling with replacement on the other hand. Conclusion We propose to employ an alternative implementation of random forests, that provides unbiased variable selection in the individual classification trees. When this method is applied using subsampling without replacement, the resulting variable importance measures can be used reliably for variable selection even in situations where the potential predictor variables vary in their scale of measurement or their number of categories. The usage of both random forest algorithms and their variable importance measures in the R system for statistical computing is illustrated and documented thoroughly in an application re-analyzing data from a study on RNA editing. Therefore the suggested method can be applied straightforwardly by scientists in bioinformatics research.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Evaluating the yield of medical tests.

              A method is presented for evaluating the amount of information a medical test provides about individual patients. Emphasis is placed on the role of a test in the evaluation of patients with a chronic disease. In this context, the yield of a test is best interpreted by analyzing the prognostic information it furnishes. Information from the history, physical examination, and routine procedures should be used in assessing the yield of a new test. As an example, the method is applied to the use of the treadmill exercise test in evaluating the prognosis of patients with suspected coronary artery disease. The treadmill test is shown to provide surprisingly little prognostic information beyond that obtained from basic clinical measurements.
                Bookmark

                Author and article information

                Contributors
                Journal
                Int J Epidemiol
                Int J Epidemiol
                ije
                International Journal of Epidemiology
                Oxford University Press
                0300-5771
                1464-3685
                June 2022
                15 December 2021
                15 December 2021
                : 51
                : 3
                : 931-944
                Affiliations
                Centre for Big Data Research in Health, University of New South Wales , Sydney, NSW, Australia
                Section of Epidemiology and Biostatistics, University of Auckland , Auckland, New Zealand
                Section of Epidemiology and Biostatistics, University of Auckland , Auckland, New Zealand
                National Drug and Alcohol Research Centre, University of New South Wales , Sydney, NSW, Australia
                Section of Epidemiology and Biostatistics, University of Auckland , Auckland, New Zealand
                Centre for Big Data Research in Health, University of New South Wales , Sydney, NSW, Australia
                Section of Epidemiology and Biostatistics, University of Auckland , Auckland, New Zealand
                Author notes
                Corresponding author. Centre for Big Data Research in Health, Level 2, AGSM Building (G27), UNSW Sydney, NSW 2052, Australia. E-mail: s.barbieri@ 123456unsw.edu.au
                Author information
                https://orcid.org/0000-0002-5919-372X
                https://orcid.org/0000-0003-0390-661X
                https://orcid.org/0000-0001-5914-6934
                Article
                dyab258
                10.1093/ije/dyab258
                9189958
                34910160
                ef4a0585-b9ef-4251-a3e0-71052da64e51
                © The Author(s) 2021. Published by Oxford University Press on behalf of the International Epidemiological Association.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 10 May 2021
                : 16 November 2021
                : 26 November 2021
                Page count
                Pages: 14
                Funding
                Funded by: Health Research Council of New Zealand, DOI 10.13039/501100001505;
                Award ID: 11/800
                Award ID: 14/010
                Funded by: New Zealand Health Research Council Clinical Research Training Fellowship;
                Funded by: National Drug and Alcohol Research Centre (NDARC);
                Funded by: University of New South Wales Scientia PhD Scholarships;
                Funded by: New Zealand Heart Foundation Hynds Senior Fellowship;
                Categories
                Methods
                AcademicSubjects/MED00860

                Public health
                cardiovascular diseases,primary prevention,risk assessment,population health,health planning,machine learning,deep learning,survival analysis

                Comments

                Comment on this article