0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Developing Clinical Prediction Models Using Primary Care Electronic Health Record Data: The Impact of Data Preparation Choices on Model Performance

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Objective

          To quantify prediction model performance in relation to data preparation choices when using electronic health records (EHR).

          Study Design and Setting

          Cox proportional hazards models were developed for predicting the first-ever main adverse cardiovascular events using Dutch primary care EHR data. The reference model was based on a 1-year run-in period, cardiovascular events were defined based on both EHR diagnosis and medication codes, and missing values were multiply imputed. We compared data preparation choices based on (i) length of the run-in period (2- or 3-year run-in); (ii) outcome definition (EHR diagnosis codes or medication codes only); and (iii) methods addressing missing values (mean imputation or complete case analysis) by making variations on the derivation set and testing their impact in a validation set.

          Results

          We included 89,491 patients in whom 6,736 first-ever main adverse cardiovascular events occurred during a median follow-up of 8 years. Outcome definition based only on diagnosis codes led to a systematic underestimation of risk (calibration curve intercept: 0.84; 95% CI: 0.83–0.84), while complete case analysis led to overestimation (calibration curve intercept: −0.52; 95% CI: −0.53 to −0.51). Differences in the length of the run-in period showed no relevant impact on calibration and discrimination.

          Conclusion

          Data preparation choices regarding outcome definition or methods to address missing values can have a substantial impact on the calibration of predictions, hampering reliable clinical decision support. This study further illustrates the urgency of transparent reporting of modeling choices in an EHR data setting.

          Related collections

          Most cited references35

          • Record: found
          • Abstract: found
          • Article: not found

          Assessing the performance of prediction models: a framework for traditional and novel measures.

          The performance of prediction models can be assessed using a variety of methods and metrics. Traditional measures for binary and survival outcomes include the Brier score to indicate overall model performance, the concordance (or c) statistic for discriminative ability (or area under the receiver operating characteristic [ROC] curve), and goodness-of-fit statistics for calibration.Several new measures have recently been proposed that can be seen as refinements of discrimination measures, including variants of the c statistic for survival, reclassification tables, net reclassification improvement (NRI), and integrated discrimination improvement (IDI). Moreover, decision-analytic measures have been proposed, including decision curves to plot the net benefit achieved by making decisions based on model predictions.We aimed to define the role of these relatively novel approaches in the evaluation of the performance of prediction models. For illustration, we present a case study of predicting the presence of residual tumor versus benign tissue in patients with testicular cancer (n = 544 for model development, n = 273 for external validation).We suggest that reporting discrimination and calibration will always be important for a prediction model. Decision-analytic measures should be reported if the predictive model is to be used for clinical decisions. Other measures of performance may be warranted in specific applications, such as reclassification metrics to gain insight into the value of adding a novel predictor to an established model.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Inference and missing data

                Bookmark

                Author and article information

                Contributors
                Journal
                Front Epidemiol
                Front Epidemiol
                Front. Epidemiol.
                Frontiers in Epidemiology
                Frontiers Media S.A.
                2674-1199
                02 June 2022
                2022
                : 2
                : 871630
                Affiliations
                [1] 1Department of Neurology, Leiden University Medical Hospital , Leiden, Netherlands
                [2] 2National eHealth Living Lab, Leiden University Medical Hospital , Leiden, Netherlands
                [3] 3Department of Public Health & Primary Care, Leiden University Medical Hospital , Leiden, Netherlands
                [4] 4Department of Neurology, University Medical Center Utrecht , Utrecht, Netherlands
                [5] 5Department of Biomedical Data Sciences, Leiden University Medical Hospital , Leiden, Netherlands
                [6] 6Department of Clinical Epidemiology, Leiden University Medical Hospital , Leiden, Netherlands
                Author notes

                Edited by: Huibert Burger, University Medical Center Groningen, Netherlands

                Reviewed by: Lauren Beesley, Los Alamos National Laboratory (DOE), United States; Kellyn F. Arnold, University of Leeds, United Kingdom

                *Correspondence: Hendrikus J. A. van Os h.j.a.van_os@ 123456lumc.nl

                This article was submitted to Research Methods and Advances in Epidemiology, a section of the journal Frontiers in Epidemiology

                †These authors share first authorship

                Article
                10.3389/fepid.2022.871630
                10910909
                38455328
                5c20fe47-a8ee-4096-91a1-312c736a2f5b
                Copyright © 2022 van Os, Kanning, Wermer, Chavannes, Numans, Ruigrok, van Zwet, Putter, Steyerberg and Groenwold.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                History
                : 08 February 2022
                : 11 April 2022
                Page count
                Figures: 3, Tables: 3, Equations: 0, References: 35, Pages: 8, Words: 6305
                Funding
                Funded by: Hartstichting, doi 10.13039/501100002996;
                Funded by: ZonMw, doi 10.13039/501100001826;
                Funded by: Hersenstichting, doi 10.13039/501100008358;
                Funded by: European Commission, doi 10.13039/501100000780;
                Categories
                Epidemiology
                Original Research

                prediction model,data preparation,electronic health records (ehrs),model performance,model transportability,clinical prediction model

                Comments

                Comment on this article