25
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Objective

          To develop a conceptual prediction model framework containing standardized steps and describe the corresponding open-source software developed to consistently implement the framework across computational environments and observational healthcare databases to enable model sharing and reproducibility.

          Methods

          Based on existing best practices we propose a 5 step standardized framework for: (1) transparently defining the problem; (2) selecting suitable datasets; (3) constructing variables from the observational data; (4) learning the predictive model; and (5) validating the model performance. We implemented this framework as open-source software utilizing the Observational Medical Outcomes Partnership Common Data Model to enable convenient sharing of models and reproduction of model evaluation across multiple observational datasets. The software implementation contains default covariates and classifiers but the framework enables customization and extension.

          Results

          As a proof-of-concept, demonstrating the transparency and ease of model dissemination using the software, we developed prediction models for 21 different outcomes within a target population of people suffering from depression across 4 observational databases. All 84 models are available in an accessible online repository to be implemented by anyone with access to an observational database in the Common Data Model format.

          Conclusions

          The proof-of-concept study illustrates the framework’s ability to develop reproducible models that can be readily shared and offers the potential to perform extensive external validation of models, and improve their likelihood of clinical uptake. In future work the framework will be applied to perform an “all-by-all” prediction analysis to assess the observational data prediction domain across numerous target populations, outcomes and time, and risk settings.

          Related collections

          Most cited references4

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Concordance between administrative claims and registry data for identifying metastasis to the bone: an exploratory analysis in prostate cancer

          Background To assess concordance between Medicare claims and Surveillance, Epidemiology, and End Results (SEER) reports of incident BM among prostate cancer (PCa) patients. The prevalence and consequences of bone metastases (BM) have been examined across tumor sites using healthcare claims data however the reliability of these claims-based BM measures has not been investigated. Methods This retrospective cohort study utilized linked registry and claims (SEER-Medicare) data on men diagnosed with incident stage IV M1 PCa between 2005 and 2007. The SEER-based measure of incident BM was cross-tabulated with three separate Medicare claims approaches to assess concordance. Sensitivity, specificity and positive predictive value (PPV) were calculated to assess the concordance between registry- and claims-based measures. Results Based on 2,708 PCa patients in SEER-Medicare, there is low to moderate concordance between the SEER- and claims-based measures of incident BM. Across the three approaches, sensitivity ranged from 0.48 (0.456 – 0.504) to 0.598 (0.574 - 0.621), specificity ranged from 0.538 (0.507 - 0.569) to 0.620 (0.590 - 0.650) and PPV ranged from 0.679 (0.651 - 0.705) to 0.690 (0.665 - 0.715). A comparison of utilization patterns between SEER-based and claims-based measures suggested avenues for improving sensitivity. Conclusion Claims-based measures using BM ICD 9 coding may be insufficient to identify patients with incident BM diagnosis and should be validated against chart data to maximize their potential for population-based analyses.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Massive Parallelization of Serial Inference Algorithms for a Complex Generalized Linear Model

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Are the Framingham and PROCAM coronary heart disease risk functions applicable to different European populations? The PRIME Study.

              To assess whether the Framingham and PROCAM risk functions were applicable to men in Belfast and France. We performed an external validation study within the PRIME (Prospective Epidemiological Study of Myocardial Infarction) cohort study. It comprised men recruited in Belfast (2399) and France (7359) who were aged 50 to 59 years, free of CHD at baseline (1991 to 1993) and followed over 5 years for CHD events (coronary death, myocardial infarction, angina pectoris). We compared the relative risks of CHD associated with the classic risk factors in PRIME with those in Framingham and PROCAM cohorts. We then compared the number of predicted and observed 5-year CHD events (calibration). Finally, we estimated the ability of the risk functions to separate high risk from low risk subjects (discrimination). The relative risk of CHD calculated for the various factors in the PRIME population were not statistically different from those published in the Framingham and PROCAM risk functions. The number of CHD events predicted by these risk functions however clearly overestimated those observed in Belfast and France. The two risk functions had a similar ability to separate high risk from low risk subjects in Belfast and France (c-statistic range: 0.61-0.68). The Framingham and PROCAM risk functions should not be used to estimate the absolute CHD risk of middle-aged men in Belfast and France without any CHD history because of a clear overestimation. Specific population risk functions are needed.
                Bookmark

                Author and article information

                Journal
                J Am Med Inform Assoc
                J Am Med Inform Assoc
                jamia
                Journal of the American Medical Informatics Association : JAMIA
                Oxford University Press
                1067-5027
                1527-974X
                August 2018
                27 April 2018
                27 April 2018
                : 25
                : 8
                : 969-975
                Affiliations
                [1 ]Janssen Research and Development, Raritan, NJ, USA
                [2 ]Department of Biomathematics, UCLA School of Medicine, CA, USA
                [3 ]Department of Medical Informatics, Erasmus University Medical Center, Rotterdam,The Netherlands
                Author notes
                Corresponding Author: Dr Jenna M Reps, Janssen Research and Development, Raritan, New Jersey, USA; jreps@ 123456its.jnj.com
                Article
                ocy032
                10.1093/jamia/ocy032
                6077830
                29718407
                7ae65c19-9f62-4c6c-8c08-772d84cc7e65
                © The Author(s) 2018. Published by Oxford University Press on behalf of the American Medical Informatics Association.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                History
                : 30 May 2017
                : 8 December 2017
                : 15 March 2018
                Page count
                Pages: 7
                Funding
                Funded by: National Science Foundation 10.13039/100000001
                Award ID: 1251151
                Categories
                Research and Applications

                Bioinformatics & Computational biology
                prediction model,prediction framework,prognostic model,observational data

                Comments

                Comment on this article