4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Can the Use of Bayesian Analysis Methods Correct for Incompleteness in Electronic Health Records Diagnosis Data? Development of a Novel Method Using Simulated and Real-Life Clinical Data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background: Patient health information is collected routinely in electronic health records (EHRs) and used for research purposes, however, many health conditions are known to be under-diagnosed or under-recorded in EHRs. In research, missing diagnoses result in under-ascertainment of true cases, which attenuates estimated associations between variables and results in a bias toward the null. Bayesian approaches allow the specification of prior information to the model, such as the likely rates of missingness in the data. This paper describes a Bayesian analysis approach which aimed to reduce attenuation of associations in EHR studies focussed on conditions characterized by under-diagnosis.

          Methods: Study 1: We created synthetic data, produced to mimic structured EHR data where diagnoses were under-recorded. We fitted logistic regression (LR) models with and without Bayesian priors representing rates of misclassification in the data. We examined the LR parameters estimated by models with and without priors. Study 2: We used EHR data from UK primary care in a case-control design with dementia as the outcome. We fitted LR models examining risk factors for dementia, with and without generic prior information on misclassification rates. We examined LR parameters estimated by models with and without the priors, and estimated classification accuracy using Area Under the Receiver Operating Characteristic.

          Results: Study 1: In synthetic data, estimates of LR parameters were much closer to the true parameter values when Bayesian priors were added to the model; with no priors, parameters were substantially attenuated by under-diagnosis. Study 2: The Bayesian approach ran well on real life clinic data from UK primary care, with the addition of prior information increasing LR parameter values in all cases. In multivariate regression models, Bayesian methods showed no improvement in classification accuracy over traditional LR.

          Conclusions: The Bayesian approach showed promise but had implementation challenges in real clinical data: prior information on rates of misclassification was difficult to find. Our simple model made a number of assumptions, such as diagnoses being missing at random. Further development is needed to integrate the method into studies using real-life EHR data. Our findings nevertheless highlight the importance of developing methods to address missing diagnoses in EHR data.

          Related collections

          Most cited references38

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research

          Objective To review the methods and dimensions of data quality assessment in the context of electronic health record (EHR) data reuse for research. Materials and methods A review of the clinical research literature discussing data quality assessment methodology for EHR data was performed. Using an iterative process, the aspects of data quality being measured were abstracted and categorized, as well as the methods of assessment used. Results Five dimensions of data quality were identified, which are completeness, correctness, concordance, plausibility, and currency, and seven broad categories of data quality assessment methods: comparison with gold standards, data element agreement, data source agreement, distribution comparison, validity checks, log review, and element presence. Discussion Examination of the methods by which clinical researchers have investigated the quality and suitability of EHR data for research shows that there are fundamental features of data quality, which may be difficult to measure, as well as proxy dimensions. Researchers interested in the reuse of EHR data for clinical research are recommended to consider the adoption of a consistent taxonomy of EHR data quality, to remain aware of the task-dependence of data quality, to integrate work on data quality assessment from other fields, and to adopt systematic, empirically driven, statistically based methods of data quality assessment. Conclusion There is currently little consistency or potential generalizability in the methods used to assess EHR data quality. If the reuse of EHR data for clinical research is to become accepted, researchers should adopt validated, systematic methods of EHR data quality assessment.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Validity of diagnostic coding within the General Practice Research Database: a systematic review.

            The UK-based General Practice Research Database (GPRD) is a valuable source of longitudinal primary care records and is increasingly used for epidemiological research. To conduct a systematic review of the literature on accuracy and completeness of diagnostic coding in the GPRD. Systematic review. Six electronic databases were searched using search terms relating to the GPRD, in association with terms synonymous with validity, accuracy, concordance, and recording. A positive predictive value was calculated for each diagnosis that considered a comparison with a gold standard. Studies were also considered that compared the GPRD with other databases and national statistics. A total of 49 papers are included in this review. Forty papers conducted validation of a clinical diagnosis in the GPRD. When assessed against a gold standard (validation using GP questionnaire, primary care medical records, or hospital correspondence), most of the diagnoses were accurately recorded in the patient electronic record. Acute conditions were not as well recorded, with positive predictive values lower than 50%. Twelve papers compared prevalence or consultation rates in the GPRD against other primary care databases or national statistics. Generally, there was good agreement between disease prevalence and consultation rates between the GPRD and other datasets; however, rates of diabetes and musculoskeletal conditions were underestimated in the GPRD. Most of the diagnoses coded in the GPRD are well recorded. Researchers using the GPRD may want to consider how well the disease of interest is recorded before planning research, and consider how to optimise the identification of clinical events.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Prevalence and determinants of undetected dementia in the community: a systematic literature review and a meta-analysis

              Objectives Detection of dementia is essential for improving the lives of patients but the extent of underdetection worldwide and its causes are not known. This study aimed to quantify the prevalence of undetected dementia and to examine its correlates. Methods/setting/participants A systematic search was conducted until October 2016 for studies reporting the proportion of undetected dementia and/or its determinants in either the community or in residential care settings worldwide. Random-effects models calculated the pooled rate of undetected dementia and subgroup analyses were conducted to identify determinants of the variation. Primary and secondary outcome measures The outcome measures of interest were the prevalence and determinants of undetected dementia. Results 23 studies were eligible for inclusion in this review. The pooled rate of undetected dementia was 61.7% (95% CI 55.0% to 68.0%). The rate of underdetection was higher in China and India (vs Europe and North America), in the community setting (vs residential/nursing care), age of <70 years, male gender and diagnosis by general practitioner. However, it was lower in the studies using Mini-Mental State Examination (MMSE) diagnosis criteria. Conclusions The prevalence of undetected dementia is high globally. Wide variations in detecting dementia need to be urgently examined, particularly in populations with low socioeconomic status. Efforts are required to reduce diagnostic inequality and to improve early diagnosis in the community.
                Bookmark

                Author and article information

                Contributors
                Journal
                Front Public Health
                Front Public Health
                Front. Public Health
                Frontiers in Public Health
                Frontiers Media S.A.
                2296-2565
                05 March 2020
                2020
                : 8
                : 54
                Affiliations
                [1] 1Department of Primary Care and Public Health, Brighton and Sussex Medical School , Brighton, United Kingdom
                [2] 2Department of Physics and Astronomy, University of Sussex , Brighton, United Kingdom
                Author notes

                Edited by: Michael Edelstein, Public Health England, United Kingdom

                Reviewed by: Laszlo Balkanyi, University of Pannonia, Hungary; Helen Isabel McDonald, London School of Hygiene and Tropical Medicine, University of London, United Kingdom

                *Correspondence: Elizabeth Ford e.m.ford@ 123456bsms.ac.uk

                This article was submitted to Digital Public Health, a section of the journal Frontiers in Public Health

                Article
                10.3389/fpubh.2020.00054
                7066995
                32211363
                ab13731d-bed8-430c-ade8-65296eaecb70
                Copyright © 2020 Ford, Rooney, Hurley, Oliver, Bremner and Cassell.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                History
                : 11 June 2019
                : 14 February 2020
                Page count
                Figures: 3, Tables: 4, Equations: 2, References: 59, Pages: 12, Words: 8832
                Funding
                Funded by: Wellcome Trust 10.13039/100004440
                Categories
                Public Health
                Original Research

                electronic health records,patient data,data quality,missing data,bayesian analysis,methodology

                Comments

                Comment on this article