30
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Approach to record linkage of primary care data from Clinical Practice Research Datalink to other health-related patient data: overview and implications

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Record linkage is increasingly used to expand the information available for public health research. An understanding of record linkage methods and the relevant strengths and limitations is important for robust analysis and interpretation of linked data. Here, we describe the approach used by Clinical Practice Research Datalink (CPRD) to link primary care data to other patient level datasets, and the potential implications of this approach for CPRD data analysis. General practice electronic health record software providers separately submit de-identified data to CPRD and patient identifiers to NHS Digital, excluding patients who have opted-out from contributing data. Data custodians for external datasets also send patient identifiers to NHS Digital. NHS Digital uses identifiers to link the datasets using an 8-stage deterministic methodology. CPRD subsequently receives a de-identified linked cohort file and provides researchers with anonymised linked data and metadata detailing the linkage process. This methodology has been used to generate routine primary care linked datasets, including data from Hospital Episode Statistics, Office for National Statistics and National Cancer Registration and Analysis Service. 10.6 million (M) patients from 411 English general practices were included in record linkage in June 2018. 9.1M (86%) patients were of research quality, of which 8.0M (88%) had a valid NHS number and were eligible for linkage in the CPRD standard linked dataset release. Linking CPRD data to other sources improves the range and validity of research studies. This manuscript, together with metadata generated on match strength and linkage eligibility, can be used to inform study design and explore potential linkage-related selection and misclassification biases.

          Related collections

          Most cited references15

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          GUILD: GUidance for Information about Linking Data sets †

          Abstract Record linkage of administrative and survey data is increasingly used to generate evidence to inform policy and services. Although a powerful and efficient way of generating new information from existing data sets, errors related to data processing before, during and after linkage can bias results. However, researchers and users of linked data rarely have access to information that can be used to assess these biases or take them into account in analyses. As linked administrative data are increasingly used to provide evidence to guide policy and services, linkage error, which disproportionately affects disadvantaged groups, can undermine evidence for public health. We convened a group of researchers and experts from government data providers to develop guidance about the information that needs to be made available about the data linkage process, by data providers, data linkers, analysts and the researchers who write reports. The guidance goes beyond recommendations for information to be included in research reports. Our aim is to raise awareness of information that may be required at each step of the linkage pathway to improve the transparency, reproducibility, and accuracy of linkage processes, and the validity of analyses and interpretation of results.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Validity of cancer diagnosis in a primary care database compared with linked cancer registrations in England. Population-based cohort study.

            The present study aimed to evaluate the validity of cancer diagnoses and death recording in a primary care database compared with cancer registry (CR) data in England. The eligible cohort comprised 42,556 participants, registered with English general practices in the General Practice Research Database (GPRD) that consented to CR linkage. CR and primary care records were compared for cancer diagnosis, date of cancer diagnosis and death. Read and ICD cancer code sets were reviewed and agreed by two authors. There were 5216 (91% of CR total) cancer events diagnosed in both sources. There were 494 (9%) diagnosed in CR only and 213 (4%) that were diagnosed in GPRD only. The predictive value of a GPRD cancer diagnosis was 96% for lung cancer, 92% for urinary tract cancer, 96% for gastro-oesophageal cancer and 98% for colorectal cancer. 'False negative' primary care records were sometimes accounted for by registration end dates being shortly before cancer diagnosis dates. The date of cancer diagnosis was median 11 (interquartile range -6 to 30) days later in GPRD compared with CR. Death records were consistent for the two sources for 3337/3397 (99%) of cases. Recording of cancer diagnosis and mortality in primary care electronic records is generally consistent with CR in England. Linkage studies must pay careful attention to selection of codes to define eligibility and timing of diagnoses in relation to beginning and end of record. Copyright © 2012 Elsevier Ltd. All rights reserved.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Recording of hospitalizations for acute exacerbations of COPD in UK electronic health care records

              Background Accurate identification of hospitalizations for acute exacerbations of chronic obstructive pulmonary disease (AECOPD) within electronic health care records is important for research, public health, and to inform health care utilization and service provision. We aimed to develop a strategy to identify hospitalizations for AECOPD in secondary care data and to investigate the validity of strategies to identify hospitalizations for AECOPD in primary care data. Methods We identified patients with chronic obstructive pulmonary disease (COPD) in the Clinical Practice Research Datalink (CPRD) with linked Hospital Episodes Statistics (HES) data. We used discharge summaries for recent hospitalizations for AECOPD to develop a strategy to identify the recording of hospitalizations for AECOPD in HES. We then used the HES strategy as a reference standard to investigate the positive predictive value (PPV) and sensitivity of strategies for identifying AECOPD using general practice CPRD data. We tested two strategies: 1) codes for hospitalization for AECOPD and 2) a code for AECOPD other than hospitalization on the same day as a code for hospitalization due to unspecified reason. Results In total, 27,182 patients with COPD were included. Our strategy to identify hospitalizations for AECOPD in HES had a sensitivity of 87.5%. When compared with HES, using a code suggesting hospitalization for AECOPD in CPRD resulted in a PPV of 50.2% (95% confidence interval [CI] 48.5%–51.8%) and a sensitivity of 4.1% (95% CI 3.9%–4.3%). Using a code for AECOPD and a code for hospitalization due to unspecified reason resulted in a PPV of 43.3% (95% CI 42.3%–44.2%) and a sensitivity of 5.4% (95% CI 5.1%–5.7%). Conclusion Hospitalization for AECOPD can be identified with high sensitivity in the HES database. The PPV and sensitivity of strategies to identify hospitalizations for AECOPD in primary care data alone are very poor. Primary care data alone should not be used to identify hospitalizations for AECOPD. Instead, researchers should use data that are linked to data from secondary care.
                Bookmark

                Author and article information

                Contributors
                Shivani.padmanabhan@mhra.gov.uk
                Journal
                Eur J Epidemiol
                Eur. J. Epidemiol
                European Journal of Epidemiology
                Springer Netherlands (Dordrecht )
                0393-2990
                1573-7284
                15 September 2018
                15 September 2018
                2019
                : 34
                : 1
                : 91-99
                Affiliations
                [1 ]GRID grid.57981.32, Clinical Practice Research Datalink (CPRD), , MHRA, ; 10 South Colonnade, Canary Wharf, London, E14 4PU UK
                [2 ]GRID grid.498467.0, NHS Digital, ; 1 Trevelyan Square, Boar Lane, Leeds, LS1 6AE UK
                Article
                442
                10.1007/s10654-018-0442-4
                6325980
                30219957
                1f7122b1-bb01-43fd-b482-0af2a5df3cd3
                © The Author(s) 2018

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

                History
                : 20 February 2018
                : 7 September 2018
                Categories
                Data Resources
                Custom metadata
                © Springer Nature B.V. 2019

                Public health
                electronic health records,record linkage,deterministic linkage,primary care data,clinical practice research datalink

                Comments

                Comment on this article