31
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Data resource profile: Clinical Practice Research Datalink (CPRD) Aurum

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Data resource basics Clinical Practice Research Datalink (CPRD) is a UK government, not-for-profit research service that has been supplying anonymized primary care data for public health research for more than 30 years. In October 2017 CPRD launched a new data resource called CPRD Aurum. CPRD Aurum is a database containing routinely collected data from primary care practices in England, capturing diagnoses, symptoms, prescriptions, referrals and tests for over 19 million patients as of September 2018 (Figure 1). Primary care data in CPRD Aurum have been linked to national secondary care databases as well as deprivation and death registration data (Table 1). Figure 1. CPRD Aurum population coverage and total patients by English region, September 2018. Circles represent total patients in CPRD Aurum in each region. Shading represents population coverage of current patients as a proportion of total regional population. Table 1. Key details about CPRD Aurum UK countries covered Consenting practices in England (consenting practices from Northern Ireland available from 2019) Who is included? 19 million patients from 738 practices (10% of English practices), of whom 7 million were alive and currently contributing (13% of the population of England) What is recorded? Demographics, diagnoses, symptoms, prescriptions, referrals, immunizations, lifestyle factors, tests and results Start and end dates From 1995 a to September 2018, with a median follow-up of 4.2 years (IQR: 1.5–11.4) for all patients and 9.1 years (IQR: 3.3–20.1) for current patients. CPRD Aurum is updated on a monthly basis. Standard linkages Hospital Episode Statistics, Death Registration, Cancer data, Mental Health Services Dataset, Small Area-Level Data (deprivation measures and rural–urban classification) a This is an arbitrary cut-off and the database includes records pre-dating 1995, however, the completeness of recorded information following this point is expected to be more reliable. IQR, interquartile range. UK primary care The United Kingdom’s (UK) National Health Service (NHS) is a publicly funded health service, free at the point of use. General practitioners (GPs) are considered the ‘gatekeepers’ of the NHS, referring patients to secondary care and diagnostic tests. 1 Over 98% of the population is registered at one of the 7300 GP practices in England. 2 A unique patient identifier, the NHS number, is used in primary, secondary and tertiary care settings, enabling linkages to other data sources. 3 There are four principal GP IT systems (primary care patient management software system) suppliers in England 4 and the largest coverage is provided by EMIS Health® (EMIS Web® software is used in 56% of English practices). 5 CPRD Aurum, discussed in this Data Resource Profile, encompasses EMIS Web® GP practices that have agreed to contribute data to this database on a daily basis. CPRD also collects data from practices using Vision® software that contribute to the CPRD GOLD database, which has been used in epidemiological research for 30 years. 6 CPRD Aurum CPRD Aurum includes patient electronic healthcare records (EHR) collected routinely in primary care. When a practice agrees to contribute patient data to CPRD Aurum, CPRD receives a full historic collection of the coded part of the practice’s electronic health records, which includes data on deceased patients and those who have left the practice. Since 25 May 2018, individuals in England can opt-out of sharing their confidential patient information for research purposes 7 and, as of 1 September 2018, 2.7% of the English primary care registered population had opted-out. 8 As of September 2018, CPRD Aurum included 7 million patients who were alive and registered at EMIS Web® currently contributing practices (Table 2), representing around 13% of the population of England. This number will increase as additional practices sign-up as contributors to this data resource as part of an ongoing recruitment strategy. Consenting practices from Northern Ireland will start contributing data to CPRD Aurum from 2019. Table 2. Demographic characteristics of Aurum patients, September 2018 All patients Current No. patients (practices) 19 305 234 (738) 7 125 786 (731) Gender  Male 9 309 928 (48.2%) 3 552 291 (49.9%)  Female 9 994 725 (51.8%) 3 573 360 (50.1%)  Indeterminate 581 (<0.01%) 135 (<0.01%) Age in 2018  <18 − 1 427 297 (20.0%)  18–64 − 4 463 385 (62.6%)  65+ − 1 235 104 (17.3%) English Region  North East 741 657 (3.8%) 327 786 (4.6%)  North West 2 948 404 (15.3%) 1 186 522 (16.7%)  Yorkshire & The Humber 715 487 (3.7%) 262 132 (3.7%)  East Midlands 551 130 (2.9%) 186 831 (2.6%)  West Midlands 3 126 234 (16.2%) 1 298 818 (18.2%)  East of England 889 553 (4.6%) 356 308 (5.0%)  South West 2 781 559 (14.4%) 980 184 (13.8%)  South Central 2 359 844 (12.2%) 847 481 (11.9%)  London 3 623 487 (18.8%) 1 125 905 (15.8%)  South East Coast 1 567 879 (8.1%) 553 819 (7.8%) Follow-up since 1995a (median years, IQR) 4.2 (1.5–11.4) 9.1 (3.3–20.1) ‘Current’ refers to patients who are alive and registered at actively contributing practices. IQR, interquartile range. aThe database includes records pre-dating 1995. Key demographic information on current and total patients is presented in Table 2. Median follow-up since 1995 for all patients was 4.2 years [interquartile range (IQR): 1.5–11.4] and 9.1 years (3.3–20.1) for current patients. For patients in CPRD Aurum, the mean decile on the 2015 Index of Multiple Deprivation 9 was 5.3 (1 being the least deprived) compared with the mean decile in England (of 5.5), suggesting a slightly less deprived population in the database. Linkage to other datasets Data from patients from all practices in CPRD Aurum can be linked to a range of health-related data sources including secondary care, disease registries and death registration records (Table 3). CPRD Aurum are linked to other patient-level health data by a trusted third party, NHS Digital, using NHS number, exact date of birth, sex and patient residence postcode (linkage methodology details are described in Padmanabhan et al., 2018). 10 CPRD does not receive or hold patient identifiers including name, full date of birth, postcode and NHS number. Identifiers are removed prior to transfer of data to CPRD to protect patient confidentiality. Personal identifiers are sent separately from GP practices to NHS Digital, the statutory body in England able to receive patient identifiable data, to enable linkage. Table 3. Standard linkages with CPRD Aurum data Linkage dataset Coveragea Key information (including coding/scoring system) ONS Death Registration Data 1998–2018 Date, place, and causes of death (ICD) Hospital Episode Statistics (HES)  Admitted Patient Care 1997–2017 Diagnoses (ICD) and procedures (OPCS)  Outpatient 2003–2017 Diagnoses (ICD) data  Accident & Emergency 2007–2017 Diagnoses (A&E codes) data  Diagnostic Imaging Dataset 2012–2017 Imaging tests data  PROM 2009–2017 Quality of life & condition-specific scales National Cancer Registration and Analysis Service  Cancer registration 1990–2015 Diagnoses (ICD) and tumour site  Systemic Anti-Cancer Treatment 2014–2015 Procedures and outcomes (ICD & OPCS)  National Radiotherapy Dataset 2012–2015 Procedures (ICD & OPCS)  Cancer Patient Experience Survey 2010–2013 Self-reported cancer patient data Mental Health Services Data Set 2007–2015 Diagnoses (ICD), functioning (HoNOS) Small area-level data  Index of Multiple Deprivation 2004–2015 Patient or practice data, including domains  Townsend Index 2001 Patient-level deprivation data  Carstairs Index 2011 Practice-level deprivation data  Rural Urban Classification 2011 Practice-level classification PROM, Patient Reported Outcome Measures; ICD, WHO International Classification of Diseases; OPCS, Office of Population Censuses and Surveys Classification of Surgical Operations and Procedures; HoNOS, Health of the Nation Outcome Scale. a Coverage is updated regularly based on data-provider releases. For up to date information on available linkages, visit cprd.com/linked-data Primary care data in CPRD Aurum have been linked to Office for National Statistics Death Registration Data, 11 which are considered the gold standard for mortality data in the UK and contain the date, place and cause of death. 12 The Hospital Episode Statistics (HES) datasets include Admitted Patient Care (APC) data which contain details of all admissions to, or attendances at English NHS health care providers, including acute hospital trusts, primary care trusts and mental health trusts. 13 HES Outpatient (OP) contains records of outpatient appointments in England including dates, specialty, clinical diagnoses and procedures. 14 HES Accident and Emergency (A&E) consists of individual records of patient care administered in the accident and emergency setting in England, also including diagnoses and procedures. 15 The HES Diagnostic Imaging Dataset (DID) contains information about diagnostic imaging tests conducted, such as X-rays and MRI scans, taken from NHS radiological information systems. 16 Cancer data provided by Public Health England (PHE) via the National Cancer Registration and Analysis Service (NCRAS) 17 have also been linked to CPRD Aurum. Linked NCRAS CPRD datasets include Cancer Registration data (record for each registrable tumour diagnosed or treated in England), the Systemic Anti-Cancer Treatment Dataset (SACT; chemotherapy treatment and outcome), the National Radiotherapy Dataset (RTDS; radiotherapy records for cancer, including teletherapy and brachytherapy) and the Cancer Patient Experience Survey (CPES; four waves of self-reported patient data). CPRD Aurum data has also been linked to the Mental Health Services Dataset (MHDS), which contains records of individuals who accessed secondary care adult, and child and adolescent mental health services, including diagnoses and episodes of care. Small area-level linkages on practice or patients’ residence postcodes include several measures of area-level deprivation (Index of Multiple Deprivation, 9 Townsend Index, 18 and Carstairs Index 19 ) and practice-level rural–urban classification. 20 Data collected CPRD Aurum is a dynamic database, and data are collected from contributing practices on a daily basis and processed to create monthly snapshots for observational research. This full de-identified coded clinical record includes symptoms, diagnoses, prescriptions, immunizations, tests, lifestyle factors and referrals recorded by the GP or other practice staff, but does not include free text medical notes. Structure The database structure is based on eight separate files, each containing patients’ pseudonymized identifiers (Figure 2). The patient file records basic patient demographics, date of death if applicable and details on when the patient registered/deregistered from the practice. The practice file contains the practice region and the most recent date of data collection for the practice, and the staff file contains the job category for each staff member in CPRD Aurum. The consultation file holds information relating to the type of consultation as entered by the GP (e.g. telephone, home visit, practice visit), which can be internally linked (within CPRD Aurum) to observations that occur during the consultation via the consultation identifier, and to the staff member that conducted the consultation via the staff file. Figure 2. CPRD Aurum dataset structure. 1Includes symptoms, diagnoses, immunizations, tests and lifestyle factors. Note: the problem and referral tables contain add-on information for certain types of observations. Some consultations are linked to observations. Some drug issues are linked to problem-type observations. The observation file contains the medical-history data entered on the GP system including symptoms, clinical measurements, laboratory test results, and diagnoses, as well as demographic information recorded as a clinical code (e.g. patient ethnicity). Observations that occur during a consultation are linked via the consultation identifier. CPRD Aurum data are structured in a long format (multiple rows per subject), and observations are linked to a parent observation. For example, measurements of systolic and diastolic blood pressure will be grouped together via a parent observation for blood pressure measurement. Data in the referral and problem files are linked to the observation file and contain ‘add-on’ data for referral-type and problem-type observations. The referral file contains information involving both inbound and outbound patient referrals to or from external care centres (most frequently from the practice to a secondary care provider). The problem file contains details of the patient’s medical history that have been defined by the GP as a ‘problem’, including the significance of the problem and its expected duration. GPs may use ‘problems’ to manage chronic conditions, thus enabling them to group clinical events (including drug prescriptions, measurements and symptoms) by problem rather than chronologically by consultation date. Finally, the drug-issue file contains data relating to all prescriptions (for drugs and devices) issued by the GP and are linked back to problem-type observations. Coding CPRD provides data dictionaries and code browsers to identify relevant codes in CPRD Aurum. The Medical Dictionary contains information on all medical history observations that have been recorded. Observations are coded using a combination of SNOMED CT (UK edition), 21 Read Version 2 22 and local EMIS Web® codes. The Drug Dictionary contains information on drug and device prescriptions recorded in EMIS Web®. This information is coded using the Dictionary of Medicines and Devices (dm+d), which exists within the SNOMED CT terminological structure. 23 Practice staff are able to add additional information to patient records as free text. However, for data governance reasons, CPRD does not collect free text as these fields may contain identifiable patient information. Ethics CPRD obtains annual research ethics approval from the UK’s Health Research Authority (HRA) Research Ethics Committee (REC) (East Midlands – Derby, REC reference number 05/MRE04/87) to receive and supply patient data for public health research. Therefore, no additional ethics approval is required for observational studies using CPRD Aurum data for public health research, subject to individual research protocols meeting CPRD data governance requirements. Funding CPRD is jointly sponsored by the UK government’s Medicines and Healthcare products Regulatory Agency and the National Institute for Health Research (NIHR). As a not-for-profit UK government body, CPRD seeks to recoup the cost of delivering its research services to academic, industry and government researchers through research user licence fees. Data resource use As CPRD Aurum is a relatively recent data resource, no studies have been published to date. Nevertheless, since the recent launch of CPRD Aurum a number of approved research projects are already underway – including in pharmacovigilance, drug prescribing patterns, health services and policy evaluation, and disease risk factors. For instance, an ongoing academic research study is looking at patient treatment pathways in primary and secondary care using linked hospital and clinical audit data. A pharmacovigilance study is using CPRD Aurum and linked hospital data to examine the association between common drug therapies and heart arrhythmia. CPRD GOLD, which contains comparable NHS GP data to CPRD Aurum (but based on practices using a different GP IT system) has been used extensively in over 2000 publications that illustrate the potential research applications of CPRD Aurum. 24 A bibliography of all peer-reviewed published studies using CPRD data, dating back over the past 30 years, is available on the CPRD website (www.cprd.com/bibliography). Strengths and weaknesses With records on over 19 million patients as of September 2018, CPRD Aurum contains a wide range of diagnostic, prescription, procedure and lifestyle information. The key strengths of CPRD Aurum are its size and coverage, longitudinal follow-up, representativeness, standard linkages and data quality assurance processes. Strengths Database size and representativeness The data currently cover 13% of the population of England, and are representative of the broader English population in terms of geographical spread (Figure 1) and deprivation (median decile on index of multiple deprivation (IMD) of 5.3), as well as age and gender [see Figure 3 comparing mid-2017 CPRD Aurum to mid-2017 data published by the Office for National Statistics (ONS)]. 25 Figure 3. Population pyramids for CPRD Aurum and ONS data. Based on mid-2017 ONS and mid-2017 CPRD Aurum data. Data linkages Patient-level data have been linked to secondary care and other data sets, providing a fuller picture of the patient care pathway and outcomes. All CPRD Aurum practices have consented to participating in the linkage scheme, which includes data from national secondary care databases (hospitals and mental health service providers), the national cancer registry, death registrations and deprivation measures (Table 3). Data quality assurance processes CPRD undertakes various levels of validation and quality assurance on the daily GP data collection comprising over 900 checks covering the integrity, structure and format of the data. Issues highlighted by the checks are reviewed and addressed before data is incorporated into CPRD Aurum. Collection-level validation ensures integrity by checking that data received from EMIS Web® practices contain only expected data files and ensures that all data elements are of the correct type, length and format. Duplicate records are identified and removed. Transformation-level validation checks for referential integrity between records to ensure that there are no orphan records included in CPRD Aurum (e.g. that all event records link to a patient). Research-quality-level validation is the last level and covers the actual content of the data. CPRD provides a patient-level data quality metric in the form of a binary ‘acceptability’ flag. This is based on recording and internal consistency of key variables including date of birth, practice registration date and transfer out date. Separately, a derived death date (consolidating death-related information captured in different parts of the patient record) is currently undergoing validation against both GP-recorded and official ONS death records, and a practice-level quality metric (ascertaining temporal gaps in recording quality) is in development that will be added to future builds of CPRD Aurum. Weaknesses Missing data Though secondary care data, including key diagnoses, can be manually recorded by GPs, this information is often incomplete in primary care records. Additional data may be available in free text entries or letters received by GPs from secondary care facilities, but this is not available to CPRD or researchers for data governance reasons. However, additional information on patient pathways can be obtained through linkage to other data sources as described above. Data on prescriptions for medications and devices that have been issued in primary care are very reliable, however, information on medications dispensed, secondary care prescriptions and over-the-counter use are not recorded in primary care. GP IT systems and coding Possible variations in coding between practices and over time, as well as the current transition to SNOMED coding, 21 should be considered by researchers when planning a study using CPRD Aurum. Additionally, the database structure of CPRD Aurum differs somewhat from other UK databases, including CPRD GOLD, 6 due to underlying differences between EMIS Web® and Vision® software structures, which may affect data comparability. CPRD has published preliminary guidance for researchers on differences between the CPRD GOLD and CPRD Aurum databases. 26 Data resource access Researchers can apply for a limited licence to access CPRD data for public health research, subject to individual research protocols meeting CPRD data governance requirements. CPRD Aurum data is provided in tab-delimited text files and can be imported into any standard statistical software package. As a not-for-profit organization, CPRD recoups its costs through research user licence fees (annual multi-study license or dataset-specific license), with additional fees for linkage to other datasets. More details including the data specification, applications process, and access to linked data, are available on the CPRD website (https://www.cprd.com). Researchers can also request feasibility counts from CPRD to inform sample-size estimates and decisions regarding suitability of CPRD Aurum for their proposed research. Any other queries can be directed to CPRD Enquiries [enquiries@cprd.com]. Profile in a nutshell CPRD Aurum is a UK primary care database set up for public health research and benefit, updated monthly for observational research, with standard linkages to hospital, mortality, cancer, mental health and deprivation data. As of September 2018, 738 GP practices in England have contributed data, which included 19 million patients, of whom over 7 million were currently registered at contributing practices. De-identified coded primary care data are collected from GP practices that have consented to provide data to CPRD Aurum. Symptoms, diagnoses, prescriptions, immunizations, tests and lifestyle factors are recorded by the GP or other practice staff, and CPRD Aurum has been linked to additional secondary care databases. Access to data for public health research is subject to data governance requirements and contractual obligations being met. Queries can be directed to CPRD Enquiries (enquiries@cprd.com). Conflict of interest: None declared.

          Related collections

          Most cited references5

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Data Resource Profile: Hospital Episode Statistics Admitted Patient Care (HES APC)

          Data resource basics Scope Hospital Episode Statistics Admitted Patient Care (HES APC) data are collected on all admissions to National Health Service (NHS) hospitals in England. HES APC also covers admissions to independent sector providers (private or charitable hospitals) paid for by the NHS. 1 It is estimated that 98–99% of hospital activity in England is funded by the NHS. 2 A hospital admission includes any secondary care-based activity that requires a hospital bed, thus including both emergency and planned admissions, day cases, births and associated deliveries. HES APC does not cover accident and emergency (A&E, emergency department) attendances or outpatient bookings; these data are held in separate HES databases. All HES databases are collated and curated by NHS Digital (previously the Health and Social Care Information Centre). In the financial year 2014/15 (April to March), 18 731 987 hospital episodes from 451 different NHS hospital trusts (known as ‘providers’) were recorded in HES APC. 3 Purpose of data collection The need for national data collection on hospital activity to inform management and planning of services was first recognized in the early 1980s by a Department of Health working group. 4 Following these recommendations, a national programme was progressively rolled out, starting in 1987 and obtaining continual national coverage by (financial year) 1989/90. 5 Since 2004/05, HES APC has also served as the basis for ‘Payment by Results’ (PbR), a pay-for-performance system of secondary care reimbursement in the NHS internal market. 6 Structure HES APC data files are structured according to financial years. Each row in HES APC indicates a ‘Finished Consultant Episode’ (FCE). An FCE represents a continuous period of care under one consultant, and each is specified with a start and an end date. Episodes are labelled as ‘finished’ and entered in HES APC according to the financial year in which they end. Consequently, episodes that start in one financial year and end in another will be classified as unfinished in the starting financial year, and finished in the ending financial year. Unfinished episodes need to be removed before analysis to prevent double counting. A hospital admission in HES APC is referred to as a ‘spell’, defined as an uninterrupted inpatient stay at one hospital. A spell may include several FCEs if the patient was seen by multiple consultants during the same stay, but does not include transfers between hospitals. If a patient is transferred to a different hospital, a new spell begins. In order to identify and measure continuous hospital stays, which include transfers to other hospitals, continuous inpatient spells (CIPs) need to be derived. Although CIP identifiers are not provided in standard HES APC extracts, methods for linking FCEs into CIPs are available, 7 including that recommended by NHS Digital. 8 Research uses HES APC has been frequently used for research and service evaluation, due to its universal coverage, long period of data collection and the ability to follow individuals over time. HES APC offers the opportunity to estimate population-based admission and procedure rates by condition and type of procedure, compare hospital performance and create hospital-based cohorts for short- or long-term follow-up. Since HES APC covers all births in NHS hospitals, representing 97.3% of births in England, 9 it is also possible to create nationally representative birth cohorts. Processing cycle and frequency of data collection Upon discharge from the care of a particular consultant, the treating clinician completes a discharge summary for the patient of diagnoses made and procedures carried out during that FCE (where procedures include surgery, diagnostic imaging, ventilation and infusion/transfusion therapy). Discharge summaries are forwarded to a clinical coding department in the hospital, who enter the information onto the local electronic patient information database. Clinical coders undergo nationally accredited training programmes and follow standardized rules for translating information on discharge summaries into clinical codes. 10 , 11 Every month, data are extracted from local hospital databases to the Secondary User Service (SUS), a national data warehouse housed within NHS Digital. 12 Data from the SUS are extracted both for purposes of hospital reimbursement under PbR, and separately to create a provisional monthly HES extract. NHS Digital carry out basic data checks and cleaning, add geographical fields based on patient postcodes, and attach pseudonymized patient identifiers (‘HESIDs’) to each episode. 13 , 14 At the end of each financial year, NHS Digital allow hospitals one further data submission to HES (the ‘Annual Refresh’), after which a provisional annual HES extract is produced for final review by hospitals. Once the Annual Refresh has been checked, a final annual HES dataset is made available. 12 Linkage within HES APC From 1997/98 onwards (when patients’ NHS numbers became a mandated return from hospitals), HES APC episodes have been linked longitudinally to the same patient by tagging episodes with the HESID. This alphanumeric variable allows patient follow-up, yet avoids the need for supplying patient identifiers to researchers. The methods used to generate the HESID have been described elsewhere. 15 Each HES APC extract contains a unique set of HESIDs to reduce the risk of individual disclosure through merging separate data extracts supplied to different research teams. Linkage to other datasets HES APC data can be linked to other datasets held by NHS Digital, including HES A&E attendances (from 2007/08), HES Outpatient appointments (from 2003/04), adult critical care (from 2008/09), diagnostic imaging data (covering all radiology procedures from 2012/13), the Mental Health Services Dataset (for all adult community and outpatient mental health care contacts from 2006/07) and Patient Reported Outcome Measures (pre- and postoperative questionnaires filled out by patients undergoing knee or hip replacements, varicose vein surgery or groin hernia repair from 2009/10). Secondary users can link these datasets because the same HESID algorithm is applied to each dataset. HES APC is also routinely linked to a number of external datasets. The Clinical Practice Research Datalink, 16 a large UK primary care database, is linked to HES APC on a monthly basis. HES APC is linked to dates and causes of non-hospital deaths from the Register of Deaths in England and Wales held by the Office for National Statistics (for deaths registered since 1 January 1998), also on a monthly basis. 17 Only deaths of patients recorded in HES APC are available through this linkage (i.e. deaths of persons who have not had a hospital admission since April 1997 are not included). NHS Digital also provides a trusted third-party bespoke linkage service, through which secondary users can request that HES APC data be linked to other external datasets. For example, both national disease registries (such as the National Joint Registry 18 and the UK Renal Registry 19 ) and well-established cohort studies including Whitehall II 20 and the Hertfordshire Cohort Study 21 have been linked to HES APC. Secondary users need to obtain the appropriate approvals to enable these linkages. Measures Clinical and patient data HES APC provides detailed clinical, demographic and organizational information for each FCE (see Table 1), with 270 variables available in the core dataset. Apart from data on diagnoses and procedures, HES APC contains information on dates of admission, operations and discharge, admission method (e..g. emergency or planned), care provider and many geographical variables mapped from a patient’s postcode. The local health geographies and hospital providers in England have changed several times since 1997, and thus care needs to be taken to ensure continuity when carrying out local or provider level analyses that use HES APC data covering many years. Table 1 Selection of key data fields available for each finished consultant episode (FCE) in HES APC data 22 Patient Admission/FCE Clinical Geography Provider/ organisational Maternity/birth (only in maternity tail) HESID Age at admission Age at discharge Sex Ethnic group Episode start date Episode end date Date of admission Date of discharge Admission method (e.g. - planned, emergency, birth) Discharge method Admission source Discharge destination Waiting time (from date of decision to admit to date of admission) Diagnoses (up to 20) Operations (up to 24) Operation dates (up to 24) Consultant specialty (admitting and treating consultant) Government office region Local authority Clinical commissioning group Index of multiple deprivation (IMD) 2004 rank, deciles and domains Care provider (hospital) General practice of patient Gestational age Number of previous births Birth weight Maternal age Mode of delivery Baby number (for multiple births) Socioeconomic status is measured by the Index of Multiple Deprivation 2004 (IMD), a small area-based indicator constructed from several different measures of deprivation. 22 IMD is measured at Lower Super Output Area (LSOA) level, where an LSOA contains between 400 and 1200 households. 23 Individual-level measures of socioeconomic status (e.g. education level or income) are not available. Detailed information on variables available, specific cleaning rules and coding used are available in the HES APC Data Dictionary provided by NHS Digital. 24 Diagnoses are coded using the International Classification of Diseases version 10 (ICD-10). 25 ICD-9 was used between April 1989 and March 1995. The number of diagnosis fields has increased over time: since April 2007, each FCE can have up to 20 ICD-10 codes entered (up from 7 codes before April 2002 and 14 in April 2002–March 2007). Each FCE has one primary diagnosis, which accounts for the majority of the length of stay of the FCE. The other diagnoses are referred to as comorbidities. According to NHS Digital cleaning rules, each FCE must have at least one primary diagnosis, although it may be recorded as unknown (ICD-10 code R69). Operations and other interventions are coded using a UK-specific system, the Office of Population Censuses and Surveys Classification of Interventions and Procedures (OPCS, currently version 4.7). 26 This has evolved over time as new techniques and technologies have been introduced. A history of versions in use is available from the NHS Digital coding standards website. 26 Each FCE may have up to 24 operations recorded (up from 4 before April 2002 and 12 in April 2002–March 2007), but procedure fields are left empty if patient management did not require an intervention covered by OPCS (e.g. where the primary treatment was a drug regimen or observation). A primary procedure is selected for each FCE as that which is the most resource-intensive, but a procedure may be described using more than one code to indicate surgical approach, anatomical location and side of procedure (e.g. stent placed under radiological control in femoral artery of left leg). Dates are also entered for each procedure. Birth and delivery information Each birth event in HES APC generates at least two FCEs: one delivery episode and one or more birth episodes. Each delivery and birth episode includes an additional ‘maternity tail’, with detailed fields including the baby’s birthweight, gestational age, birth order (for multiple births), mode of delivery and maternal age (Table 1). The maternity tail is based on information entered via local maternity databases. Unlike the diagnostic and procedure fields, the maternity tail data fields use HES-specific categories rather than standardized classifications, and it is not a mandated return to NHS Digital. This leads to large variations in data completeness and quality. 27 , 28 It is not possible to directly link a mother and a baby in HES APC; that is, the mother’s HESID is not copied to the baby’s birth record. However, linkage between mother and baby is possible using probabilistic methods. 29 Hospital use in England Both numbers and rates of hospital admissions have increased during the period of HES APC data collection (Figure 1), particularly among older adults (aged 60-74 and 75+). Between 1998/99 and 2014/15, the overall FCE rate has increased by 40% from 24.5 per 100 person-years to 34.3 per 100 person-years, with the steepest increase (73.0%) in adults aged 75+. Figure 1 A) Number of finished consultant episodes (FCEs) by age group from financial years 1998/99 to 2014/15; and B) episode rates by age group per 100 person-years. Denominators for rates are based on mid-year population estimates for England 78 . Since HES APC covers all hospital admissions, infants and older adults (aged 65+) are over-represented in HES APC compared with the general population of England (Table 2). Table 2 Demographic characteristics of HES APC patients compared with general population of England Characteristic HES APC a England b Finished consultant episodes 18731964 Admissions 15892434 Admission type  Emergency 5615707 (30.0)  Waiting list 6119234 (32.7)  Planned 2154564 (11.5)  Other 2002929 (10.7) Sex  Male 8359362 (44.6) 26773200 (49.3)  Female 10370245 (55.4) 27543400 (50.7)  Gender unknown 2357 (0.01) – Age  0 years 1013476 (5.4) 664183 (1.2)  1–4 years 454461 (2.4) 2766774 (5.1)  5–14 years 568902 (3.0) 6245420 (11.5)  15–24 years 1167439 (6.2) 6837371 (12.6)  25–34 years 1880715 (10.0) 7425591 (13.7)  35–44 years 1573273 (8.4) 7103408 (13.1)  45–54 years 1986116 (10.6) 7635651 (14.1)  55–64 years 2319214 (12.4) 6100512 (11.2)  65–74 years 3013044 (16.1) 5162873 (9.5)  75–84 years 2941250 (15.7) 3099319 (5.7)  85+ years 1711354 (9.1) 1275516 (2.3)  Missing 102720 (0.5) Numbers within parentheses represent proportions of FCEs (for HES APC) and proportions of persons (for England) aData source: HES APC 2014–15. 3 bONS 2014 mid-year population estimates. 75 Data resource use Although no up-to-date bibliography of published research based on HES APC is curated by the data providers, a 2013 systematic review identified 148 articles using HES APC data published between 1989 and July 2011. 30 We carried out a subsequent search on PubMed on the 8 June 2016 using the search term ‘Hospital Episode Statistics’ for article abstracts published since July 2011. We identified 264 relevant publications where the primary analysis involved the use of HES APC data, and a further 130 papers where HES data had been linked to cohorts created in other datasets. The annual number of publications using HES APC data has increased from 2 in 1993 30 to 88 in 2015. Published studies using HES APC data have covered a diverse range of topics. They have explored the incidence of conditions across regions and over time. 31 , 32 They have also examined cross-sectional and longitudinal patterns of treatment by organization, 33 including comparing NHS and privately contracted providers 34 or regions, 35 , 36 both from descriptive and analytical perspectives. Regional comparisons have included evaluating the impact of clinical evidence 37 or guidelines 38 as well as health care policies. 39 They have examined the outcome of medical as well as surgical therapies (such as survival, 40 short-term postoperative mortality, 41 complications, 42 reoperation 43 and hospital readmissions 44 ), with some seeking to identify factors that are associated with these outcomes, in terms of both patient characteristics 45 , 46 and organizational factors such as surgical volume 47 or day of week. 48 Methodological studies include creating coding frameworks, 28 applying comorbidity scores, 49 developing risk prediction models 50 and using look-back methods to impute missing data items. 51 Many high profile routinely produced reports on the quality of secondary care are based on HES APC data. These include hospital mortality monitoring reports produced by NHS Digital 52 and commercial organizations, 53 and research reports by independent think-tanks 54 and Royal Medical Colleges. 55 Strengths and weaknesses Coverage The key strength of the HES APC database is its universal coverage, which provides an unselected sample of hospital episodes. The large size of HES APC makes it possible to precisely estimate admission rates and capture outcomes for rare conditions, including congenital anomalies or specific cancers. Longitudinal linkage Another strength is the possibility to longitudinally link patients using the HESID, allowing for the creation of HES-based cohort studies if a suitable inception date can be identified. The long period of data collection of HES (currently up to 19 years) allows long-term follow-up of admitted patients, which has allowed the development of risk prediction models for distal outcomes. 44 Standardized coding ICD-10 coding of clinical diagnoses offers the opportunity to use HES APC for international comparisons of secondary care use. Since ICD-10 is used in hospital administrative data across the UK, Europe, Canada, Australia and New Zealand, HES APC has been used to assess the impact of differential health policy between NHS systems and internationally. 56–58 International studies using HES APC include cross-country comparisons of the incidence of neonatal abstinence syndrome 59 and non-small cell lung cancer. 60 Nonetheless, international comparisons are challenging due to differences between countries in admission thresholds, organization of care provision, and whether secondary care is free at point of use or requires health insurance or other payment. HES APC episodes are readily linked to information on costs of care, due to the ability to match each episode to a Healthcare Resource Group, and hence a unit cost. 61 This makes HES APC an important data resource for health economics. 62–64 Coding variation One of the key challenges in interpreting HES APC is the reliance on diagnostic and procedure codes for identifying study participants and outcomes. Despite centrally issued coding rules, clinical coders rely on the quality and detail of completed discharge summaries to enter data consistently. Consequently, diagnostic coding practices vary between hospitals, particularly for comorbidities. 65 Since the roll-out of PbR, financial incentives now exist for hospitals to improve coding depth in order to ensure accurate reimbursement. This has led to an increase in the number of diagnostic codes used and improvements in coding accuracy. 7 , 66 The introduction of PbR therefore poses challenges for interpreting time-series studies using HES APC data, and care must be taken to not overinterpret results identifying increasing complexity of cases admitted. 7 Sensitivity to admission thresholds Since HES APC covers only admitted patients, it is sensitive to variation between hospitals or over time in admission thresholds. The introduction of the four-hour waiting target in A&E departments in 2004 has been suggested as a contributing factor for the increase in rates of emergency admissions in children during the 2000s. 67 , 68 Changes in thresholds for emergency admissions can be examined using linked HES A&E data; 69 however, variation in admission thresholds for planned procedures cannot readily be determined using HES datasets. Missing data Although age, sex and clinical characteristics are well completed in HES APC (see Table 2), data on ethnicity are not. Ethnicity has been a mandated return for all NHS contacts since 1991. Although ethnicity recording has improved over time, the proportion of patients with a known ethnicity recorded was still only 85% in 2011, up from 41% in 1997. 70 Further, there is a high proportion of missing data in the maternity tail fields (see Figure 2). Postcodes were not extracted from the SUS for birth episodes prior to 2013/14, which means earlier birth episodes cannot be mapped to geographical variables, including the Index of Multiple Deprivation (IMD). 71 As an example, completeness of the IMD decile variable for singleton birth episodes in 2012/13 was 7.8%, compared with 81.9% in 2013/14. Figure 2 Proportion of birth records with missing data for selected variables in the maternity tail from financial years 1997/98 to 2013/14. Quality of internal linkage The HESID linkage algorithm relies heavily on the accurate recording of NHS number across all hospital episodes to avoid missed matches (FCEs that have failed to link to a patient). Consequently, there is a substantial proportion of missed matches in HES APC. A recent estimate puts the HESID missed-match rate at 4%, 72 leading to an underestimation of readmission rates by 3.8%. NHS numbers were not provided at birth until 2002, meaning that linkage within HES APC and to other HES and external datasets is not reliable for births before 2002/3. 73 Scope limitations HES APC covers higher dependency (HDU) or intensive care unit (ICU) periods, but it does not contain ‘flags’ to identify such stays, nor detailed information on level of care or HDU/ICU interventions. A separate HES dataset covers adult critical care from 2008/09, 74 whereas data relating to neonatal or paediatric intensive care are collected through systems external to NHS Digital. Data on drugs prescribed through hospital pharmacies to inpatients are not available in HES APC. There is currently no national individual-level hospital prescribing database for England. Opt-outs Patients who do not wish their records to leave NHS Digital can lodge a ‘type 2 opt-out’ with their primary care practice. 75 From 29 April 2016, any records (including in previous financial years) relating to persons who have opted out in any NHS Digital dataset (including HES APC) will therefore be removed before supply to secondary users. Overall, for the 2014/15 HES APC annual extract, 2.3% of episodes will be removed, with substantial geographical variation in opt-out rates. 75 Data resource access Access to HES APC data is provided by NHS Digital for the NHS, government, researchers and commercial health care bodies. Those requesting an extract of the data must show that their work will support health and social care and improve health. 76 Data cannot be released for solely commercial purposes. Data are requested through the online Data Access Request Service (DARS). Applications are evaluated by the Data Access Advisory Group which check all data requests for patient-level data to evaluate whether there is an appropriate legal basis for data dissemination and that appropriate data security is in place. Details about HES applications and associated costs are available on the DARS website [http://content.digital.nhs.uk/DARS]. NHS Digital carries out audits to check that data users meet obligations regarding the terms and conditions of use, including disclosure control. 77 Profile in a nutshell HES APC contains data on all admissions to National Health Service (NHS) hospitals in England, or to independent hospitals where the costs are met by the NHS. It was originally set up for purposes of management and planning of hospital services. Data are now also collected for purposes of reimbursing hospital activity. HES APC includes all hospital care episodes from the financial year 1989/90 onwards (1 April 1989–31 March 1990). Pseudonymized patient identifiers that allow for longitudinal follow-up of patients are available from 1997/98 onwards. HES APC data are entered from medical records by clinical coders in each hospital, according to national clinical coding standards. The database is collated and processed centrally by NHS Digital (previously the Health and Social Care Information Centre). Data fields exist for diagnoses, procedures, patient demographics (including ethnicity and area-level deprivation), admission and discharge dates, hospital and other variables. HES APC data can be linked to outpatient and emergency department attendances as well as datasets external to NHS Digital, including death registrations. Aggregate data are accessible via the NHS Digital website and individual-level data are available through the NHS Digital Data Access Request Service, subject to approval and a cost recovery charge. Funding L.W. is supported by funding from the Department of Health Policy Research Programme through funding to the Policy Research Unit in the Health of Children, Young People and Families (grant reference number 109/0001). This is an independent report commissioned and funded by the Department of Health. The views expressed are not necessarily those of the Department. A.Z.’s PhD studentship is supported by awards to establish the Farr Institute of Health Informatics Research, London, from the Medical Research Council, Arthritis Research UK, British Heart Foundation, Cancer Research UK, Chief Scientist’s Office, Economic and Social Research Council, Engineering and Physical Sciences Research Council, National Institute for Health Research, National Institute for Social Care and Health Research and Wellcome Trust (grant MR/K006584/1). P.H. is funded by a National Institute for Health Research postdoctoral fellowship (number PDF-2013‐06‐004). This article represents independent research funded by the National Institute for Health Research (NIHR). The views expressed are those of the authors and not those of the NHS, the NIHR or the Department of Health. Conflict of interest: None declared.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Approach to record linkage of primary care data from Clinical Practice Research Datalink to other health-related patient data: overview and implications

            Record linkage is increasingly used to expand the information available for public health research. An understanding of record linkage methods and the relevant strengths and limitations is important for robust analysis and interpretation of linked data. Here, we describe the approach used by Clinical Practice Research Datalink (CPRD) to link primary care data to other patient level datasets, and the potential implications of this approach for CPRD data analysis. General practice electronic health record software providers separately submit de-identified data to CPRD and patient identifiers to NHS Digital, excluding patients who have opted-out from contributing data. Data custodians for external datasets also send patient identifiers to NHS Digital. NHS Digital uses identifiers to link the datasets using an 8-stage deterministic methodology. CPRD subsequently receives a de-identified linked cohort file and provides researchers with anonymised linked data and metadata detailing the linkage process. This methodology has been used to generate routine primary care linked datasets, including data from Hospital Episode Statistics, Office for National Statistics and National Cancer Registration and Analysis Service. 10.6 million (M) patients from 411 English general practices were included in record linkage in June 2018. 9.1M (86%) patients were of research quality, of which 8.0M (88%) had a valid NHS number and were eligible for linkage in the CPRD standard linked dataset release. Linking CPRD data to other sources improves the range and validity of research studies. This manuscript, together with metadata generated on match strength and linkage eligibility, can be used to inform study design and explore potential linkage-related selection and misclassification biases.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Spatial distribution of clinical computer systems in primary care in England in 2016 and implications for primary care electronic medical record databases: a cross-sectional population study

              Objectives UK primary care databases (PCDs) are used by researchers worldwide to inform clinical practice. These databases have been primarily tied to single clinical computer systems, but little is known about the adoption of these systems by primary care practices or their geographical representativeness. We explore the spatial distribution of clinical computing systems and discuss the implications for the longevity and regional representativeness of these resources. Design Cross-sectional study. Setting English primary care clinical computer systems. Participants 7526 general practices in August 2016. Methods Spatial mapping of family practices in England in 2016 by clinical computer system at two geographical levels, the lower Clinical Commissioning Group (CCG, 209 units) and the higher National Health Service regions (14 units). Data for practices included numbers of doctors, nurses and patients, and area deprivation. Results Of 7526 practices, Egton Medical Information Systems (EMIS) was used in 4199 (56%), SystmOne in 2552 (34%) and Vision in 636 (9%). Great regional variability was observed for all systems, with EMIS having a stronger presence in the West of England, London and the South; SystmOne in the East and some regions in the South; and Vision in London, the South, Greater Manchester and Birmingham. Conclusions PCDs based on single clinical computer systems are geographically clustered in England. For example, Clinical Practice Research Datalink and The Health Improvement Network, the most popular primary care databases in terms of research outputs, are based on the Vision clinical computer system, used by <10% of practices and heavily concentrated in three major conurbations and the South. Researchers need to be aware of the analytical challenges posed by clustering, and barriers to accessing alternative PCDs need to be removed.
                Bookmark

                Author and article information

                Journal
                Int J Epidemiol
                Int J Epidemiol
                ije
                International Journal of Epidemiology
                Oxford University Press
                0300-5771
                1464-3685
                December 2019
                11 March 2019
                11 March 2019
                : 48
                : 6
                : 1740-1740g
                Affiliations
                Clinical Practice Research Datalink, Medicines and Healthcare Products Regulatory Agency , London, UK
                Author notes
                Corresponding author. Clinical Practice Research Datalink, Medicines and Healthcare Products Regulatory Agency, 10 South Colonnade, London E14 4PU, UK. E-mail: achim.wolf@ 123456mhra.gov.uk
                Article
                dyz034
                10.1093/ije/dyz034
                6929522
                30859197
                0660aa00-d080-415f-bc2c-d934b864fd57
                © The Author(s) 2019. Published by Oxford University Press on behalf of the International Epidemiological Association.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                History
                : 25 February 2019
                Page count
                Pages: 8
                Categories
                Data Resource Profiles

                Public health
                Public health

                Comments

                Comment on this article