Is omission of free text records a possible source of data loss and bias in Clinical Practice Research Datalink studies? A case–control study

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Objectives

To estimate data loss and bias in studies of Clinical Practice Research Datalink (CPRD) data that restrict analyses to Read codes, omitting anything recorded as text.

Design

Matched case–control study.

Setting

Patients contributing data to the CPRD.

Participants

4915 bladder and 3635 pancreatic, cancer cases diagnosed between 1 January 2000 and 31 December 2009, matched on age, sex and general practitioner practice to up to 5 controls (bladder: n=21 718; pancreas: n=16 459). The analysis period was the year before cancer diagnosis.

Primary and secondary outcome measures

Frequency of haematuria, jaundice and abdominal pain, grouped by recording style: Read code or text-only (ie, hidden text). The association between recording style and case–control status (χ ² test). For each feature, the odds ratio (OR; conditional logistic regression) and positive predictive value (PPV; Bayes’ theorem) for cancer, before and after addition of hidden text records.

Results

Of the 20 958 total records of the features, 7951 (38%) were recorded in hidden text. Hidden text recording was more strongly associated with controls than with cases for haematuria (140/336=42% vs 556/3147=18%) in bladder cancer (χ ² test, p<0.001), and for jaundice (21/31=67% vs 463/1565=30%, p<0.0001) and abdominal pain (323/1126=29% vs 397/1789=22%, p<0.001) in pancreatic cancer. Adding hidden text records corrected PPVs of haematuria for bladder cancer from 4.0% (95% CI 3.5% to 4.6%) to 2.9% (2.6% to 3.2%), and of jaundice for pancreatic cancer from 12.8% (7.3% to 21.6%) to 6.3% (4.5% to 8.7%). Adding hidden text records did not alter the PPV of abdominal pain for bladder (codes: 0.14%, 0.13% to 0.16% vs codes plus hidden text: 0.14%, 0.13% to 0.15%) or pancreatic (0.23%, 0.21% to 0.25% vs 0.21%, 0.20% to 0.22%) cancer.

Conclusions

Omission of text records from CPRD studies introduces bias that inflates outcome measures for recognised alarm symptoms. This potentially reinforces clinicians’ views of the known importance of these symptoms, marginalising the significance of ‘low-risk but not no-risk’ symptoms.

Related collections

Most cited references 12

Record: found
Abstract: found
Article: not found

Recent advances in the utility and use of the General Practice Research Database as an example of a UK Primary Care Data resource.

Tim Williams, Tjeerd Van Staa, Shivani Puri … (2012)

Since its inception in the mid-1980s, the General Practice Research Database (GPRD) has undergone many changes but remains the largest validated and most utilised primary care database in the UK. Its use in pharmacoepidemiology stretches back many years with now over 800 original research papers. Administered by the Medicines and Healthcare products Regulatory Agency since 2001, the last 5 years have seen a rebuild of the database processing system enhancing access to the data, and a concomitant push towards broadening the applications of the database. New methodologies including real-world harm-benefit assessment, pharmacogenetic studies and pragmatic randomised controlled trials within the database are being implemented. A substantive and unique linkage program (using a trusted third party) has enabled access to secondary care data and disease-specific registry data as well as socio-economic data and death registration data. The utility of anonymised free text accessed in a safe and appropriate manner is being explored using simple and more complex techniques such as natural language processing.

0 comments Cited 144 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Evaluation of diagnostic tests when there is no gold standard. A review of methods.

Hans Reitsma, Anne Rutjes, A Coomarasamy … (2007)

To generate a classification of methods to evaluate medical tests when there is no gold standard. Multiple search strategies were employed to obtain an overview of the different methods described in the literature, including searches of electronic databases, contacting experts for papers in personal archives, exploring databases from previous methodological projects and cross-checking of reference lists of useful papers already identified. All methods available were classified into four main groups. The first method group, impute or adjust for missing data on reference standard, needs careful attention to the pattern and fraction of missing values. The second group, correct imperfect reference standard, can be useful if there is reliable information about the degree of imperfection of the reference standard and about the correlation of the errors between the index test and the reference standard. The third group of methods, construct reference standard, have in common that they combine multiple test results to construct a reference standard outcome including deterministic predefined rules, consensus procedures and statistical modelling (latent class analysis). In the final group, validate index test results, the diagnostic test accuracy paradigm is abandoned and research examines, using a number of different methods, whether the results of an index test are meaningful in practice, for example by relating index test results to relevant other clinical characteristics and future clinical events. The majority of methods try to impute, adjust or construct a reference standard in an effort to obtain the familiar diagnostic accuracy statistics, such as sensitivity and specificity. In situations that deviate only marginally from the classical diagnostic accuracy paradigm, these are valuable methods. However, in situations where an acceptable reference standard does not exist, applying the concept of clinical test validation can provide a significant methodological advance. All methods summarised in this report need further development. Some methods, such as the construction of a reference standard using panel consensus methods and validation of tests outwith the accuracy paradigm, are particularly promising but are lacking in methodological research. These methods deserve particular attention in future research.

0 comments Cited 81 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

The risk of oesophago-gastric cancer in symptomatic patients in primary care: a large case–control study using electronic records

S Stapley, T Peters, R Neal … (2012)

Background: Over 15 000 new oesophago-gastric cancers are diagnosed annually in the United Kingdom, with most being advanced disease. We identified and quantified features of this cancer in primary care. Methods: Case–control study using electronic primary-care records of the UK patients aged ⩾40 years was performed. Cases with primary oesophago-gastric cancer were matched to controls on age, sex and practice. Putative features of cancer were identified in the year before diagnosis. Odds ratios (ORs) were calculated for these features using conditional logistic regression, and positive predictive values (PPVs) were calculated. Results: A total of 7471 cases and 32 877 controls were studied. Sixteen features were independently associated with oesophago-gastric cancer (all P 5% in patients ⩾55 years was for dysphagia. In patients <55 years, all PPVs were <1%. Conclusion: Symptoms of oesophago-gastric cancer reported in secondary care were also important in primary care. The results should inform guidance and commissioning policy for upper GI endoscopy.

0 comments Cited 48 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMJ Open

Journal ID (iso-abbrev): BMJ Open

Journal ID (hwp): bmjopen

Journal ID (publisher-id): bmjopen

Title: BMJ Open

Publisher: BMJ Publishing Group (BMA House, Tavistock Square, London, WC1H 9JR )

ISSN (Electronic): 2044-6055

Publication date Collection: 2016

Publication date (Electronic): 13 May 2016

Volume: 6

Issue: 5

Electronic Location Identifier: e011664

Affiliations

[1 ]Medical School, University of Exeter, College House , Exeter, UK

[2 ]Hoyland House , Painswick, UK

Author notes

[Correspondence to ] Sarah J Price; S.J.Price@ 123456exeter.ac.uk

Article

Publisher ID: bmjopen-2016-011664

DOI: 10.1136/bmjopen-2016-011664

PMC ID: 4874123

PubMed ID: 27178981

SO-VID: 08181777-16ab-42fb-90a1-5c80b199b65c

Copyright © Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/

License:

This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: http://creativecommons.org/licenses/by/4.0/

History

Date received : 24 February 2016

Date revision received : 5 April 2016

Date accepted : 13 April 2016

Comments

Comment on this article

scite_

Cited by 48

See all cited by

Most referenced authors 131

See all reference authors

Is omission of free text records a possible source of data loss and bias in Clinical Practice Research Datalink studies? A case–control study

Read this article at

Abstract

Objectives

Design

Setting

Participants

Primary and secondary outcome measures

Results

Conclusions

Related collections

Karger: Oncology

Most cited references 12

Recent advances in the utility and use of the General Practice Research Database as an example of a UK Primary Care Data resource.

Evaluation of diagnostic tests when there is no gold standard. A review of methods.

The risk of oesophago-gastric cancer in symptomatic patients in primary care: a large case–control study using electronic records

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 214

Cited by 48

Most referenced authors 131