2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Identifying children with Cystic Fibrosis in population-scale routinely collected data in Wales: A Retrospective Review

      research-article
      1 , 3 , * , 2 , 1 , 3 , 4 , 5 , 6 , 2
      International Journal of Population Data Science
      Swansea University

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Introduction

          The challenges in identifying a cohort of people with a rare condition can be addressed by routinely collected, population-scale electronic health record (EHR) data, which provide large volumes of data at a national level. This paper describes the challenges of accurately identifying a cohort of children with Cystic Fibrosis (CF) using EHR and their validation against the UK CF Registry.

          Objectives

          To establish a proof of principle and provide insight into the merits of linked data in CF research; to identify the benefits of access to multiple data sources, in particular the UK CF Registry data, and to demonstrate the opportunity it represents as a resource for future CF research.

          Methods

          Three EHR data sources were used to identify children with CF born in Wales between 1 st January 1998 and 31 st August 2015 within the Secure Anonymised Information Linkage (SAIL) Databank. The UK CF Registry was later acquired by SAIL and linked to the EHR cohort to validate the cases and explore the reasons for misclassifications.

          Results

          We identified 352 children with CF in the three EHR data sources. This was greater than expected based on historical incidence rates in Wales. Subsequent validation using the UK CF Registry found that 257 (73%) of these were true cases. Approximately 98.7% (156/158) of individuals identified as CF cases in all three EHR data sources were confirmed as true cases; but this was only the case for 19.8% (20/101) of all those identified in just a single data source.

          Conclusion

          Identifying health conditions in EHR data can be challenging, so data quality assurance and validation is important or the merit of the research is undermined. This retrospective review identifies some of the challenges in identifying CF cases and demonstrates the benefits of linking cases across multiple data sources to improve quality.

          Related collections

          Most cited references26

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) Statement

          Routinely collected health data, obtained for administrative and clinical purposes without specific a priori research goals, are increasingly used for research. The rapid evolution and availability of these data have revealed issues not addressed by existing reporting guidelines, such as Strengthening the Reporting of Observational Studies in Epidemiology (STROBE). The REporting of studies Conducted using Observational Routinely collected health Data (RECORD) statement was created to fill these gaps. RECORD was created as an extension to the STROBE statement to address reporting items specific to observational studies using routinely collected health data. RECORD consists of a checklist of 13 items related to the title, abstract, introduction, methods, results, and discussion section of articles, and other information required for inclusion in such research reports. This document contains the checklist and explanatory and elaboration information to enhance the use of the checklist. Examples of good reporting for each RECORD checklist item are also included herein. This document, as well as the accompanying website and message board (http://www.record-statement.org), will enhance the implementation and understanding of RECORD. Through implementation of RECORD, authors, journals editors, and peer reviewers can encourage transparency of research reporting.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            The SAIL databank: linking multiple health and social care datasets

            Background Vast amounts of data are collected about patients and service users in the course of health and social care service delivery. Electronic data systems for patient records have the potential to revolutionise service delivery and research. But in order to achieve this, it is essential that the ability to link the data at the individual record level be retained whilst adhering to the principles of information governance. The SAIL (Secure Anonymised Information Linkage) databank has been established using disparate datasets, and over 500 million records from multiple health and social care service providers have been loaded to date, with further growth in progress. Methods Having established the infrastructure of the databank, the aim of this work was to develop and implement an accurate matching process to enable the assignment of a unique Anonymous Linking Field (ALF) to person-based records to make the databank ready for record-linkage research studies. An SQL-based matching algorithm (MACRAL, Matching Algorithm for Consistent Results in Anonymised Linkage) was developed for this purpose. Firstly the suitability of using a valid NHS number as the basis of a unique identifier was assessed using MACRAL. Secondly, MACRAL was applied in turn to match primary care, secondary care and social services datasets to the NHS Administrative Register (NHSAR), to assess the efficacy of this process, and the optimum matching technique. Results The validation of using the NHS number yielded specificity values > 99.8% and sensitivity values > 94.6% using probabilistic record linkage (PRL) at the 50% threshold, and error rates were < 0.2%. A range of techniques for matching datasets to the NHSAR were applied and the optimum technique resulted in sensitivity values of: 99.9% for a GP dataset from primary care, 99.3% for a PEDW dataset from secondary care and 95.2% for the PARIS database from social care. Conclusion With the infrastructure that has been put in place, the reliable matching process that has been developed enables an ALF to be consistently allocated to records in the databank. The SAIL databank represents a research-ready platform for record-linkage studies.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              The SAIL Databank: building a national architecture for e-health research and evaluation

              Background Vast quantities of electronic data are collected about patients and service users as they pass through health service and other public sector organisations, and these data present enormous potential for research and policy evaluation. The Health Information Research Unit (HIRU) aims to realise the potential of electronically-held, person-based, routinely-collected data to conduct and support health-related studies. However, there are considerable challenges that must be addressed before such data can be used for these purposes, to ensure compliance with the legislation and guidelines generally known as Information Governance. Methods A set of objectives was identified to address the challenges and establish the Secure Anonymised Information Linkage (SAIL) system in accordance with Information Governance. These were to: 1) ensure data transportation is secure; 2) operate a reliable record matching technique to enable accurate record linkage across datasets; 3) anonymise and encrypt the data to prevent re-identification of individuals; 4) apply measures to address disclosure risk in data views created for researchers; 5) ensure data access is controlled and authorised; 6) establish methods for scrutinising proposals for data utilisation and approving output; and 7) gain external verification of compliance with Information Governance. Results The SAIL databank has been established and it operates on a DB2 platform (Data Warehouse Edition on AIX) running on an IBM 'P' series Supercomputer: Blue-C. The findings of an independent internal audit were favourable and concluded that the systems in place provide adequate assurance of compliance with Information Governance. This expanding databank already holds over 500 million anonymised and encrypted individual-level records from a range of sources relevant to health and well-being. This includes national datasets covering the whole of Wales (approximately 3 million population) and local provider-level datasets, with further growth in progress. The utility of the databank is demonstrated by increasing engagement in high quality research studies. Conclusion Through the pragmatic approach that has been adopted, we have been able to address the key challenges in establishing a national databank of anonymised person-based records, so that the data are available for research and evaluation whilst meeting the requirements of Information Governance.
                Bookmark

                Author and article information

                Journal
                Int J Popul Data Sci
                Int J Popul Data Sci
                IJPDS
                International Journal of Population Data Science
                Swansea University
                2399-4908
                11 August 2020
                2020
                : 5
                : 1
                : 1346
                Affiliations
                [1 ]Swansea University Medical School, Swansea University
                [2 ]Department of Public Health and Policy, University of Liverpool, Liverpool L69 7ZX
                [3 ]Health Data Research UK
                [4 ]Administrative Data Research Wales
                [5 ]Cystic Fibrosis Trust, One Aldgate, London EC3N 1R
                [6 ]Public Health Wales, Capital Quarter 2, Tyndall Street, Cardiff. CF10 4BZ\break † Joint First Authors
                Author notes
                [*]Corresponding author: R Griffiths. rowena.griffiths@ 123456swansea.ac.uk
                Article
                5:1:24 S2399490819013399
                10.23889/ijpds.v5i1.1346
                7898022
                33644411
                a4d8d9d9-1af7-41fe-8e84-ef7b8b39abd6

                This work is licenced under a Creative Commons Attribution 4.0 International License.

                History
                Categories
                Population Data Science

                Comments

                Comment on this article