U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Turner J, Siriwardena AN, Coster J, et al. Developing new ways of measuring the quality and impact of ambulance service care: the PhOEBE mixed-methods research programme. Southampton (UK): NIHR Journals Library; 2019 Apr. (Programme Grants for Applied Research, No. 7.3.)

Cover of Developing new ways of measuring the quality and impact of ambulance service care: the PhOEBE mixed-methods research programme

Developing new ways of measuring the quality and impact of ambulance service care: the PhOEBE mixed-methods research programme.

Show details

Creating a linked data set

Introduction

Health-care information relating to a single person is often held by different services and is usually unconnected. In addition, different systems may be event rather than person based, so the same person can have multiple and different unconnected event records. The purpose of linking data is to match data from different information systems, bringing them together to create a single record of events for an individual person.

Having access to linked health data provides a real advantage for assessing health-care quality and performance. Patient pathways often involve multiple service providers or service contacts. If we base our assessments of how good care is on information from a single health provider or service, this provides only a ‘partial view’ of quality and performance and does not capture the range of services or complex care pathways available in today’s health care.32

This is important for the ambulance service as, although they are a key service providing immediate help to people with an emergency or urgent health-care problem, very often this is a relatively short component, being only the first step in a longer set of contacts with different parts of the health-care system. In most cases, the impact of ambulance service care may not be obvious until further along the episode of care and this is particularly true of patient outcome information.

The availability of linked patient information enables important outcomes to be measured. In addition, making better use of the routine information collected along an episode or pathway of care for a population of patients, such as those who call 999, means that it becomes possible to monitor and compare processes and outcomes of care over time. Although linking information from different parts of the health service into a single patient record might seem obvious in the digital age, this is still not routinely available.

There have been previous attempts to link ambulance service data with ED data in the UK33 and Australia.34 In both of these previous studies, data linkage was achieved but only after problems with data quality, finding suitable patient identifiers (IDs) and developing statistical matching processes had been overcome. Within the PhOEBE programme, our aim for workstream 2 was to revisit this problem and attempt to create a data set that linked routinely collected health service and national mortality information for individuals who used the 999 emergency ambulance service. The objectives were to:

  1. develop data linkage processes that are acceptable to patients, data processors and data controllers, and comply with information legislation
  2. obtain the necessary research and data approvals
  3. link routinely collected ambulance service information about the 999 call and the clinical care given to patients with routinely collected hospital information and national mortality information, using a third-party data processor (NHS Digital)
  4. create a new information source that provides a single record of the emergency care pathway for each patient contacting the 999 ambulance service.

Types and sources of information included in the linked data

We used five different types of information from three sources to create the linked data set. A full list of data and variables included in the linked data set can be found in a supplementary file at www.sheffield.ac.uk/scharr/sections/hsr/mcru/phoebe/reports.25

Ambulance information

We used two different types of ambulance service data.

  1. Computer-aided dispatch (CAD) data: this is the information that is recorded in the ambulance control room for every 999 call they receive. It contains items relating to call management and triage (e.g. the assessment of what the health problem is and how urgent it is). Items include timings (e.g. call received, ambulance sent, arrival on scene), location, reason for the call, urgency category, resources sent, disposition and patient demographic information.
  2. Electronic patient report form (ePRF): a comprehensive record of clinical care provided to patients at the incident scene for those patients who are sent an ambulance response. It includes descriptions of condition, results of assessment and any treatment provided and is recorded directly to a hand-held computer.

Hospital information

We used two types of information on hospital events relating to ED care and hospital admission. The source of this information was HES data. This is a centrally managed data warehouse containing details of all admissions to NHS hospitals in England. HES information is stored as a large collection of separate records, one for each period of care, in a secure data warehouse. Each HES record contains a wide range of information about an individual patient admitted to a NHS hospital. For example:

  • clinical information about diagnoses and operations
  • information about the patient, such as age group, sex and ethnic category
  • administrative information, such as time waited, date of admission and discharge destination
  • geographical information on where the patient was treated and the area in which they lived.

Within HES data, we obtained information from two subsets of data.

  1. HES A&E: these are individual records for all ED attendances occurring in England and contains information on patient details, dates and times, health problem or condition, investigations and treatments.
  2. HES admitted patient care (APC): these are individual records for all patients admitted to hospital in England and contain information on diagnosis, treatment, length of stay, ward or facility type and medical specialty (e.g. cardiology, orthopaedics).

The advantage of national HES data is that all episodes of care for an individual patient are recorded. This means that we could identify not only any hospital care associated with the initial 999 call but also any subsequent related ED attendances or hospital admissions within a defined period of time.

National mortality data

Mortality data were obtained from the ONS.35 The ONS collects information on the date and cause of death from the death certificate when the death of an individual is registered. The death certificate also records a list of other conditions or diseases that the patient had at the time of death.

Some people die in hospital and this is recorded in the HES data. However, other people die outside hospital. Adding ONS mortality information to our linked data provides a better and more accurate picture as it gives more detailed information on the cause of death and allows us to identify this important outcome for people who may have died without being admitted to hospital or after they have been discharged from hospital.

Study services and planned data collection periods

Two ambulance services in England took part in the programme: East Midlands Ambulance Service NHS Trust (EMAS) and Yorkshire Ambulance Service NHS Trust (YAS). When the PhOEBE programme started in 2011, the use of the ePRF was not widespread among ambulance services. EMAS had ePRF coverage for > 80% of the population who it served and YAS for one small operational area (about 15% of the service population). In our original plan we intended to create two linked data sets for each service (four data sets in total) for two time periods in year 2: July –December 2012 and January–June 2013. Delays in obtaining the right permissions meant that this moved by 6 months so the final data linkage periods were January–June 2013 and July–December 2013. Two separate time periods were used as the intention was to use the first data sets to construct the performance and quality measures in workstream 3 and the second sets to test the measures in workstream 4.

Data permissions

Linking different sources of health data together must be done in a way which is ethical and secure, acceptable to patients and service users and meets the requirements of the Data Protection Act36,37 and other relevant legislation. A central principle of data-linking studies is that individual concerns about the use of personal information must be balanced against the research benefits for the general population, so measures to manage risk and safeguard personal health information must be in place.38 We therefore had to obtain a number of relevant permissions and put in place the required information governance and secure data management processes before we could request and obtain our linked data. Patient identifiable data were needed to enable the processes that linked ambulance service, hospital and mortality data. However, no identifiable data were transferred or processed outside the NHS and, therefore, no patient identifiable data were retained in the final linked data set we used for our research.

We obtained the following permissions:

  • NHS research ethics approval – as elements of the process required patient identifiable data, approval was sought and gained on 12 July 2012 from the NHS health research authority through the National Research Ethics Service Committee East Midlands – Derby (Research Ethics Committee reference 12/EM/0251, Integrated Research Application System project number 84751).
  • Confidentiality Advice Group – approval is required from this group (previously the National Information Governance Board) where research studies wish to use personal-identifiable patient information without consent, for purposes other than the direct care of patients and where it is not possible to use anonymised data or to seek patient consent. In this project, seeking individual consent was not feasible as the number of patients was very large (the individual ambulance services respond to > 400,000 999 calls per year) and we anticipated that some patients would have died. Approval was confirmed on 17 August 2012.
  • NHS Digital Data Access Request Service – permission is required from the NHS Digital Data Access Advisory Group to process and receive data. As part of this approval process, the legal basis for accessing the data, information governance and security arrangements, including data storage systems, and whether or not the project has a purpose beneficial to the health system is assessed.

The process of applying for data permissions and approvals proved to be very challenging and time-consuming. When we started this process, the HES data were held by NHS Digital. This organisation was also engaged to provide the data linkage service. We initially obtained the necessary (at that time) approvals in 2012 and began the process for obtaining the first set of ambulance data for linkage in January 2013. However, shortly after this a number of serious internal problems at HSCIC meant that there was a major reorganisation into what is now NHS Digital. During this period no data were released from NHS Digital. The reorganisation also meant that new approvals processes were put in place and the data permissions process had to begin again. This was completed in May 2015 and it was then a further 4 months before we received any NHS Digital data. Additional work and data approvals were then required owing to poor match rates for some patient groups, meaning it was not until October 2016 that we received the first adequate data set required for the workstream 3 work. Figure 5 provides a summary of the timelines and processes for the data linkage work.

FIGURE 5. Timetable of data permissions and processes for obtaining linked data.

FIGURE 5

Timetable of data permissions and processes for obtaining linked data. CAG, Confidentiality Advice Group; DoB, date of birth; DSA, data sharing agreement; IG, Information Governance; NIGB, National Information Governance Board; R&D, research and (more...)

Creating the linked data sets

To create the linked data that we needed for the programme, a number of steps were needed to bring the different types of information together. Within NHS Digital, processes are already in place to link HES records and ONS mortality data. The new task for this project was to link ambulance service electronic records with these subsequent health records.

The first step was to retrieve the relevant information from ambulance service CAD and ePRF records. The starting point was all 999 calls received in the relevant time frame. Some calls were excluded at this point such as attendances with no ePRF, interhospital transfers, calls passed to other ambulance services and duplicate calls for the same incident. The exception was ‘hear and treat’ calls, defined as those calls that received input from a clinician (nurse or paramedic) but which have no ePRF record as no ambulance is sent.

The following stepwise process was then followed:

  • Yorkshire Ambulance Service and EMAS selected and extracted the study data sample, based on all included ambulance service contacts within the specified time period.
  • The study ambulance services linked the CAD and ePRF data (except ‘hear and treat’ calls) for all selected ambulance service contacts and produced a linked data set in Microsoft Excel® (Microsoft Corporation, Redmond, WA, USA). These data contain a large number of variables recording details of the patient, call processes, response provided, clinical assessment and treatment.
  • The ambulance services assigned a unique ID code to each individual patient record.
  • The ambulance services created a version of the data set that contained only the clinical data from the ePRF, non-identifiable emergency call and dispatch information from CAD and the unique ID number. This anonymised file, in the form of a password-protected Excel spreadsheet, was sent via secure encrypted e-mail to the research team at the University of Sheffield.
  • The ambulance services created a second version of the data set that contained only the variables required for data linking including patient identifiable data. These included, for example, date, time and location of incident, patient name, date of birth, address, hospital attended, the unique ID number, and (when available) NHS number. For cases for which there was no NHS number available, these were traced by NHS Digital. This data set was sent to NHS Digital as a password-protected Excel spreadsheet via NHS Digital’s secure electronic file transfer system.

The next step was to link the ambulance service data with HES and ONS mortality data. This was undertaken by NHS Digital using its data-linking algorithm. This was a deterministic linkage of NHS number, sex, date of birth and postcode using a series of progressive steps39 to match the same information in one data set with that in another. When the NHS number was unavailable, we used NHS Digital’s NHS number-tracing service to look up NHS numbers using date of birth and patient name. NHS digital linked ambulance data with a large number of variables from the HES A&E, HES patient admission and ONS death records so we could identify all patients who subsequently attended an ED, were admitted to hospital or died. The unique patient ID provided by the ambulance service was retained in this linked data set. After all possible records were linked, NHS Digital removed identifiable data and, when necessary, replaced it into a pseudonymised variable, for example date of birth was transformed into age. The de-identified data were returned to the research team using the same secure transfer processes.

The final step was for the research team to re-link the clinical and CAD data provided by the ambulance services with the HES and ONS data provided by NHS Digital, using the unique ID number contained in each data set to produce our final linked data set. Figure 6 shows the data flow processes used for workstream 2.

FIGURE 6. Data flow processes.

FIGURE 6

Data flow processes.

Because of the delays in obtaining linked data, we were unable to obtain the intended four complete data sets in our original plan. The first best-quality data received was that created for EMAS data for the period January–June 2013 in October 2016. We did subsequently obtain linked data for YAS for the same period and also the linked data for both EMAS and YAS for the second period of July–December 2013. However, given the time needed to then process these data sets into formats needed for the programme research, we were unable to use them within the time available. These data will be available for further research but the description below of data processing and the number of cases included in the linked data used in this programme was confined to the first EMAS data set we were able to fully utilise.

Processing the linked data

Data were housed on a secure virtual machine and read for processing into R (The R Foundation for Statistical Computing, Vienna, Austria), which is an open source programming language and data management software programme for statistical computing. The processing involved data cleaning and standardisation to create variables required for the study, for example calculating time intervals.

A full list of the variables included in the data sets from each information source, a detailed description of the processes for requesting and returning data, the technical specification of the linkage algorithm and a description of how each data package was created are provided in the supplementary file at www.sheffield.ac.uk/scharr/sections/hsr/mcru/phoebe/reports.25

Data complexity

Ambulance service data are complex and services hold data about patients in multiple data sets. For example, call data are stored within CAD, clinical patient data are stored in ePRF and process data about resources sent to incidents are stored in a separate resources data set. We also obtained another data file containing lower super output area to provide information about geographical area and deprivation index for each incident. This was used to calculate variables such as rural and urban incidents. The process of linking the ambulance data sets together was very complex. It is possible for multiple vehicles to be sent to the same incident. The first attending resource may not be the resource that takes the patient to hospital; therefore, calculating time on scene or total prehospital time involves multiple rows of data in multiple data sets. There can also be more than one patient for the same incident, meaning that one row of CAD data are linked with multiple ePRF files. Added to this complexity is that patients may re-contact the ambulance service many times within the 6-month data sample. For some of the PhOEBE programme indicators we were required to link together all ambulance contacts for the same person within a specific time period (e.g. 3 or 7 days).

Analytical decisions

Calls can be analysed for individual patients or individual calls. We decided to count decisions, not patients, because we were interested in the performance of the ambulance service at each point of contact, rather than for each individual; multiple 999 calls still present multiple opportunities for a service to make an appropriate/inappropriate decision on each occasion, even if the calls all relate to the same patient. Counting decisions allows us to recognise this in a way that simply counting the overall experience of each patient (once) would not [i.e. if the same patient phoned three times in 3 days, was left at scene each time and then died (still within 3 days of the first call), that would be three care decisions even though they relate to a single patient].

Additional linkages

Poor initial matching results for one service and for specific types of patients within another service meant that investigation of the data quality for the linkages was required. Ambulance services were subsequently able to provide better-quality linking information by either accessing alternative data sources or obtaining missing patient data from previous contacts and attaching it to the study data.

Cases included in the final data set

In our first complete EMAS data set, 83% (154,927/187,426) of patients in the sample were successfully traced and their records linked. Unsuccessful traces were due to missing or incomplete patient ID data from the CAD or ePRF records, so these could not be linked to subsequent health-care information or deaths. However, subsequent re-contacts with the ambulance service were identifiable using a unique HES ID generated for each patient. Figure 7 shows the numbers of cases and proportions of cases included and traced and Table 4 shows the numbers and proportions of calls traced at each tracing step.

FIGURE 7. Numbers of cases traced and included in the linked data set.

FIGURE 7

Numbers of cases traced and included in the linked data set.

TABLE 4

TABLE 4

Traced call numbers as a proportion of total call volume

Linkage of data for patients with an ePRF (‘see and treat’ and ‘see and convey’) was high, leading to a high overall (> 82%) match rate for the PhOEBE programme data sample (all calls attended with an ePRF and ‘hear and treat’). However, data-linking success for different patient groups was variable because of differences in the quality of data recorded for different types of patients. In particular, linkage rates for ‘hear and treat’ patients were very low. This is because at the time that the data were recorded, it was not standard practice for date of birth to be recorded on the CAD system. Date of birth was a key part of the NHS Digital data-linking algorithm and without this information the algorithm produced a non-match. As the CAD data system was the only data source available for ‘hear and treat’ patients, this resulted in an initial match rate of zero. A final linkage rate of 23.7% was achieved through searching subsequent and previous ambulance attendance data for additional linking information for ‘hear and treat’ patients. This potentially introduced a bias into the sample as the ‘hear and treat’ patients who were matched within our sample were those that had previously contacted and been seen by the ambulance service. These patients were more likely to be sicker than the ‘hear and treat’ patients in our sample where linkage was not possible. We assessed whether or not there were differences between patient characteristics for those with linked or unlinked data and found little evidence of differences for those discharged at scene or conveyed to ED. We did, however, find differences for ‘hear and treat’ patients; for example, linked ‘hear and treat’ patients were older than patients with non-linked data. This was most likely because older people were more likely to have had other contacts with the ambulance service.

Summary

We were able to develop data linkage processes that were acceptable to patients and the public and to data controllers, met with data legislation and were technically possible.

Although it was technically possible to link the data within the context of a research project, the complexity and time-consuming nature of data approvals, obtaining linked data and processing that data means that this is not feasible for individual ambulance services to undertake this routinely at present.

We found the following:

  1. For cases involving patient contact with ambulance staff it was possible to link ambulance, hospital and mortality data for > 85% of ambulance calls.
  2. Much lower rates of linkage were possible with ‘hear and treat’ calls, resulting in a potentially biased sample. This also made it more difficult to accurately establish consequent events such as re-contacts with other parts of the urgent-care system.
  3. Recording date of birth was essential for linking data sets and ambulance services could improve this in future for data processing.
  4. We were able to define the steps and processes required to link ambulance, hospital and mortality data for future research studies, assuming that the regulatory requirements remain unchanged. Future data linkage for evaluation could be achieved more efficiently through data-sharing agreements between ambulance services and hospitals with linkage performed at an NHS organisation if there were sufficient resources and expertise to do this.

The completion of the data linkage work allowed us to proceed to the next activity: exploring the use of case mix and building statistical models to measure the six indicators identified in workstream 1.

Copyright © Queen’s Printer and Controller of HMSO 2019. This work was produced by Turner et al. under the terms of a commissioning contract issued by the Secretary of State for Health and Social Care. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.
Bookshelf ID: NBK540548

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (2.4M)

Other titles in this collection

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...