Feedback of aggregate patient-reported outcome measures and performance data: reviewing contexts

Joanne Greenhalgh; Sonia Dalkin; Kate Gooding; Elizabeth Gibbons; Judy Wright; David Meads; Nick Black; Jose Maria Valderas; Ray Pawson

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Greenhalgh J, Dalkin S, Gooding K, et al. Functionality and feedback: a realist synthesis of the collation, interpretation and utilisation of patient-reported outcome measures data to improve patient care. Southampton (UK): NIHR Journals Library; 2017 Jan. (Health Services and Delivery Research, No. 5.2.)

Cover of Functionality and feedback: a realist synthesis of the collation, interpretation and utilisation of patient-reported outcome measures data to improve patient care

Functionality and feedback: a realist synthesis of the collation, interpretation and utilisation of patient-reported outcome measures data to improve patient care.

Show details

Contents

< Prev Next >

Chapter 5Feedback of aggregate patient-reported outcome measures and performance data: reviewing contexts

The previous chapter considered the mechanisms through which the feedback and public reporting of performance data might work. In this chapter, we review some of the circumstances or contexts that shape which of these mechanisms are triggered and thus how the feedback and public reporting of PROMs and other performance data may (or may not) improve patient care. It is important to note here that we are not simply analysing single contextual constraints. Programmes never operate in isolation, and the feedback of performance data has been inserted into complex health systems in which a range of concomitant innovations, policy initiatives and management directives also operate, which may sharpen or blunt the intended impact of PROMs and performance data feedback. Figure 15 illustrates how an intended outcome of a programme will often distort in further contexts subject to further policy measures. Therefore, it is more appropriate to consider these as contextual configurations.

FIGURE 15

Contextual complexity and outcomes.

In this chapter, we explore how contextual configurations may trigger some of the intended and unintended consequences of feedback and public reporting. To recap, the intended consequence of public reporting is that clinicians take steps to improve the quality of patient care. The unintended consequences of feedback and public reporting of performance are that clinicians may:

dismiss or ignore the data
engage in ‘effort substitution or tunnel vision’ (i.e. focusing on the areas of care measured by the performance data to the detriment of other important, but unmeasured, areas of care)
engage in ‘gaming’ the data (e.g. the manipulation of data to give the impression of change without any real change in the underlying performance).

The contextual configurations that may influence how providers respond to the feedback and public report of performance data include:

whether any rewards or sanctions were attached to performance
the perceived credibility and validity of performance data
the ‘action-ability’ of performance data.

Theory 7: financial incentives and sanctions influence providers’ responses to the public reporting of performance data

Both benchmarking and public disclosure theories hypothesise that the power relationships and relative status of the organisations producing and acting on the performance data can influence the response of organisations whose performance is being assessed. This power may be exerted in a number of ways, for example through increased scrutiny, varying the relative freedom offered to organisations, and financial incentives and sanctions. A number of systematic reviews have highlighted the variable impact of financial incentives on professional behaviour.²¹³^,²¹⁴ Pawson’s programme theory of public disclosure programmes also recognises that publicising performance data rarely works in isolation from other sanctions. His realist synthesis²¹ of public disclosure interventions across a range of different contexts found that public disclosure is more likely to achieve its intended outcomes when it is targeted at aspirational elites and can be dovetailed with existing market sanctions.²¹ It is evident that previous and existing public reporting programmes are often accompanied by a range of incentives and sanctions. For example, in England, under the ‘hospital star ratings’ system, trusts achieving a three-star rating were granted ‘earned autonomy’ in the form of less frequent monitoring and inspections from the CHI, retention of profits from the sale of hospital land to reinvest in services and the right to become a foundation trust. Their ratings also determined the level of discretion chief executives had to make use of the ‘NHS Performance Fund’ to incentivise QI at a local level. Trusts with a zero-star rating were required to produce a ‘Performance Action Plan’ indicating the steps taken to improve care, which had to be agreed with the Modernisation Agency and the trust’s Department of Health regional office.

Currently, PROMs data may form part of the indicators used to reward providers as part of the CQUIN system. From 2014–15, PROMs have been included in the BPT for hip and knee replacement. Providers would qualify for the BPT if they met the following criteria:

do not have an average health gain significantly below the national average (99.8% significance), and adhere to the following data submission standards:
- have a minimum PROMs pre-operative participation rate of 50%
- have a minimum NJR compliance rate of 75%
- have a NJR unknown consent rate of < 25%.

For 2015–16, NHS England proposed that the threshold for the NJR compliance rate be increased to 85%. Between 2004 and 2013, the QOF rewarded GPs for using standardised depression measures for screening and then reassessing patients suspected of having depression.

In this section we test the theory that attaching financial incentives to the public reporting of quality may accelerative and amplify the impact of public reporting and feedback of performance data on improvements to patient care. We also consider the theory that they may have a detrimental impact on aspects of care that are not incentivised or lead to the gaming or manipulation of data. We begin by considering a collection of quantitative studies that have compared the impact of public reporting alone with the impact of public reporting and financial incentives on the quality of patient care.

Theory 7a: financial incentives accelerate and amplify the impact of public reporting and feedback of performance data on improvements to patient care

Lindenauer et al.²¹⁵

This study makes use of a ‘natural experiment’, in which changes in hospital performance measured by quality indicators for hospitals that voluntarily participated in public reporting of performance scheme are compared with the performance of a subset of these hospitals who also voluntarily participated in a pay for performance scheme in the USA. The public report scheme, the Hospital Quality Alliance (HQA), was initiated in 2002, with all acute hospitals in the USA invited to participate and incentivised to do so by linking participation with the annual Medicare payment update; 98% of hospitals participated in the scheme. They were expected to collect and report on a minimum of 10 quality measures across three conditions: heart failure, myocardial infarction and pneumonia.

The pay-for-performance scheme in the Centers for Medicare and Medicaid Services (CMS)-Premier Hospital Quality Incentive Demonstration (HQID) was initiated in 2003. A total of 421 hospitals who subscribed to a quality benchmarking database were also invited to participate; 266 agreed but 11 later withdrew, leaving 255 hospitals. As part of this programme, hospitals were expected to collect and publicly report on 33 quality measures for five clinical conditions (heart failure, myocardial infarction, pneumonia, coronary bypass grafting, and hip and knee replacement), which also included the 10 indicators as part of the HQA programme. In addition, for each clinical condition, hospitals in the top decile on a composite measure of quality for a given year received a 2% bonus payment, while hospitals in the second decile received a 1% bonus payment. Hospitals that, at the end of the third year of the programme, had failed to exceed the baseline performance of hospitals in the lowest two deciles incurred financial penalties of between 1% and 2%.

The authors included hospitals if they submitted data on a minimum of 30 cases for a single condition annually as part of the HQA programme. They matched the 255 hospitals that participated in the HQID programme with at least one hospital that participated in the HQA programme alone on the basis of number of beds, teaching status, region, location and ownership status. They matched 199 HQID hospitals each with two HQA hospitals and eight with only one; thus, a total of 207 HQID and 406 HQA only hospitals were included. They compared the change in adherence to performance on 10 indicators shared by both programmes over eight quarters for each hospital between 2003 and 2005, and also calculated change in adherence to two compound measures for each of the three conditions (myocardial infarction, heart failure and pneumonia). They also recalculated the differences adjusting for confounding variables using a linear regression model.

The authors found that pay-for-performance hospitals showed statistically significantly greater improvement for 7 out of the 10 individual measures and in all of the composite measure scores. When differences in baseline performance and other confounding variables were taken in to account, the incremental effect of financial incentives decreased from 4.3% to 2.6% for the composite measure for myocardial infarction, from 5.2% to 4.1% for heart failure and from 4.1% to 3.1% for pneumonia; all differences remained statistically significant. The authors concluded that this suggests that the financial incentives have a modest effect on ‘catalysing quality improvement efforts among hospitals already engaged in public reporting’. However, some caution must be exercised in interpreting these results. Those who participated in the HQID programme were likely to be enthusiasts, and it is likely that they were the better performers at baseline and, thus, more likely to continue to improve their performance. It is unlikely that matching or statistical control of confounders accounted for all the possible differences between the HQA and HQID hospitals. Furthermore, the study assessed the HQID hospitals performance on only 10 indicators but they publicly reported on 33. However, in terms of the theories under test, this study provides some evidence that financial incentives and sanctions can accelerate or amplify provider responses to public reporting programmes.

Friedberg et al.²¹⁶

This study examined doctors’ groups’ responses to and use of the publicly reported patient experience data, and compared the characteristics of groups that had different levels of ‘engagement’ or use of the data. The data were collected by the Massachusetts Health Quality Partners (MHQP) collaborative, which had been publicly reporting patient experience data since 2006. A total of 117 doctor groups were invited to participate in a 30-minute semistructured interview and 72 (62%) of group leaders responded. The interviews explored group leaders’ use of patient experience reports, what sort of improvement activities had been initiated as a result and data on the characteristics of the group (e.g. group size, organisational model, employment of doctors and exposure to financial incentives).

The initial step of their analysis identified three different levels of doctor group engagement with patient experience data: level 1 groups did not recall receiving patient experience surveys and did not use them, apart from distributing the reports to their staff (17% of their sample); level 2 groups took one or more actions to improve quality, but these were largely directed at doctors or sites that were low performers (22%); and level 3 groups reported one or more group-wide initiatives to improve patient experience, which included most or all staff or sites in the group. They found that level 3 groups were statistically significantly more likely to be integrated medical groups, to employ their own doctors, to be network affiliated and to be exposed to financial incentives based on measures of clinical quality. The authors concluded that their findings indicated that the improvement strategies require a ‘managerial infrastructure capable of starting and directing improvement activities’ improved by ‘payment incentives based on patient experience’. Some caution must be exercised in interpreting these results, as the authors’ findings were based on self-report; respondents may have over-reported their QI activities; and there may have been other, unexplored factors that may have explained differences in the level of engagement in doctors’ groups. Nonetheless, in terms of our theory under test, the results provide some evidence to support the idea that financial incentives can increase the likelihood that providers will initiate QI activities in response to the public reporting of performance.

Alexander et al.²¹⁷

This study examined the impact of public reporting and financial incentives on the extent to which small and medium-sized doctor practices in the USA engage in ‘care management practices’ (CMPs). The authors define CMPs as ‘organized processes implemented by doctor groups to systematically improve the quality of care for patients’.²¹⁷ These include the use of patient registers, electronic medical records, doctor performance feedback and provider education. As such, the study does not examine the impact of public reporting and incentives on the quality of patient care per se, but on processes that are thought to lead to improvements in the quality of patient care.

Their analysis is based on survey data collected as part of the national study of small and medium-sized physician practices; these are practices with < 20 practising doctors. This survey focused on the 14 ‘communities who were in receipt of support from the Aligning Forces for Quality programme’. Practices that were part of the Aligning Forces for Quality programme were provided with both grants and support from people with expertise in QI to support them to both measure and publicly report on the quality of care, and to take steps to improve care and involve patients in this process. The questionnaire was sent to a stratified random sample of the Aligning Forces for Quality practices (n = 1793), of whom 67% responded (n = 1201). The authors of this paper focused on a subsample of 643 practices who were engaged in either private or public reporting of quality. They used the Physician Organisation of Care Management Index (PCOMI) as an indicator of the level of CMP use. The PCOMI is a summary measure (ranging from 0 to 24) of whether the practice uses reminders for preventative care, uses doctor feedback, has a disease register, has clinical practice guidelines and employs non-doctor staff educators in the care of people with four chronic conditions (heart failure, depression, asthma and diabetes). To explain variation in the PCOMI, the authors constructed binary indicators of (1) whether the quality performance of the practice was publicly reported; (2) whether the practice received a financial reward on the basis of its quality performance during the past year; and (3) whether the practice was aware of quality reports.

The authors found that, controlling for patient and practice characteristics, practices that received financial rewards engaged in a statistically significantly higher number of CMPs, with a PCOMI score 7 points higher than those practices that were not in receipt of financial rewards. They also found that practices that discussed quality reports at their physician meetings also engaged in a statistically significantly higher number of CMP practices, with a PCOMI score 17 points higher than those of practices who did not discuss quality reports. Practices that were subject to public reporting did engage in a higher number of CMP processes than practices that were not subject to public reporting, with PCOMI scores 8 points higher than those of practices not subject to public reporting; however, this difference was not statistically significant. Finally, they found that practices who were subject to both public reporting and financial rewards engaged in a higher number of CMPs, with PCOMI scores 10 points higher than those of practices subject to only public reporting or financial rewards. The authors argued that their findings demonstrate there is a ‘significant joint effect of having both PR [public reporting] and financial incentives above and beyond just having one of them’.²¹⁷ In their discussion, they acknowledged that their findings may indicate that doctors who participate in the public reporting of quality require financial rewards in order to invest the time required to change clinical care, or that those practices which have CMPs already in place are more likely to produce quality outcomes worthy of financial reward. Their study did not allow for an exploration of doctors’ motives.

This was a cross-sectional study and, therefore, it is not possible to infer a causal relationship between financial incentives, public report and CMPs. We can also question the validity of the PCOMI as a measure of provider engagement in CMP practices. However, in terms of the theory under test, this study provides a further layer of evidence supporting the idea that financial incentives may amplify the impact of public reporting on the quality of patient care. The study also suggests that whether or not practices were aware of and discussed the public reports of quality was more important than whether or not they were subjected to public reporting.

Doran et al.²¹⁸

This paper reported on a longitudinal analysis to compare achievement rates for 23 activities included in the English primary care QOF incentive scheme and 19 activities not included. The achievement of the QOF indicators was also publicly reported through the HSCIC website and on the NHS Choices website. As such, the study allows for an exploration of the theory that financial incentive schemes can lead to the neglect of activities that are not included in the scheme, the so-called ‘tunnel vision’ hypothesis. The authors’ analyses drew on data from the General Practice Research Database, which contains patient data on morbidity, treatment, prescribing and referral from 500 general practices and covers 7% of the UK population. A sample of 148 practices that provided data continuously throughout the study period (2000–7) was selected to reflect practices with a range of list sizes. A random sample of 4500 patients was drawn from each practice.

The 42 indicators were selected from a pool of 428 indicators identified by the research as being already established indicators or those based on clinical consensus, as expressed in national guidelines. They excluded indicators that may have been affected by significant changes in the underlying evidence base or were dropped from the QOF scheme in an attempt to rule out some of the possible effects of other changes in the policy environment on achievement of the indicators. They classified the indicators into two subtypes: those relating to measurement (e.g. blood pressure measurement) and those relating to prescribing, as previous research suggested that response to these types of indicators may be different. Thus, the indicators were classified into four groups by type and whether or not they were incentivised.

The difference between the expected achievement rate and the actual achievement rate for the indicators was analysed using multivariate regression models over four different time periods: before the introduction of the QOF scheme (2000–3); during preparation for the QOF – when practices knew the scheme would be introduced but did not yet know details of the indicators (2003–4); immediately after the introduction of the QOF (2004–5); and longer term (2005–7). The authors examined the impact of the incentive scheme on the four indicator groups separately and then compared incentivised and non-incentivised indicators for each of the two types (measurement and prescribing).

The authors found that, prior to the introduction of the QOF (2000–3), achievement increased significantly for 32 out of the 42 indicators, decreased significantly for two and did not change for eight. The authors argue that these findings suggest that quality initiatives introduced over this period such as clinical audit, and the introduction of statutory bodies focused on QI (such as NICE), had an impact of improving the quality of care in general practice. The authors found that achievement rates improved at the fastest rate prior to the introduction of the QOF (between 2000 and 2003) for those indicators that were subsequently incentivised under the QOF system. Thus, the QOF indicators focused on areas of care that previously had already shown the greatest improvements, suggesting that they reflected areas of care in which practices were already performing well and/or perceived to be important.

In the first year following the introduction of the QOF, achievement rates for incentivised indicators increased substantially for all measurement indicators, with increases above the predicted rates in 2004–5 of up to 38%. The prescribing indicators had a higher baseline rate than the measurement indicators, increased at a slower rate during the pre-intervention period and, although they also saw significant increases above the predicted rates in the first year, these increases were smaller than those found for the measurement indicators (1.2–8.3%). However, collectively, the incentivised indicators ‘reached a plateau in the second and third years of the scheme’ where ‘only 14 of the 23 incentivised indicators had achievement rates significantly higher than rates projected from pre-intervention trends after three years’.

For the non-incentivised indicators, achievements rates immediately following the QOF improved in line with achievement rates projected from pre-intervention rates. However, in the second and third years, the rate of QI slowed down relative to expected achievement rates. By 2006–7 the authors found that quality was significantly worse than expected from pre-intervention trends, especially for measurement activities. Improvement rates were also significantly lower, relative to projected rates, than achievement rates for the incentivised indicators.

This study did have a number of limitations, acknowledged by the authors, that may have affected its conclusions. Achievement rates may have been affected by changes in the consistency and accuracy of recording indicators over time, especially for incentivised indicators. They may also have been influenced by changes to the case mix of patients subject to the indicators, especially as the incentivised indicators also encouraged increased case finding (e.g. for depression). The practices selected may not have been representative of the population, although the trends found in this study replicated those seen nationally. The authors focused on a limited selection of indicators, which may limit the generalisability of their findings, but at the same time increased the likelihood of attributing differences in achievement rates to the incentivised/non-incentivised status of the indicators, rather than due to other contextual changes. As such, we can be reasonably convinced that this study provides a useful test of the theory that financial incentives may lead practitioners to focus on incentivised aspects of care, to the detriment of non-incentivised aspects.

This study found that although financial incentives increased the quality of care in the short term, in the longer term their impact petered out. The authors suggest three possible explanations for their findings: (1) that the improvements seen in the first year were due to better recording procedures; (2) that practices reached a ‘ceiling’ limit of quality in the third year, with little opportunity for further improvement; or (3) that practices took their foot off the accelerator for these incentivised activities because they had already reached the threshold at which they would receive the maximum amount of remuneration, and further effort would not result in increased income. Of particular interest to our synthesis is the finding that there was no detrimental impact on aspects of care that were not incentivised in the short term, but in the longer term the scheme had ‘some detrimental effects’ on certain areas of care, particularly measurement activities. In terms of the theory under test, this provides some support for the idea that financial incentives can lead to providers focusing on aspects of care that are incentivised at the expense of other areas of care.

Theory 7a summary

The studies reviewed in this section provide some evidence that financial incentives, together with public reporting, have a greater impact on improvements to the quality of patient care than either initiative alone. However, they also suggest that this impact may occur only in the short term, and that in the long term the impact of financial incentives may reduce, especially if providers reach the threshold at which they would receive the maximum amount of remuneration. Furthermore, they also provide some evidence to suggest that financial incentives may also lead to a ‘tunnel vision’ effect, whereby providers focus on incentivised aspects of care at the expense of non-incentivised aspects. The cross-sectional nature of most of the studies reviewed means that inferences of a causal link between public reporting combined with financial incentives and improvements in patient care need to be treated with caution. Furthermore, the studies reviewed above do not provide any insights into how providers themselves have responded to public reporting when financial incentives were attached. We now consider a series of qualitative studies that have examined how providers have responded to either public or private reporting of quality both when financial incentives are attached and when they are not.

Theory 7b: providers do not make improvements to patient care when no financial incentives are attached to performance feedback

We start by reviewing studies examining responses to private and public reporting of quality when no financial incentives are attached to performance. Here, we test the theory that providers do not make improvements to patient care when no financial incentives are attached to performance feedback.

Wilkinson et al.²¹⁹

Wilkinson et al.²¹⁹ explored the views and responses of 52 staff, in 15 general practices, to cardiovascular and stroke performance indicators that were developed by the authors. The indicators largely focused on process and were similar to those later included in the National Service Frameworks for coronary heart disease and later the QOF. However, this study was undertaken before those initiatives were implemented. For half of the practices, the academic team collected the data themselves, and for all practices they fed back data to the practice in a 1-hour presentation; as such, the indicators were fed back privately, rather than publicly. During this presentation, the academic team explained how the indicators were developed, how the indicators for that practice compared with those of other practices in the study and the potential clinical benefits if full uptake of the indicators were obtained by the practice. The practice was encouraged to develop an action plan to address changes that they practice felt were necessary. Two months after the presentation, the authors interviewed a range of staff at each practice, including the GPs who led audit activities within the practice (n = 15), other GPs (n = 14), practice nurses (n = 12) and practice managers (n = 11).

The authors found that almost all of the GPs and nurses, and half of the practice managers, questioned the validity of the data used to generate the indicators because of gaps in the data, computer-related difficulties and confusion in applying Read codes. The most common response to the feedback of the indicators was to improve the number, uniformity and accuracy of recording data, with almost half of the practices attempting to do this. The most common reason cited was to ‘demonstrate to other practices within the primary care group that their own practice was providing good care’²¹⁹ and, less commonly, to prompt GPs to improve patient care. Three of the 15 practices initiated an audit to validate the data used to produce the indicators. The authors note that ‘all the professionals found the comparative nature of the results useful in interpreting their practice’s performance’.²¹⁹ In their interviews, respondents mentioned that the comparative indicators had highlighted a gaps in their own performance compared with their own perceptions, and differences between their own performance and that of their peers:

[O]ne imagines that one is doing a fantastic job, then when you actually see it in writing you think oh that’s not quite as good as you think. I am sure that this sort of presentation really winds you up to do better.
GP

It is helpful to be able to compare to local means and see whether you are doing a bit better or worse, and that perhaps is one of the strongest ways of getting GPs to alter things . . . they like to be seen to doing things a bit better than their colleagues.
GP. Reproduced from Quality in Health Care, Wilkinson EK, McColl A, Exworthy M, Roderick P, Smith H, Moore M, Gabbay J, vol. 9, pp. 166–74, © 2000, with permission from BMJ Publishing Group Ltd²¹⁹

Out of the 15 practices, 11 developed action plans for change, but these were largely in the form of ‘informal verbal agreements devised by one or two enthusiasts who were usually doctors’ and most focused on single changes. Communication of the action plans within practices was ‘ad hoc and informal’.²¹⁹ The authors identified that change was constrained by a lack of time and resources (both financial and human) to act on the information. Change was supported if the indicator represented a personal interest of one of the practice members or there was someone in the practice who had been allocated responsibility for that clinical area. Change was also supported when the indicators ‘accorded with other ongoing local and national initiatives’ which serve to increase ‘the status or relevance of the indicator results’.²¹⁹

In terms of the theories under test, this study indicates that, in line with feedback and benchmarking theories, the private feedback of their performance to GP practices highlighted gaps between their own performance and the indicators, and highlighted differences between the performance of their practice and that of other practices. This prompted the practices to reflect on their own practice, both as individuals and as an organisation. However, it also suggests that GPs had little trust in the validity of the indicators and sought to improve the accuracy of their data, or, less commonly, to investigate the reasons behind the indicator findings via audit. We could hypothesise that these activities were important to (a) improve the trust the practices had in their data and (b) provide a further basis on which to initiate any changes. However, the authors found that, beyond these activities, practices made few formal, co-ordinated attempts to change practice because they did not have the time, resources or interest in doing so. The authors argued that the ‘absence of specific incentives to change, either positive or punitive, meant that responses were purely voluntary’.²¹⁹ In terms of the theories under test, this suggests that the private feedback of data that are not trusted by their recipients, and without incentives attached to them, may prompt individuals or organisations to reflect on their practice and to improve the accuracy of the data collected, but it does not lead to any longer-term improvements to patient care.

Mannion and Goddard²²⁰^,²²¹

These papers report on provider and health board responses to the Clinical Resource and Audit Group (CRAG) indicators in Scotland. The CRAG indicators were compiled and disseminated by the then Scottish Executive. They consist of seven reports including 38 clinical indicators covering a range of specialties, and individual trusts and health boards are named in the reports. They were not part of a formal framework of performance management, and the Scottish Executive indicated that they should not be used to make definitive judgements about the quality of services.

The authors conducted case studies of eight Scottish NHS trusts that varied by size, geographical area and performance on the indicators. They focused on the impact of two specific indicators relating to 5-year survival of women with breast cancer and 30-day survival after admission for stroke. All eight trusts were ‘average’ for the breast cancer indicators; six of the eight were average for the stroke indicator and two were worse than average. In each trust, they interviewed the chief executive, medical directors, consultants with responsibility for stroke services, consultants with a responsibility for breast cancer services, nurse managers and junior doctors (n = 48). The authors also interviewed the director of public health (or deputy) at the local health board for each trust, and key staff from the Information and Statistics Division of the Scottish Executive and the CRAG secretariat, to explore the intended purpose of the indicators.

With the trusts, the authors found that the CRAG indicators were ‘rarely cited by staff as the primary driver of QI or sharing best practice between organisations’.²²⁰ The indicators were not integrated into formal clinical governance systems, but were mainly used by trusts to argue for increased resources for services. For example, for the two trusts that were ‘worse’ than average for the stroke indicators, the CRAG data were used to argue (successfully in one case) for a new stroke unit in order to improve care. These trusts also conducted further audits to check the quality of the data, and one introduced patient protocols. Two of the ‘average’ trusts also used the CRAG data to argue for a new stroke unit, while, in another, the health board declined a request for additional funding because the CRAG data were satisfactory. For three other ‘average’ trusts, the CRAG data had no discernible impact on stroke services. In response to the breast cancer indicators, in three trusts the data were used alongside other information to inform or argue for the set-up of new services.

Similarly, they found that CRAG indicators were rarely used by health boards to make definitive judgements about the quality of care in trusts. If they were used at all, they were used to highlight potential problems requiring further scrutiny, especially if the trust was identified as a significant outlier. The health board would then meet senior staff within the trust to ‘express concern and explore the problem in further detail’.²²¹

The authors identified some of the reasons for the low impact of the indicators. Both health boards and trust staff, especially consultants, questioned the quality of the data (e.g. inconsistent coding and quality of case-mix adjustment) and did not perceive them as credible. The data were also not perceived to be timely, due to the time lapse (at least 1 year) between their collection and publication. The data also appeared to be poorly disseminated to frontline trust staff; consultants and chief executives were aware of the data, but nurse managers and junior doctors were not. Similarly, within health boards, data were discussed at board level and by senior management, but were disseminated downwards only if ‘the Health Board or a specific speciality was identified as being a significant national outlier on the indicators’.²²¹ The indicators were not part of a formal system of performance assessment and, as such, there were ‘weak formal incentives for staff to perform satisfactorily on the indicators’.²²¹ Furthermore, health boards ‘did not hold staff accountable for their performance’²²² and they were not used as a basis for changing contracts to a different trust. However, some trust staff did acknowledge that the indicators could sometimes enhance their ‘professional status and reputation’.²²¹

These findings suggest that trust and health board staff did not perceive the CRAG indicators to be credible, and that the indicators were poorly disseminated within these organisations. Both good and poor performance on the indicators was used to justify requests for additional resources but performance on the indicators was rarely used as a basis to initiate improvements patient care. In terms of the theories under test, the CRAG indicators were not linked to any rewards and sanctions, and in their discussion the authors observed that ‘Many Trust and Health Board staff identified this as the key reason why the indicators effected little change in provider organizations’.²²¹ However, they also noted that ‘the introduction of explicit incentives may lead to reduced performance if this crowds out intrinsic professional motivation’.²²¹

Theory 7b summary

These studies suggest that when no financial incentives are attached to the public or private reporting of performance, and when stakeholders do not perceive data to be credible, they are rarely used as the basis to initiate improvements in patient care. The studies also suggest that, under these conditions, providers’ first response to indicators is to verify the data on which the indicators are based, to improve the quality of the data or carry out audits. These activities may serve to (a) improve trust in the data and (b) provide a further basis on which to initiate any changes. However, these studies also suggest that, without the co-ordination, resources or incentives to do so, these preliminary investigations may not lead to longer-lasting change, and quality reports are largely ignored.

Theory 7c: financial incentives attached to the feedback of performance data can lead to ‘tunnel vision’

The studies reviewed in theory 7a suggest that financial incentives can lead to providers focusing on aspects of care that are incentivised at the expense of other areas of care, or ‘tunnel vision’. We test this theory by considering a series of studies that have examined how providers respond to performance data when a specific set of incentives and sanctions are attached to performance.

Mannion et al.¹⁸

This study examined the impact of the NHS hospital ‘star ratings’ on acute hospital trusts in England. Recall from Chapter 3 that the star ratings were a single summary score of hospital performance based on their achievement according to a range of indicators that were made publicly available. This system was also accompanied by a combination of financial rewards and sanctions. Trusts achieving a three-star rating were granted ‘earned autonomy’ in the form of less frequent monitoring and inspections for the CHI, retention of profits from the sale of hospital land to reinvest in services and the right to become a foundation trust. Their ratings also determined the level of discretion chief executives had to make use of the ‘NHS Performance Fund’ to incentivise QI at a local level. Trusts with a zero-star rating were required to produce a ‘Performance Action Plan’ indicating the steps taken to improve care, which had to be agreed with the Modernisation Agency and the trust’s Department of Health regional office. The authors used a multiple case study design with purposeful sampling of high-performing (n = 2) and low-performing (n = 4) trusts based on 2000–1 performance data. They undertook documentary analysis (CHI reports and internal governance reports) and semistructured interviews with between 8 and 12 key managers and senior clinicians in each site. As such, their interview findings reflect the views of senior, rather than frontline, staff.

Participants expressed a general view that star ratings did not adequately reflect hospital performance, in terms of either their coverage or their sensitivity to local factors that were perceived to be beyond their control. Low-performing trusts especially felt that areas of excellent practice were not taken into account in the indicators. The study also reported that the star ratings had served to align internal performance management activities with national targets and direct resources on those aspects of performance seen as important by government. Some staff in low-performing trusts reported that star ratings were useful in illuminating dysfunctional senior management that previously remained unchallenged.

However, this study also found a number of unintended consequences of the star ratings. Some trusts reported manipulating and misrepresenting the data (e.g. not accurately reporting the number of 12-hour trolley waits) and gaming the data (e.g. cancelling operations the night before rather than on the day) to improve their ratings. These findings align with the hypotheses from benchmarking and public disclosure theories, that the lower the acceptance of the data, the more likely organisations are to engage in efforts to improve the presentation or appearance of the indicator. Another unintended consequence reported in this study was a perception that public disclosure had led to ‘tunnel vision’, with trusts focusing on the issues measured to the exclusion of unmeasured but important areas. One example reported in the paper was that the waiting time target of 13 weeks in children’s services had ‘forced the trust to concentrate on children referred to it by doctors, rather than professionals, even though the clinical needs of the patients may be very similar’.¹⁸

In their discussion, the authors concluded that hospitals use public reports as ‘a lever to influence staff behaviour’ and noted that the ‘unintended and negative consequences of the star rating system came across loud and clear’.¹⁸ The authors contrasted the high profile of the English star ratings with the relatively low profile of the Scottish CRAG indicators. They hypothesised that one reason for this was the effectiveness of the dissemination strategy and the simplicity of the rating system, making it easier for both professionals and the public to understand. However, a consequence of this simplicity was that few participants in this study felt that the star ratings adequately reflected the quality of the hospital and dismissed the ratings as invalid. We note here that a further key difference was that star ratings had a system of rewards and sanctions attached to them, whereas the CRAG indicators did not. In terms of the theory under test, this study suggests that when rewards and sanctions are attached to public reporting, but providers do not accept the validity of indicators, their efforts may focus on demonstrating the appearance of a high performance, resulting in unintended consequences such as effort substitution and gaming.

Dowrick et al.⁸¹

This study aimed to examine both GPs’ and patients’ views about the introduction of the routine collection of standardised measures of depression severity, which was incentivised in the QOF. Under this framework, GPs received QOF points for administering a standardised measure of depression for patients coded as being newly diagnosed with depression, and also received points for reassessing patients using a standardised measure of depression 2 weeks later. Furthermore, GP QOF scores were made available to the public via the HSCIC website from 2005 onwards. The authors interviewed 34 GPs and 24 patients from 38 practices in three locations in England. Here, we focus specifically on GPs’ views of the validity of the standardised depression measures and how the incentives built into the QOF influenced GPs’ use of the measures in practice.

The authors found that GPs questioned the validity of standardised measures of depression, both as clinical tools to aid patient management and as aggregate measures of the prevalence of depression. For example, one GP indicated, ‘I don’t have sufficient confidence that it’s an objective enough tool, really, to measure trends’.⁸¹ They also expressed scepticism about the perceived motivation behind this QOF indicator, suggesting that its introduction was based not on an extensive consideration of the evidence that the indicator would improve care but on the hunches of a few academics and policy-makers to make extra work for GPs. As one GP remarked, ‘I have a horrible feeling that a few academics got together and said this is a good idea and someone at the Department of Health said, oh yes, this is another hoop to make GPs jump through’.⁸¹ Coding a patient as having depression but then not using a standardised depression questionnaire to assess them would result in a lower QOF score and, thus, a loss of income for practices. However, the use of standardised depression questionnaires took up time in the consultation. The authors found that this set of conditions resulted in some GPs being reluctant to code people as having depression to avoid having to use a questionnaire, thereby saving time in the consultation. As one GP explained, ‘I think we stop and pause a little bit before we actually put the depression code in. And, of course, there was a mad scramble around the Read codes to find a Read code that wouldn’t get picked up by the QOF’.⁸¹

This study suggests that GPs did not perceive the use of standardised depression measures to be valid or necessary tools in their management of patients with depression. However, if GPs did not use them, they ran the risk of losing income. Under these circumstances, GPs avoided the potential loss of income through manipulating the ways in which they coded patients suspected of having depression. In terms of the theory under test, this study suggests that if providers perceive that the quality indicators that are subject to public reporting do not reflect what they perceive as good-quality patient care but they are financially incentivised to fulfil them, they may resort to the gaming and manipulation of data to avoid both having to fulfil them and the consequent loss of income.

Mitchell et al.⁸⁰

This study also explored the impact of the QOF on the diagnosis and management of depression in primary care. The authors took a purposive sample of four GP practices and conducted one focus group in each practice. Focus group participants included GPs, practice nurses, community nurses, primary care mental health workers and practice managers. They were asked to describe how the introduction of the QOF and NICE guidelines had influenced how depression was diagnosed and managed.

The authors observed that GPs found the use of the PHQ-9 (a standardised depression rating scale) time-consuming to use during the consultation and had adapted how they administered the questionnaire to fit with their consultation style. These included letting the patient self-complete the measure in the waiting room, reading out the questions to the patient and recording the answers themselves, recalling the questions from memory during the consultation and working out the score afterwards, and telephone administration of the questionnaire. The authors noted that a number of these ‘workarounds’ may have compromised the validity of the PHQ-9. GPs perceived that the PHQ-9 did not facilitate the clinical management of the patient, as they preferred to rely on their ‘gut feeling’ to determine how depressed a patient was, which they felt was often not reflected by the PHQ-9 score. Rather, the impetus to use the PHQ-9 was ‘the potential for missed targets’.⁸⁰ The authors report that the financial incentives attached to the QOF acted as a ‘disincentive to code depression if a PHQ-9 was not completed by the patient’.⁸⁰ Instead, GPs used alternative codes, such as ‘low mood’ or ‘stress’, to avoid recording ‘mild’ symptoms as depression. As one GP explained ‘diagnoses of what would be “QOF-able” depression has probably dropped . . . we realised if we kept labelling people as depressed when they perhaps weren’t, then we weren’t going to see them again and lose the points’.⁸⁰

This GP is referring to a scenario in which a patient with mild depression or stress consults their GP only once and does not return to the practice for follow-up, perhaps because their symptoms have resolved. In this situation, if the patient was initially coded as having depression but then did not return for a follow-up consultation, they would not be able to complete a PHQ-9 questionnaire at follow-up, the practice QOF score would go down and the practice would lose income. It reflects the clinical uncertainty regarding the diagnosis of depression and how the QOF created a penalty for getting this diagnosis ‘wrong’ that did not exist before its introduction. In response, GPs avoided coding patients as having depression. In terms of the theory under test, this study suggests that attaching financial incentives to quality of care indicators may create perverse incentives unless the quality indicators contribute to patient management and allow for the clinical uncertainty inherent in the practice of medicine. In situations where perverse incentives are created, this may lead to gaming or effort substitution. It also highlights the tensions between the use of PROMs as a tool to reward good practice at an aggregate level and their use as individual patient management tools.

Theory 7 summary

This collection of studies, using a range of different methods, has provided a useful test of the theory that financial incentives may amplify the impact of public reporting on QI but may also have a detrimental impact on non-incentivised aspects of care. A number of studies suggest that greater improvements in the quality of patient care occur when providers are subjected to both financial incentives and public reporting than when they are subjected to either initiative alone.²¹⁵^–²¹⁷ Another set of studies, largely qualitative, suggest that feedback of performance indicators to providers who are subjected to neither public reporting nor financial incentives, rarely led to formal or sustained attempts to improve the quality of patient care, particularly when providers did not trust the indicators themselves.²¹⁹^–²²¹ Under these conditions, the feedback of performance data were more likely to lead to providers improving the recording and coding of data, which may be an important first step in increasing their trust in the data itself, as well as providing a basis on which further QI initiatives may occur.

However, the evidence also suggests that financial incentives have only a short-term impact on QI if they are used to incentivise activities that providers already perform well in and when providers reach the threshold at which they would receive the maximum amount of remuneration.²¹⁸ Furthermore, there is also both quantitative²¹⁸ and qualitative evidence¹⁸ to indicate that financial incentives, together with public reporting, may lead to ‘tunnel vision’ or effort substitution, that is, focusing on aspects of care that are incentivised to the detriment of care that is not, especially when providers do not feel that the indicators adequately capture quality of care. There is also evidence to suggest that when providers are subjected to both public reporting and financial incentives attached to these indicators but they do not feel the indicators are valid or contribute to patient care, this can lead to the manipulation or gaming of the data.¹⁸^,⁸⁰^,⁸¹ This is not always or necessarily the result of active attempts to ‘cheat’ the system on the part of providers. Rather, the use of financial rewards can create perverse incentives that are at odds with the inherent clinical uncertainty of conditions such as depression. Under these conditions, clinicians have to find a way to manage this clinical uncertainty at the same time as ensuring that they are not financially penalised for doing so.

Theory 8: the perceived credibility of performance data influences providers’ responses to the feedback of performance data

In Chapter 3, we highlighted a number of theories that suggested that data must be perceived as credible and must be trusted by providers if they are going to respond to them. The previous section of this report on financial incentives revealed that, unless the recipients view performance data as valid and relevant to the clinical care of patients, financial rewards attached to their feedback can create perverse incentives to meet targets at the expense of clinical care, and may lead to gaming and effort substitution. Benchmarking theories postulate that the lower an organisation’s acceptance of poor benchmarking scores and the more data can be regarded as a ‘soft indicator’, the more likely it is that it will respond by denouncing the validity of the indicator and/or improve presentation of data rather than improving performance.⁹³^,⁹⁴ If data are not perceived as valid, it is unlikely that clinicians will respond by making changes to clinical care.

One theory to explain why clinicians do not trust data lies in the methodological aspects of the indicators themselves, their coverage and validity and the process of case-mix adjustment. In the USA, a particular bone of contention is the formulation of indicators based on routinely collected administrative data gathered by insurance companies at patient discharge to bill payers, which are deemed by many clinicians to be inaccurate, versus the use of data extracted from patient notes by hospital representatives, requiring additional resources to obtain. In contrast, PROMs and patient-reported experience data are based not on clinical or administrative data but on patients’ own reports of their health and experience. However, providers may question if the subjective reports of patients can serve as reliable indicators of their health outcomes, and some have expressed concerns that patients’ ratings of their outcomes may be unduly influenced by their experiences.⁹⁷^,¹⁹⁴^,²²³ The underlying assumption of all of these claims is that it is the data and what is done to the data (i.e. case-mix adjustment) that providers object to.

Decisions about what data are collected and how they are manipulated to form indicators are made by those who design and initiate such reporting schemes. Those who initiate or mandate such public reporting of performance initiatives have a particular set of hopes and aspirations regarding the outcomes of such a scheme. An alternative, although not mutually exclusive, theory is that providers do not trust the underlying driver of the feedback and public reporting programme and question the designers’ anticipated outcomes. The studies reviewed in Chapter 4 on mechanisms revealed a tension between the idea that the goal of public reporting is to put pressure on providers to improve quality through increased competition to improve their market share versus the idea that the goal of public reporting to is improve quality through sharing data and learning from other organisations. Benchmarking theories have also drawn attention to two competing aims of benchmarking activities: competition versus collaboration.⁹¹ Wolfram-Cox et al.⁹¹ hypothesise that whether benchmarking is collaborative or competitive depends on structural factors such as the extent of interdependence between partners; the degree of geographical separation; the number of partners involved; and dynamic factors, for example who initiates the benchmarking, the primary motivation for initiating the benchmarking and the nature of the existing relationships between the organisations.

In the following section, we review a number of studies to try to unpick the relationships between, and relative importance of, the source of the data, the nature of the indicators and the perceived motivation behind the reporting of performance data. We start by considering studies that have attempted to understand the determinants of the success or failure of public reporting and feedback initiatives, with a particular focus on studies exploring how these factors contribute to the perceived credibility of these data.

Theory 8a: the perceived credibility of the performance data influences providers’ responses to performance data

Bradley et al.²²⁴

This study aimed to identify successful strategies and common difficulties in implementing data feedback initiatives in hospitals. The authors focused on exploring hospitals’ efforts to improve beta-blocker use after acute myocardial infarction and purposively selected eight hospitals from across the USA whose performance in this clinical area varied substantially. They conducted semistructured interviews with between four and seven staff at each hospital; in total, 45 participants were interviewed, comprising 14 medical staff, 15 nursing staff, 11 staff with responsibility for quality assurance and five senior administrative staff. Interview questions focused on how staff had collected and used the data and the degree to which they perceived the data had been effective in improving care. The authors’ analysis focused on identifying ‘what worked’ and what did not in collecting and implementing these data.

The authors identified seven key themes underpinning what made data feedback effective; three of these themes focused on the credibility of the data. The authors report that medical staff at every hospital felt that the data must be valid and perceived as valid by clinicians in order to have any impact on doctors’ behaviour. If data were perceived as valid, clinicians were less likely to reject or ‘argue with’ them and were more likely to respond to them. However, participants also recognised that gaining clinicians’ trust in the credibility of the data took time and required effort. Strategies used to increase the credibility of the data included nurses sitting alongside doctors to demonstrate that they could accurately abstract information from patients’ notes in order to create the feedback reports and to investigate any perceived inaccuracies with the data quickly ‘until we’re sure it’s clean’.²²⁴ Finally, participants also explained that the timeliness of the data and the ways in which they were presented were also central to their perceived credibility. In particular, participants felt that collecting ‘real-time’ data that were ‘no more than 3–6 months old’ and ensuring that they were presented by someone who was ‘clinically competent’ were important ingredients in maintaining the credibility of the data. This study was based on the views of staff in only eight hospitals across the USA; however, it provides some initial evidence to unpick what is meant by and what supported data credibility. In terms of the theory under test, these findings suggest that the credibility of the data depends on the processes through which these data are collected and presented.

Mehrotra et al.¹⁹⁵

Mehrotra et al.¹⁹⁵ conducted interviews with 17 employers and 27 hospital managers to explore their views of and responses to employer initiated report cards in 11 regions of the USA. The authors attempted to include hospital representatives who were supportive of, as well as those who were opposed to, report cards. Hospital managers were either chief executives or QI directors. The aim of the study was to explore the determinants of successful report card efforts and to understand why some report card initiatives failed.

The authors found a mix of successes and failures in terms of whether or not report cards were perceived to have stimulated QI in hospitals. In communities where report cards had been successful, interviewees felt that there had been increased attention to quality and an increase in the presence of quality directors at board of trustee meetings. They identified a number of contextual factors or system tensions that prevented the success of report cards. They found considerable ambiguity and tension between these different stakeholders concerning the purposes of report card initiatives. In some communities, hospitals were unclear what the purpose of the report card was. In others, there was tension between employers, who were perceived to have introduced report card systems to reduce costs, and clinicians, who felt that improving quality should be the primary goal of report card initiatives.

The authors also identified conflicts regarding how quality was measured, including concerns about case-mix adjustment, whether outcomes or process measures should be used, and the validity and cost of data used to produce report cards. In terms of case-mix adjustment, hospital leaders felt that it would never be possible to develop reliable methods of case-mix adjustment, while employers felt that imperfect case-mix adjustment was better than none. Hospital leaders felt that outcome measures did not enable them to identify the source of the quality problem, while employers felt that it was a hospital’s responsibility to undertake additional work to identify this. The authors found that the most ‘contentious’ issue between hospitals and employer coalitions was the data used to generate report cards. Many employer report card initiatives used administrative billing data that hospital leaders considered inadequate for QI, as they perceived these data to provide financial rather than quality information. Employers were more accepting of the use of administrative data. Hospitals preferred the use of clinical data to produce performance indicators but were frustrated that they had to pay for these data to be abstracted from patients’ notes in order to produce report cards that they did not want in the first place. Finally, the authors found that the degree to which hospitals were involved in report card design and modification influenced their acceptance of the data.

The authors concluded that they could find no consistent set of report card characteristics that predicted which report cards were successful in initiating QI activities by hospitals, apart from the finding that successful report cards did not use administrative or billing data. They hypothesise two possible explanations for this finding: (1) that hospitals ignored the report cards because they dismissed administrative data as inaccurate, or (2) that hospitals were not involved in the design of such cards and, as such, felt little ownership over the purpose of the scheme.

Boyce et al.⁵⁸

This study explored Irish surgeons’ experiences of receiving peer-benchmarked feedback, replicating the same measures used as part of the UK national PROMs programme for hip surgery. However, unlike in the UK national PROMs programme, the feedback provided to surgeons in this study was at the individual surgeon level, rather than at the provider level. The feedback was not being implemented routinely and was not publicly reported, but was a ‘one-off’ private feedback intervention. The format of the feedback was also different from that provided by the national PROMs programme; rather than receiving a funnel plot, surgeons received a ‘caterpillar plot’ that graphically presented the average health gain on the OHS plus 95% confidence intervals for all surgeons (anonymised), with their own score highlighted. In this way, the surgeons were able to see how their own score compared with that of others.

The paper reports on a qualitative research study that was nestled within a larger RCT of PROMs feedback, the Patient Reported Outcomes: Feedback Interpretation and Learning (PROFILE) trial, the results of which were reviewed in the previous chapter.¹⁷⁴ This study aimed to evaluate the effectiveness of the NHS PROMs programme methodology for surgeon-level feedback in an Irish context. PROFILE tests the hypothesis that surgeons who received benchmarked PROMs feedback will have better future outcomes than those who do not. Surgeons were randomised to the intervention arm of the PROFILE trial and received peer-benchmarked feedback. All 11 surgeons in this feedback arm of the trial were invited, and agreed, to participate in face-to-face interviews. The participants varied in terms of the setting of their usual workplace, their relative performance ranking and their previous experience of using PROMs. The interviews explored surgeons’ experiences of using PROMs, their attitudes to using PROMs as a peer benchmarking tool, the methodological and practical issues with collecting and using PROMs data and the impact of the feedback on their behaviour.

The authors found that surgeons had conceptual and methodological concerns about the use of PROMs data, which led the surgeons to question the validity of these data. Unlike other performance indicators, PROMs rely on the subjective judgement of patients, and surgeons questioned patients’ ability to report on issues such as pain and function. Surgeons also confused PROMs with patient experience; they assumed that PROMs captured, and would be unduly influenced by, patients’ experiences of their care, and they were also concerned that patients may either underestimate or overestimate preoperative and postoperative outcomes. Furthermore, many surgeons expressed disbelief about the percentage of patients who reported that they had not improved or had had problems after surgery, as these figures did not match their clinical experience and the verbal feedback received from patients. They also expressed concerns about the impact of patient case mix and differences in hospital resources and levels of community support that may affect comparisons between surgeons or providers. They also questioned the timing of PROMs follow-up and did not feel that 6 months’ follow-up would fully capture the full benefit of the operation.

The surgeons also had difficulty interpreting and understanding the meaning of the data. They felt that PROMs feedback alone was not sufficient to provide an explanation for poor performance and that it did not enable them to identify opportunities for QI. This was because the surgeons perceived there to be a number of causal factors that may lead to poor PROMs scores, and thus felt that the PROMs scores did not, in themselves, highlight which of these factors required addressing. This relates to audit and feedback theories, which hypothesise that feedback must unambiguously provide information on the cause of poor performance and identify ways in which it can be rectified. The study also highlighted a number of practical issues around collecting and using PROMs data that created barriers to positive engagement with the exercise. Data collection added to workload pressures, and many surgeons stated that their supporting staff were not willing to accept this added workload. Political will at a hospital and system level were thought of as important in order to sustain any QI as they required local resource flexibility. There were also concerns about training in the use of PROMs.

The study also sought to understand how surgeons’ attitudes to PROMs data related to their use of these data for QI activities. The authors’ analysis identified three distinct groups of participants in terms of attitudes towards the data (typology): advocates, converts and sceptics. The advocates expressed a positive attitude towards the feedback they received, which they believed had an impact through promoting a reflective process focusing on their clinical practice. However, specific changes to care were not discussed. The converts were uncertain about the value of PROMs, and this reduced their inclination to use these data. This group generally felt that it is important to know what patients think about their outcome, but emphasised the need to provide actionable feedback. The sceptics believed that the PROMs feedback they received was not clinically useful and had no impact on their behaviour. They felt that there were too many methodological concerns to trust these data, and that these data did not provide a useful source of ideas to stimulate QI.

In terms of the theories under test, this study suggests that surgeons questioned the validity of PROMs data because they mistrusted the idea that patients’ subjective experiences formed a valid indicator of the quality of care, and because they felt that the instruments themselves, the timing of measurement and the ways in which the data were adjusted for case mix did not provide an accurate indicator of the quality of patient care. This study provides some support to the theory that, owing to the multiplicity of factors that may be causally linked to an outcome, providers find it more difficult to identify the possible causes of poor outcomes.

Theory 8a summary

Two of these are small qualitative studies that rely on the self-report of hospital staff in selective regions of the USA. The other is a small qualitative study of surgeons’ experiences of and attitudes towards PROMs data. However, they all suggest that both the source of performance data and the process through which the data are collected and presented are important influences on whether or not performance data are perceived as credible by clinicians. Mehrotra et al.’s¹⁹⁵ study in particular highlights that clinicians perceived data from patients’ notes to be more credible than report cards based on administrative data. This suggests that the source of performance data is an important determinant of their perceived credibility.

Theory 8b: the source of performance data influences providers’ perceptions of their credibility

We can test the theory that report cards based on data from patients notes are perceived as more credible than report cards derived from administrative databases by comparing two of the oldest cardiac reporting systems in the USA: the California Hospital Outcomes Project (CHOP) reports, which are based on administrative data, and the NYSCRS, which is derived from clinical data abstracted from patient notes. Participation in both systems is mandated by state law. Despite the age of the systems, there are a number of key differences between the two, in terms of how they were developed, the data used to produce the reports and the reporting level, that provide a useful comparison for our synthesis.

The CHOP reports, which began in 1993, are based on routinely collected administrative discharge data, and are overseen by a government agency, the Office of State Wide Health Planning and Development. The risk-adjusted data are aggregated to the hospital level only. The initial report classified hospital performance for acute myocardial infarction as ‘better’ or ‘not better’ than expected, while the second report classified hospital performance as ‘better’, ‘worse’ or ‘neither better nor worse’ than expected. The NYSCRS was initiated in 1989, partly in response to the shortcomings of the HCFA mortality data report cards. The NYSCRS was developed as a collaborative venture between the New York State Department of Health Authority and the 21-member-appointed Cardiac Advisory Committee. The reports were produced from chart data collected specifically by the hospital for the reports that are aggregated on a yearly and 3-yearly basis at both hospital and surgeon level. The reports contained the number of deaths, observed mortality rates, expected mortality rates and risk-adjusted ratios, and enabled the identification of hospitals and surgeons with statistically higher and lower rates than expected given their case mix.

There have been several studies exploring both systems, and we now review those that have explored clinicians’ attitudes to and use of data produced by these reporting systems. It is important to note that many of these studies are based on surveys of providers, which are subject to the risk of response, recall and social desirability bias. Providers who respond to surveys may be those who have an especially positive (or negative) view of public reporting, and what providers say has happened may not always accurately reflect what has actually occurred. Nonetheless, taken together, the surveys do provide some evidence with which to test our theories.

California Health Outcomes Project reports

Luce et al.²²⁵

The authors surveyed 17 acute care public hospitals 1 year after the initial CHOP reports were first published to explore whether or not and how the hospitals had used the reports to initiate QI initiatives. They provide no information on the number of hospitals in their initial sampling frame, so it is difficult to determine how generalisable their findings are. The authors found that ‘few, if any’ QI activities were initiated in response to the CHOP data. The free-text responses to their survey suggested the main reasons for this were hospitals perceiving their outcomes to be adequate, questioning the validity of these data (as there were ‘too few’ patients in each diagnostic category), not having the resources (we do not know whether this refers to examining the data or addressing any issues) and not being concerned about the public release of data. In part, these findings can be explained by the fact that this study was conducted early in the history of public reporting of performance, so familiarity with these data and expectations about their use may not have created the same pressure on hospitals to respond. In line with theories discussed in the previous section on market competition, the authors also point out that public hospitals do not have to compete for patients because most patients attending are uninsured and have little choice of hospital. As such, public hospitals had less incentive to improve the quality of their care. They also explain the findings by highlighting two issues of relevance to the theories tested here: (1) that providers struggled to understand these data and (2) that they distrusted the method used to risk adjust these data. In their conclusions, Luce et al.²²⁵ noted that, at the time of writing, ‘hospitals continue to resent the fact that they are required at their own expense to provide the Office of State Wide Health Planning and Development with discharge data that can be used against them in the competitive medical marketplace’.²²⁵ Here we see that providers distrusted not only these data themselves, but also the perceived motivation behind the data’s collection: creating ‘winners’ and ‘losers’ in a competitive market.

Rainwater et al.²²⁶

This survey was conducted 2 years later than that carried out by Luce et al.²²⁵ The authors surveyed 249 hospitals (out of the 374 who received the CHOP report) and then interviewed a purposively selected subsample of 39 hospital quality managers from the state to explore how they had used the second publication of the CHOP reports for QI purposes. They found that managers expressed concerns about the quality of data coding on which the reports were produced and whether or not the report provided a valid comparison of dissimilar hospitals. When respondents were asked what they found least useful about the report, the most frequent response was that the report ‘was not timely and did not reflect current practices’.²²⁶ The respondents also felt that the report provided information outcomes but not ‘practical information about the process of care’,²²⁶ which they regarded as key information for driving QI. The QI managers wanted to know what better-performing hospitals were doing differently. The respondents also indicated that quality information they obtained from other sources was more useful than the CHOP reports and cited systems that were characterised by process data and rapid feedback. Some felt that the CHOP reports simply confirmed what they already knew from other data.

Similar to Luce et al.,²²⁵ they observed that two-thirds of respondents had taken no specific action in response to the reports, although the reports were disseminated widely among hospital staff. Of those who had taken action, responses included (1) review of care and instigation of new care pathways, (2) changing medical staff and (3) improving process of data coding. Interview participants explained that CHOP data had been useful for improving hospital coding and for educating doctors about the importance of coding, because this affects the compilation of the indicators. The authors conclude that the public reporting of performance ‘although not completely ignored, is not a strong impetus for change or improvement in the process of care’.²²⁶ The authors observed that hospitals typically responded in a way that lies ‘between these two extremes and can be viewed as largely ceremonial. Organisations responding in a ceremonial manner alter observable activities to create the impression that established processes are working, without actually altering core activities’,²²⁶ namely patient care. This conclusion resonates with van Helden and Tilemma’s⁹³ benchmarking theory, that when organisations do not accept the validity of the indicator, they are more likely to respond by improving the presentation or appearance of these data rather than improving performance. However, these findings can also be interpreted as indicating that providers’ initial responses to the report cards focused on efforts to improve the validity and credibility of these data, through improving the process of hospital coding.

New York State Cardiac Reporting System

We now turn to the NYSCRS. These report cards are based on clinical data abstracted from patients’ notes and were overseen by a committee of cardiologists and cardiac surgeons. On this basis, therefore, we might expect that the cards would have been better received than the CHOP reports. One survey directly compared provider responses to the CHOP and the NYSCRS reporting systems, to test the theory that reporting systems based on clinical data were viewed more favourably by providers than those based on administrative data.

Romano et al.²²⁷

Romano et al.²²⁷ surveyed 249 of 374 hospitals in California and 25 of 31 hospitals in New York to compare the views of hospital leaders on the two reports. Some caution is required in interpreting the findings of this survey, as hospitals with high volumes of acute myocardial infarction were more likely to respond than those with low acute myocardial infarction volumes. The authors also noted their suspicion that hospital leaders with strongly negative or positive views were more likely to respond to the survey than those with neutral views, which may have skewed the findings to the extremes. Nonetheless, the study does provide a useful comparison between the two reporting systems.

This study found that 68% of hospital leaders in California, compared with 89% of leaders in New York, agreed that risk-adjusted mortality data were useful in improving the quality of care. The New York report was rated significantly better than the California report in its usefulness in improving the quality of care, accuracy in describing hospital performance and ease of interpretation. In California, 50% of respondents agreed that the state’s reporting system was better than other systems that used administrative data, while 81% of respondents in New York agreed with this statement. Conversely, 24% of hospital leaders in California agreed that their state’s reporting system was better than other systems based on clinical data abstracted from patients’ notes, whereas in New York 50% of respondents agreed with this statement.

Hospital leaders in New York were, in general, more knowledgeable than those in California about the methods of risk adjustment for their reporting system. However, only 8% of leaders in California and 22% in New York rated the report as ‘very good’ or ‘excellent’ in facilitating QI. This indicates that, although report cards based on clinical data may be better received than those based on billing data, hospital leaders are yet to be convinced of their value in QI. This suggests that it is not merely the nature of these data that determines their use and value in initiating QI initiatives. Indeed, the authors conclude that NYSCRS higher ratings ‘may not be attributable to its use of detailed clinical data. Those ratings may, instead reflect New York’s longer track record . . . [and] greater oversight by a Cardiac Advisory Committee, and a limited population of hospitals’.²²⁷ Chassin et al.¹⁹¹ (discussed previously under theory 3) also concluded that the durability of the NYSCRS was attributable to ‘its integration into the routine process of a governmental agency . . . and the vigorous involvement of the state’s leading cardiac surgeons and cardiologists in the advisory committee process’.

Theory 8b summary

These studies suggest that the CHOP reports were widely disseminated within hospitals but stimulated few, if any, attempts by providers to initiate QI activities. Instead, providers responded by taking steps to improve the validity of these data, as, by and large, they did not perceive them to be credible. This lack of credibility stemmed from the reports’ reliance on administrative data, their lack of timeliness and their failure to provide information on the process of care that underpinned the outcomes data. The NYSCRS reporting system seems to have had a somewhat greater impact, with poorly performing hospitals taking steps to improve patient care. When compared head to head, the NYSCRS was, in general, better received by hospital leaders than the CHOP reports. However, the small relative advantage of the NYSCRS cannot be attributed simply to its use of clinical data, and the relative disadvantage of the CHOP reports were not solely due to the use of administrative data. Instead, studies point to the idea that clinicians distrusted the underlying rationale for collecting the CHOP data, while the NYSCRS was better received due to the involvement of leading clinicians in its design, through an advisory committee. In terms of the theory under test, this suggests that clinical involvement in the design of the report cards in addition to the nature of these data is a better explanation of their success or otherwise than the nature of these data alone. As we initially highlighted, these two conditions are not mutually exclusive and we need to understand what it is about clinician involvement in the design of report cards that influences their success.

Theory 8c: the perceived underlying driver of public reporting systems influence providers’ responses

We can test the theory that the perceived underlying driver of public reporting systems influences providers’ responses by comparing providers’ responses with mandatory and voluntary reporting systems. Mandatory reporting systems can be initiated by regulators, national or state governments, insurance companies or employers. They may have a range of different drivers for initiating such schemes, and evidence¹⁹⁵ reviewed previously suggests that these may be at odds with what clinicians perceive to be important. In contrast, voluntary reporting systems are often initiated by professional groups or independent QI organisations whose values may better reflect what clinicians perceive to be important. We begin by reviewing studies that provide some data on hospital leaders’ views and GPs’ views of mandatory public reporting of performance initiatives. Some studies also report providers’ views of how they have responded to voluntary versus mandatory reporting systems and their views on externally produced, publicly reported systems versus internally collected data.

Hafner et al.⁷⁵

This study explored provider views of the nationally standardised acute myocardial infarction, heart failure and pneumonia performance measures produced and reported by the Joint Commission on Accreditation of Healthcare Organizations (JCAHO) in the USA. The Joint Commission is a non-profit organisation that accredits hospitals in the USA, and public reporting of quality indicators forms a mandatory part of that accreditation. The JCAHO also carries out inspections to check that hospitals are meeting the minimum standards. Thirty-six hospitals were randomly selected from a sampling frame of 555 Joint Commission-accredited hospitals and invited to take part in the study. Twenty-nine hospitals agreed to participate; nine had performance indicators consistently above the mean and were ‘high’ performers, seven were equal to or below the mean (the authors classed them as poor performers, but it could be argued that there is a difference between being at and being below the mean) and 13 hospitals had both high and low scores (‘mixed performers’).

Data were collected as 29 focus groups in each hospital with a total of 201 participants, including managers and frontline staff such as doctors, nurses and administrators. The focus groups consisted of mixed groups of staff and were conducted by Joint Commission staff, who were responsible for both producing the performance indicators and accrediting the hospitals. The questions asked tended to focus on the positive elements of public reporting rather than the negative aspects. Thus, the study was at risk of numerous potential sources of reporting bias; frontline staff may not have felt able to express contrasting views to those of senior managers. The authors report that ‘in interviews involving both leadership and frontline staff, more detailed responses to questions were proffered by those in leadership roles with front line staff affirming response either with non-verbal cues or simple one word responses’,⁷⁵ suggesting that ‘chatty bias’ might have been at play here. As such, findings are more likely to represent the views of managers than frontline staff. Furthermore, participants may have been reluctant to express negative views about performance indicators to the organisation responsible for accrediting their hospital and producing the indicators.

The paper reports largely positive impacts of performance reporting on QI initiatives. The authors found that the public reporting of performance resulted in managers, clinicians and administrators becoming more engaged in QI activities. A nurse commented that it provided justification for securing additional resources for QI activities. It also served to prioritise and focus attention on issues raised by the performance data. For high-performing hospitals, this arose from their desire to maintain this status. For low performers, the data led to an awareness of the need to critically analyse the data and respond to the findings.

Throughout the paper there is evidence that the drive to improve did not come simply from the intrinsic motivation of the staff but from the fact that the data have been made public. Participants reported that making the data public had ‘drawn their attention to it’ and ‘forced them’ to look at the data and respond to them. For example, an administrator indicated ‘When you tell a surgeon his numbers are going to be out there, you get their attention and they ask what they need to do!’.⁷⁵ Similarly, nurses from a low-performing hospital indicated that public reporting had ‘forced us to look at it [the data], to compare it. Before, it just sat there, now it drives us to do better’ and that ‘knowing that the public is aware of the scores makes us more energised to do better’.⁷⁵ The findings suggest that media scrutiny of the data put pressure on staff to respond to them, as one administrator noted: ‘I don’t think anyone believed it was going to be public until the newspaper article – that’s when people gasped!’.⁷⁵ Media scrutiny was perhaps felt more strongly for lower-performing hospitals, as interviewees felt that the local media typically tended to focus on ‘why the numbers are low and rarely on why they are high’.⁷⁵ These findings resonate with public disclosure theory, discussed in the previous chapter, which hypothesises that the media act to reinforce the shaming mechanism of public reporting and prompt the desired response.

A further issue to emerge from this paper was that interviewees in 17 of the 29 organisations reported that the validity and reliability of performance data were challenged by frontline staff. This occurred in both high-performing and low-performing organisations, but occurred more frequently in the low-performing hospitals. Concern was expressed that the data did not fully capture the quality of care in the hospital, that comparisons between hospitals did not adequately reflect differences in case mix and that the data were too old to reflect current practice. The authors noted that high-performing hospitals saw these challenges as ‘learning opportunities’, although they do not fully explain what is meant by this. These findings support audit and feedback and benchmarking theories, namely that when performance feedback is inconsistent with a provider’s own estimation of their performance it may not be accepted.

The methodological limitations relating to the numerous sources of reporting bias may explain why this study tended to find more positive views of the impact of mandatory performance reporting than other studies. In terms of the theories under test, this study suggests that mandatory public reporting had served to place QI at the top of the agenda for providers and that increased resources had been directed to addressing quality issues. It suggests that providers were sceptical about the quality and validity of the data. It also indicates that mandatory public reporting ‘forced’ and ‘energised’ providers to address quality issues because of the media scrutiny they expected to receive or actually received.

Asprey et al.²²³

This study aimed to explore primary care providers’ views of and responses to feedback from the national GP patient survey in England. This is a national survey of a random sample of patients registered at all GP practices in England, which asks about their experiences of care provided by their practice. The survey includes items on the ease of getting through on the telephone, the helpfulness of receptionists and being able to see your preferred doctor, as well as ratings of how good patients felt their doctor and nurse were on a number of dimensions. The survey is run by the CQC and is, essentially, mandatory. The results from the survey are publicly reported on a special GP patient survey website hosted by the CQC, where patients can compare the results of their own practice with those of two others of their choice. In addition, the results of some elements of the survey are used in calculating GP practice QOF payments. As such, the GP patient survey is mandatory and publicly reported, and has financial rewards and sanctions attached to it.

The authors selected four PCTs to represent different geographical regions across England, and within these four areas they selected the five highest-scoring and five lowest-scoring practices on the GP patient survey item ‘ease of obtaining an appointment with a doctor’,²²³ an item that contributed to the practices’ QOF scores. Their aim was to recruit one high- and one low-scoring practice from each of the four areas; the first practice to agree to take part within each sampling strata was included in the study. Ten GP practices were recruited, four with high scores and six with low scores, which included two single-handed practices. In each practice, two GPs (except in the single-handed practices), one practice manager and one practice nurse were interviewed, giving a total of 37 interviews.

The authors found that participants were sceptical about the credibility of the survey findings for a number of reasons. They were concerned that respondents were unlikely to be representative of the practice population because the ‘vocal minority’ with negative views were more likely to reply, while other groups, such as older people, working people and people with mental health problems, were less likely to reply. They also felt that the items in the questionnaire did not necessarily represent what constituted ‘good’ care, as not all patients valued them in the same way, or it was unclear that a high score on an item reflected a good experience; for example, a high score on waiting times might indicate that patients were being rushed through appointments in order to shorten waiting times for patients. Participants also felt that the lack of adjustment for case mix also reduced the validity, and thus the utility, of the feedback for improving patient care.

The authors report that the ‘most emotive’ responses provided by interview participants related to their suspicion that the surveys were driven by ‘political motives’. One GP from a low-scoring practice described the ways in which the questions were asked as ‘stacked against us and I think most GPs have a cynical view about that’.²²³ A practice nurse from a low-scoring practice also felt that ‘this is a cost cutting exercise and little to do with a real commitment to patient satisfaction or to help those in primary care deliver a better service’.²²³ In other words, low-scoring practices in particular felt that the survey was a politically motivated attempt to cut costs rather than improve patient care. Such feelings sometimes led participants to reject and ignore the survey results, as one GP from a low-scoring practice explained:

I’m totally cynical about the government’s motivation and this is just part of that . . . So if they think they’ve got me over a barrel, forget it, because they haven’t. And I can just happily carry on and ignore this survey.
GP from a low-scoring practice²²³

One GP from a low-scoring practice commented that although the financial element of the QOF was important – ‘we like to maintain a high income, of course we do’ – financial concerns were not ‘paramount’;²²³ rather, the practice was more concerned about the ‘shame factor . . . information is shared so much, you don’t want to see yourself . . . on a bar chart at the bottom of the pile’.²²³

The suspicion that political motives were behind the survey was not limited to low-scoring practices; those in high-scoring practices also felt that survey was being conducted for political ends to make GPs work harder but with few clinical gains. One high-scoring GP commented that the GP patient survey was:

. . . a way of softening up primary care for extended hours, by showing there was a demand out there for it . . . I don’t think there’s going to be huge clinical gains from doing that.²²³

Furthermore, the questions in the survey were also perceived to measure what mattered to politicians, to the exclusion of other important aspects of care. As one GP from a low-scoring practice expressed:

It’s a bit of tail and dog isn’t it? . . . Because it has been measured, is it necessarily important? It’s important but is it as important as some of the things that haven’t been measured?²²³

Some participants also felt that the items in the questionnaire driven by government rhetoric, such as rapid access, had unrealistically raised patient expectations about what they could expect from general practice, but, at the same time, the government had not provided additional funding to enable practices to deliver those promises. Finally, some practices had acted on the survey findings and made changes, such as extending their opening hours, but these had not been reflected in their scores, which led to practices feeling discouraged and thinking:

Well why? What else can we do?
Practice manager from a low-scoring practice²²³

In terms of the theory under test, these findings suggest that this government-initiated, mandatory public reporting programme was perceived to be driven by political motives rather than a desire to improve patient care. Consequently, primary care providers perceived that the items in the programme reflected government definitions of what constituted ‘good’ patient care and measured what mattered to the government, rather than to primary care practices themselves. In turn, practices were sceptical about the credibility of the data as an aid to improving patient care. This feeling was further reinforced when, even after attempts to respond to the survey by making changes to patient care, these improvements were not reflected in the scores achieved on subsequent surveys, further challenging the credibility of the survey itself.

Pham et al.²²⁸

This paper explores provider responses to a range of performance reporting initiatives and pays particular attention to two initiatives. The first is the reporting required by JCAHO, an organisation responsible for the accreditation and regulation of hospitals in the USA; public reporting is a condition of accreditation and thus is mandatory. They also explored the Hospital Quality Initiative, which is a voluntary system launched by the CMS and is reported on the Hospital Compare website. Following the reporting system’s introduction, participation in it was poor until a state law was passed that non-participating hospitals would not receive a 0.4% annual payment update. The authors also examined providers’ views on other forms of reporting systems; as such, the paper provides evidence on providers to the different drivers behind a range of reporting systems, including a contrast between mandatory, voluntary and clinician-led public reporting systems.

The paper drew on data collected as part of the Community Tracking Study. The Community Tracking Study is a longitudinal study based on site visits and surveys of health-care purchasers, insurers and providers, and is focused on tracking changes in the accessibility, cost and quality of health care. The data selected for the study were collected in 2004–5 and consisted of 111 interviews with five hospital association leaders, representatives from the JCAHO and CMS and six state reporting programmes, 21 chief effective officers, 21 vice presidents of nursing, 30 quality officers and 26 clinical directors from 2–4 of the largest hospitals or hospital systems in 12 health-care markets across the USA. The data were collected using semistructured interviews that explored specific reporting programmes and their perceived impact on the hospital’s organisational culture around QI, priorities, budget, data collection and review activities, feedback and accountability mechanisms. Clinical directors were also asked about their use of 11 QI tools targeted at chronic heart failure, as both the CMS and the JCAHO reporting systems included these.

The authors found that respondents mentioned involvement in 38 different reporting programmes, with each hospital participating in a mean of 3.3 programmes. These programmes varied along a number of axes, which correspond closely to the contextual factors identified within the public disclosure, audit and feedback and benchmarking theories. These included:

‘Sponsorship’, or who initiated the reporting system: by purchaser, regulator, private insurer, professional groups or other private organisations.
Data type: hospitals submit primary data for public reporting (JCAHO), primary data for private benchmarking (e.g. hospital consortia) or secondary data (e.g. insurance claims or patient surveys).
Mandatory versus voluntary: although most programmes are voluntary, this is influenced by the nature of incentives attached to the programme – incentives explicitly tied to participation in the programme perceived as rewards (e.g. pay for performance) or punitive (e.g. loss of accreditation).
QI support: whether or not programmes provide prescriptive information to guide hospitals’ QI activities.
Inclusion of clinical outcome measures.

Senior hospital leaders perceived that key drivers, such as linkages to payment, JCAHO accreditation and peer pressure from public benchmarking, had raised the priority given to quality measurement and improvement by hospital leaders. This was manifest in several ways, including the inclusion of QI priorities in strategic planning, boards and senior management taking more responsibility for the formal review of performance data and associated improvement strategies and performance-related pay for chief executives. Respondents also believed that CMS and JCAHO reporting programmes had also positively changed doctors’ attitudes towards quality measurement and improvement. The accreditation and financial consequences of reporting programmes could be used as leverage by quality officers in their dealings with doctors. However, the respondents also felt that these programmes artificially focused on a limited number of indicators, which has directed both attention and resources away from other important clinical areas.

As a result of their mandatory requirements, participants felt that the drive to participate in CMS and JCAHO reporting was of a ‘push’ nature. Hospitals did direct resources at the reported clinical areas but ‘without taking standardised approaches to improving performance’.²²⁸ Motivation to be involved in other reporting initiatives was described more in terms of a ‘pull’ because they offered support by specifying changes in care processes. These forms of reporting were seen as attractive because they ‘don’t leave hospitals flailing about trying to identify evidence-based interventions on their own’²²⁸ and because they encouraged a culture of continuous QI. These sorts of programmes were most likely to be those initiated by QI organisations, state QI organisations and professional organisations.

Hospital respondents were divided on whether or not reporting had a significant impact on specific process changes to improve the quality of care. Those who had participated in QI programmes prior to being involved in mandatory public report programmes felt that public reporting had little impact on their QI interventions, as these were largely driven internally. The respondents were also divided regarding whether or not reporting had a ‘spill over’ effect on improving quality for the non-reporting conditions; some believed that QI was limited to the targeted conditions, while others felt that it had raised frontline staff’s ability and eagerness to identify and address problems in non-reported areas too.

In their conclusions, the authors argued that national programmes that mandate participation through regulatory or financial reward mechanisms ‘can influence nearly all hospitals and garner attention from those that would otherwise not prioritise QI highly’,²²⁸ whereas voluntary programmes, especially those which also provide ongoing support to implement QI initiatives, ‘help focus priorities at hospitals that are eager to take on the more challenging goals of QI’.²²⁸ Thus, national programmes with mandatory reporting and regulatory or payment consequences have increased hospital leaders’ and frontline staff’s attention to quality and directed resources towards QI. However, at the same time, such public reporting initiatives may ‘artificially narrow the scope of QI in which hospitals might otherwise engage, especially for those with long institutional histories of QI’.²²⁸ In terms of the theories under test, these findings suggest that, for those not already involved in QI activities, mandatory public reporting services serves to raise awareness of quality of care issues and direct resources towards issues raised by such reporting systems. However, similar to the findings of Mannion et al.,¹⁸ these can lead to tunnel vision, where other important clinical areas do not receive the same attention. For those hospitals already involved in QI activities, mandatory reporting systems were perceived to have little additional impact on these activities. Furthermore, hospitals were attracted to systems initiated or run by QI or professional organisations because they offered support to providers to take the further steps necessary to identify the source of the problem and implement QI activities.

Davies⁹⁸

As described in Chapter 4, this study examined the responses of US providers based in California to both externally produced, publicly reported data and internally produced, privately fed back data, with a specific focus on cardiology. They explored providers’ views of a publicly reported data system, such as the CHOP, which publicly reports data on 30-day mortality for acute myocardial infarctions and also confidential data systems designed for internal use, for example the national register for myocardial infarction by Genetech and Health Care Financing Authority, sponsored peer-review organisations in the state of California. As such, the study provides a useful contrast between provider responses to data that is produced and fed back in different ways.

This was a multiple case study of six hospitals purposively selected because they were ‘high performers’, as the author expected that he would be more likely to find examples of QI in those hospitals (with the assumption that the hospitals had become ‘high performers’ as a result of QI activities). However, a range of different hospitals was in the sample, including two academic medical centres, one health maintenance organisation, two private but not-for-profit medical centres and one public provider ‘safety net’ hospital. The author conducted 35 interviews with 31 individuals lasting between 54 and 90 minutes. Interviews were conducted with key informants in each setting, including the chief executive, senior clinicians with management responsibilities, senior quality managers, the chief of cardiology, a senior nurse manager, and two or three frontline staff within cardiology. As such, this study provides insight into the views of frontline as well as managerial staff.

Participants questioned the validity and reliability of publicly reported data because they perceived that the reporting systems did not take case mix into account. Participants also expressed concerns about inconsistent coding practices and the poor quality of administrative data; differences in performance were seen as artefacts of the data collection process rather than reflecting real differences in performance. This led to efforts to reform the data collection process. Participants also felt that issues that were measured attracted more attention than was warranted, to the detriment of other, unmeasured services. Quotations from respondents suggested that a reason this was perceived negatively by some staff might have been because it drove efforts to address the problem that were not necessarily clinically appropriate: ‘It really fires people up to meet the task, rather than for clinically appropriate reasons’. Participants also reported instances in which clinicians challenged the pressure exerted on them by a purchasing group following the feedback of performance data, because they felt that the priorities being imposed on them by the purchasing group were counterproductive.

The author’s analysis of his data suggested that a provider’s action in response to publicly reported comparative performance data was most likely when external data indicated they were performing poorly: ‘being an outlier does motivate performance. There’s no doubt about that’.⁹⁸ A response to these data was less likely if they were a ‘middle ranker’ because some providers were willing to tolerate being ‘middle of the pack’ and did not feel that they had an incentive to improve. However, in some instances being ‘middle of the rank’ was not acceptable when providers were ‘striving to be the best’, while, for others, a provider taking action depended on ‘our own perception as to whether [the data] were an accurate reflection of what we think is happening’.⁹⁸ The paper also noted that publicly reported data were seen as only one source of information about the quality of care, with their own assessments and the views of peers and coworkers being as important. For example, one senior clinician noted, ‘It’s the opinion of peers that matter more than anything else about quality’ and another senior clinician explained that comparative performance data ‘merely reinforces already held opinions just based on other factors, you know, day to day experience’.⁹⁸

The study also sheds light on the relative roles of externally produced and internally collected data on the implementation of QI initiatives in response to these data. The findings suggest that publicly reported data focused attention and acted as a ‘kick start’ to the QI process; as a senior nurse manager explained, ‘external data are the start of the process . . . that really gets the ball rolling in terms of an [internal CQI] investigation’.⁹⁸ However, external data were not able to identify the cause of the problem and thus could not help to identify a solution because these data were not timely and thus lacked relevance to current care – ‘If you’re not doing it yourself and reacting to it immediately, there’s a whole time lag and opportunities for improvement that you’ve missed’ – and did not provide sufficient detail: ‘you just don’t get the details [from the external data].’⁹⁸ It was the internally collected data that provided the necessary clinical detail to identify the cause of the problem and how to remedy it: ‘it’s the in-house data [that] drives us more than the outside data. I think it’s also better data and it’s more focused; it has many more elements to it’ and ‘our best successes [in using data to improve quality] were our very own internal ones.’⁹⁸

Therefore, external data were useful to identify what needed to be looked at, but internal, clinically owned process-based data were needed to identify the cause of the problem and how it could be dealt with. To support this, providers also needed practical resources for the analysis, presentation and interpretation of such data and a culture that valued and supported continuous QI processes: ‘we have wonderful motivated people but if we didn’t have the resources to do this, we couldn’t. There’s not only people committed to excellence, there’s resources committed to excellence’.⁹⁸ If good local data and supportive resources were absent, little QI was seen: ‘We don’t do it [benchmarking] and we don’t have the resources to do it . . . really, no way, since we don’t have ongoing databases’.⁹⁸

Thus, in terms of the theories under test, this paper suggests that when publicly reported data highlighted issues that the clinicians themselves or their peers also felt were a problem, they served to amplify or kickstart an intrinsic desire to improve. It also indicates that although publicly reported data might focus attention on areas that need to be changed, it was only through analysing internally collected process data that providers could understand the cause of the problem and identify ways to address it. To be able to do this, providers needed practical resources and management support for internal data collection and the analysis and interpretation of those data.

Theory 8c summary

These studies suggest that mandatory public reporting systems had focused the attention of hospital leaders and frontline staff on quality issues, particularly for those who had no previous experience of QI activities. However, unless the issues raised were also identified as a problem based on clinicians’ day-to-day experience, by their peers or by internal data collected by the hospital, the focus on the indicators included in mandatory public reporting systems was perceived as leading to ‘tunnel vision’. In particular, for those who were already engaged in QI activities, public reporting could artificially focus attention on a limited number of issues, at the expense of other clinically important areas. Furthermore, although scoring poorly on an indicator included in a public reporting system could ‘kick start’ a response from providers, it was only through analysing internally collected data that providers could understand the source of the problem and identify a possible solution to rectify it. However, providers also reported that considerable resources were required to enable them to do this. It is likely that those who were already engaged in QI activities had set up internal data processes that inform QI on an ongoing basis and thus were better able to respond when external data shone a light on poor performance. It is therefore unsurprising that providers valued the additional support for QI activities that was offered by reporting systems led by clinical or QI organisations and felt ‘pulled’ rather than ‘pushed’ to engage with these reporting systems.

Theory 8d: clinicians have greater trust in clinician-led reporting systems

To test the theory that clinicians have greater trust in clinician-led public reporting systems, we now look in more detail at one programme that was initiated and led by a collaboration between clinicians, hospital providers, insurers and employers: the WCHQ.

Greer¹⁹⁹

All quotations from this study are reproduced from Greer,¹⁹⁹ with permission from the Commonwealth Fund.

Greer’s report of how the WCHQ was set up is overwhelmingly positive about its impact; this may be due, in part, to its reliance on data collected from the ‘enthusiasts’ who led the collaborative. The author recognises that, as the collaborative was a ‘pioneering effort’, it benefited from the Hawthorne effect. The report is based on 31 interviews with board members and medical practice executives involved in the WCHQ. They included eight CEOs, 10 chief medical officers and five executives responsible for the quality of care in their organisation. As such, it does not capture the perspectives of staff on the ground or their responses to this initiative. Nonetheless, the study provides some useful ideas about the mechanisms through which clinical engagement and trust in a public reporting system was achieved.

As Greer describes, the WCHQ was founded in 2003 by chief executives of several large multispecialty practices and their partner hospitals. The collaborative was brought together with the purpose of promoting QI among member organisations by (1) developing performance indicators, (2) openly sharing provider performance through public reporting and (3) identifying and sharing best practices to improve the performance of all members of the collaborative. One of Greer’s interviewees, a doctor by background who was the CEO of a doctor-led health-care organisation, noted:

What we sensed was that there were reports being published by people who had some knowledge, but perhaps not full knowledge of healthcare and its delivery . . . The reason for our meeting was: it looks like people are going to start writing reports, publishing reports on medical performance and that will be followed by dictating type of care and how care should be delivered. Shouldn’t we, the people responsible for care delivery, shouldn’t we be involved in the process?
p. 2¹⁹⁹

In other words, the WCHQ was set up in order to allow clinicians to retain some control over the ways in which public reporting was designed and implemented. The underlying premise of this process was perceived to be one of mutual learning, as an antidote to the competitive climate providers perceived they were expected to work in. One of Greer’s respondents, a founding member of the WCHQ, noted:

I think in many ways WCHQ became an oasis from this highly competitive environment . . . a safe harbour where we are not talking about market dominance and control. We are talking about quality.¹⁹⁹

A chief medical officer who joined the initiative later, when the opportunity for membership was opened up to a wider group of clinic groups and practices, also remarked:

[I enjoy] the sense of collaboration, and what is kind of fascinating, is that the discussion – of how we are doing, how we are doing relative to each other, how we can do better – constantly brings you back to your primary purpose and that is the patient you are taking care of.¹⁹⁹
p. 13¹⁹⁹

Here we see that the primary motivation for signing up to the WCHQ was its focus on how to improve patient care, rather than on improving profits through market dominance.

The mechanisms of change reported by interviewees were that (1) doctors were intrinsically motivated to do a good job, and receiving feedback that indicated their performance was poor prompted them to take steps to improve; and (2) doctors can identify what needs to be improved, and how, by sharing and learning from the best practices of other providers. These mechanisms resonate with both audit and feedback and collaborative benchmarking theories. Greer’s interviewees noted:

I think one of the constructs that the Collaborative built on is that physicians want to do a good job . . . By providing information about how physicians perform you can influence physicians behaviour . . . They are driven to change things when their performance does not look good.
p. 14¹⁹⁹

WCHQ [provides] the actual benchmarking data for looking at where you are at, how to improve, and building on those connections with other organisations that are similar to you; where you can say ‘our numbers are not good here, how did yours get better? What can we learn from how you are doing it?
p. 17¹⁹⁹

Greer argued that the WCHQ had a high level of clinical engagement, as evidenced by its growing numbers, with 50% of Wisconsin primary care physicians being members of the collaborative in 2008. Greer attributed this high degree of clinical engagement to the fact that the collaborative was clinician led, rather than led by government or an insurance company. This resulted in the development of indicators and methods of attributing those indicators that were perceived as valid reflections of clinical care. The indicators were accepted because there was a shared perception among the collaborative that the motivation behind the production of the indicators was that of improving patient care, rather for competition or external regulation. Participants in the WCHQ indicated that they accepted the data collated by the WCHQ rather than dismissed them because they were developed by clinicians and as such were seen as an accurate reflection of clinical care. The fact that the collaborative developed the indicators made it much more difficult for its members to then dismiss their validity. As one of Greer’s interviewees reports:

. . . we promised each other we would report our data, we would not fudge it, we would have it verified, we would make it public and we would not walk away from whatever we found.¹⁹⁹

In contrast, Greer’s interviewees were sceptical of the motivation behind insurer, claims-based indicators that were perceived as inaccurate and designed for competitive advantage rather than claims purposes. Indicators developed by organisations that were representatives of the government, such as the Quality Improvement Organisation or the JCAHO, were seen as motivated by external regulation and were perceived to have only a short-term impact on QI initiatives, as one of Greer’s interviewees observed:

JCAHO is big brother coming in and the response is usually: ‘Oh there is a JCAHO visit. We will get ready for the JCAHO visit that we will have to pass.’ Then they leave and they don’t come back for 5 years and until the next one comes along nobody thinks about them.
p. 17¹⁹⁹

In summary, Greer’s case study of the set-up of the WCHQ suggests that clinicians were engaged in setting up the collaborative because they supported its primary motive of improving the quality of care by comparing performance and sharing best practices. The perceived mechanisms of change were those evoked by audit and feedback and benchmarking theories; that is, clinicians have an intrinsic motivation to do a good job and will be prompted to improve if their performance is poorer than they would like or poorer than that of their peers. This engagement, in turn, led to the development of indicators that reflected the information that clinicians needed to improve patient care. As clinicians were involved in the design of the indicators, it was then difficult for them to dismiss the data as invalid. This suggests that clinicians are more likely to trust data that are publicly reported if they have a role in the choosing the indicators and the means through which they are risk adjusted and reported.

Lamb et al.²²⁹

This paper reports on a longitudinal cohort study and survey of members of the collaborative to explore whether or not the WCHQ led to improvements in the quality of care provided. For each publicly reported indicator, the authors examined whether or not there was an improvement in mean performance for the collaborative as a whole over time and examined trends over time. Clinician groups were ordered by their rank in first year of reporting and this was correlated with their rate of improvement. They conducted a postal survey of clinic groups to examine whether or not QI projects were undertaken specifically in response to reporting. Finally, for four indicators for which comparative data were available, they compared improvement in care for patients in the WCHQ with (1) patients in Wisconsin who were not part of the collaborative, (2) patients in Iowa and South Dakota, where there was no public reporting, and (3) residents in the remainder of the USA. Their analysis is based on responses from 17 out of the 20 doctor groups who were part of the collaborative, representing 409 out of the 582 clinics.

The authors found that for the WCHQ as a whole, for each reported measure, the indicators for clinics that were part of the collaborative improved as a whole over time, but there was wide variation in the amount of improvement, from 1.2 for low-density lipoprotein control to 17.3 for monitoring kidney function. Not all of these improvements were statistically significant (of 13 measures, 7 showed a statistically significant improvement over time). Groups that were initially low performers improved at a greater rate than those that were initially high performers. The survey found that 15 out of 16 groups reported formally giving priority to at least one QI measure in response to reporting. Nine out of 16 groups indicated they always or nearly always set priorities in response to reporting, while 6 out of 16 indicated that they sometimes did so. The mean number of QI interventions initiated by members of the collaborative increased over time. On three of the four measures for which comparative data were available, there was a trend for patients in the WCHQ to receive better care than patients in Wisconsin who were not part of the collaborative, in Iowa and Dakota and the USA as a whole, but this was not statistically significant. The measure that WCHQ members did not perform better on was not publicly reported by the WCHQ.

It is not possible to attribute the improvements found in this study solely to public reporting, as improvements in the quality of care were also found in other locations where public reporting did not occur. As the authors acknowledge, clinicians in the WCHQ were volunteers and more likely to be enthusiastic about public reporting and their patients were more affluent. Nonetheless, these findings suggest that members of the WCHQ did take steps to improve patient care in response to public reporting. They also indicate that providers’ responses were variable, suggesting that no group was able to respond to all the reported measures and clinician groups prioritised a limited number of indicators to focus on. Of note here is the finding that those initially identified as low performers improved at a faster rate than average or high performers. The authors hypothesise that ‘public reporting creates a milieu in which parties compete for external recognition and strive to avoid the negative aspect of publicly being identified at the bottom of the list’.

Smith et al.²³⁰

This paper reports on survey of doctor groups who were members of the WCHQ to explore whether or not they had initiated any improvements in diabetes care in response to the collaborative’s public reporting. The authors invited 21 doctor groups, representing 582 clinics, to participate, of which 17 groups representing 409 clinics agreed to participate. They received group responses from 231 clinics and individual responses from 178 clinics. They carried out two surveys: one for the doctor group as a whole and one for each clinic. They asked the clinics if they had implemented any of 55 QI initiatives, of which 22 were diabetes related, in each year between 2003 and 2008. The groups were asked to identify any year in which a metric was chosen to be a focus for QI and to indicate if this was in response to the public reporting. It is unclear how this was quantified, for example whether it was a simple yes/no answer. They used these data to generate an indication of whether, for each year, the doctor group formally adopted a focus on one or more diabetes metrics in response to the collaborative’s reporting, whether they adopted a focus but not in response to public reporting or whether they did not adopt a focus. Given that this survey is based on self-report, there is a risk of reporting bias in terms of both social desirability and recall bias.

They found that the implementation of diabetes QI initiatives increased between 2003 and 2008. Clinics in groups that focused on diabetes metrics in response to public reporting were more likely to implement both single and multiple initiatives than groups that did not formally adopt a focus on diabetes. In this group, a factor that appeared to influence whether clinics adopted single or multiple interventions was their experience in diabetes QI; clinics with less experience were more likely to implement single interventions. Clinics in groups that focused on diabetes metrics but not in response to public reporting were more likely to implement multiple interventions. The authors asked quality directors from four doctor groups why clinics chose to operate multiple or single interventions in a given year. Their responses indicated that clinics implementing single interventions were in the early stages of QI. The authors of the paper report the views of one quality director:

One quality director commented that, with the group’s participation in the collaborative, its doctors were seeing standard comparative reports for the first time. The director said that these reports motivated clinicians to ‘do something’, but ‘they just didn’t have the bandwidth to do more’.

The authors also noted that clinics implementing multiple interventions sometimes did so in response to public reporting, but they were also often involved in externally sponsored QI projects. The authors report:

In one case, clinics had implemented a single physician-directed intervention as a ‘first step’ but the quality director of that group noted that ‘we needed broader organisational change to sustain improvement’.²³⁰

Despite its methodological flaws, this study’s findings suggest the public reporting of performance data alone does not stimulate sustainable organisational change. The public reporting of performance data is more likely to lead to sustainable QI if it occurs alongside other, large, externally supported QI initiatives. Single QI interventions implemented in isolation may not lead to sustainable QI. The study also suggests that QI occurs incrementally, and that the organisations may achieve sustainable improvement through a process of trial and error; single interventions may be a first step along this pathway.

Theory 8 summary

These studies suggest that clinicians who developed the WCHQ public reporting system were motivated by a desire to improve patient care, and retain control over how the system was developed and operationalised rather than to achieve market dominance. The indicators selected reflected clinical views of what constituted good care, and clinicians themselves were involved in the development of the case-mix adjustment algorithms. In terms of the theories under test, this involvement therefore both ensured that these data were valid and instilled a sense of ownership in these data which, in turn, made it much more difficult for practices to reject these data as invalid. When the programme was more widely implemented, Lamb et al.’s²²⁹ study indicated that clinicians did take steps to improve patient care, but responses were variable and no clinician group was able to respond to all of the indicators. Furthermore, Smith et al.’s²³⁰ study demonstrated that QI occurs incrementally and often requires support from other national QI initiatives to be successful. It also suggests that as practices gain more experience in QI, they are more able to implement more sustainable changes. This suggests that although clinical acceptance of the indicators as valid is more likely when the public report programme is led by clinicians, and this in turn means that clinicians are more likely to take steps to respond to such data, the success of QI initiatives also depends on the experience of the practice in implementing QI activities and the resources available to implement changes. This raises questions about what makes public reporting programmes more actionable and what support providers need to make sustainable changes. It is this theory that we turn to in the next section.

Theory 9: the degree to which performance data are ‘actionable’ influences providers’ responses to the feedback of performance data

We now turn to testing theories which focus on what makes performance data actionable. In Chapter 3, we identified a number of theories which specified the conditions under which performance data may support or constrain attempts by recipients to take action in response to them. Again, it is important to note that programmes are not implemented in isolation but have to work alongside other initiatives that may support or inhibit their impact. Furthermore, programmes themselves are complex and embody a collection of different characteristics that may have a differential impact on whether or not their intended outcomes are achieved. In Chapter 3, we suggested that the following configuration of programme characteristics may influence they extent to which providers used data to initiate improvements to patient care.

Timeliness: if data are not fed back to recipients in a timely way, they do not reflect current care and are less likely to be used as a catalyst for QI.
Problem identification: performance data rarely provide a definitive ‘answer’ regarding the quality of care provided; rather, what leads to change is the discussion and investigation of the underlying cause of the level of performance indicated from the data.
Nature of the indicator: process data are more useful than outcome data for QI purposes as they are better able to provide an indication about the cause of the poor outcomes or what needs to be improved.
Level/specificity of feedback: performance data are more useful if they relate to individual clinicians or departments, as this enables action plans to be developed and implemented at ward level.

We explore the impact of these contextual configurations by reviewing the evidence on providers’ views of and responses to patient experience data because we hypothesise that these data can (but do not always) exemplify this configuration of programme characteristics. Patient experience data are a form of process data and, we can hypothesise, provide information on the providers’ performance on different dimensions of the care experience and therefore, give an indication of which care processes need to be improved. A number of initiatives that collect patient experience data, such as the GP patient survey in England, collect and feedback those data more frequently (biannually) and with a shorter time lag between data collection (6 months) and feed back than that for PROMs data collection, where there is often 1 year between data collection and feedback. It can also be reported at ward level as well as at the hospital level, so individual wards can compare how they perform with the hospital as a whole and, for national surveys, the national average. We start by reviewing studies that have examined whether or not the feedback of patient experience surveys has led to improvements in patients’ experiences. We also review studies that have explored providers’ views of patient experience surveys and their self-report of the QI initiatives that were undertaken. Finally, we consider studies that have examined the impact of interventions designed to support providers in responding to patient experience data.

Theory 9a: patient experience data are actionable and enable providers to take steps to improve patient care

Vingerhoets et al.²³¹

This study assessed whether or not the structured, individualised, benchmarked feedback of patient experiences to GPs in the Netherlands resulted in improvements to patients’ experiences of care. This study was conducted before national surveys of patient experiences were implemented in the Netherlands and therefore GPs in this study were unlikely to have had the same exposure to patient experience surveys as they do now. The sampling frame for the study was a sample of 700 GPs in the Netherlands stratified by level of urbanisation. From this sample, 60 GPs from 43 practices were recruited to the study and each was asked to recruit two cohorts of 100 consecutive attending patients, one at baseline and one 15 months later. Each cohort of patients was asked to complete a previously validated patient experience questionnaire covering nine dimensions of care. After matching for practice size, each practice was randomly allocated to a control arm and an intervention arm. The control arm practices received no feedback from the questionnaire. GPs in the intervention arm practices received a written 15-page report detailing the patient experience scores provided by their patient on each dimension, total scores and also reference figures for all GPs. The report also contained an abstract for a review of the determinant of patients’ evaluations of their care and a manual that explained how GPs might use the results of the survey. GPs in the intervention arm were also sent a questionnaire enquiring about any changes they had made to their own behaviour or the organisation of care. The results were analysed using multiple regression to examine if there were any statistically significant differences in patient experience scores between the two arms of the trial.

The authors found that, after controlling for the effect of baseline patient experience scores, patients’ evaluations of continuity of care and medical care were statistically significantly less positive in the intervention arm. There were no statistically significant differences in the other seven dimensions of patient experience, despite GPs reporting that they had made changes to their own behaviour or the organisation of care. The authors hypothesise that the lack of effect of their intervention may have been because the follow-up period was too short for any improvements to register, that GPs may have been too busy to implement changes and that the general shortage of GPs in the Netherlands may mean that GPs felt less pressure to respond to feedback. They argue that the intervention may have been more effective if ‘it is embedded in an educational programme or QI activity related to a specific clinical topic or group’.²³¹ They also suggest that the feedback may have functioned as a means of identifying specific topics for QI that needed to be explored in more detail before implementing specific QI activities.

This study was conducted in a different context from that experienced by current GPs, who had considerably more exposure to feedback from patient experience questionnaires. However, it does suggest that a ‘one-off’ feedback of patient experience data to GPs, without any public reporting or financial incentives attached to it, does not lead to improvements in patients’ experiences of care.

Elliot et al.²²²

This study examined the feedback and public reporting of the Hospital CAHPS survey, which measures patient experiences and is publicly reported on a quarterly basis. The scheme was initially introduced by the CMS on a voluntary basis, and in 2008, 55% of eligible hospitals in the USA were involved in the scheme. However, CMS implemented a penalty of a reduction of 2% in the annual payment to hospitals if they failed to collect data from 2007 and failed to report it from 2009. By 2009, the percentage of hospitals participating in the scheme rose to 80%. Thus, the Hospital CAHPS scheme had financial incentives linked to hospital participation. It is perhaps also worth noting here that two of the authors who conducted the study worked for CMS, which was responsible for introducing the scheme.

The data from the survey are reported as the proportion of responses in the most positive categories (i.e. ‘definitely yes’, ‘yes’ or ‘always’) across nine domains measuring nurse communication, doctor communication, responsiveness of hospital staff, pain management, communication about medicines, cleanliness of hospital environment, quietness of hospital environment and discharge information. The authors adjusted these data for survey mode and patient characteristics. They compared scores on the survey between data published in March 2008 and March 2009 for 2774 hospitals in the USA that publicly reported data for both time periods to examine whether or not there had been any improvements in patient experience over time.

The authors found statistically significant but very small changes in patient experience scores between March 2008 and March 2009 on all nine domains except doctor communication. Most changes were < 1 percentage point difference in the scores in the top category between the two time periods. In their discussion the authors describe these changes as ‘modest but meaningful improvements’ and argue that their findings provide evidence that ‘healthcare entities are able to use CAHPS feedback to improve patient experience’.²²² However, it is difficult to know the real significance of the changes for patients from these data and the study did not explore whether or how providers did use the data to improve care or what changes, if any, were made. Without a control group, it is difficult to know if such modest improvements in patient experience would also have occurred in the absence of any feedback. Furthermore, the time period over which the study was conducted was short and may not have captured the impact of any changes. In terms of the theory under test, this study suggests that, in the short term, the feedback and public reporting of patient experience data to providers leads to only very small gains in some domains of patient experience but not in doctor–patient communication. We now look at studies that have explored providers’ views and attitudes towards patients experience data, and their reports of whether or not and how they used this feedback to initiate QI activities.

Barr et al.²³²

This study explored the impact of mandatory public reporting of patient experience on providers’ QI activities. The study focused on the state-wide mandatory reporting of patient experience in Rhode Island, which was initially fed back privately to providers in 2000, and from 2001 onwards was fed back publicly. The 56-item survey was carried out annually on a random sample of patients discharged from each state licensed hospital in Rhode Island. The survey covered nine domains of patient experience: nursing care, doctor care, treatment results, patient education (including discharge information), comfort/cleanliness, admitting, other staff courtesy, food service and overall satisfaction/loyalty. The survey findings were publicly reported as the hospital’s score on each domain expressed as whether it was the same as, above or below the national average. Hospitals also received survey item data (expressed as percentage scores).

The sampling frame comprised four key executives in each of the 11 hospitals (CEO, medical director, nurse executive and patient satisfaction co-ordinator). Of the 52 positions identified, 42 people agreed to take part in the study (13 CEOs, eight medical directors, eight nurse directors and 13 patient satisfaction co-ordinators). The authors interviewed participants 1 year after the initial release of the first public report, either face to face or by telephone, and explored what QI activities had taken place in response to the patient experience survey.

The authors found that every hospital reported at least two QI initiatives within the domains reported in the survey. The most commonly reported areas in which improvement initiatives had taken place were admitting (nine hospitals), patient education (nine hospitals), nursing care (eight hospitals), treatment results (eight hospitals) and food service (eight hospitals). Less common areas were other staff courtesy (six hospitals), doctor care (five hospitals) and comfort cleanliness (four hospitals). Hospitals also reported being involved in other, broader QI initiatives, which could also have impacted on the domains reported in the patient satisfaction survey. However, the authors did not explore how the hospital’s own score on these domains related to the QI efforts. The authors also found that although most hospitals had a decentralised approach for initiating QI initiatives, the reporting of the patient survey results was centralised and focused on senior management. Participants explained that they used the patient experience survey results to prioritise areas for improvement. They also noted that they had the greatest support for QI activities from the board and senior management and the least support from medical staff. Participants cited ‘widespread support for QI, a culture and leadership fostering QI and a team approach’²³² as being important for successful QI activities.

This paper was a small-scale interview study in one location of the USA. The authors relied on hospital leaders’ self-report of whether or not QI activities had taken place, which might have been subject to recall or social desirability bias. However, this paper provides another layer of evidence to suggest that some areas of patient experience were more likely than others to be subjected to QI efforts. It also suggests that senior managers played an important role in supporting QI activities.

Boyer et al.²³³

The study reports on providers’ views and responses to a locally developed and implemented patient experience questionnaire for inpatients in a 2220-bed teaching hospital in France. The patient experience survey had been carried out yearly since 1998 and was ongoing at the time the paper was written (2006). It produced patient experience scores on a number of dimensions (medical information, relation with nurses, relations with doctors, living arrangements and health-care management) for the hospital as a whole and for each clinical department. The authors surveyed staff members in the hospital using a 26-item questionnaire which examined if staff had been informed about the overall hospital results, the results for their ward, how they were informed about the results, if the results were discussed, whether or not any action plans were developed as a result of the survey and staff attitudes towards patient experience surveys. The authors sent 502 questionnaires out to staff in the hospital, although they did not report what their sampling frame was or how it was determined. Of these, 261 (52%) of the questionnaires were returned.

The authors found that the specific results for the ward were less well known than the overall hospital results, with 60% of respondents indicating that they were aware of the specific ward results and 70% indicating that they were aware of the overall hospital results. However, 87% of staff indicated that they were more interested in the ward-specific results, compared with 13% who indicated that they were more interested in the overall hospital results. Respondents placed a higher value on open-ended comments than on standardised patient experience scores. Forty per cent of respondents indicated that the results of the patient satisfaction survey were discussed in staff meetings, 40% indicated that actions were taken to solve problems and 40% indicated that the survey had led to modifications to professional behaviour. In their conclusions, the authors argue that one explanation for the insufficient use of the survey may be explained by ‘a lack of quality management culture’ and a lack of ‘discussion of the results within the department’.²³³

This is a poor-quality study; the sampling frame for the study was not clear and the sample size and sample of participants who responded was small. The survey was conducted in one hospital and as such its findings may not be generalisable. Nonetheless, the study provides some indication that ward-level data were perceived as more useful than overall hospital performance. It also suggests that patient experience surveys are not a panacea to QI but that their use depends on the extent to which the data are disseminated within the hospital and whether or not they are discussed in ward meetings. It also implicates the importance of a broader, supportive hospital culture in facilitating the use of patient experience surveys to improve patient care.

Geissler et al.²³⁴

This study explored the motivators and barriers to doctors’ use of patient experience data. The authors were particularly interested in doctors’ views of the patient experience surveys distributed confidentially as part of the activities of the MHQP collaborative. This survey was conducted and fed back to clinicians every 2 years. However, they also explored doctors’ views of other forms of patient experience data obtained from other sources. They developed a conceptual model to guide their investigation. They theorised that the degree to which doctor practices were engaged in initiatives to improve patient experience influenced the extent to which they make improvements in patient experiences. The degree to which doctor practices engaged in initiatives to improve patient experiences was influenced by organisational characteristics, such as culture, incentives, IT management and leadership, and by the characteristics of patient experience reports themselves, in terms of how they were disseminated, ease of use, timeliness and level at which the report was fed back. Here we focus on the findings relating to the nature of the reports themselves.

To test their model, Geissler et al.²³⁴ conducted 30-minute semistructured interviews with a sample of doctor groups in Massachusetts. The sampling frame was the 2007 MHQP state-wide doctor directory, with at least three doctors providing care to members of at least one of the five largest commercial health plans in Massachusetts, resulting in 117 doctor groups who were invited to participate. They interviewed leaders from 72 groups, giving a 62% response rate.

Their study did not specifically compare doctors’ views of different types of patient experience surveys, but their findings do provide some insight into how the characteristics of the reports and the way they were fed back served to support or constrain the use of patient experience surveys in improving patient care. Participants indicated that they valued free-text responses from patients and sent positive responses to staff to boost morale, especially if individual staff were named. The negative ones were used to target particular departments or wards that were named in the feedback. This suggests that the free-text responses were valued because they allowed a more specific understanding of what was going well and what was not.

They also valued patient experience surveys, which provided support in interpreting and acting on the findings, such as those which provided a ‘priority list’ consisting of the ‘ten most important areas or things that you could address that would have the biggest impact on improving patient satisfaction’.²³⁴ The timeliness of the data was also mentioned as important, with data provided on a frequent basis being seen as supporting efforts to improve care, and those with a large time-lag between data collecting and analysis being seen as less useful, as the following quotations from respondents illustrate:

This data has been more useful . . . because it’s more timely. The data is available to us on an ongoing basis; we get it literally every day . . . so . . . the feedback is . . . more current.

I will get the MHQP and it’s on stuff that happened a year and a half ago. That’s very hard to go out to . . . practices and say we have got a . . . problem . . . you’ve got to do something about it . . . they say ‘well that was a year and a half ago’.

Doctors also valued reports that provided data at the level of the individual clinician and were benchmarked against other groups’ performance, so that they could compare their performance with that of others.

In terms of the theories under test, this study suggests that support with data interpretation, the level at which the data were reported and the timeliness with which the data were reported were seen by doctors as important in either constraining or supporting their efforts to improve patient experience.

Reeves and Seccombe²³⁵

This study aimed to explore providers’ attitudes towards patient experience surveys and understand if and how they were being used in NHS hospitals. At the time of this study, annual patient experience surveys were conducted in specific patient populations: inpatients, emergency departments, outpatients and young patients (aged 0–18 years). Twenty-seven hospitals were purposively sampled from 169 NHS trusts providing acute care; the sampling frame was organised according to the size of the trust and whether they were inside or outside London. The person listed in the Health Care Commission records as being the lead for patient surveys was contacted to check whether they were the lead and, if so, invited to take part in an interview. It is not clear if interviews were undertaken face-to-face or by telephone. The interviews focused on views and uses of patient surveys, but they were not tape recorded and only notes were taken. This study therefore focuses on the views of those who lead in the trust on patient experience surveys and as such may not represent the views of frontline staff. As notes were taken, it is possible that key issues were missed and that participants’ responses were filtered through the interviewer’s frame of reference, leaving open possibilities of misinterpretation, misunderstanding of what the interviewee meant and selective listening or remembering. The study was funded by the HSCIC.

Participants drew attention to the trade-off between the timeliness and robustness of different sources of patient experience data. Participants noted that comment cards and suggestion boxes offered immediate feedback, and comments written on questionnaires were seen as useful in gaining the attention of clinicians and often provided details of incidents of poor care. As one participant commented ‘Reading through the comments, even though our percentage scores are OK, you think “That shouldn’t have happened” ’.²³⁵ However, patient experience surveys were seen as more robust: ‘Without a doubt, the national patient surveys are given the most weight. We have nothing else that is so sophisticated and would give us such useful data’.²³⁵ There appeared to be a distinction drawn by participants between ‘soft’ information, such as comments or complaints, and ‘hard’ evidence such as clinical or routine data. Patient experience surveys appeared to be seen as more ‘robust’ than comments and complaints.

Participants also commented about the methods through which survey findings were disseminated, most commonly through the organisation’s intranet, newsletters, meetings where the contractors came into present findings and special events. In most organisations, the results were sent to senior staff who were expected to cascade the information down to junior staff, but participants reported that some groups of staff, such as doctors or more junior staff, were less likely to receive the results. They also commented on how difficult or easy it was to interpret those data; almost all participants felt that the Healthcare Commissions presentation of the published results were easy to interpret, especially the traffic light system, which shows whether the trust falls within the best 20% of trusts, the middle 60% or the bottom 20%. This helped trusts to ‘see quite clearly where you are and where you should be’.²³⁵

However, when it came to acting on the findings from patient experience surveys, opinions were more varied. Some participants felt that feedback from patient experience surveys was not specific enough to be relevant to recipients, who, it was hoped, would act on the information. This was particularly seen as an issue for doctors, who participants felt were focused on their ‘sphere of influence’. As one participant noted, ‘The main criticism we have from doctors is “Make it specific to the area I work in and I will take notice of it” ’.²³⁵ They also noted variation in clinicians’ ‘receptiveness’ to survey findings, with nurses perceived as being ‘easier to engage’ than doctors.

Almost all participants reported using patient survey results as the basis of action plans, and the authors give two examples of changes providers made. Both were in response to very specific issues highlighted by the survey: one was in response to the surveys highlighting problems with ‘noise at night’ that led to a range of efforts to reduce noise on the ward, and the other was variable information provided at discharge which led to changes in the way information was provided. Here, the surveys appeared to highlight problems with specific areas of care that were addressed. However, some participants also noted difficulties in formulating and then implementing action plans in response to the data. One participant commented that ‘Just giving people the results doesn’t mean they will take action. They need direction to make them do things and the frameworks to help them’.²³⁵ Other participants commented that they found it difficult to identify the reasons behind their successes or failures, and had difficulty knowing how to address shortcomings. Policy documents, published at the time of the study, promoted the idea of spreading best practice across the NHS, in line with theories of collaborative benchmarking. However, participants appeared divided in their enthusiasm for this idea. Some were interested in learning how others had made improvements, while others, in the authors’ words, were ‘not particularly enthusiastic’ about identifying and learning from the best practices of others, although the exact form of their opposition is not reported.

In terms of the theories under test, this study indicates that providers were aware of the trade-off between timely and robust feedback and felt that both types needed to be integrated to provide a fuller picture of their performance. It also suggests that providers preferred feedback that was specific to their ward or ‘sphere of influence’ and that this was an important determinant of whether or not providers took action in response to this. However, it also suggests that providers needed support to identify the reasons behind their successes or failures and, in turn, to take steps to make improvements. This suggests that patient experience data do not always provide a clearer picture of the causes of good or poor care.

Boiko et al.²³⁶

This study explored primary care staff views and responses to the confidential feedback of a patient experience survey, similar to that used in the England GP patient survey. A random sample of 25 practices from Cornwall, Devon, Bristol, Bedfordshire, Cambridgeshire and North London agreed to take part in the study, and a random sample of their patients was mailed the patient experience survey. The practices received aggregate-level feedback for their practice, and each individual family doctor also received confidential feedback of their own scores on the patient experience survey. A purposively selected sample of 14 GP practices were then invited to take part in focus groups, which included 128 participants in all (40 GPs, 18 practice managers, 18 nurses, 20 receptionists, 13 administrators and 19 other staff members). The focus groups explored how practices had responded to the findings of the surveys, and also commented on two hypothetical situations in which some doctors in the practice received less favourable patient experience scores than other doctors.

Participants questioned whether or not patient experience surveys could adequately capture the ‘complex reality of healthcare interactions’ and contended that they focused on what was measurable to the exclusion of other important aspects of care. As one GP explained:

A lot of this data that’s collected in a measurable kind of way doesn’t really represent reality. There’s a kind of fixation on measurable outcomes but they don’t really tell us what’s going on.²³⁶

Staff also drew attention to the trade-off between the increased relevance of local surveys that were less robust, versus the robustness of national surveys that were less specific to individual practitioners and did not include free-text comments. As one GP commented, ‘We want to see data tailored to individual practitioner because we all practice [sic] differently’.²³⁶ Patient complaints were seen as more useful because they allowed practices to understand where problems may lie. As one administrator noted, ‘I think we learn a lot more from patients that write to us individually about complaints’.²³⁶

Participants also reported a number of changes they had made to services in response to the survey, including modifying their facilities, appointment systems and providing staff training. The changes largely related to organisational aspects of service delivery and operational matters. However, the authors commented that, for most practices, changes were ‘rarely attributable directly to the survey feedback’;²³⁶ rather, the survey had provided a ‘nudge’ to implement changes they were already considering. Participants mentioned a number of difficulties in responding to issues highlighted by the patient survey, including not having the resources to acquiesce to perceived unrealistic patient expectations (e.g. patients wanting the surgery to be open at weekends), balancing the sometimes conflicting demands of different groups of patients (e.g. some patients wanted music in the waiting room, whereas others did not) and the working patterns of GPs making it difficult to always fulfil patients’ preferences. As one doctor summarised:

Would you like the surgery to be open on Saturday? Yeah. Would you like us to go 24 hours? Yeah. Are you going to pay more taxes to have it open on Saturday? No.²³⁶

They also felt that, even though they had made changes to the organisational aspects of the delivery of care, these had not been always been reflected in improved scores on patient experience measures which, as one GP described, were ‘remarkably stubborn in terms of the change in perception by patients’.²³⁶

In particular, they acknowledged that it was very difficult to tackle an individual doctor’s poor performance, especially when findings were fed back confidentially. It was only perhaps when these findings were shared more widely within the practice that change might occur. As one practice manager commented:

If the survey results are between (the survey providers) and the doctor . . . there’s absolutely no reason for them to change their ways is there? What is the motivation to change . . .? It is only when this information becomes available to . . . the practice that things could start to change.²³⁶

However, this respondent was also unsure exactly who in the practice could be expected to put pressure on ‘poor performing’ GPs to change their behaviour. Teams acknowledged the difficulties of having an ‘unmanageable’ GP in the practice but most teams indicated that they would support a doctor who consistently received poor patient feedback through mentoring, peer support sessions and interventions by a partner or manager. They also recognised that some GPs may not be ‘a great communicator but they are great at doing something else’.²³⁶ Finally, staff felt that there was little external support for making changes in response to patient experience surveys. One GP complained that surveys had come out but that there was:

very little support from anyone to say, right, this is how you can improve things that might help or we understand why you might be having problems . . . It has always been: here is your survey results, it is up to you now to sort it out.²³⁶

The authors concluded that primary care staff view the role of patient experience surveys as serving a ‘quality assurance’ function, as they offered evidence that they were providing an acceptable standard of care. However, it was less clear that patient experience surveys fulfilled a QI function. Although patient experience surveys identified potential dimensions for change, ‘actual changes were usually confined to “easy targets” for modification such as décor or playing music’.²³⁶ They note that ‘issues such as the management of GPs with evidence of poor communication skills, or responding to other “interpersonal” aspects of care, were much harder to tackle’.²³⁶ They also argue that patient experience survey findings were only one of the ‘spurs to action’ to address problems that practices were often already aware of. In terms of the theories under test, this study suggests that although patient experience surveys may provide a clearer indication of areas of care that required improvement, there was no guarantee that this led to QI activities. The changes that were made focused on issues that staff were already aware of or on the organisational aspects of service delivery, rather than on the ‘harder to tackle’ issues of communication skills and interpersonal behaviour.

Theory 9a summary

These studies suggest that the timeliness of some forms of patient experience data were valued by providers and that ward-specific data were perceived as more useful than higher-level hospital data. However, they also draw attention to the trade-off in characteristics of different forms of patient experience data. Qualitative or ‘softer’ data from comment cards, patient responses to open-ended questions or complaints were seen as providing data that were useful in providing a more detailed understanding of the nature and causes of problems but were seen as ‘less robust’ by providers, while patient experience surveys were perceived as focusing on measurable but less relevant aspects of patient care but were acknowledged to be more robust. Furthermore, providers questioned whether patient experience surveys were able to capture the real-life complexities of patient care. Providers felt that both sources were needed to provide a more rounded picture of patient experience.

However, although patient experience surveys identified potential dimensions for change, the studies reviewed here suggest that this did not always lead to steps to improve patient care. Furthermore, where changes did occur, these were not always directly in response to the findings of patient experience surveys, but reflect issues staff were already aware of from other data sources. The studies indicate a number of reasons for this. Some forms of patient experience data were perceived as not being specific enough to be ‘in the sphere of influence’ of certain clinicians. Providers were sometimes not clear on how to make changes and wanted guidance with this process. Others felt that they did not have the resources to meet patient demands, such as opening primary care services at the weekend. The studies suggested that, when changes were made, these tended to focus on addressing aspects of the organisation of care, or the so-called ‘easy stuff’, and that the more interpersonal aspects of patient care, relating to the behaviour and communication skills of individual clinicians, were much more difficult to address.

Theory 9b: making patient experience data more immediate and integrated into clinical discussions improves provider responses; theory 9c: providing tailored support to interpret and act on patient experience data improves providers’ responses

We now examine studies that have attempted to provide additional support to providers in order to enhance the impact of the feedback of patient experience data. These have included making patient experience data more immediate and better integrated into clinical discussions and providing tailored support to help providers interpret data, identify problems with care and develop solutions.

Reeves et al.²³⁷

This study was designed as a pilot study to test the feasibility and impact of an intervention designed to improve the ways in which patient experience data were fed back to recipients. The intervention had a number of components, including (1) increasing the immediacy of the feedback, (2) providing specific feedback at ward level, (3) including patients’ written comments in addition to the ward’s scores and (4) an enhanced version of the intervention that also included ward meetings with nurses to discuss the findings of the survey and support them to act on the findings. It was hypothesised that the intervention would increase the likelihood that clinicians (in this case nurses) would take steps to improve patient care and, in turn, the patient experience scores for the ward would improve.

The study design was a RCT in two single site acute hospitals in London. All non-maternity inpatient wards were eligible for inclusion: 18 wards in trust A and 14 wards in trust B. Nine wards in each trust were randomly allocated to one of three arms, although the rationale for this number is not provided. Patient survey data were gathered in each trust specifically for the study, using the CQC’s Inpatient Questionnaire collected through a postal questionnaire. A random sample of 160 patients discharged from each of the included wards over a 2-month period was taken at 4-month intervals during the study period (on six occasions in trust A and three in trust B) and they were mailed a questionnaire. The overall response rate during the study period was 47%.

In the control arm, survey findings were provided to the director of nursing in each trust, with no special efforts made to disseminate them to ward nurses. In the ‘basic feedback’ arm, individual letters were sent to nurses on the wards and their matrons which detailed (1) bar charts of scores on questions about nursing care, comparing the target wards scores with the scores for other wards in the arm and the national average; (2) as the study progressed, graphs of how the ward’s scores had changed over time; and (3) transcription of patients’ responses to a series of open questions on the patient experience survey. The ‘feedback plus’ arm received the letters and feedback, but ward managers were also asked to invite ward nurses to ward meetings with the researchers during working hours to discuss the survey findings. The main outcome measure of the study was the mean score of a subset of 20 questions from the CQC inpatient questionnaire, which was termed the Nursing Care Score. The authors do not provide details on why and how these questions were chosen or on whether or not the scale was psychometrically valid and tapped into a common factor of nursing care. Notes were taken at meetings with nurses in the feedback plus arm and ward managers in the basic feedback arms were also contacted to establish whether or not actions had been taken in response to the survey, although details of how this occurred were not provided.

Multiple-level regression was used to examine changes in the Nursing Care Score in wards in each arm. This analysis suggested that there were no statistically significant differences in the changes in ward patient experience scores over time between the control arm and the basic feedback arm. The changes in the Nursing Care Score from baseline to follow-up for the wards in both basic and feedback arms were negative in all wards except two, suggesting that their patient experience scores had worsened over time. The authors report that, when asked, ‘none of the basic feedback ward managers identified specific actions resulted from the printed results’. This suggests that improving the timeliness of feedback and providing ward-level specific feedback alone was not sufficient to lead to improvements in patient experience scores. There was a statistically significant difference in the changes in scores over time in the feedback plus arm compared with the other two arms, with scores improving over time. However, there was much more variation in changes in the Nursing Care Score from baseline to follow-up in the intervention arm, with some wards staying virtually the same (three wards with changes < 1.0 point either way), some worsening (two wards) and one improving. Graphs showing the aggregated rate of improvement in the three groups also demonstrate that the differences between the three groups was largely due to the fact that control group and basic feedback patient experience scores worsened over the study period, while a very small overall improvement was observed in the intervention arm.

The authors reported some of their own impressions from ward meetings to understand the function of the ward meetings and explain why they lead to improvements in patient experience scores. Attendance at meetings was variable, but the attendance of matrons ‘had a positive influence’, as they offered ‘suggestions for improvement’, encouraged ‘ward staff and nurses to take responsibility for results’ or supported ‘efforts to implement changes’.²³⁷ The authors felt that the ward meetings facilitated ‘nurses engagement’ and noted that the written comments in particular stimulated the nurses’ interest. The authors also noted that staff needed prompting from them to focus on understanding the patient feedback and planning strategies for improvement; without this staff were more ‘inclined to discuss . . . the many difficulties they experienced in fulfilling their duties; staffing shortages, NHS policy or their perceptions of hospital managers’.²³⁷ We can hypothesise that the discussion of more general matters either was a lapse of focus on the nurses’ part or served to explain or justify the difficulties in understanding and acting on the data.

The verbatim quotations that are offered by the authors suggest that nurses felt that some of the survey findings had a rational explanation and did not constitute an indicator of poor care; for example, they explained that ‘Patients think we are talking in front of them as they are not there because we have to talk quietly to maintain confidentiality’.²³⁷ They also appeared to implicate patients themselves in making unreasonable demands by using ‘call buttons for trivial reasons’ so that ‘it would not be good use of our time to answer them all immediately’, suggesting, again, that they questioned if the survey indicators tapped into ‘good care’.²³⁷ The authors also note that it was ‘difficult to ascertain clear examples of innovations in patient care’ as a result of the patient survey feedback and the meetings. The most common responses were that nurse managers raised the issues in ward meetings and handovers and ‘reminded nurses of the importance of fulfilling their duties relating to ensuring patients’ experiences were positive’.²³⁷

This study was a pilot study and, as such, has a number of methodological shortcomings. The randomisation process was not masked, leaving open the possibility that the research team’s preferences or knowledge of the wards influenced the allocation. It was not a cluster RCT; wards in the same hospital were randomised to different arms, leaving open the possibility that contamination between the three arms occurred, diluting the impact of the intervention. It is not clear whether or not the changes in patient experience scores in the intervention arm were statistically significant over time, only that they were statistically significantly different from the changes that occurred in the other two arms. The qualitative data were not collected systematically and many of the insights into the meetings were derived from the researcher’s impressions, with few verbatim quotations from participants.

Despite these limitations, the study does provide some lessons about the elements of the feedback process that may support or constrain the actions taken in response to the feedback of patient experience data. In terms of the theory under test, this study suggests that improving the timeliness of feedback and providing ward-level-specific feedback alone was not sufficient to lead to improvements in patient experience. Indeed, under these conditions, patient experience scores worsened. The facilitated meetings did lead to small improvements in patient experience as measured by patient experience scores, but it is not clear if these in themselves were statistically significant. These meetings functioned as an opportunity for nurses to air their concerns about the data (and their general working conditions) and as a means of raising nurses’ awareness about the data, rather than as sites for the strategic planning of improvements to patient care.

Davies et al.²³⁸

This study evaluated the impact of a peer and researcher support intervention coupled with a modified process of collecting and feeding back patient experience survey data on the QI activities of a small number of providers. Nine providers who had previously expressed an interest in learning to use patient experience data more effectively were invited to join a collaborative. One group left the collaborative early on in the project, leaving eight groups involved. They were invited to submit suggestions for how the current CAHPS survey could be modified to make it more useful for QI purposes, and the research team used the suggestions to refine the survey. The resulting survey covered five domains: scheduling and visit flow, access, communication and interpersonal care, preventative care and integration of care. It also included a question that measured a global rating of care, a question on whether or not patients would recommend the service to others and an open-ended question. The survey was administered to a random sample of recently discharged patients and fed back to providers on a rolling monthly basis to provide ‘real time’ data to providers, in the hope that this would ‘support a rapid cycle of quality improvement’.²³⁸

In addition, the providers participated in a patient experience ‘action group’, which followed a ‘model of collaborative learning’.²³⁸ This involved a group leader and team members attending a full-day bimonthly meeting facilitated by QI advisors, which focused on supporting the groups to understand and interpret the survey data, prioritise areas for improvement, set targets and action plans to address any issues raised by the patient experience data, and monitor their progress in implementing these plans. The impact of the intervention was evaluated using a mixed-methods cohort study, including measuring changes in patient experience scores during the 18-month intervention and interviews with collaborative methods to explore their experiences. All eight leaders participated in initial interview but, by the time of the follow-up survey, two groups had left the collaborative and two leaders had changed positions. Consequently, the six original leaders and one new leader were interviewed at follow-up. In addition, four leaders invited other members of their team to take part in interviews.

At follow-up, six groups had used at least one of the suggested tools to explore the possible reasons for their patient experience survey results. These tools included walkthroughs (five groups), patient interviews (two groups), patient focus groups (two groups) and cycle time surveys (one group). Four groups had used one of the suggested interventions designed to improve patient care, including scripting for clinic staff (two groups), communication skills training (one group) and patient education materials. Two groups had developed their own interventions. The leaders reported feeling that the group’s support had been useful in creating momentum and motivation to implement the changes, and had provided an opportunity to learn from others. As one member commented:

It lets you know that you are not alone . . . we tend to blame our workers if we get bad outcomes but if the whole world is getting bad outcomes . . . perhaps . . . it’s a common culture. We all have stories about success and failure and sharing those stories is helpful.²³⁸

Six leaders decided on and attempted to implement at least one intervention to improve patient care; four leaders reported that they had implemented the intervention as originally planned, while two reported problems with implementation. These latter two groups reported that other organisational changes had competed for priority and they had focused on those instead. For the other four groups, however, the impact of these interventions was not always reflected in follow-up patient experience survey results. Many of the changes in the scores were not statistically significant and were sometimes in domains that were not directly related to the focus of the intervention that had been implemented. Of the four groups who had implemented an intervention, ‘three had some change in the direction they had hoped for’.²³⁸ For example, one team had attempted to improve communication between staff and to patients about waiting times and test results, and found that more patients reported feeling that they had been kept informed about this in the last 6 months.

Two groups showed ‘mixed or negative effects’. One group had delivered communication skills training to staff and found a ‘slight improvement’ in patients reporting that doctors explained things to them in a way that was easy to understand, but a decline in the percentage of patients reporting that doctors spent enough time with them. Staff reported that some aspects of the communication skills package were perceived as conflicting with the group culture in suggesting that the problem lay with individual clinicians who need to work on their skills, rather than working together as team. Another had tried to reorganise their clinic to improve visit flow but found that patients reported longer waiting times and not being informed about their wait. This was perceived to be because patients arrived earlier than they were asked to and, therefore, perceived this as an additional delay. Thus, attempts to improve one element of care seemed to have a detrimental impact on other aspects. Following a post hoc analysis of their findings, the authors identified that ‘the two groups that succeeded most clearly in improving their patient experience worked on interventions that required no major change in clinician behaviour specifically for the project’ and which ‘aimed for modest improvements that did not require complex changes’.²³⁸

This was a small case study of six doctor practices that were self-selected, highly motivated to take part in the study and relatively experienced in improvement activities. As such, the findings of this study may not be generalisable to other practices that were less motivated and less experienced. However, it provides some valuable lessons as to the circumstances under which the feedback of patient experience data does and does not lead to the successful implementation of QI strategies. It suggests that teams need considerable support to interpret and understand patient experience data, and that they need to conduct more specific investigations to identify the cause of negative patient experiences. Furthermore, it indicates that implementing change is challenging, and that the more complex the issue, the more challenging it is to effect change. Those that succeeded in this study attempted relatively simple interventions that did not require substantial changes to clinical behaviour. Finally, it also demonstrates that change to one aspect of patient care can have a detrimental impact on other aspects of the systems. Thus, as the authors argue in their conclusion, it is possible to produce small improvements in patient experience by making changes to simple aspects of patient care, but ‘it is difficult to . . . leverage more substantial change without a more comprehensive strategy that is organisation-wide and regarded as fundamental to organisational success’.²³⁸

Theories 9b and 9c summary

Here we have reviewed two different interventions that aimed to increase the impact of the feedback of patient experience data on improvements to the quality of patient care. They suggest that enhancing the feedback alone, through providing timely, ward-level data to nurses, is not sufficient to lead to the implementation of QI strategies in response to these data. Supplementing this feedback with ward-level meetings served an important function of addressing nurses’ concerns about the validity of the data, raising awareness of the data and reminding nurses about their role in supporting positive patient experiences. However, these meetings did not serve as opportunities for the strategic planning of QI activities and led to only small improvements in patient experiences.

The other study evaluated an intervention that focused on hospital leaders who were involved in modifying the patient experience questionnaire used and received regular, timely feedback of patient experience data and both expert and peer support to interpret the data, investigate the causes of poor care and implement interventions to address any issues. Despite this considerable amount of support, the findings from this study suggest that some improvements in patient experience occurred but were hard-won. Those that succeeded focused on simple interventions that did not require complex changes or changes to clinical behaviour. This reinforces the findings from the studies reviewed previously in this section that changes to the behaviour of clinicians are more difficult to achieve. The study also demonstrated that changes to one aspect of patient care can have unintended effects on other aspects. The lesson from this study is that significant and sustained improvement in patient experience in response to feedback can only be achieved with system- and organisation-wide strategies.

Chapter summary

In this chapter we have explored the ways in which different contextual configurations influenced the mechanisms through which providers respond to the feedback and public reporting of performance data and the resulting outcomes. We have tested three main theories:

theory 7: financial incentives and sanctions influence providers’ responses to the feedback of performance data
theory 8: the perceived credibility of performance data influences providers’ responses to the feedback of performance data
theory 9: the degree to which performance data are ‘actionable’ influences providers’ responses to the feedback of performance data.

Theory 7

The findings of our synthesis suggest that greater improvements in the quality of patient care occur when providers are subjected to both financial incentives and public reporting than when they are subjected to either initiative alone.²¹⁵^–²¹⁷ We also found that the feedback of performance indicators to providers who are subjected to neither public reporting nor financial incentives rarely led to formal or sustained attempts to improve the quality of patient care, particularly when providers themselves did not trust the indicators.²¹⁹^–²²¹ Under these conditions, the feedback of performance was more likely to lead to providers improving the recording and coding of data, which may be an important first step in increasing their trust in the data themselves and also providing a basis from which further QI initiatives may occur.

However, we also found that financial incentives have only a short-term impact on QI if they are used to incentivise activities that providers already perform well in and when providers reach the threshold at which they would receive the maximum amount of remuneration.²¹⁸ Furthermore, we also found quantitative²¹⁸ and qualitative evidence¹⁸ to indicate that financial incentives, together with public reporting, may lead to ‘tunnel vision’ or effort substitution, that is, focusing on aspects of care that are incentivised to the detriment of care that is not, especially when providers do not feel that the indicators adequately capture quality of care. There is also evidence to suggest that when providers are subjected to both public reporting and financial incentives attached to these indicators, but they do not feel that the indicators are valid or contribute to patient care, this can lead to the manipulation or gaming of the data.¹⁸^,⁸⁰^,⁸¹ This is not necessarily the result of active attempts to ‘cheat’ the system on the part of providers. Rather, the use of financial rewards can create perverse incentives that are at odds with the inherent clinical uncertainty of conditions such as depression. Under these conditions, clinicians have to find a way to manage this clinical uncertainty, while at the same time ensuring that they are not financially penalised for doing so.

Theory 8

Our synthesis suggests that adequate case-mix adjustment and the accurate coding and recording of data were essential for providers to have any trust in performance data.⁹⁷^,¹⁹⁵^,²²⁴ Both the source of performance data and the process through which they are collected and presented are important influences on whether or not such data are perceived as credible by clinicians. We also found support for the theory that clinicians perceived data from patients’ notes as being more credible than performance data derived from administrative data.²²⁵^–²²⁷ However, our synthesis also indicated that clinical involvement in the design of the public reporting initiatives was a better explanation of their success, or otherwise, than the nature of the data alone. We therefore tested the theory that providers respond differently to public reporting initiatives that are imposed on a mandatory basis by national or state governments or regulatory authorities compared with those that are led by clinicians.

We found that mandatory public reporting systems were perceived by providers to be governed by political motives, rather than by a desire to improve the quality of patient care.²²³ Mandatory public reporting systems focused the attention of hospital leaders and frontline staff on quality issues, particularly for those who had no previous experience of QI activities.²²⁸ However, unless the issues raised were also identified as a problem based on clinicians’ day-to-day experience, by their peers or by the internal data collected by the providers themselves, mandatory public reporting systems were perceived as leading to ‘tunnel vision’.⁹⁸ In particular, for those who were already engaged in QI activities, mandatory public reporting systems could artificially focus their attention on a limited number of issues, deemed important by government or regulatory bodies, at the expense of other clinically important areas.²²⁸ Furthermore, although scoring poorly on an indicator included in a mandatory public reporting system could ‘kick start’ a response from providers, it was only through analysing internally collected data that providers could understand the source of the problem and identify possible solution to rectify this.⁹⁸

We also found that clinicians engaged with clinician-led public reporting systems because they supported their primary motive of improving the quality of care through comparing performance and sharing best practices.¹⁹⁹ The perceived mechanisms of change were those evoked by audit and feedback and benchmarking theories; that is, clinicians had an intrinsic motivation to do a good job and were prompted to improve if their performance was poorer than they would like or poorer than that of their peers. This engagement in turn led to the development of indicators that reflected the information that clinicians needed to improve patient care. As clinicians were involved in the design of the indicators, it was difficult for them to then dismiss the data as invalid.

However, we also found that although clinical acceptance of the indicators as valid was more likely when the public report programme was led by clinicians, and this in turn meant that clinicians were more likely to take steps to respond to such data, the success of QI initiatives also depended on the experience of the practice in implementing QI activities and the resources available to implement changes.²³⁰ Providers reported that considerable resources were required to enable them to respond to issues highlighted by the feedback of performance data. Those who are already engaged in QI activities may have been more likely to set up internal data processes that inform QI on an ongoing basis and, thus, were better able to respond when external data shone a light on poor performance.

Theory 9

We tested the theory that the feedback of patient experience data can (but do not always) embody a cluster of characteristics that render it easier for providers to use these data to initiate improvements in patient care. These include that idea that patient experience data provide a clearer indication of which care processes need to be improved, can be fed back in a timely manner and can be reported at ward as well as provider level. We found that the timeliness of some forms of patient experience data were valued by providers and that ward-specific data were perceived as more useful than higher-level hospital data.²³³ Our synthesis also highlighted the trade-off in characteristics of different forms of patient experience data. Qualitative or ‘softer’ data from comment cards, patient response to open-ended questions or complaints were seen as providing data that were useful in providing a more detailed understanding of the nature and causes of problems, but were seen as ‘less robust’ by providers, while patient experience surveys were perceived as focusing on measurable but less relevant aspects of patient care, but were acknowledged to be more robust. Providers felt that both sources were needed to provide a more rounded picture of patient experience.²³⁵

However, although patient experience surveys identified potential dimensions for change, we found that this did not always lead to substantial improvements in patient care.²²² When changes did occur, these were not always directly in response to the findings of patient experience surveys but reflected issues staff were already aware of from other data sources, a similar finding to our synthesis of other forms of performance data.²³⁶ We identified a number of reasons for this, which were also similar to those identified in our synthesis of studies addressing the credibility of other forms of performance data. Providers were sometimes not clear on how to make changes and wanted guidance with this process.²³⁵ Others felt that they did not have the resources to meet patient demands, such as opening primary care services at the weekend.²³⁶ Our synthesis also suggested that, where changes were made, these tended to focus on addressing aspects of the organisation of care, or the so-called ‘easy stuff’, and that the more interpersonal aspects of patient care, relating to the behaviour and communication skills of individual clinicians, were much more difficult to address.²²²^,²³²^,²³⁶^,²³⁷ Furthermore, when we reviewed studies evaluating interventions designed to support providers to interpret the data, investigate the causes of poor care and implement changes, we found that those that succeeded focused on simple interventions that did not require complex changes or changes to clinical behaviour.²³⁸ Our synthesis also indicated that changes to one aspect of patient care can have unintended effects on other aspects and that significant and sustained improvement in patient care in response to feedback can only be achieved with system, organisation-wide strategies. This conclusion is shared by other realist syntheses evaluating other complex organisational interventions.⁸⁹

Copyright © Queen’s Printer and Controller of HMSO 2017. This work was produced by Greenhalgh et al. under the terms of a commissioning contract issued by the Secretary of State for Health. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.

Included under terms of UK Non-commercial Government License.

Bookshelf ID: NBK409445

Contents

< Prev Next >

PubReader
Print View
Cite this Page
Greenhalgh J, Dalkin S, Gooding K, et al. Functionality and feedback: a realist synthesis of the collation, interpretation and utilisation of patient-reported outcome measures data to improve patient care. Southampton (UK): NIHR Journals Library; 2017 Jan. (Health Services and Delivery Research, No. 5.2.) Chapter 5, Feedback of aggregate patient-reported outcome measures and performance data: reviewing contexts.
PDF version of this title (4.9M)

Feedback of aggregate patient-reported outcome measures and performance data: re...
Feedback of aggregate patient-reported outcome measures and performance data: reviewing contexts - Functionality and feedback: a realist synthesis of the collation, interpretation and utilisation of patient-reported outcome measures data to improve patient care
cry5 cryptochrome circadian regulator 5 [Danio rerio]
cry5 cryptochrome circadian regulator 5 [Danio rerio]
Gene ID:83776
Gene

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Bookshelf

Functionality and feedback: a realist synthesis of the collation, interpretation and utilisation of patient-reported outcome measures data to improve patient care.

Chapter 5Feedback of aggregate patient-reported outcome measures and performance data: reviewing contexts

FIGURE 15

Theory 7: financial incentives and sanctions influence providers’ responses to the public reporting of performance data

Theory 7a: financial incentives accelerate and amplify the impact of public reporting and feedback of performance data on improvements to patient care

Lindenauer et al.215

Friedberg et al.216

Alexander et al.217

Doran et al.218

Theory 7a summary

Theory 7b: providers do not make improvements to patient care when no financial incentives are attached to performance feedback

Wilkinson et al.219

Mannion and Goddard220,221

Theory 7b summary

Theory 7c: financial incentives attached to the feedback of performance data can lead to ‘tunnel vision’

Mannion et al.18

Dowrick et al.81

Mitchell et al.80

Theory 7 summary

Theory 8: the perceived credibility of performance data influences providers’ responses to the feedback of performance data

Theory 8a: the perceived credibility of the performance data influences providers’ responses to performance data

Bradley et al.224

Mehrotra et al.195

Boyce et al.58

Theory 8a summary

Theory 8b: the source of performance data influences providers’ perceptions of their credibility

California Health Outcomes Project reports

Luce et al.225

Rainwater et al.226

New York State Cardiac Reporting System

Romano et al.227

Theory 8b summary

Theory 8c: the perceived underlying driver of public reporting systems influence providers’ responses

Hafner et al.75

Asprey et al.223

Pham et al.228

Davies98