Included under terms of UK Non-commercial Government License.
NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Grant AM, Boachie C, Cotton SC, et al. Clinical and economic evaluation of laparoscopic surgery compared with medical management for gastro-oesophageal reflux disease: 5-year follow-up of multicentre randomised trial (the REFLUX trial). Southampton (UK): NIHR Journals Library; 2013 Jun. (Health Technology Assessment, No. 17.22.)
Clinical and economic evaluation of laparoscopic surgery compared with medical management for gastro-oesophageal reflux disease: 5-year follow-up of multicentre randomised trial (the REFLUX trial).
Show detailsRecruitment to the trial
Participants were recruited in 21 clinical centres, all within the UK (their locations are listed on the left-hand side of Table 1). Recruitment to the trial was open from March 2001 until the end of June 2004, although not all centres enrolled over the total period because of the staggered introduction of centres and early closure for logistical reasons in a few places.1
A total of 357 participants were recruited to the randomised component: 178 allocated to surgery and 179 allocated medical management. 453 participants agreed to join the preference component: 261 choosing surgery and 192 choosing medical management. Table 1 shows recruitment by centre. Around one-fifth of the randomised participants were enrolled in Aberdeen; no centre contributed > 10% of participants in the preference component.
Analysis populations
Throughout the analyses presented later in this chapter, the participants in the randomised component are kept separate from those in the preference component (other than for rare surgical events). The numbers of participants in each of the four main analysis populations are shown in Table 2. All 357 who joined the randomised component are in the randomised ITT population; only the 280 within this group who actually received their allocated management over the first year are in the randomised PP population. All 453 participants who joined the preference component are in the preference ITT population; the 407 of these who, by the end of the first year, were managed as originally chosen were in the preference PP population.
Trial conduct
The derivation of the main study groups and their progress through the stages of follow-up in the trial are shown in Figure 1. This is in the form of a CONSORT (Consolidated Standards of Reporting Trials) flow diagram. In total, 1078 patients were considered for trial entry and 200 of these were found not to meet one or more of the eligibility criteria. Of the 68 patients eligible for the study but not recruited, 51 declined to participate, six were subsequently deemed inappropriate for the study by the surgeon responsible for care and the remaining 11 were missed.
Details of the clinical management actually received are described later in this chapter.
The mean (SD) time intervals in months between the receipt by the trial office of each subsequent annual postal questionnaire are shown in Table 3; all were near 12 months, as would be expected. There was, however, a difference between the randomised groups in the time interval between the 1-year and the 2-year questionnaires (mean 12.2 months surgical group vs 13.9 months medical group). In part, this was due to more late returns in the medical management group – the median intervals were closer: 12.00 and 13.00 months respectively. As described previously,1 early follow-up was adjusted to be at a time equivalent to 3 and 12 months after surgery. The adjustments in the medical group to match this could be only approximate and this is the explanation for the difference that remained between the randomised groups. An advantage of long-term follow-up to 5 years is that any difference in the timing of follow-up becomes proportionately smaller over time.
More details of the response rates to the annual questionnaires are provided in Table 4. The overall rates of return of annual follow-up questionnaires (years 1–5) were 89.5%, 77.7%, 76.7%, 69.8% and 68.9% of the study participants. Seven participants are known to have died up to the end of the 5-year follow-up; equivalent response rates among those not known to have died are 89.8%, 77.9%, 77.0%, 70.2% and 69.5%. There were no substantive differences in response rates between the groups.
Three participants died before the 1-year follow-up was reached: two in the preference surgery group and one in the randomised medical group. None of these participants actually had surgery. Four died subsequently; there is no evidence linking these deaths to trial participation.
Description of the groups at trial entry
Sociodemographic and clinical factors
Table 5 provides a description of the groups at trial entry. The main division within the table is between participants in the randomised component and those in the preference component. These two halves of the table are further divided according to the allocation of participants and then subdivided according to ITT or PP.
Randomised arms
Within the randomised groups there were no apparent imbalances between the medical and surgical intervention arms. The patients were, on average, 46 years old, 66% were men, around two-thirds were in full employment and participants had been on GORD medication for a median of 32 months. The baseline characteristics in the randomised PP groups were similar.
Preference arms
The sociodemographic characteristics of the preference participants were broadly similar to those of the randomised participants. However, preference medical participants tended to be older (mean age 50 years) and were more likely to be female, fewer were in full-time employment and participants had been on GORD medication for a shorter period (approximately 6 months less than randomised participants).
Prescribed medications
The prescribed medications at the time of trial entry are shown in Table 6. There was a similar profile of prescribed medications across the randomised and preference groups. As would be expected, nearly all participants reported taking a reflux-related drug in the previous 2 weeks. Over 90% had taken a PPI, of which lansoprazole was the most common.
Health status
Randomised arms
The HRQoL scores at study entry are displayed in Table 7. The scores were broadly similar in the randomised surgical and randomised medical groups, although they were slightly higher (better health) in the randomised medical group. When the DMC first met after the initial 143 participants had been recruited to the randomised component, the committee did ask us to change the enrolment procedure to ensure that baseline questionnaires were completed before formal entry and randomisation. We understand that this was because they were concerned about an apparent imbalance between the randomised groups in baseline health status at that time. After satisfying themselves that this was not due to a breakdown in the randomisation procedure, the DMC surmised that this might be due to prior knowledge of the treatment allocation affecting questionnaire responses (with those allocated surgery tending to project worse health status than those allocated medical management). Certainly, the groups based on the first 143 participants were well balanced in other respects, and there was subsequently good balance in health status as well. The apparent small imbalance between the randomised groups in health status measures is therefore likely to be a reflection of the imbalance in the first 143 participants.
The most prevalent reflux symptoms (those with lowest scores) were general discomfort and wind. The participants had lower SF-36 and EQ-5D scores than a normal UK population with the same average age and sex characteristics (SF-36 population norm approximately 50 for all domains; EQ-5D norm 0.88).
Preference arms
The preference for surgery participants reported worse REFLUX QoL scores and worse health in general than the preference for medicine participants. It can be seen from Table 7 that the randomised participants reported QoL measures in between these two extremes.
Baseline characteristics of groups compared at 5 years
There were differences in baseline characteristics between those who had completed a questionnaire at 5 years and those who had not (Table 8). For example, responders had a higher mean age (47.9 years vs 43.6 years), had been on prescribed medication for a shorter period at recruitment to the REFLUX trial (50.5 months vs 60.2 months) and had higher QoL scores at baseline (measured on the disease-specific REFLUX instrument, EQ-5D and SF-36).
However, the baseline characteristics of those in the randomised surgical and randomised medical groups who completed a questionnaire at 5 years were very similar, with the only notable difference being in BMI (Table 9). The mean baseline BMI among responders in the randomised surgical group was higher (29.0 kg/m2) than that for responders in the randomised medical management group (27.7 kg/m2). As described in Chapter 2, these results confirmed that a repeated measures analysis assuming no differential loss to follow-up could be considered.
Surgical management
Table 10 summarises the use of surgery in the four study groups over the full 5-year follow-up period. At the end of the first year, 111 participants (62.4%) randomised to surgery had actually undergone fundoplication. Over the next 4 years, one more member of this group had fundoplication, bringing the total to 112 (62.9%). In the randomised medical group, 10 participants (5.6%) had fundoplication in the first year, with a further 14 participants having fundoplication in subsequent years, bringing the total at 5 years to 24 (13.4%). In the preference surgical group, 218 participants (83.5%) had fundoplication in the first year, with four more in the period up to 5 years, taking the percentage to 85.1%. Surgical management applied to only three participants (1.6%) in the preference medical group in the first year, with a further three being operated on in the subsequent 4 years (total 3.1%).
Information about the reasons why participants allocated surgery did not receive it in the first year is available for 47. For 25 of these 47, this was a clinical decision, most commonly the surgeon deciding that surgery was not appropriate; most of the other 22 changed their minds about surgery for a variety of work- or home-related reasons. A further 20 withdrew for unknown reasons. There is no doubt, however, that a number of these participants suffered long delays before being formally offered surgery, and this was an important factor in their eventual decision to choose not to have surgery after all. The trial was conducted at a time when there was great pressure on surgical services in the NHS, with long delays for elective surgery for non-life-threatening benign conditions being common. Indeed, the average time between trial entry and surgery in the trial was 8–9 months.1
Details of the surgery received by the 111 participants (62.4%) randomised to surgery and the 218 preference participants (83.5%) who actually received surgery in the first year, the perioperative complications that they experienced and their hospital stay have been reported previously but are summarised in Appendix 2 for completeness. There were no perioperative deaths.
Table 11 shows the numbers of those who had fundoplication who subsequently had a second reflux-related operation during the 5 years of follow-up. Overall, this applied to 16 participants (4.4%) among the 364 who had a first operation: five (4.5%) in the randomised surgery group; one (4.2%) in the randomised medical group; eight (3.6%) in the preference surgery group; and two (33.3%) in the preference medical group. In total, five of the 16 operations were reconstructions of the same wrap, three were repairs of hiatus hernia only, six were conversions to a different type of wrap and two were reversals of the fundoplication. Two of these 16 participants had a third reflux-related operation; both were in the preference surgery group – one a reconstruction of the same wrap and one a repair of hiatus hernia only.
Late postoperative complications
Table 12 describes late postoperative complications among those participants who had surgery, in each of the study groups and overall. Of the total 364 who had fundoplication, 12 (3.3%) had a late complication: four (1.1%) were oesophageal dilatations/stricture dilatations; three (0.8%) were repairs of incisional hernias; and five (1.4%) were a heterogeneous group of other complications as detailed in the table.
Medication
Figure 2 summarises reported use of any PPI medication in the previous 2 weeks across the follow-up time points of the trial. Full details are provided in the tables in Appendix 3. From the time of the first annual follow-up onwards, rates in both medical groups were consistently around 80%. The rates in the randomised surgical ITT group at the first, second and third annual follow-ups were approximately 36–38%, rising to 43% in the fifth year. The extent to which these rates reflected medication taking among those allocated to surgery and who had fundoplication (rather than those who did not have surgery) can be gauged from the randomised surgery PP group: 7.3% (3 months), 12.5% (1 year), 15.1% (2 years), 19.6% (3 years), 23.9% (4 years) and 25.6% (5 years).
Table 13 allows further exploration of the reasons for the rise in medication use in the randomised surgery group. It distinguishes those reporting taking medication at the end of the first year of follow-up from those who indicated that they were not taking medication at that time. It shows that around 10–20% of those taking medication at the end of the first year did not report medication use at subsequent annual follow-up. Among those not taking medication at the first annual follow-up in the surgical groups, around 10% rising to around 20% reported medication use at subsequent annual follow-up. This contrasts with the rates in the medical groups, with around 50–60% of those not taking medication at the end of the first year reporting anti-reflux drug use in subsequent annual follow-up. The pattern of type of PPI used changed over the course of the study. Although lansoprazole had been the most commonly used PPI at trial entry, omeprazole use increased over time to become the predominant PPI.
Outcome
Health status
Full details of the health status and QoL measures at each time point of follow-up are in the tables in Appendix 4. Details of the statistical testing of the health status and QoL scores can be found in the next section of this chapter.
REFLUX score
Figure 3 summarises changes in the disease-specific REFLUX score over the follow-up period. From this it can be seen that the scores at all time points are highest (indicating fewest symptoms) in the randomised surgical and preference surgical groups. However, the differences between the surgical and medical groups narrow over time. This is due principally to the scores in the randomised medical group improving over the first 3 years and, to a lesser extent, those in the preference medical group improving over the latter end of the follow-up period. The scores for the five components of the measure are summarised graphically in Figures 4–8. These show that the overall difference between the groups is principally due to the ‘general discomfort’ component and, to a lesser extent, the ‘nausea and vomiting’ and ‘activity limitations’ components.
Short Form questionnaire-36 items
The pattern of SF-36 scores, both for the composite physical and mental scores and for the individual dimensions (Figures 9–16), was similar to that seen for the REFLUX score, although more compact. Differences narrowed over the 5 years of follow-up, with the ‘general health’ dimension showing the clearest differences between the surgery and the medical management groups.
Use of health services
Table 14 shows use of health services for the randomised groups. The larger number of overnight hospital admissions in the medical group largely reflected admissions for surgery; as described above, 14 participants allocated to medical management had fundoplication after the first year. However, seven participants in the medical group compared with one in the surgical group had admissions for a non-surgery-related reason (data not shown).
Numbers of day-case hospital admissions were similar in the two groups. The larger number of visits to or from a GP for a reflux-related reason in the randomised medical group reflected both more individuals attending their GPs and a higher frequency of visits for those who sought GP care.
Individual symptoms of gastro-oesophageal reflux disease or its treatment
Table 15 shows the frequency with which participants reported symptoms of GORD or its treatment at 3 and 5 years of follow-up for the randomised groups. At both 3 and 5 years, heartburn was reported by a higher proportion of participants in the randomised medical group than in the randomised surgical group. In addition, a higher proportion of participants in the randomised medical group reported more frequent heartburn than in the randomised surgical group. At both time points, a higher proportion of participants in the randomised medical management group also reported regurgitation symptoms and burping/belching than in the randomised surgical group. At both 3 and 5 years, the proportions who reported no difficulty swallowing and no wind from the lower bowel were similar between the randomised surgical and the randomised medical groups. There was also little difference between the groups at each time point in the proportion of participants who reported a feeling of wanting to be sick but being physically unable to do so.
Statistical analyses
Primary outcome
The pre-chosen primary outcome was the REFLUX QoL score after 5 years of follow-up. The differences between groups with corresponding 95% CIs are shown in Table 16. Two types of analysis are presented for the randomised participants – ITT and adjusted treatment received. Table 16 also displays the impact of including adjustment for baseline score and randomised group* baseline score interaction terms.
Intention to treat
For the ITT analysis there was a mean difference of 6.4 between the groups in favour of surgery when only the minimisation variables were adjusted for (95% CI 1.6 to 11.2; p = 0.009). A repeated measures analysis across the 5 years gave a difference of 8.1 (95% CI 4.4 to 11.7). This was not the most parsimonious model – there was strong evidence of an interaction effect between randomised group and baseline REFLUX QoL score (interaction term was −0.23, 95% CI −0.43 to −0.03; p = 0.023). This implied that as baseline REFLUX QoL score increased the treatment effect decreased. Estimating the treatment difference at the trial baseline mean REFLUX QoL score of 65.2 resulted in a trial effect size of 8.5 (95% CI 3.9 to 13.1; p < 0.001). If the average patient had a lower mean REFLUX QoL score at baseline of 56.0, the effect size increased to 10.6 (95% CI 5.3 to 15.8). If the patient had a higher baseline score of 78.0, the treatment effect size decreased to 5.5 (95% CI 0.6 to 10.4). All results, however, showed strong evidence of increases in REFLUX QoL scores favouring surgery.
Adjusted treatment received
The adjusted treatment received analyses attempted to mitigate the effect of non-compliance with the allocated treatment and hence provide an estimate of ‘efficacy’.40 As expected, this approach gave a larger difference, but with wider CIs (9.4, 95% CI 1.7 to 17.0; p = 0.017).
Preference groups
The preference for surgery participants reported considerably worse mean REFLUX QoL scores at baseline than the preference for medicine participants (55.8 vs 77.5) (see Table 7). Despite starting from a much lower baseline score, at follow-up, the REFLUX QoL score slightly favoured the surgical group using an ITT analysis (difference = 0.61; 95% CI −3.44 to 4.66; p = 0.767) and an adjusted treatment received analysis (difference = 0.10; 95% CI −4.77 to 4.97; p = 0.967). The differences were not, however, statistically significant.
Secondary outcomes
The secondary outcomes were the health status measures (EQ-5D, SF-36) and REFLUX symptom score at times equivalent to 3 months and then annual follow-up after surgery, and REFLUX QoL (at time points other than 5 years, when it was the primary end point). Analyses of these outcomes are shown in Tables 17–22.
REFLUX symptom score
There were statistically significantly higher REFLUX QoL scores at all time points, albeit with some diminution over time in the surgical group (see Figure 3). Although symptom category scores favoured surgery across all domains at all time points, the most marked and sustained difference was in ‘general discomfort’.
Short Form questionnaire-36 items
The SF-36 scores in all domains also favoured the surgical group at all time points. Differences decreased over time and this was reflected in most p-values being < 0.05 up to 3 years, whereas at year 5 this applied to only ‘norm-based general health’ and ‘norm-based role emotional’.
European Quality of Life-5 Dimensions
Differences in EQ-5D had a similar pattern to differences in REFLUX QoL and SF-36 scores – differences all favoured the surgical group but tended to narrow such that scores at years 2 and 3 were statistically significantly different, but at later time points they were not. Variability tended to increase over time. Despite the general narrowing of the EQ-5D difference over time, at year 5 it was actually the same as that at 12 months after surgery but with wider CIs.
Adjusted treatment received
As would be expected, all (with a small number of exceptions) the adjusted treatment received analyses had larger differences than the corresponding ITT analyses (around 25–50% higher), but with wider CIs.
Subgroup analyses
Removal of data from the single largest clinical centre (Aberdeen)
No formal exploration of centre effects was undertaken because of the small numbers of participants recruited in many of the clinical centres. However, a sensitivity analysis removing the data from the Aberdeen centre, the centre where the largest number of participants were recruited, did not significantly change the conclusions (adjusted difference in REFLUX score at 60 months = 5.43, 95% CI 0.96 to 9.90).
Partial compared with total wrap procedure
In an observational analysis, there was no evidence of a difference between a total wrap procedure and a partial wrap procedure. The difference in the REFLUX QoL score between these procedures at time equivalent to 5 years post surgery was −1.0 (95% CI −5.4 to 3.7; p = 0.649).
Discussion
Follow-up to 5 years after laparoscopic surgery described here provides clear evidence of sustained improvement in GORD symptoms, as judged by the REFLUX QoL scores. Differences between the groups as randomised did tend to diminish over the course of the study; nevertheless, the analyses at 5 years (the primary end point) showed highly statistically significant results with effect sizes of the order of 0.6 of a SD.
This report concentrates on the data collected annually at a time equivalent to between 2 and 5 years post surgery. Data were collected through self-complete postal questionnaires, backed up by postal and telephone reminders and occasional completion of the questionnaire over the telephone. The response rate did drop over time, from 90% at 1 year to around 70% at 5 years. The principal reason for not obtaining a follow-up questionnaire was a loss of contact, such as following a home move; the second most common reason was a decision by a participant to decline further follow-up. The category of ‘non-responder’ accounted for only around 8% of those without a follow-up questionnaire. Response analysis showed that responders at 5-year follow-up had a higher mean age, had been prescribed anti-reflux medication for a shorter period of time at recruitment and had higher QoL at baseline. However, the characteristics of responders and non-responders at 5 years were similar across the two randomised groups.
Randomised trials, such as the REFLUX trial, that compare surgery with medical management are challenging to mount because of the stark contrast between the treatments compared. As described in the previous report of this study, recruitment was not easy and it is to the credit of the many staff in the 21 centres involved in the trial that this was accomplished successfully. A second challenge was that, after randomisation, a sizable proportion of participants did not receive the treatment to which they had been allocated – again, reflecting the contrasts in the treatments. We explored the impact of this in a number of ways.
Figure 18 shows the results of a supplementary analysis of the group randomly allocated surgery stratified by whether or not they actually had surgery. It shows that those who had surgery started from a lower REFLUX QoL baseline score (had worse symptoms) than those who did not undergo surgery, and then had a sharp rise in score following the operation such that their scores were consistently higher than those who did not actually have fundoplication. To put this another way, the improvement seen among those who had surgery was greater than that in the randomised group overall.
Figure 19 shows a similar supplementary analysis of the group allocated medical management stratified by whether or not they in fact had surgery in the first year. This shows that those who had fundoplication (the lowest line) had more severe symptoms of GORD (low REFLUX QoL scores) at the time of trial entry, worse even than the preference surgical group. In contrast, those solely managed medically had relatively high baseline scores. Scores among those randomised to medical management who had surgery improved markedly over the course of the follow-up, such that by years 4 and 5 the scores in the two strata were similar. This indicates that much of the narrowing of the scores in the ITT groups over the 5 years can be explained by surgery in the randomised medical group.
We assessed more formally the extent to which surgery in the randomised medical management group might have affected the results by undertaking adjusted treatment received analyses. We decided to base these on treatment status at the first year follow-up point. We chose this partly to be consistent with our previous report of the results up to 1 year and partly because we considered that those who had surgery after that time point were likely to be highly selected. To put this another way, we were concerned that a PP analysis up to 5 years would be particularly prone to bias. The adjusted treatment received analyses, as expected, indicated larger effects of surgery – with differences in score around 25–50% higher. As illustrated by the preference groups in this study, the proportion of those recommended surgery and willing to have it who subsequently go on to have fundoplication is likely to be higher in everyday practice. Hence, we would argue that the results of the adjusted treatment received analyses are likely to provide a better estimate of the benefits of a policy of laparoscopic fundoplication as would apply in the health service.
The principal concern about laparoscopic fundoplication is possible risks associated with the surgery. We described intra- and postoperative surgical outcomes in our previous report.1 Among the 329 patients in the randomised surgical and preference surgical groups who had fundoplication in the first year, there were no major surgical complications. Two patients (0.6%; 95% CI 0.1% to 2.2%) required conversion to an open procedure; eight (2.4%; 95% CI 1.2% to 4.7%) had a visceral injury; and one (0.3%; 95% CI < 0.1% to 1.7%) had a blood transfusion. Three were admitted to a high-dependency unit, but none to an intensive care unit. The 5-year follow-up provides information about longer-term risks. We are aware of seven deaths among trial participants; however, none has an apparent link to the trial. Twelve (3.3%) of the total of 364 participants who had a fundoplication had a late complication: four were oesophageal dilatations/stricture dilatations, three had repairs of incisional hernias and five were a heterogeneous group of other complications (see Table 12). Sixteen (4.4%) of those who had fundoplication required further surgery (see Table 11): five reconstruction of the same wrap, six conversion to another type of wrap, three repair of hiatus hernia only and two reversal of fundoplication. These, albeit uncommon, complications need to be taken into account when surgery is being considered.
Proton pump inhibitor use in the randomised medical group was consistently around 80%, although these participants were not always the same people at each follow-up. In our questionnaire, we chose to ask about anti-reflux drug use over the preceding 2 weeks as we thought that a recollection over a longer period would be unreliable. Nevertheless, taking of PPIs seems to be dynamic (patients stopping and restarting) and rates of use at any time over a longer period would likely have been higher. We did observe more visits to GPs in the medical groups for reflux-related reasons during the 5 years of follow-up but are not able to say whether this was due to routine reassessments or because symptom control was less stable or inadequately controlled in the medical group.
The pattern of PPIs used did change over the course of the study. At baseline, the commonest PPI was lansoprazole, but omeprazole superseded this over the course of the trial. Much of this change occurred in the first year and hence could be a consequence of the review of medical management that was part of the trial management for those randomised to medical management.
The larger number of overnight hospital admissions in the randomised medical management group was largely, but not totally, explained by the minority who went on to have surgery; as discussed in Chapter 5 describing the economic evaluation, this was the principal driver of extra resource use by the medical group during the longer-term follow-up.
Despite the methodological challenges alluded to above, the study, through the data presented here, has successfully addressed the first of the objectives of this longer-term follow-up: to assess whether or not short-term clinical benefits, principally in terms of symptom control, are sustained – they are, albeit attenuated. In the next chapter we consider the REFLUX trial in the context of the three other randomised trials that have been conducted worldwide comparing laparoscopic fundoplication with medical management, and assess whether or not the results of the REFLUX trial are consistent with those of the other trials.
- Trial results and clinical effectiveness - Clinical and economic evaluation of l...Trial results and clinical effectiveness - Clinical and economic evaluation of laparoscopic surgery compared with medical management for gastro-oesophageal reflux disease: 5-year follow-up of multicentre randomised trial (the REFLUX trial)
Your browsing activity is empty.
Activity recording is turned off.
See more...