U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Grant AM, Boachie C, Cotton SC, et al. Clinical and economic evaluation of laparoscopic surgery compared with medical management for gastro-oesophageal reflux disease: 5-year follow-up of multicentre randomised trial (the REFLUX trial). Southampton (UK): NIHR Journals Library; 2013 Jun. (Health Technology Assessment, No. 17.22.)

Cover of Clinical and economic evaluation of laparoscopic surgery compared with medical management for gastro-oesophageal reflux disease: 5-year follow-up of multicentre randomised trial (the REFLUX trial)

Clinical and economic evaluation of laparoscopic surgery compared with medical management for gastro-oesophageal reflux disease: 5-year follow-up of multicentre randomised trial (the REFLUX trial).

Show details

Chapter 3Trial results and clinical effectiveness

Recruitment to the trial

Participants were recruited in 21 clinical centres, all within the UK (their locations are listed on the left-hand side of Table 1). Recruitment to the trial was open from March 2001 until the end of June 2004, although not all centres enrolled over the total period because of the staggered introduction of centres and early closure for logistical reasons in a few places.1

TABLE 1

TABLE 1

Number of participants by centre

A total of 357 participants were recruited to the randomised component: 178 allocated to surgery and 179 allocated medical management. 453 participants agreed to join the preference component: 261 choosing surgery and 192 choosing medical management. Table 1 shows recruitment by centre. Around one-fifth of the randomised participants were enrolled in Aberdeen; no centre contributed > 10% of participants in the preference component.

Analysis populations

Throughout the analyses presented later in this chapter, the participants in the randomised component are kept separate from those in the preference component (other than for rare surgical events). The numbers of participants in each of the four main analysis populations are shown in Table 2. All 357 who joined the randomised component are in the randomised ITT population; only the 280 within this group who actually received their allocated management over the first year are in the randomised PP population. All 453 participants who joined the preference component are in the preference ITT population; the 407 of these who, by the end of the first year, were managed as originally chosen were in the preference PP population.

TABLE 2

TABLE 2

Number of participants in each analysis population

Trial conduct

The derivation of the main study groups and their progress through the stages of follow-up in the trial are shown in Figure 1. This is in the form of a CONSORT (Consolidated Standards of Reporting Trials) flow diagram. In total, 1078 patients were considered for trial entry and 200 of these were found not to meet one or more of the eligibility criteria. Of the 68 patients eligible for the study but not recruited, 51 declined to participate, six were subsequently deemed inappropriate for the study by the surgeon responsible for care and the remaining 11 were missed.

FIGURE 1. The CONSORT diagram.

FIGURE 1

The CONSORT diagram.

Details of the clinical management actually received are described later in this chapter.

The mean (SD) time intervals in months between the receipt by the trial office of each subsequent annual postal questionnaire are shown in Table 3; all were near 12 months, as would be expected. There was, however, a difference between the randomised groups in the time interval between the 1-year and the 2-year questionnaires (mean 12.2 months surgical group vs 13.9 months medical group). In part, this was due to more late returns in the medical management group – the median intervals were closer: 12.00 and 13.00 months respectively. As described previously,1 early follow-up was adjusted to be at a time equivalent to 3 and 12 months after surgery. The adjustments in the medical group to match this could be only approximate and this is the explanation for the difference that remained between the randomised groups. An advantage of long-term follow-up to 5 years is that any difference in the timing of follow-up becomes proportionately smaller over time.

TABLE 3

TABLE 3

Interval between randomisation and follow-up (months), mean (SD)

More details of the response rates to the annual questionnaires are provided in Table 4. The overall rates of return of annual follow-up questionnaires (years 1–5) were 89.5%, 77.7%, 76.7%, 69.8% and 68.9% of the study participants. Seven participants are known to have died up to the end of the 5-year follow-up; equivalent response rates among those not known to have died are 89.8%, 77.9%, 77.0%, 70.2% and 69.5%. There were no substantive differences in response rates between the groups.

TABLE 4

TABLE 4

CONSORT table

Three participants died before the 1-year follow-up was reached: two in the preference surgery group and one in the randomised medical group. None of these participants actually had surgery. Four died subsequently; there is no evidence linking these deaths to trial participation.

Description of the groups at trial entry

Sociodemographic and clinical factors

Table 5 provides a description of the groups at trial entry. The main division within the table is between participants in the randomised component and those in the preference component. These two halves of the table are further divided according to the allocation of participants and then subdivided according to ITT or PP.

TABLE 5

TABLE 5

Description of groups at trial entry

Randomised arms

Within the randomised groups there were no apparent imbalances between the medical and surgical intervention arms. The patients were, on average, 46 years old, 66% were men, around two-thirds were in full employment and participants had been on GORD medication for a median of 32 months. The baseline characteristics in the randomised PP groups were similar.

Preference arms

The sociodemographic characteristics of the preference participants were broadly similar to those of the randomised participants. However, preference medical participants tended to be older (mean age 50 years) and were more likely to be female, fewer were in full-time employment and participants had been on GORD medication for a shorter period (approximately 6 months less than randomised participants).

Prescribed medications

The prescribed medications at the time of trial entry are shown in Table 6. There was a similar profile of prescribed medications across the randomised and preference groups. As would be expected, nearly all participants reported taking a reflux-related drug in the previous 2 weeks. Over 90% had taken a PPI, of which lansoprazole was the most common.

TABLE 6

TABLE 6

Description of groups at trial entry: prescribed medications

Health status

Randomised arms

The HRQoL scores at study entry are displayed in Table 7. The scores were broadly similar in the randomised surgical and randomised medical groups, although they were slightly higher (better health) in the randomised medical group. When the DMC first met after the initial 143 participants had been recruited to the randomised component, the committee did ask us to change the enrolment procedure to ensure that baseline questionnaires were completed before formal entry and randomisation. We understand that this was because they were concerned about an apparent imbalance between the randomised groups in baseline health status at that time. After satisfying themselves that this was not due to a breakdown in the randomisation procedure, the DMC surmised that this might be due to prior knowledge of the treatment allocation affecting questionnaire responses (with those allocated surgery tending to project worse health status than those allocated medical management). Certainly, the groups based on the first 143 participants were well balanced in other respects, and there was subsequently good balance in health status as well. The apparent small imbalance between the randomised groups in health status measures is therefore likely to be a reflection of the imbalance in the first 143 participants.

TABLE 7

TABLE 7

Description of groups at trial entry: health status

The most prevalent reflux symptoms (those with lowest scores) were general discomfort and wind. The participants had lower SF-36 and EQ-5D scores than a normal UK population with the same average age and sex characteristics (SF-36 population norm approximately 50 for all domains; EQ-5D norm 0.88).

Preference arms

The preference for surgery participants reported worse REFLUX QoL scores and worse health in general than the preference for medicine participants. It can be seen from Table 7 that the randomised participants reported QoL measures in between these two extremes.

Baseline characteristics of groups compared at 5 years

There were differences in baseline characteristics between those who had completed a questionnaire at 5 years and those who had not (Table 8). For example, responders had a higher mean age (47.9 years vs 43.6 years), had been on prescribed medication for a shorter period at recruitment to the REFLUX trial (50.5 months vs 60.2 months) and had higher QoL scores at baseline (measured on the disease-specific REFLUX instrument, EQ-5D and SF-36).

TABLE 8

TABLE 8

Baseline characteristics of responders and non-responders at 5 years

However, the baseline characteristics of those in the randomised surgical and randomised medical groups who completed a questionnaire at 5 years were very similar, with the only notable difference being in BMI (Table 9). The mean baseline BMI among responders in the randomised surgical group was higher (29.0 kg/m2) than that for responders in the randomised medical management group (27.7 kg/m2). As described in Chapter 2, these results confirmed that a repeated measures analysis assuming no differential loss to follow-up could be considered.

TABLE 9

TABLE 9

Baseline characteristics of responders at 5 years by randomised allocation

Surgical management

Table 10 summarises the use of surgery in the four study groups over the full 5-year follow-up period. At the end of the first year, 111 participants (62.4%) randomised to surgery had actually undergone fundoplication. Over the next 4 years, one more member of this group had fundoplication, bringing the total to 112 (62.9%). In the randomised medical group, 10 participants (5.6%) had fundoplication in the first year, with a further 14 participants having fundoplication in subsequent years, bringing the total at 5 years to 24 (13.4%). In the preference surgical group, 218 participants (83.5%) had fundoplication in the first year, with four more in the period up to 5 years, taking the percentage to 85.1%. Surgical management applied to only three participants (1.6%) in the preference medical group in the first year, with a further three being operated on in the subsequent 4 years (total 3.1%).

TABLE 10

TABLE 10

Initial fundoplication operations

Information about the reasons why participants allocated surgery did not receive it in the first year is available for 47. For 25 of these 47, this was a clinical decision, most commonly the surgeon deciding that surgery was not appropriate; most of the other 22 changed their minds about surgery for a variety of work- or home-related reasons. A further 20 withdrew for unknown reasons. There is no doubt, however, that a number of these participants suffered long delays before being formally offered surgery, and this was an important factor in their eventual decision to choose not to have surgery after all. The trial was conducted at a time when there was great pressure on surgical services in the NHS, with long delays for elective surgery for non-life-threatening benign conditions being common. Indeed, the average time between trial entry and surgery in the trial was 8–9 months.1

Details of the surgery received by the 111 participants (62.4%) randomised to surgery and the 218 preference participants (83.5%) who actually received surgery in the first year, the perioperative complications that they experienced and their hospital stay have been reported previously but are summarised in Appendix 2 for completeness. There were no perioperative deaths.

Table 11 shows the numbers of those who had fundoplication who subsequently had a second reflux-related operation during the 5 years of follow-up. Overall, this applied to 16 participants (4.4%) among the 364 who had a first operation: five (4.5%) in the randomised surgery group; one (4.2%) in the randomised medical group; eight (3.6%) in the preference surgery group; and two (33.3%) in the preference medical group. In total, five of the 16 operations were reconstructions of the same wrap, three were repairs of hiatus hernia only, six were conversions to a different type of wrap and two were reversals of the fundoplication. Two of these 16 participants had a third reflux-related operation; both were in the preference surgery group – one a reconstruction of the same wrap and one a repair of hiatus hernia only.

TABLE 11

TABLE 11

Subsequent reflux-related operations among participants who had fundoplication

Late postoperative complications

Table 12 describes late postoperative complications among those participants who had surgery, in each of the study groups and overall. Of the total 364 who had fundoplication, 12 (3.3%) had a late complication: four (1.1%) were oesophageal dilatations/stricture dilatations; three (0.8%) were repairs of incisional hernias; and five (1.4%) were a heterogeneous group of other complications as detailed in the table.

TABLE 12

TABLE 12

Late postoperative complications (> 1 month after surgery)

Medication

Figure 2 summarises reported use of any PPI medication in the previous 2 weeks across the follow-up time points of the trial. Full details are provided in the tables in Appendix 3. From the time of the first annual follow-up onwards, rates in both medical groups were consistently around 80%. The rates in the randomised surgical ITT group at the first, second and third annual follow-ups were approximately 36–38%, rising to 43% in the fifth year. The extent to which these rates reflected medication taking among those allocated to surgery and who had fundoplication (rather than those who did not have surgery) can be gauged from the randomised surgery PP group: 7.3% (3 months), 12.5% (1 year), 15.1% (2 years), 19.6% (3 years), 23.9% (4 years) and 25.6% (5 years).

FIGURE 2. Use of PPI medication at baseline and at follow-up points up to 5 years.

FIGURE 2

Use of PPI medication at baseline and at follow-up points up to 5 years.

Table 13 allows further exploration of the reasons for the rise in medication use in the randomised surgery group. It distinguishes those reporting taking medication at the end of the first year of follow-up from those who indicated that they were not taking medication at that time. It shows that around 10–20% of those taking medication at the end of the first year did not report medication use at subsequent annual follow-up. Among those not taking medication at the first annual follow-up in the surgical groups, around 10% rising to around 20% reported medication use at subsequent annual follow-up. This contrasts with the rates in the medical groups, with around 50–60% of those not taking medication at the end of the first year reporting anti-reflux drug use in subsequent annual follow-up. The pattern of type of PPI used changed over the course of the study. Although lansoprazole had been the most commonly used PPI at trial entry, omeprazole use increased over time to become the predominant PPI.

TABLE 13

TABLE 13

Anti-reflux medication use after the frst year

Outcome

Health status

Full details of the health status and QoL measures at each time point of follow-up are in the tables in Appendix 4. Details of the statistical testing of the health status and QoL scores can be found in the next section of this chapter.

REFLUX score

Figure 3 summarises changes in the disease-specific REFLUX score over the follow-up period. From this it can be seen that the scores at all time points are highest (indicating fewest symptoms) in the randomised surgical and preference surgical groups. However, the differences between the surgical and medical groups narrow over time. This is due principally to the scores in the randomised medical group improving over the first 3 years and, to a lesser extent, those in the preference medical group improving over the latter end of the follow-up period. The scores for the five components of the measure are summarised graphically in Figures 48. These show that the overall difference between the groups is principally due to the ‘general discomfort’ component and, to a lesser extent, the ‘nausea and vomiting’ and ‘activity limitations’ components.

FIGURE 3. Mean REFLUX QoL score at baseline and at follow-up points up to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 3

Mean REFLUX QoL score at baseline and at follow-up points up to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 4. Mean REFLUX QoL general discomfort symptom score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 4

Mean REFLUX QoL general discomfort symptom score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 8. Mean REFLUX QoL constipation and swallowing symptom score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 8

Mean REFLUX QoL constipation and swallowing symptom score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 5. Mean REFLUX QoL wind and frequency symptom score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 5

Mean REFLUX QoL wind and frequency symptom score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 6. Mean REFLUX QoL nausea and vomiting symptom score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 6

Mean REFLUX QoL nausea and vomiting symptom score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 7. Mean REFLUX QoL activity limitation symptom score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 7

Mean REFLUX QoL activity limitation symptom score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

Short Form questionnaire-36 items

The pattern of SF-36 scores, both for the composite physical and mental scores and for the individual dimensions (Figures 916), was similar to that seen for the REFLUX score, although more compact. Differences narrowed over the 5 years of follow-up, with the ‘general health’ dimension showing the clearest differences between the surgery and the medical management groups.

FIGURE 9. Mean SF-36 norm-based physical functioning score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 9

Mean SF-36 norm-based physical functioning score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 16. SF-36 norm-based mental score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 16

SF-36 norm-based mental score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 10. Mean SF-36 norm-based role physical score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 10

Mean SF-36 norm-based role physical score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 11. Mean SF-36 norm-based bodily pain score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 11

Mean SF-36 norm-based bodily pain score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 12. Mean SF-36 norm-based general health score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 12

Mean SF-36 norm-based general health score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 13. Mean SF-36 norm-based vitality score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 13

Mean SF-36 norm-based vitality score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 14. Mean SF-36 norm-based social functioning score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 14

Mean SF-36 norm-based social functioning score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 15. Mean SF-36 norm-based role emotional score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

FIGURE 15

Mean SF-36 norm-based role emotional score at baseline and follow-up points to 5 years (score range 0–100; the higher the score, the better the patient felt).

European Quality of Life-5 Dimensions

Figure 17 graphically displays the EQ-5D scores over the course of the follow-up period. The pattern is similar to that seen for the REFLUX score although differences are less marked and only clearly seen over the first 3 years.

FIGURE 17. EQ-5D at baseline and follow-up points to 5 years.

FIGURE 17

EQ-5D at baseline and follow-up points to 5 years.

Use of health services

Table 14 shows use of health services for the randomised groups. The larger number of overnight hospital admissions in the medical group largely reflected admissions for surgery; as described above, 14 participants allocated to medical management had fundoplication after the first year. However, seven participants in the medical group compared with one in the surgical group had admissions for a non-surgery-related reason (data not shown).

TABLE 14

TABLE 14

Use of health services

Numbers of day-case hospital admissions were similar in the two groups. The larger number of visits to or from a GP for a reflux-related reason in the randomised medical group reflected both more individuals attending their GPs and a higher frequency of visits for those who sought GP care.

Individual symptoms of gastro-oesophageal reflux disease or its treatment

Table 15 shows the frequency with which participants reported symptoms of GORD or its treatment at 3 and 5 years of follow-up for the randomised groups. At both 3 and 5 years, heartburn was reported by a higher proportion of participants in the randomised medical group than in the randomised surgical group. In addition, a higher proportion of participants in the randomised medical group reported more frequent heartburn than in the randomised surgical group. At both time points, a higher proportion of participants in the randomised medical management group also reported regurgitation symptoms and burping/belching than in the randomised surgical group. At both 3 and 5 years, the proportions who reported no difficulty swallowing and no wind from the lower bowel were similar between the randomised surgical and the randomised medical groups. There was also little difference between the groups at each time point in the proportion of participants who reported a feeling of wanting to be sick but being physically unable to do so.

TABLE 15

TABLE 15

Frequency of GORD symptoms at 3 and 5 years

Statistical analyses

Primary outcome

The pre-chosen primary outcome was the REFLUX QoL score after 5 years of follow-up. The differences between groups with corresponding 95% CIs are shown in Table 16. Two types of analysis are presented for the randomised participants – ITT and adjusted treatment received. Table 16 also displays the impact of including adjustment for baseline score and randomised group* baseline score interaction terms.

TABLE 16

TABLE 16

Primary outcome: REFLUX QoL scores after 5 years of follow-up

Intention to treat

For the ITT analysis there was a mean difference of 6.4 between the groups in favour of surgery when only the minimisation variables were adjusted for (95% CI 1.6 to 11.2; p = 0.009). A repeated measures analysis across the 5 years gave a difference of 8.1 (95% CI 4.4 to 11.7). This was not the most parsimonious model – there was strong evidence of an interaction effect between randomised group and baseline REFLUX QoL score (interaction term was −0.23, 95% CI −0.43 to −0.03; p = 0.023). This implied that as baseline REFLUX QoL score increased the treatment effect decreased. Estimating the treatment difference at the trial baseline mean REFLUX QoL score of 65.2 resulted in a trial effect size of 8.5 (95% CI 3.9 to 13.1; p < 0.001). If the average patient had a lower mean REFLUX QoL score at baseline of 56.0, the effect size increased to 10.6 (95% CI 5.3 to 15.8). If the patient had a higher baseline score of 78.0, the treatment effect size decreased to 5.5 (95% CI 0.6 to 10.4). All results, however, showed strong evidence of increases in REFLUX QoL scores favouring surgery.

Adjusted treatment received

The adjusted treatment received analyses attempted to mitigate the effect of non-compliance with the allocated treatment and hence provide an estimate of ‘efficacy’.40 As expected, this approach gave a larger difference, but with wider CIs (9.4, 95% CI 1.7 to 17.0; p = 0.017).

Preference groups

The preference for surgery participants reported considerably worse mean REFLUX QoL scores at baseline than the preference for medicine participants (55.8 vs 77.5) (see Table 7). Despite starting from a much lower baseline score, at follow-up, the REFLUX QoL score slightly favoured the surgical group using an ITT analysis (difference = 0.61; 95% CI −3.44 to 4.66; p = 0.767) and an adjusted treatment received analysis (difference = 0.10; 95% CI −4.77 to 4.97; p = 0.967). The differences were not, however, statistically significant.

Secondary outcomes

The secondary outcomes were the health status measures (EQ-5D, SF-36) and REFLUX symptom score at times equivalent to 3 months and then annual follow-up after surgery, and REFLUX QoL (at time points other than 5 years, when it was the primary end point). Analyses of these outcomes are shown in Tables 1722.

TABLE 22

TABLE 22

Secondary outcomes at a time equivalent to 5 years after surgery: health status

TABLE 17

TABLE 17

Secondary outcomes at a time equivalent to 3 months after surgery: health status

TABLE 18

TABLE 18

Secondary outcomes at a time equivalent to 12 months after surgery: health status

TABLE 19

TABLE 19

Secondary outcomes at a time equivalent to 2 years after surgery: health status

TABLE 20

TABLE 20

Secondary outcomes at a time equivalent to 3 years after surgery: health status

TABLE 21

TABLE 21

Secondary outcomes at a time equivalent to 4 years after surgery: health status

REFLUX symptom score

There were statistically significantly higher REFLUX QoL scores at all time points, albeit with some diminution over time in the surgical group (see Figure 3). Although symptom category scores favoured surgery across all domains at all time points, the most marked and sustained difference was in ‘general discomfort’.

Short Form questionnaire-36 items

The SF-36 scores in all domains also favoured the surgical group at all time points. Differences decreased over time and this was reflected in most p-values being < 0.05 up to 3 years, whereas at year 5 this applied to only ‘norm-based general health’ and ‘norm-based role emotional’.

European Quality of Life-5 Dimensions

Differences in EQ-5D had a similar pattern to differences in REFLUX QoL and SF-36 scores – differences all favoured the surgical group but tended to narrow such that scores at years 2 and 3 were statistically significantly different, but at later time points they were not. Variability tended to increase over time. Despite the general narrowing of the EQ-5D difference over time, at year 5 it was actually the same as that at 12 months after surgery but with wider CIs.

Adjusted treatment received

As would be expected, all (with a small number of exceptions) the adjusted treatment received analyses had larger differences than the corresponding ITT analyses (around 25–50% higher), but with wider CIs.

Subgroup analyses

Removal of data from the single largest clinical centre (Aberdeen)

No formal exploration of centre effects was undertaken because of the small numbers of participants recruited in many of the clinical centres. However, a sensitivity analysis removing the data from the Aberdeen centre, the centre where the largest number of participants were recruited, did not significantly change the conclusions (adjusted difference in REFLUX score at 60 months = 5.43, 95% CI 0.96 to 9.90).

Partial compared with total wrap procedure

In an observational analysis, there was no evidence of a difference between a total wrap procedure and a partial wrap procedure. The difference in the REFLUX QoL score between these procedures at time equivalent to 5 years post surgery was −1.0 (95% CI −5.4 to 3.7; p = 0.649).

Discussion

Follow-up to 5 years after laparoscopic surgery described here provides clear evidence of sustained improvement in GORD symptoms, as judged by the REFLUX QoL scores. Differences between the groups as randomised did tend to diminish over the course of the study; nevertheless, the analyses at 5 years (the primary end point) showed highly statistically significant results with effect sizes of the order of 0.6 of a SD.

This report concentrates on the data collected annually at a time equivalent to between 2 and 5 years post surgery. Data were collected through self-complete postal questionnaires, backed up by postal and telephone reminders and occasional completion of the questionnaire over the telephone. The response rate did drop over time, from 90% at 1 year to around 70% at 5 years. The principal reason for not obtaining a follow-up questionnaire was a loss of contact, such as following a home move; the second most common reason was a decision by a participant to decline further follow-up. The category of ‘non-responder’ accounted for only around 8% of those without a follow-up questionnaire. Response analysis showed that responders at 5-year follow-up had a higher mean age, had been prescribed anti-reflux medication for a shorter period of time at recruitment and had higher QoL at baseline. However, the characteristics of responders and non-responders at 5 years were similar across the two randomised groups.

Randomised trials, such as the REFLUX trial, that compare surgery with medical management are challenging to mount because of the stark contrast between the treatments compared. As described in the previous report of this study, recruitment was not easy and it is to the credit of the many staff in the 21 centres involved in the trial that this was accomplished successfully. A second challenge was that, after randomisation, a sizable proportion of participants did not receive the treatment to which they had been allocated – again, reflecting the contrasts in the treatments. We explored the impact of this in a number of ways.

Figure 18 shows the results of a supplementary analysis of the group randomly allocated surgery stratified by whether or not they actually had surgery. It shows that those who had surgery started from a lower REFLUX QoL baseline score (had worse symptoms) than those who did not undergo surgery, and then had a sharp rise in score following the operation such that their scores were consistently higher than those who did not actually have fundoplication. To put this another way, the improvement seen among those who had surgery was greater than that in the randomised group overall.

FIGURE 18. Mean REFLUX QoL scores for (a) all randomised to surgery, (b) those randomised to surgery who had fundoplication and (c) those randomised to surgery who did not have surgery.

FIGURE 18

Mean REFLUX QoL scores for (a) all randomised to surgery, (b) those randomised to surgery who had fundoplication and (c) those randomised to surgery who did not have surgery.

Figure 19 shows a similar supplementary analysis of the group allocated medical management stratified by whether or not they in fact had surgery in the first year. This shows that those who had fundoplication (the lowest line) had more severe symptoms of GORD (low REFLUX QoL scores) at the time of trial entry, worse even than the preference surgical group. In contrast, those solely managed medically had relatively high baseline scores. Scores among those randomised to medical management who had surgery improved markedly over the course of the follow-up, such that by years 4 and 5 the scores in the two strata were similar. This indicates that much of the narrowing of the scores in the ITT groups over the 5 years can be explained by surgery in the randomised medical group.

FIGURE 19. Mean REFLUX QoL scores for (a) all randomised to medical management, (b) those randomised to medical management who did not have surgery and (c) those randomised to medical management who had fundoplication.

FIGURE 19

Mean REFLUX QoL scores for (a) all randomised to medical management, (b) those randomised to medical management who did not have surgery and (c) those randomised to medical management who had fundoplication.

We assessed more formally the extent to which surgery in the randomised medical management group might have affected the results by undertaking adjusted treatment received analyses. We decided to base these on treatment status at the first year follow-up point. We chose this partly to be consistent with our previous report of the results up to 1 year and partly because we considered that those who had surgery after that time point were likely to be highly selected. To put this another way, we were concerned that a PP analysis up to 5 years would be particularly prone to bias. The adjusted treatment received analyses, as expected, indicated larger effects of surgery – with differences in score around 25–50% higher. As illustrated by the preference groups in this study, the proportion of those recommended surgery and willing to have it who subsequently go on to have fundoplication is likely to be higher in everyday practice. Hence, we would argue that the results of the adjusted treatment received analyses are likely to provide a better estimate of the benefits of a policy of laparoscopic fundoplication as would apply in the health service.

The principal concern about laparoscopic fundoplication is possible risks associated with the surgery. We described intra- and postoperative surgical outcomes in our previous report.1 Among the 329 patients in the randomised surgical and preference surgical groups who had fundoplication in the first year, there were no major surgical complications. Two patients (0.6%; 95% CI 0.1% to 2.2%) required conversion to an open procedure; eight (2.4%; 95% CI 1.2% to 4.7%) had a visceral injury; and one (0.3%; 95% CI < 0.1% to 1.7%) had a blood transfusion. Three were admitted to a high-dependency unit, but none to an intensive care unit. The 5-year follow-up provides information about longer-term risks. We are aware of seven deaths among trial participants; however, none has an apparent link to the trial. Twelve (3.3%) of the total of 364 participants who had a fundoplication had a late complication: four were oesophageal dilatations/stricture dilatations, three had repairs of incisional hernias and five were a heterogeneous group of other complications (see Table 12). Sixteen (4.4%) of those who had fundoplication required further surgery (see Table 11): five reconstruction of the same wrap, six conversion to another type of wrap, three repair of hiatus hernia only and two reversal of fundoplication. These, albeit uncommon, complications need to be taken into account when surgery is being considered.

Proton pump inhibitor use in the randomised medical group was consistently around 80%, although these participants were not always the same people at each follow-up. In our questionnaire, we chose to ask about anti-reflux drug use over the preceding 2 weeks as we thought that a recollection over a longer period would be unreliable. Nevertheless, taking of PPIs seems to be dynamic (patients stopping and restarting) and rates of use at any time over a longer period would likely have been higher. We did observe more visits to GPs in the medical groups for reflux-related reasons during the 5 years of follow-up but are not able to say whether this was due to routine reassessments or because symptom control was less stable or inadequately controlled in the medical group.

The pattern of PPIs used did change over the course of the study. At baseline, the commonest PPI was lansoprazole, but omeprazole superseded this over the course of the trial. Much of this change occurred in the first year and hence could be a consequence of the review of medical management that was part of the trial management for those randomised to medical management.

The larger number of overnight hospital admissions in the randomised medical management group was largely, but not totally, explained by the minority who went on to have surgery; as discussed in Chapter 5 describing the economic evaluation, this was the principal driver of extra resource use by the medical group during the longer-term follow-up.

Despite the methodological challenges alluded to above, the study, through the data presented here, has successfully addressed the first of the objectives of this longer-term follow-up: to assess whether or not short-term clinical benefits, principally in terms of symptom control, are sustained – they are, albeit attenuated. In the next chapter we consider the REFLUX trial in the context of the three other randomised trials that have been conducted worldwide comparing laparoscopic fundoplication with medical management, and assess whether or not the results of the REFLUX trial are consistent with those of the other trials.

Copyright © Queen's Printer and Controller of HMSO 2013. This work was produced by Grant et al. under the terms of a commissioning contract issued by the Secretary of State for Health. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.

Included under terms of UK Non-commercial Government License.

Bookshelf ID: NBK260651

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (2.4M)

Other titles in this collection

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...