U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Grant AM, Boachie C, Cotton SC, et al. Clinical and economic evaluation of laparoscopic surgery compared with medical management for gastro-oesophageal reflux disease: 5-year follow-up of multicentre randomised trial (the REFLUX trial). Southampton (UK): NIHR Journals Library; 2013 Jun. (Health Technology Assessment, No. 17.22.)

Cover of Clinical and economic evaluation of laparoscopic surgery compared with medical management for gastro-oesophageal reflux disease: 5-year follow-up of multicentre randomised trial (the REFLUX trial)

Clinical and economic evaluation of laparoscopic surgery compared with medical management for gastro-oesophageal reflux disease: 5-year follow-up of multicentre randomised trial (the REFLUX trial).

Show details

Chapter 5Economic analysis

The economic evaluation aimed to determine the cost-effectiveness of laparoscopic fundoplication compared with continued medical management in patients with GORD symptoms that are reasonably controlled by medication and who are judged suitable for both surgical and medical management. The analysis entailed three components:

  1. systematic review of existing cost-effectiveness evidence
  2. within-trial (5-year) economic analysis
  3. validation of within-trial analysis and exploration of the need for a longer-term model.

Systematic review of existing cost-effectiveness evidence

The aim of this systematic review is to identify any existing cost-effectiveness studies that compare laparoscopic fundoplication with medical management for GORD. A previous HTA report included a review of the evidence available from 1995 to December 2005 and identified three relevant studies (described below).1 The updated search focuses on the period from December 2005 to April 2011. The methods used to identify studies and the results of the systematic search are discussed in the sections below.

Methods

The following data sets were searched to identify published evidence: MEDLINE and MEDLINE In-Process & Other Non-Indexed Citations (1948 to present), EMBASE (1996 to week 15, 2011), Cochrane Database of Systematic Reviews (CDSR) and the NHS Centre for Reviews and Dissemination databases [Database of Abstracts of Reviews of Effects (DARE), NHS Economic Evaluation Database (NHS EED), HTA]. The search strategy incorporated broad reflux-related search terms as used in a recent Cochrane Review.57 The search also focused on identifying health-related and GORD-specific QoL evidence.

Studies were considered relevant for inclusion in the review if they were published in English and were full health economic evaluations (cost-effectiveness, cost-utility or cost–benefit analysis) comparing costs and outcomes associated with laparoscopic fundoplication and medical management. For the purpose of this study laparoscopic fundoplication includes both complete and partial wrap procedures. Publications outside the above criteria were excluded from this review. Details of the updated search strategy are presented in Appendix 6.

Results

A total of 3662 references were identified from the searches (MEDLINE: 1640, EMBASE: 1825, CDSR: 44, DARE: 56, NHS EED: 85, HTA: 12). Titles and/or abstracts were reviewed and studies that satisfied all inclusion criteria were included in the review. Papers describing five additional studies were obtained for inclusion. These were published between 2007 and 2011 and were related to the UK and Canadian settings. Of the total of eight studies, five are linked to three of the randomised trials described in Chapter 4: Anvari et al.,4446 Mahon et al.5153 and the REFLUX trial,1 the long-term follow-up of which is the topic of this report. There is no economic evaluation in the LOTUS trial.48 Three of the studies were based on the REFLUX trial. These were published as part of the earlier HTA report1 and in two journal articles.3,5 Summaries of the two within-trial economic evaluations are presented in Appendix 7. Below is a brief description of the eight reports – the five linked to the three randomised trials are considered first, followed by the three studies based on observational data.

Economic analyses based on clinical trials

Economic evaluation based on the Anvari et al. trial46

This was an economic evaluation conducted alongside the Anvari et al. trial described in Chapter 4. Laparoscopic fundoplication was compared with PPI for patients with chronic GORD. The follow-up period was 3 years and the analysis was conducted from a societal perspective. Cost-effectiveness was reported in terms of cost per QALY gained.

Three generic preference-based questionnaires were administered during the trial: Health Utilities Index Mark 3 (HUI3), EQ-5D and Short Form questionnaire-6 dimensions (SF-6D). Although these instruments have been valued by large general public samples, they differ in the attributes used for their descriptive system and the method of valuation applied. The EQ-5D has been valued using time trade-off whereas the SF-6D and HUI3 use the standard gamble. Utility scores showed an improvement in patients' HRQoL in both groups across the three utility instruments; however, the degree of improvement varied according to the utility instrument used. The base-case analysis (using the HUI3 instrument), after adjustment for baseline differences, indicated that, over the 3 years, laparoscopic fundoplication patients experienced a 0.109 gain in QALYs compared with PPI patients. The ICER for laparoscopic fundoplication patients was around C$29,400 (£19,000) per QALY gained. An increased ICER of C$76,300 (£49,300) was obtained using the EQ-5D as the HRQoL measure.

Economic evaluation based on the Mahon et al. trial52

This study looked at the cost-effectiveness of laparoscopic fundoplication compared with maintenance PPI medication for severe GORD based on the Mahon et al. randomised trial described in Chapter 4. Results based on the 12-month follow-up were extrapolated using other published data sets. Costs and outcomes for up to 12 months were obtained from a sample of patients in the trial (the first 100) and resource use was quantified using data from hospital records and GPs' notes. The incremental cost of laparoscopic fundoplication compared with PPI therapy per additional patient returned to a physiologically normal acid score (< 13.9) at 3 months was £5515 (95% CI £3655 to £13,400) and the incremental cost per point improvement in combined gastrointestinal and psychological well-being score at 12 months was £293 (90% CI £149 to £5250). The authors concluded that laparoscopic surgery would break even compared with medical management after 8 years and would be cost saving thereafter.

Economic evaluation based on the REFLUX trial1,3,5

Bojke et al.5 present a preliminary cost-effectiveness analysis conducted before the availability of the 1-year REFLUX trial results. The analysis compared the cost-effectiveness of surgery (laparoscopic fundoplication) with long-term medical management (PPIs) for GORD disease in an average 45-year-old man. A lifetime (30 years) Markov model that adopted the perspective of the NHS was developed. Effectiveness data were obtained from a fixed-effect meta-analysis that synthesised data from multiple sources. QALYs were estimated using utility scores (measured by the EQ-5D instrument) derived from a subset of UK patients included in the REFLUX trial. Over a lifetime, expected costs associated with surgery (£5014) were higher than expected costs associated with PPI (£4890). Expected QALYs associated with surgery (13.04) were greater than QALYs associated with PPIs (12.36). The incremental cost per QALY gained (ICER) for surgery compared with medical care was £180. The estimated probability that surgery was cost-effective at the threshold of £30,000 per QALY was 0.639. The authors highlighted important areas for further research, such as the HRQoL of patients on PPIs or post surgery.

The within-trial cost-effectiveness analysis, comparing laparoscopic fundoplication with medical management 1 year post surgery, was described in full in the 2008 report of the REFLUX trial.1 The analysis was conducted on an ITT basis from a NHS perspective. HRQoL was assessed at baseline and at 3 and 12 months' follow-up using the EQ-5D. Cost-effectiveness was reported in terms of the difference in mean QALYs between the treatment groups. This difference was estimated using ordinary least squares (OLS) regression, adjusting for baseline differences in EQ-5D between individuals. The estimated difference in mean costs between the groups was £1280 (95% CI £1054 to £1468). The HRQoL of patients randomised to surgery tended to improve on average by 0.066 more QALYs (95% CI 0.023 to 0.107) than in the medical management group. The estimated mean ICER was around £19,000. At a threshold of £30,000 per QALY, the probability of surgery being cost-effective was 0.86.

Epstein et al.3 developed a Markov model using 12-month data from the REFLUX trial and other sources in order to extrapolate the cost-effectiveness of laparoscopic fundoplication compared with medical management over the longer term (lifetime). Cost-effectiveness was reported in terms of the cost per QALY gained from surgery. The analysis was conducted from a NHS perspective. Under base-case assumptions, surgery had an additional mean cost of £847 and additional mean QALYs of 0.37 over the lifetime of the patients. The incremental cost per additional QALY gained was around £3000. At a threshold of £20,000 per QALY, the probability that surgery was cost-effective was around 0.74.

Economic analyses based on observational data

Economic evaluation based on Romagnuolo et al.58

This study is based on observational data and compares the cost-effectiveness of maintenance regimens of omeprazole and laparoscopic fundoplication within the Canadian medical system. The effectiveness, HRQoL and resource-use data were derived from studies published between 1985 and 2000. Outcomes were expressed as QALYs and costs were estimated from the perspective of a provincial health ministry. A two-stage Markov model (healing and maintenance phases) was used to estimate costs and utilities using a time horizon of 5 years. Laparoscopic fundoplication was the most cost-effective option at 3.3 years of follow-up and was cost saving at 5 years. These results were sensitive to the price of omeprazole. QALYs did not differ significantly between treatment groups.

Economic evaluation based on Arguedas et al.59

This study, also based on observational data, compared the cost-effectiveness of laparoscopic fundoplication and medical management in patients with severe reflux oesophagitis. Outcomes were quantified using QALYs with model inputs derived from the published literature. A Markov simulation model was used to extend a previous analysis to a 10-year time horizon. Procedure and hospitalisation costs were estimated using Medicare reimbursement rates from the authors' institution. Medical therapy was associated with a total cost of $8798 and 4.59 QALYs, whereas the surgery was more expensive ($10,475) and less effective (4.55 QALYs). The authors concluded that medical therapy dominated surgery.

Economic evaluation based on Comay et al.60

This is a cost-effectiveness analysis, based on observational data, principally concerned with assessing an endoscopic therapy (Stretta procedure) compared with PPIs and laparoscopic fundoplication in the management of GORD. The Strettra procedure is out of the scope of our analysis; however, the data on costs and QALYs provided by the authors allow us to better understand QoL related to these technologies and make comparisons with other authors' estimates. The authors constructed a Markov model that tracked patients over a period of 5 years. Analysis was undertaken from the Canadian Ministry of Health perspective. A literature review for published studies before 2004 was carried out to derive effectiveness and utility data. Symptom-free months and QALYs were used to measure benefit. PPI was the dominant strategy, producing more symptom-free months at lower costs than the other strategies. Laparoscopic fundoplication was associated with higher costs and generated more QALYs. The discounted mean QALYs over 5 years were 4.6487 for laparoscopic fundoplication and 4.6357 for PPI. The ICER for laparoscopic fundoplication compared with PPI was C$384,692 (£240,470). This is unlikely to be considered cost-effective.

Conclusions

The different outcomes used make it difficult to compare the results of the various studies analysed here. For those studies quantifying the benefits associated with the two treatments using QALYs, the results differ depending on the type of analysis conducted. Although the trial-based results suggest that there is good short- and medium-term evidence indicating that surgery may well represent a cost-effective alternative intervention, the model-based studies are not so optimistic.

The ICER for surgery ranged from £180 to £49,000 per QALY gained. However, the limitations of the studies included in this review suggest that we should be cautious when interpreting these results. The decision model developed as part of the REFLUX trial extrapolated from data at 12 months and was based on the assumption that the treatment effect of surgery (in terms of impact on HRQoL) remains constant over the lifetime of patients. However, as would be expected, the results of the sensitivity analysis suggested that surgery was less cost-effective when the beneficial effect of surgery was limited to 5 years (increasing the ICER to £11,300) and when HRQoL was worse in those for whom surgery failed (increasing the ICER to £11,310 when considering very high rates of surgical failure).

The value of conducting additional research to reduce any uncertainty in the REFLUX model was demonstrated. The expected value of perfect information (EVPI) is the maximum amount that a decision-maker should be willing to pay to eliminate all uncertainty that arises because of imprecision in the parameters of the model. The value of information analysis suggested that further research could be worthwhile. At a threshold of £30,000, the per-patient EVPI was £15,106.

Within-trial economic evaluation

Follow-up data from the REFLUX trial up to 5 years after surgery are now available. These economic data represent the longest follow-up of randomised patients currently available. These data can help to inform the question regarding the sustainability of initial improvement in HRQoL following surgery. This section describes the updating of the cost-effectiveness analysis using these data to reduce the level of uncertainty about the cost-effectiveness of surgery and thus its role in the NHS.

Overview

Differences in mean costs and QALYs at 5 years (based on data collected within the REFLUX trial) were used to derive an estimate of the cost-effectiveness of laparoscopic surgery (laparoscopic fundoplication) and continued medical management. The extent of missing data throughout the trial follow-up is significant; therefore, the base case consists of the multiple imputed data set following ITT analysis. A separate scenario – complete-case analysis, in which patients with any missing data are excluded – was employed for ITT and PP for 1-year analyses. Costs and QALYs were evaluated on the basis of costs falling on the NHS and Personal Social Services expressed in UK pounds sterling at a 2010 price base. All analysis and modelling were undertaken in Stata/SE 11.1 (StataCorp LP, College Station, TX, USA).

Methods

Patient population

As described in earlier chapters, the patient population in the REFLUX trial was patients with GORD whose symptoms required medication for reasonable control and for whom either surgery or continued medical management appeared to be an acceptable treatment option. A policy of offering relatively early laparoscopic fundoplication was compared with the alternative policy of continued medical management. The analysis used data only from the randomised trial component of the REFLUX trial (i.e. not from the preference groups). As described in Chapter 3, 357 patients were randomised to either surgical treatment (n = 178) or medical management (n = 179) and patients were followed for up to 5 years.

Health-care resource use

Health-care resource-use data were collected prospectively as part of the clinical report forms and patient questionnaires at 3 and 12 months and 2, 3, 4 and 5 years. Patient questionnaires at 3 and 12 months collected information for the previous 3 and 9 months respectively. In addition, a questionnaire at 12 months recorded resource use for the whole of the first year (see following section on costs). Patient questionnaires from the second year onwards collected information for the previous 12 months on hospital admissions (day and overnight admissions) and GP visits, and data on medication for the previous 2 weeks. Clinical report forms collected data on surgery and perioperative complications of surgery.

Costs

The cost for each individual patient in the trial was calculated by multiplying his or her use of health-care resources by the associated unit costs (Table 31). Discount was applied from year 2. Unit costs were all sourced from published data (see Table 31). Total costs include the costs of surgery, GP visits, hospital admissions and medication. Incremental costs (laparoscopic fundoplication vs medical management) for each year and per category of resource use, according to ITT allocation, were calculated using OLS regression.

TABLE 31

TABLE 31

Unit costs employed to calculate the costs of reflux-related health-care use

The questionnaires asked for details of anti-reflux medication taken in the previous 2 weeks: name, dose and number of tablets/capsules. The cost of anti-reflux medication during these 2 weeks was calculated by multiplying the prices published in the Drug Tariff for December 201061 for each medicine by the number of tablets taken. Yearly medication costs are calculated using the area under the curve method,62 which assumes linear interpolation between follow-up points. The costs of reflux-related inpatient, outpatient and day-case visits were derived from the NHS Reference Costs 2009–10,63 in which the relevant codes were weighted by activity level.

For the base-case analysis, total costs included the costs of surgery, complications due to surgery, reoperations, reflux-related prescribed medication, reflux-related visits to and from the GP and reflux-related hospital inpatient, outpatient and day visits. For the sensitivity analysis, all GP visits and all hospital admissions are included in the calculation of total costs (see Incremental analysis for more details on sensitivity analysis). Costs of hospital admissions and GP visits were obtained by multiplying the relevant unit costs by the numbers of admissions and visits reported by the patients respectively. Patients themselves classified how many visits and admissions were reflux related in relation to the total number of visits. There is a possibility that patients may not have fully understood the clinical consequences of GORD; hence, they may misclassify the reason for a consultation. If such misclassification is different across treatment groups, estimates of incremental costs may be biased.

For the first year of the trial, data on resource use were collected at 3 months and 12 months, and for the whole year using an additional questionnaire. To make the most efficient use of the data available for the first year of the trial, resource use at 1 year was estimated as the greater of the area under the curve between the first and second questionnaire and the 12-month health-care survey. This is in line with the procedure employed for the earlier publication evaluating the REFLUX trial.1

The cost of surgery included the costs of (1) presurgical procedures (endoscopy, pH monitoring and manometry), (2) the surgery team, (3) operative complications, (4) hospital stay, (5) capital costs and overheads and (6) consumables. The cost of reoperations was assumed to be equivalent to the mean cost of the first surgery. The cost of reflux-related visits to and from the GP was assumed to be equivalent to the average cost of visits to and from the GP.64

Quality-adjusted life-years

Health outcomes were expressed in terms of QALYs. HRQoL was assessed in the REFLUX trial at baseline and 3 months and then yearly until 5 years using the EQ-5D.65,66 The EQ-5D is a standardised and validated generic instrument for the measurement of HRQoL. It has five dimensions: mobility, ability to self-care, ability to undertake usual activities, pain and discomfort, and anxiety and depression. Each dimension has three possible responses (no problems, moderate problems or severe problems), creating 245 mutually exclusive health states. Each of these health states has been valued in a large UK population study using the time trade-off method, in which 1 corresponds to perfect health (thus the maximum value possible) and 0 corresponds to death.65,66

QALYs for each patient were calculated as the area under the curve following the trapezium rule,67 which assumes linear interpolation between follow-up points. Incremental mean QALYs between treatment groups were estimated with and without adjustment for baseline utility, using OLS regression.

Discounting

Costs and outcomes from year 2 were discounted using a 3.5% annual discount rate, in line with current guidelines.65,68

Missing data and multiple imputation

Given the extent of missing data, the multiple imputed data set is presented as the base case. This was created using all available data and multiple imputation with chained equations.69 Mean imputation was used to predict missing data at baseline,70 as randomisation should ensure equal distribution of potentially confounding variables. Complete-case analysis refers to only those patients who returned all questionnaires and completed all EQ-5D profiles.

Missing or inconsistent answers to questions on resource use were dealt with as follows. For medication use, patients were asked at each follow-up questionnaire whether or not they were using prescribed medication for reflux and, if so, to indicate the name, strength and the number of tablets taken in the past 2 weeks. It was evident from preliminary analyses that the answers to the first question were not necessarily consistent with the answers to the second question. Therefore, the following rule was applied for the costing of drugs: (1) if the patient provided the name, strength and number of tablets taken, he/she was assumed to be taking medication; (2) if the patient did not specify either a drug or the number of tablets taken, he/she was considered not to be taking medication; (3) if the patient specified a particular drug but no dosage, the missing data were imputed as the median of all other patients on that medication. Similarly, missing answers to the questions regarding GP visits and hospital admissions were assumed to indicate that no visits or admissions occurred. Because of the nature of the questionnaire, it is reasonable to assume that absence of an answer indicates no use of services.

Multiple imputation71 was the statistical technique chosen to deal with missing cost and HRQoL data because of non-returned questionnaires and incomplete EQ-5D profiles, using the user-defined programme ‘ice’ in Stata 11.1. Multiple imputation presents three major advantages over standard ad hoc methods for dealing with missing data (such as mean imputation and last value carried forward): (1) it makes full use of all of the available data, (2) it incorporates uncertainty associated with the missing data and (3) it ensures unbiased estimates and standard errors as long as data are MAR.69 [Little and Rubin72 defined three missing data mechanisms: (1) MCAR if the probability of data being unobserved is independent of both observed and unobserved values; (2) MAR if the probability of data being unobserved is dependent on the observed values but independent of unobserved ones and (3) missing not at random (MNAR) if the probability of data being unobserved is dependent on unobserved values.]

Multiple imputation follows three steps. First, regression models are used to predict plausible values for the missing observations from the observed values. A random component is included to reflect the uncertainty around the predictions. These values are then used to fill in the gaps in the data set. This process is repeated m number of times (m being the number of imputations), creating m number of imputed data sets. Second, each data set is analysed independently using complete-case methods. Third, the estimates obtained from each imputed data set are combined to generate mean estimates of costs and QALYs, variances and CIs using Rubin's rules,73 in such a way that the uncertainty around the predicted values is fully taken into account.69,74 Because the REFLUX trial has missing data for both costs and EQ-5D scores, multiple imputation using chained equations (MICE) was employed. For MICE, each variable is predicted with its own regression model. Each imputed data set is created by running the regression models over several cycles, in which each variable informs the prediction of the other variables.69,74 To obtain overall estimates of mean and incremental costs and QALYs across all of the imputed data sets, the ‘mim’ command was used.75 Semi-parametric bootstrapping in Stata 11.1 was employed to estimate the probability that surgery is cost-effective, while maintaining the correlation between costs and QALYs (see Incremental analysis for more details).76

Plausible prediction of the missing data depends on the appropriate specification of the regression models used in MICE.74 If a model is misspecified, the distribution of imputed values may not resemble that of the observed values, and thus the estimates of treatment effect may be biased.69 The regression model specified will depend on the type and distribution of the variable to be predicted.70 The variables required for the economic evaluation are costs for each year and EQ-5D scores at each time point. Both are continuous variables and neither is normally distributed; EQ-5D scores in the REFLUX trial are bounded between −0.594 and 1,66 and costs are bounded at zero and tend to present a positive skew. Two approaches to deal with non-normality with MICE have been suggested in the literature:69 (1) transformation towards normality and (2) predictive mean matching. [In predictive mean matching the missing observation is imputed with an observed value from an individual with a similar linear predictor.70 Consequently, the distribution of imputed values tends to closely match the distribution of the observed values.69] Using the REFLUX data set none of the transformation approaches (Box–Cox,77 log-transformation and log-transformation of non-zero values with generation of an indicator variable78) were successful in transforming the data distribution to normality. As a result, predictive mean matching was the strategy employed to ensure that the distribution of imputed values closely resembled the distribution of observed values. All known covariates thought to be associated with the missingness mechanism, costs and EQ-5D scores were included in the prediction equations: EQ-5D scores at each follow-up point, costs at each year, allocation, BMI, age and sex. A total of 100 imputations (m = 100) was used to ensure efficient and reproducible estimates.69

Multiple imputation provides unbiased estimates of treatment effect if data are MAR. Whether or not data are MAR is an untestable assumption by definition, as unobserved values are unknown. Departure from the MAR assumption may have implications for decision-making if the results from the cost-effectiveness analysis differ from those of the base case. Sensitivity analysis was used to test the impact on the cost-effectiveness results if data were MNAR, that is, if patients with worst outcomes or greater costs were more likely to have missing data.70,79 Four scenarios were tested. In scenario (1), all patients with missing data had their total QALYs reduced by 10%, 20%, 30%, 40% and 50%. Conversely, in scenario (2), for all patients with missing data costs were increased by the same proportions (10%, 20%, 30%, 40% and 50%). In scenario (3), only surgery patients with missing data had their QALYs reduced. In scenario (4), costs were increased only for patients undergoing surgery.

Incremental analysis

The cost-effectiveness of surgery was evaluated by comparing the costs and QALYs incurred in the surgery arm with the costs and QALYs in the medical management arm at 5 years of follow-up, using conventional decision rules and estimating ICERs as appropriate.80 If one intervention is associated with greater mean QALYs and lower mean costs it is deemed cost-effective by dominance. The ICER is calculated if either treatment arm does not dominate. The ICER summarises the additional costs associated with one intervention over another and relates this to the additional benefits. This ICER is then compared with a threshold for the cost per QALY. The National Institute for Health and Care Excellence (NICE) uses a threshold cost per QALY of around £20,000–30,000 to determine whether or not an intervention represents good value for money in the NHS.65 Consequently, if the ICER is < £20,000, laparoscopic fundoplication could be considered potentially cost-effective. ICERs between £20,000 and £30,000 per QALY are considered borderline and an ICER > £30,000 is not typically considered cost-effective.

The ICER can be re-expressed using the net monetary benefit (NMB). The NMB of an intervention is the value of the health benefits gained from a particular intervention compared with standard care in monetary terms, minus the incremental costs of the intervention. The translation of health benefits into the monetary scale was made using a cost-effectiveness threshold of £20,000. This is the threshold commonly use by NICE (this corresponds to 1 QALY being valued at £20,000). Therefore, the NMB provides a measure of the gain (or loss) in resources of investing in a particular intervention when those resources could have been used elsewhere.81 The NMB of laparoscopic fundoplication and medical management were calculated and used to demonstrate the influence of trial duration on the estimates of cost-effectiveness of surgery.

As discussed previously, the multiple imputed data set was used as the base case for the cost-effectiveness analysis because of the large proportion of data lost for the complete-case analysis. Because total costs and total QALYs are cumulative quantities, any missing data at any of the follow-up points will result in that patient being dropped from a complete-case analysis. The cost-effectiveness results using the complete case are presented for comparison. Complete-case analysis will provide unbiased estimates only if the data are MCAR, that is, the probability of data being unobserved is independent of both observed and unobserved values. Multiple imputation ensures unbiased estimates if the data are MAR (the probability of data being unobserved is dependent on the observed values but independent of unobserved ones). Because unobserved values are unknown, the missing data mechanism and hence the validity of either assumption is untestable. Nevertheless, multiple imputation presents two advantages. First, it requires a less stringent assumption for ensuring unbiased estimates. Second, if data are MCAR, both complete-case and multiple imputation estimates will be unbiased whereas, if data are MAR, complete-case analysis will be biased.

Analysis of uncertainty for incremental analysis

Sensitivity analysis is used to explore and quantify any uncertainty in the cost-effectiveness results. Three types of sensitivity analysis were undertaken: structural, scenario and probabilistic sensitivity analysis. Structural and scenario sensitivity analyses were carried out on the complete-case data set. Probabilistic sensitivity analysis was carried out in both the complete case and the multiple imputation data set.

Structural sensitivity analysis consisted of a PP analysis that classified patients according to treatment compliance at 1 year of follow-up, that is, whose management at 1 year was consistent with their original random allocation. Consequently, the PP data set consisted of the patients randomised to surgery who actually had surgery, and of the patients randomised to medical management who did not undergo surgery at 1 year. Patients randomised to medical management who had surgery might differ from those randomised to medical management who were managed medically without surgery, for several reasons. A patient's condition might have worsened, prompting surgery, or patients might have changed their preferences and wish to be taken off medication. The latter implies that, had they been screened for the study at the point in time when they had surgery, they would not have been eligible for the study. These patients would have had a preference and would not have accepted randomisation. The condition itself is complex because of its recurrent and cyclical nature (patients suffering from reflux have punctual exacerbations, which can lead them to change their preferences and request surgery). Therefore, the reasons for not complying with randomisation are likely to be a combination of the two motives (worsening of condition and change in preference). PP was chosen because it was thought to be more similar to clinical practice, where patients can experience a wait for surgery and change their preferences during this period. Any switching of treatment after 1 year is assumed to be because of a change in clinical status, which would preclude inclusion in the clinical trial.

The base-case analysis included only the costs of reflux-related GP visits and hospitalisations. Two alternative costing scenarios were tested in sensitivity analysis: including either all GP visits or all hospital use, regardless of whether they had been classified as reflux or non-reflux related.

Probabilistic sensitivity analysis attempts to quantify the joint effect of uncertainty around the costs and QALYs. Semiparametric bootstrapping was used to estimate the probability that each intervention is cost-effective for a range of cost-effectiveness threshold values. In bootstrapping, the original data are sampled with replacement to create a new data set, in order to calculate estimates of treatment effect. Repeating this process a large number of times results in a vector of replicated statistics, which ultimately provide an empirical estimate of the CIs around mean incremental costs and QALYs. The probability of an intervention being the most cost-effective is the conventional method of presenting the uncertainty around the cost-effectiveness results. The CIs around the ICER are not presented because they are difficult to interpret and are not easy to use: a negative ICER can indicate that an intervention dominates (because it is associated with more benefits and lower costs than its comparator) or it is dominated (because it is associated with fewer benefits and higher costs).76

Validation

Several procedures were used to ensure the validity of the analysis. First, two statistical analysis codes (written in Stata) were developed in parallel and their results compared. Second, the code was developed by one analyst and checked independently by another. Third, the results were cross-checked in Microsoft Excel (Microsoft Corporation, Redmond, WA, USA) for a sample of the data set. Lastly, selected results were represented graphically and examined for face validity. The validity of the imputation strategy was explored by (1) analysing the data for predictors of missingness,70 (2) comparing the distributions of the observed and imputed values graphically70 and (3) estimation of Monte Carlo errors.69Appendix 8 describes the validation process in more detail.

Results

Patient population

Complete-case analysis consisted of the patients who returned all questionnaires and completed all EQ-5D profiles. Overall, there are 172 patients in the complete-case analysis (88 randomised to medical management and 84 randomised to surgery). Table 32 shows the numbers of questionnaires returned (includes those with some missing data) and the numbers of completed questionnaires returned for each year. As expected, the number of questionnaires returned in each year of follow-up decreases with time. The return of questionnaires does not follow a monotonic pattern, that is, patients who did not return the questionnaire for one particular year may have returned a questionnaire in subsequent years. Therefore, the number of patients in the complete-case analysis is lower than the number of completed questionnaires in year 5. The large number of patients not included in the complete-case analysis because of missing data strengthens the rationale for using the multiple imputation data sets in the base case.

TABLE 32

TABLE 32

Numbers of questionnaires returned and completed questionnaires returned and corresponding proportions per trial arm, according to ITT analysis

Health-care resource use

Table 33 summarises yearly health-care resource use in the two trial arms according to ITT analysis. During the first year of the trial, 111 patients randomised to surgery and 10 patients randomised to medical management underwent laparoscopic fundoplication. The 111 patients who were randomised to and received surgery constituted the surgery group in the PP analysis. The 169 patients who were randomised to medical management and did not undergo surgery during the first year of follow-up constituted the medical management group in the PP analysis. In the subsequent years of follow-up there were 15 patients who underwent surgery (one patient who had been randomised to surgery and 14 patients who had been randomised to medical management). These patients are included in the overnight hospital admissions category. Patients randomised to medical management reported more hospital and GP visits than the surgery patients over the 5 years of follow-up.

TABLE 33

TABLE 33

Health-care resource use per year per trial arm, according to ITT analysis

Table 34 shows the costs associated with health-care use according to ITT analysis for all available cases (see Appendix 9 for corresponding table for PP). All available cases uses data from all questionnaires returned at each time point. Per annum costs and costs per category refer to all available data, that is, to all participants who returned the questionnaire for that particular year or for that particular category. Therefore, the sum of the costs per category is different from the sum of the costs per annum. Similarly, total costs for complete-case analysis do not correspond to the sum of the costs per category or to the sum of the costs per annum because complete case is a subset of all available data because of the non-monotone missing data pattern. Total costs for complete-case analysis refer to the patients who returned all questionnaires and completed all EQ-5D profiles (84 surgery patients and 88 medical management patients).

TABLE 34. Costs associated with resource use for all available cases, discounted from year 2 at 3.

TABLE 34

Costs associated with resource use for all available cases, discounted from year 2 at 3.5%, according to ITT analysis

Patients randomised to medical management accumulate lower costs than patients randomised to surgery. Table 34 indicates that surgery patients accrued a large proportion of the total costs in the first year, and accumulated lower costs during the remaining 4-year follow-up than the medical management group. In contrast, the costs accrued by medical management patients are evenly distributed across the duration of the trial. These results suggest that the cost trend in medical management patients is steeper than in surgery patients; hence, that cumulative costs in medical management patients tend to increase at a greater rate than in surgery patients. Costs associated with surgery were the major cost driver for the surgery group. Costs associated with reflux-related medication were significantly greater for the medical management group than for the surgery group. Costs associated with admissions to hospital and GP visits were not statistically significantly different between the two groups. Surgery during years 2–5 is accounted for in the overnight hospital admissions. There were a few crossovers from medical management to surgery from year 2; hence, the difference in costs associated with overnight hospital admissions between the two treatment groups is small. These results suggest that patients undergoing surgery in subsequent years are not a major cost driver in determining the cost-effectiveness of surgery.

Quality-adjusted life-years

Table 35 summarises the EQ-5D scores reported at each follow-up point for all available cases (see Appendix 9 for the corresponding table for PP). All available cases uses data from all questionnaires returned at each time point. The surgery group appears to have better HRQoL than the medical management group, despite starting from a lower baseline EQ-5D on average (0.7201 in the medical management group and 0.7107 in the surgery group). The difference in HRQoL between the two treatment groups decreased with time. This may be due to patients randomised to medical management undergoing surgery throughout the follow-up period and/or to diminishing treatment effect over time.

TABLE 35

TABLE 35

Health-related quality of life (EQ-5D) for all available cases according to ITT analysis

Comparison of costs and quality-adjusted life-years between multiple imputation and complete case

Table 36 shows the comparison of the total costs per year between the complete-case data set and the multiple imputation results. Complete case includes only those participants who returned all questionnaires and fully completed the EQ-5D questionnaires. The similarity of both the means and the CIs provides some reassurance of the validity of the multiple imputation model. The distribution of costs and EQ-5D scores in the imputed data sets matches reasonably well the distribution of the original data (see Appendix 8 for details). Furthermore, the Monte Carlo errors are < 15% of the coefficient and CI estimates, suggesting that 100 imputations are sufficient to ensure reproducibility and statistical efficiency.69

TABLE 36

TABLE 36

Comparison between the complete-case and multiple imputation data sets for costs and HRQoL, according to ITT allocation

For both the complete-case and multiple imputation data sets, the participants randomised to laparoscopic fundoplication accrued greater costs but also reported greater HRQoL than participants randomised to continued medical management. The 95% CI for mean incremental QALYs crosses zero for the unadjusted for baseline estimates, whereas it remains above zero for the adjusted values. This result reflects the baseline imbalance in mean utility between treatment groups. Therefore, these results strongly indicated that surgery is associated with a greater QALY improvement than medical management. The sum of the differences in EQ-5D for the ITT groups does not correspond to the incremental mean QALYs because of the effect of discounting.

Cost-effectiveness

The results of the incremental analysis suggest that laparoscopic fundoplication is a cost-effective strategy for GORD patients eligible for the REFLUX trial (Table 37). The results for the complete-case analysis concur with those for the multiple imputation data set; across adjusted and unadjusted ICER for baseline EQ-5D, ICERs range between £5468 and £8410, well below conventional cost-effectiveness thresholds of £20,000 and £30,000 per additional QALY. For both data sets (complete case and multiple imputation), the probability of surgery being the more cost-effective intervention is > 0.82 for incremental analysis unadjusted for baseline EQ-5D and > 0.93 once incremental QALYs are adjusted for baseline EQ-5D. In the ITT analysis the ICER is higher for the multiple imputed data than for the complete case if QALYs are adjusted for baseline EQ-5D, but lower if QALYs are unadjusted. This might reflect the effect of having baseline EQ-5D in the prediction model, which would preclude the need for adjustment.

TABLE 37

TABLE 37

Incremental analysis for the ITT analysis at 5 years of follow-up for the complete-case and multiple imputation data sets

Figure 20 shows how the NMB associated with laparoscopic fundoplication increases with the duration of the trial. This reflects the increase in costs associated with the medical group, which offsets the initial investment made in laparoscopic fundoplication in the surgery group.

FIGURE 20. Net monetary benefit (incremental QALYs × £20,000 per QALY – incremental costs) over the duration of the REFLUX trial for the multiple imputation and complete-case data sets (QALYs adjusted by baseline EQ-5D).

FIGURE 20

Net monetary benefit (incremental QALYs × £20,000 per QALY – incremental costs) over the duration of the REFLUX trial for the multiple imputation and complete-case data sets (QALYs adjusted by baseline EQ-5D).

Structural sensitivity analysis: per-protocol status for the complete case

Structural sensitivity analysis consisted of PP status at 1 year for the complete case. In the PP analysis patients are classified according to the treatment actually received at 1 year of follow-up. The PP group consists of 111 patients who were randomised to surgery and who actually had surgery during the first year of the trial and 169 patients who were randomised to medical management and who did not undergo surgery during this time period. However, complete-case data exist only for 84 medical management patients and 66 laparoscopic fundoplication patients. Appendix 9 presents detailed results for costs and HRQoL according to PP analysis. As expected, patients who actually had surgery have higher costs than patients who did not undergo surgery, regardless of their randomisation. Table 38 summarises the incremental results of the PP analysis. Similar to the ITT analysis, the surgical policy is likely to be cost-effective at conventional (NICE) thresholds for cost-effectiveness. The incremental costs are higher and the incremental QALYs lower for the PP analysis (for surgery compared with medical management) than for the ITT analysis if no adjustment is made for baseline imbalances in EQ-5D. Therefore, the ICER is also greater (surgery is less cost-effective than suggested by the ITT analysis). Once total QALYs are adjusted for baseline EQ-5D, however, the incremental mean QALYs increase substantially and the ICER is reduced. Nevertheless, the adjusted ICER in the ITT analysis is lower than that in the PP analysis by around £2000.

TABLE 38

TABLE 38

Incremental analysis for the PP analysis at 5 years of follow-up for the complete-case data set

Scenario sensitivity analysis: all general practitioner and all hospital costs for complete case

The results of the scenario analyses strengthen the case for the surgical policy (Table 39). For scenario 1, replacing reflux-related GP costs by all GP costs, the ICER increased slightly in relation to the base case. Nevertheless, the ICER remains well below conventional thresholds and the probability of surgery being cost-effective is > 0.83, for both adjusted and unadjusted analyses. In scenario 2, replacing reflux-related hospital costs by all hospital costs, medical management was ‘dominated’ by the surgical policy because of this intervention being associated with greater benefits in terms of QALYs and lower costs. For this scenario the probability of surgery being cost-effective was > 0.93.

TABLE 39

TABLE 39

Incremental analysis for the scenario sensitivity analysis at 5 years of follow-up for the complete-case data set

Sensitivity analysis for the multiple imputation model: departure from missing at random assumption

The multiple imputation procedure assumes that the individuals who completed and returned all questionnaires are similar to the individuals who did not, conditional on their observed characteristics (MAR assumption).69,79 However, this may not be the case: patients who did not return a questionnaire may have experienced worse HRQoL and accrued higher health service costs, or vice versa. Sensitivity analysis on the multiple imputation model tested how sensitive the cost-effectiveness results are to the MAR assumption. Figure 21 represents the change in NMB adjusted for baseline EQ-5D as costs and QALYs are varied in patients with missing data. The origin, marked as ‘base case’, refers to the incremental results from the multiple imputed data set (ICER = £7028 per additional QALY). The right quadrant plots NMB after increasing the total costs in steps of 10% for patients for whom there was missing data, for both treatment groups and for surgery-allocated patients. The left quadrant plots NMB after decreasing total QALYs in similar fashion. Positive values for NMB indicate that surgery is cost-effective; negative values indicate that surgery is not cost-effective for a threshold of £20,000 per additional QALY.

FIGURE 21. Net monetary benefit (incremental QALYs adjusted for baseline EQ-5D × £20,000 per QALY – incremental costs) over variation in total costs and total QALYs in the multiple imputed data set.

FIGURE 21

Net monetary benefit (incremental QALYs adjusted for baseline EQ-5D × £20,000 per QALY – incremental costs) over variation in total costs and total QALYs in the multiple imputed data set.

The cost-effectiveness of surgery is relatively insensitive to any increase in costs; the NMB changes little if costs are increased for patients with missing data in both treatment groups and if costs are increased just for surgery-allocated patients with missing data. A similar result is observed for the reduction in total QALYs for all patients with missing data. In contrast, the cost-effectiveness of surgery is highly sensitive if it is assumed that surgery-allocated patients with missing data experience lower HRQoL than patients with complete data. A 10% decrease in QALYs for patients randomised to surgery with missing data results in NMB decreasing to negative values. This scenario shows that missing data can have an impact on the results under certain conditions. It is impossible to empirically confirm or refute the scenario from the data in the trial, but it could be considered an extreme case. It seems improbable in practice that surgical patients with poor quality of life are less likely to respond to follow-up questionnaires than similar participants undergoing medical management.

Conclusion

The results of the within-trial economic analysis suggest that laparoscopic fundoplication is the more cost-effective option for the management of the sorts of patients suffering from GORD who were eligible for the REFLUX trial. The ICER for the ITT approach in the complete case was between £5468 and £8410 per additional QALY, and for the multiple imputed data set was between £7028 and £7792 per additional QALY, depending on whether QALYs are unadjusted or adjusted for baseline. Adjusted results are likely to be more reflective of the improvement in HRQoL associated with surgery. The probability of surgery being cost-effective was > 0.80 for all analyses. The results are robust to the scenario analyses testing assumptions regarding resource-use and missing data mechanism apart from when surgery-allocated patients with missing data were assumed to experience lower HRQoL than other patients. In all scenarios the ICERs were similar to the base case ICERs and well below NICE cost-effectiveness thresholds.

Validation of within-trial (5-year) analysis and exploration of the need for a long-term model

Introduction

The within-trial analysis found that surgery was cost-effective over a 5-year time horizon. A sufficient condition for surgery to be unambiguously cost-effective over a longer term is that, in each year after 5 years, HRQoL is lower and costs are the same or increasing faster in the medical group than in the surgical group. The results from both the multiple imputation and the complete-case analysis suggest that surgery is likely to be a cost-effective alternative over the longer term. Based on the ITT analyses undertaken so far, it is unlikely that mean HRQoL in patients who had surgery will become lower than that in patients on medical management after 5 years, and it is also very unlikely that mean annual costs incurred by surgery patients will exceed those incurred by medical management patients. If these results are robust, then there is no need to develop an economic model to extrapolate the 5-year results over a longer time horizon. Surgery would simply become more cost-effective over time.

This section develops a statistical model to investigate whether or not the results obtained in the within-trial economic analysis are robust to alternative assumptions and methods, and uses the results to consider whether or not the evidence supports this sufficient condition over the longer term.

Methods

Overview

The aim of this analysis was to estimate the difference in costs and the difference in HRQoL (measured with the EQ-5D) between the surgical and medical management randomised groups and describe how this difference evolves over time. A simple way of doing this would be to estimate the difference in costs and outcomes at each time point independently. The results of this analysis were shown in Table 34 (for costs) and Table 35 (for EQ-5D). These showed that costs were greater in the surgical group in the first year but greater in the medical group thereafter. EQ-5D tended to be higher in the surgical group in years 4 and 5 but the CIs crossed zero. There are two main limitations of this simple analysis:

  1. The outcomes at each time point are unlikely to be independent. If the outcomes at one time point are correlated with those at other time points this analysis may lead to biased estimates of standard errors.
  2. The analysis does not take account of missing data. If missing data are not MCAR then this analysis may lead to biased estimates of the mean of the coefficients.

The multiple imputation accounts for the correlation of responses from the same individuals and for the missing data (see Table 36). However, the validity of this analysis depends on the correct specification of the equations used to impute the missing data. Moreover, other regression-based methods are available for handling missing data in longitudinal studies, principally mixed models, and results may be sensitive to the methods used. This section uses a mixed model to handle the missing data and compares predicted outcomes with those using multiple imputation.

Mixed models

A mixed model is a regression-based method for handling continuous data that is measured at more than one time point during follow-up. It allows estimation of treatment effects under the assumption that the data are MAR, that is, dropout may depend on intermediate values. Analysing each time period separately assumes that dropouts are MCAR, a stronger assumption. A mixed model uses all of the observed data. Individuals who dropped out after providing intermediate data contribute to the estimation of the final outcomes. This analysis has the same aims as multiple imputation but uses a different method and with different assumptions. Therefore, it can also be viewed as a sensitivity analysis to test the robustness of the multiple imputation.

The mixed model can be written as:

Where for an individual i,

Yi = α +βRi+Xi+ ei, ei ∼ MVN(0,Σ)

Ri = randomised group

Yi = vector of all outcomes (at times 1…T)

Xi = vector of covariates

The variance-covariance matrix Σ is unstructured, that is, no prior assumptions are made about the values of the correlations. Separate models are fitted for costs and for EQ-5D. Baseline values of the EQ-5D are included as an ‘outcome’ (i.e. at t = 1). Dummies representing time points 1 to T were included as covariates Xi. Treatment effects are included as time* randomised group interactions although no treatment effect at baseline is allowed. No other covariates are included in the model.

Results

Costs

Figure 22 shows the difference in costs (excluding initial surgery) in years 1–5. Mean costs are greater in the medical management arm of the trial after the first year and the CIs only just cross zero. These results are very similar to those of Table 34.

FIGURE 22. Difference in costs (£) excluding initial surgery (mean, 95% CI).

FIGURE 22

Difference in costs (£) excluding initial surgery (mean, 95% CI).

European Quality of Life-5 Dimensions

Figure 23 shows the difference in EQ-5D at 3 months and in years 1–5. Mean HRQoL tends to be greater in the surgical group during the trial, although the CIs cross zero in some periods. These results are very similar to those of Table 35.

FIGURE 23. Difference in EQ-5D, adjusted for baseline (mean, 95% CI).

FIGURE 23

Difference in EQ-5D, adjusted for baseline (mean, 95% CI).

Conclusion

The results of the mixed model (taking account of correlations and missing data) are very similar to those of the complete-case analyses (which assumed that data at different time points were independent) and the multiple imputation (see Table 36). All of these analyses show that follow-up costs are significantly greater in the medical management arm of the trial (because of greater reflux-related hospital admissions, GP visits and use of medication). The analyses also show that surgery tends to be more effective, in terms of HRQoL, than medical management over the 5-year follow-up. Although this treatment difference appears to weaken over time, there is no reason to expect that surgery will become less effective with a longer follow-up. Consequently, the evidence suggests that the cost-effectiveness of laparoscopic fundoplication will not diminish if measured over a longer follow-up time. Nevertheless, there is uncertainty surrounding these conclusions because of the large proportion of missing data.

Discussion

The results of the cost-effectiveness analysis strongly suggest that a policy of offering laparoscopic fundoplication to people with GORD who require long-term PPI treatment for symptom control is more cost-effective than continuing to manage them with PPIs (with selective use of surgery if symptoms are poorly controlled), assuming that the cost-effectiveness thresholds used by NICE (£20,000–30,000 per QALY) are appropriate for the NHS. Surgery represents a greater initial investment and lower medium-term costs, whereas costs associated with medical management remain relatively constant or slightly increase over time. The difference in HRQoL achieved with surgery is sustained over 5 years, although the results indicate that mean EQ-5D scores for surgery and medical management tended to converge (as discussed in Chapter 3, in part this reflects later surgery in patients randomised to medical management). The ICER favours surgery when incremental QALYs are both adjusted and unadjusted for baseline EQ-5D. Nevertheless, adjusted incremental QALYs are likely to be a more reliable estimate of treatment effect as they account for differences in baseline utility. Patients randomised to medical management reported higher baseline utility than patients randomised to surgery. Failure to adjust for these baseline differences could result in a biased ICER, as discussed elsewhere.62 The results from the multiple imputed data set are likely to be more accurate than the results from the complete-case analysis because of the large number of patients with incomplete data (> 50%). Therefore, multiple imputation was chosen for the base case. Nevertheless, the results are similar across the data sets and laparoscopic surgery is the more cost-effective intervention for both.

There is little uncertainty regarding the cost-effectiveness results once adjustment for EQ-5D at baseline is performed. The probability of surgery being cost-effective ranged between 0.932 and 0.999 for the base case and across the scenarios tested. Furthermore, it is clear from the results of the scenario analysis that the base-case results are robust to alternative costing assumptions. The PP analysis is used to test whether or not the ITT analysis is potentially misleading because of the dilution of treatment effect (some patients randomised to surgery did not have surgery and some patients randomised to medical management actually had surgery). The PP analysis has the advantage of mimicking clinical practice and could be thought to be more relevant to decision-makers. However, the PP analysis is not without its limitations. First, and as with any PP analysis, it is sensitive to selection bias because of breaking randomisation. Second, the PP analysis may still underestimate the effect of surgery because patients having surgery between 2 and 5 years are counted as medical management. Third, the PP analysis is actually a subset of the ITT groups, which further reduces the data set. For these reasons, the ITT results are likely to be more reliable. It is important to characterise any uncertainty in the analysis as failure to do so can result in inaccurate estimates of cost-effectiveness, particularly when costs and benefits are highly skewed.82 In addition, any analyses of uncertainty can help to illustrate where caution should be exercised when interpreting the results of a cost-effectiveness analysis. The results of the sensitivity analyses suggest that the uncertainty is likely to be driven by HRQoL. If QALYs for randomised surgery patients with missing data are reduced, surgery may no longer be cost-effective.

For the within-trial analysis no assumptions are needed about the longer-term effectiveness and costs associated with surgery and medical management. However, the within-trial analysis has some disadvantages. First, it does not account for any differences in costs and QALYs that may be expected over the longer term (> 5 years), which could be due to differences in recurrence/relapse, medication use, NHS service utilisation or HRQoL. Second, it uses data only from the REFLUX trial and does not consider other sources of evidence. Third, only a limited range of sensitivity analyses was possible. Finally, the large proportion of missing observations required an assumption regarding the mechanism of missing data, which may have some impact on the cost-effectiveness estimates. The exploration of the need for a longer-term model aimed to address the first limitation of the within-trial analysis. A mixed model was used to examine the trend in the difference in costs and the difference in QALYs between treatment groups over time.

No evidence was found to suggest that the cost-effectiveness of surgery diminishes over a longer follow-up time. Both multiple imputation and mixed models are commonly used methods to handle missing data. Multiple imputation was used in the previous section because, by imputing missing data, it naturally allows the estimation of the total cost and total QALYs for each patient in the trial. Furthermore, it can handle correlation between several outcomes (in this case costs and QALYs) as well as correlation between outcomes over time. Mixed models do not explicitly impute missing data but adjust the estimates of the differences between treatment groups at each discrete follow-up time to take account of the missing data. The approach therefore offers an alternative method to multiple imputation to examine trends in the difference in costs and the difference in QALYs between treatment groups over time. Because the analyses using multiple imputation and mixed models agree, we can have more confidence that the results are valid and that surgery is the most cost-effective intervention.

A number of other studies have quantified the cost-effectiveness of laparoscopic fundoplication and medical management. Not all of these, however, use a common metric (such as QALYs) to measure benefits. Of those studies quantifying the benefits associated with the two treatments using QALYs, the ICER for surgery ranged from £180 to £49,000. There are a number of key differences between the methodologies used in the studies, which limit the extent to which comparisons can be made between results. Importantly, not all of the studies are based on within-trial analysis; in fact, only two are: those by Grant et al.1 and Goeree et al.46 The remainder use modelling techniques to either extrapolate short-term trial results over the longer term or pool available evidence to generate estimates of costs and outcomes. Comparing the results from Grant et al.1 with those from Goeree et al.46 we can see that there are quite significant differences in the estimates of cost per QALY, from £19,000 in Grant et al.1 to £49,000 in Goeree et al.46 This difference is primarily driven by the difference in QALYs. In Goeree et al.46 the EQ-5D score is actually lower in the surgical group than in the medical management group (this is unadjusted for baseline imbalances) whereas the HUI3 score is higher for the surgical group than the medical management group. The reason for the difference between the EQ-5D and the HUI3 scores is not discussed in the paper. The cost differences in the two studies were similar. Comparing the results from Goeree et al.46 with those from the updated trial analysis we see even starker contrast between the ICERs produced (£7028 vs £49,000). Again, this is driven by the differences in EQ-5D scores observed throughout the trial period. The EQ-5D scores in the REFLUX 5-year analysis are consistently higher in the surgery group than in the medical management group, although there is a tendency for convergence towards the end of the follow-up period. Further research is required to look at why the trials produce such different results using the EQ-5D.

Other considerations

The generalisability of these findings to the GORD population in the UK is difficult to ascertain because the proportion of GORD patients meeting the entry criteria for this trial is uncertain. The surgeons participating in the trial may be more proficient in the procedure than those in actual practice. Furthermore, capacity constraints may limit the offer of the surgery policy to all potentially eligible patients.

Copyright © Queen's Printer and Controller of HMSO 2013. This work was produced by Grant et al. under the terms of a commissioning contract issued by the Secretary of State for Health. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.

Included under terms of UK Non-commercial Government License.

Bookshelf ID: NBK260636

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (2.4M)

Other titles in this collection

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...