Included under terms of UK Non-commercial Government License.
NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Johnston R, Uthman O, Cummins E, et al. Canagliflozin, dapagliflozin and empagliflozin monotherapy for treating type 2 diabetes: systematic review and economic evaluation. Southampton (UK): NIHR Journals Library; 2017 Jan. (Health Technology Assessment, No. 21.2.)
Canagliflozin, dapagliflozin and empagliflozin monotherapy for treating type 2 diabetes: systematic review and economic evaluation.
Show detailsThree submissions were received, from:
- Janssen for canagliflozin
- AstraZeneca for dapagliflozin
- Boehringer Ingelheim for empagliflozin.
The submissions had three main sections:
- a review of the evidence on clinical effectiveness and safety
- a NMA comparing SGLT2 inhibitors with comparators
- cost-effectiveness analysis.
Clinical effectiveness
As regards clinical effectiveness, the evidence provided by the manufacturers was very similar to that presented earlier in this report. The same trials were presented. The submissions were of good quality and we have very few comments.
The Janssen submission included 52-week results from an extension to the CANTATA-M study,92 which we omitted because there was no comparison group. In brief, the 52-week data showed that the reductions in HbA1c were largely maintained (reductions on 100 mg 0.91% at 26 weeks and 0.81% at 52 weeks; reductions on 300 mg 1.16% at 26 weeks and 1.11% at 52 weeks). However, a little more weight was lost by 52 weeks.
The Boehringer submission included data from a 76-week extension study, which had been published in abstract form only.82 Almost 40% of patients dropped out, leading to extensive use of last observation carried forward (LOCF), which is not a reliable method because people do not drop out at random. It is likely that those who stayed in were doing better than those who dropped out.
The Boehringer submission made a useful point about adherence to therapy. This would apply not just to diabetes medications: people with diabetes tend to have comorbidities such as hypertension and osteoarthritis (due to excess weight) and so may be on other medications for other conditions. Donnan et al.182 reported that the more medications were prescribed and the more complex the regimen, the poorer the compliance. Eli Lilly now market a combination tablet with empagliflozin and linagliptin.
One omission from the AstraZeneca submission was any mention of cancer risk. The FDA were concerned about breast, prostate and bladder cancer, even though in none of these cases was the risk statistically significantly raised.183 In the trials, there were nine cases of bladder cancer amongst 5501 subjects in the dapagliflozin group versus one amongst 3516 in the placebo arms. Some of these cancers appeared too soon after the patients started dapagliflozin for credible causality, and all but one of the patients had had microscopic haematuria, suggestive of bladder pathology, before starting the drug or within 6 months of doing so.184 One hypothesis is that an increased UTI rate in patients on dapagliflozin leads to increased testing or urine and hence of detection of bladder tumours, but 7 of the 10 patients diagnosed with bladder cancer had not had UTIs.184
Breast cancer was observed in nine patients (0.04% of female patients) in the dapagliflozin arms but in none of the placebo groups. However, two cases were diagnosed within 6 weeks of starting dapagliflozin so were certainly not due to the drug.
There were 10 cases of prostate cancer in the dapagliflozin arms (0.34%) versus three cases in the placebo arms (0.16%).
Some cancers, albeit less common ones, were less common (though 95% CIs overlapped with no difference) in the dapagliflozin groups, and overall there was no difference in rates for all cancers.
It is difficult to explain the differences in bladder and breast cancer, but it seems unlikely that dapagliflozin is the cause.
Network meta-analyses
There were marked differences amongst the NMAs. For example, the AstraZeneca meta-analysis included seven trials of sulfonylureas, with five trials involving glibenclamide. The Janssen meta-analysis included nine trials of sulfonylureas, with five trials comparing glibenclamide with other sulfonylureas and two of glibenclamide against pioglitazone. Only one trial was in both NMAs.
The Boehringer NMA included 22 trials involving sulfonylureas: glibenclamide, seven; glimepiride, six; gliclazide, six; glipizide three; and tolbutamide, one.
Of the seven sulfonylurea trials in the AstraZeneca NMA, four were also in the Boehringer NMA. Of the nine sulfonylurea trials in the Janssen NMA, three were also in the Boehringer NMA. Only one trial was in all three of the manufacturers’ NMAs.
AstraZeneca
The AstraZeneca NMA starts with a major assumption with which the AG disagrees, which is that the classes of drugs (sulfonylureas, TZDs, DPP-4 inhibitors and SLT2 inhibitors) can be grouped. In the case of the TZDs, this does not matter because all of the trials cited include pioglitazone. However, our view is that the sulfonylureas have different effects, and that gliclazide is the sulfonylurea of choice, as explained in Chapter 1.
We also disagree with the assumption by AstraZeneca that when monotherapy fails, NPH insulin would be started. This seems strange when there is such a range of oral medications that can be tried. We note that a recommendation to introduce insulin as a second drug was one option in the consensus statement by a group on behalf of the ADA and the European Association for the Study of Diabetes in 2006.185 However, this consensus was strongly criticised by a larger group of experts as being based more on opinion than evidence.186
One problem with the AstraZeneca NMA is the data reported in the forest plot (figure 4.6) for the pooled sulfonylureas, which include glibenclamide, glimepride, glipizide and one gliclazide trial. The net effect size in HbA1c lowering is 0.12%, which is unusually low. Two trials provide 85% of the weight in this meta-analysis, Rosenstock 2013187 and Shihara 2011.188 In the forest plot for the Shihara trial,188 glimepiride is shown as reducing HbA1c by 0.10%, and in the Rosenstock trial187 glipizide is shown as increasing HbA1c by 0.03%. These results are not credible.
In the Rosenstock trial,187 about half of the patients left the trial before conclusion, with 21.5% of the glipizide group doing so because they needed additional ‘rescue’ treatment because of hyperglycaemia. About half of the recruits had been on glucose-lowering drugs before entry, and had a 4-week washout period. However, the primary analysis included the rescued patients and this is reflected in the one of the analyses, which reported a 0.09% reduction in HbA1c. (It is not clear where the rise of 0.03% in the AstraZeneca forest plot comes from.) The baseline HbA1c in the glipizide group was 7.45%, and 33% had baseline HbA1c of 7.0% or less. So a large reduction in HbA1c would not be expected. However, if the rescue group is removed, those completing the trial had mean reduction in HbA1c of 0.31% (from text) or about 0.5% (from graph).
The Shihara 2011 trial188 compared glimepiride and pioglitazone monotherapy in drug-naive Japanese patients. Baseline HbA1c was 7.8% in the glimepiride group and it fell to 6.8% by 6 months (from graph – reduction of 6.9% in text at 3 months). It is not clear where the 0.1% figure used in the AstraZeneca meta-analysis comes from, though we note that the HbA1c difference between glimepiride and pioglitazone at 3 months as 0.1%.
One other sulfonylurea trial in the forest plot is shown as having a very small reduction in HbA1c. This is Erem 2014,158 which was used in the AG NMA. The AstraZeneca forest plot reports a reduction in HbA1c of 0.14% compared with placebo. There was no placebo group in Erem 2014158 which compared gliclazide with pioglitazone and metformin. The HbA1c was reduced from 8.26% at baseline in the gliclazide group to 6.92% at 6 months, so a more credible reduction against placebo might have been to use the 1.34% before-and-after figure.
Given that these Rosenstock187 and Shihara188 trials dominate the meta-analysis, the sulfonylurea section of it is not credible. It contains eight trials but the others are smaller and carry less weight. Apart from the Erem trial,158 their HbA1c results in the other five are as expected from sulfonylureas, showing reductions ranging from 0.6% to 1.8%.
However, these problems may just affect the forest plot. In appendix 8.9, the reduction attributed to glipizide in the Rosenstock trial187 is 0.23%, still smaller than usual but more credible. The reduction stated in this table for glimepiride in the Shihara trial188 is 1.0%. In addition, the caterpillar figure 8.9 in the appendices looks reasonable and is followed by a reported difference for sulfonylureas versus placebo of 0.80% in table 8.21.
Table 4.4 in the AstraZeneca NMA gives a reduction in HbA1c of 0.99% with sulfonylureas, compared with placebo. In the modelling a figure of –0.95% is used, which corresponds with both of the submitted AstraZeneca models and table 5.3 of their submission. So the forest plot figures are a minor mishap that does not affect the AZ modelling.
Review of statistical methods
The AstraZeneca submission estimated both fixed- and random-effects meta-analyses for the continuous and count-based outcome measures. It used the DIC to assess model fit, with at least a 3-point change signifying an improved model. Also, the manufacturer submission (MS) compared the posterior distribution of between-study SDs with the prior distributions to assess whether or not it was updated by the available evidence (i.e. the additional information had had an effect). Random-effects models were fitted first, as they were considered a priori as the appropriate model. Fixed-effects models were selected only if they significantly improved model fit as demonstrated by DIC and changes to the posterior distribution of between-study SDs. Clinical and statistical heterogeneity were assessed through an evaluation of sources and the I2-statistic for pairwise comparisons, respectively. Heterogeneity was examined through a sensitivity analysis using meta-regression to adjust for the effects of baseline HbA1c. Consistency was also assessed through a comparison of the direct and indirect evidence using pairwise meta-analyses of the active treatments versus placebo for the outcome of HbA1c only. The overall modelling strategy used in the MS seemed appropriate.
AstraZeneca undertook Bayesian Markov chain Monte Carlo (MCMC) NMAs for continuous and count-based outcome measures. It specifies that vague priors were used for unknown parameters; however, no details were provided as to the distributions or link functions used in the models. Vague priors are usually specified; however, there are occasions when other priors should be assessed to establish the possible effects on the posterior estimates [e.g. binomial model with a logit link function or a rate model with log link function (when a uniform prior is used for the SD) or when data are sparse and the model fails to converge (when vague gamma priors are used for precision)]. No sensitivity analyses assessing the effects of different distributions, link functions or priors were presented. As the treatments considered in the NMAs were assessed by class, this may be less of a concern. The MS reports that MCMC models were run using three chains starting from different values of the unknown parameters, used a burn-in of ≥ 20,000 iterations, an update of ≥ 100,000 iterations and a parameter thin of 10. Convergence was assessed using history plots of the chains for the relevant parameters (overlapping histories indicating convergence) and a Monte Carlo error for each parameter (error of ≤ 5% of posterior SD indicating convergence). No assessment is reported regarding the influence of autocorrelation. The approach taken in the MS to MCMC models appears appropriate.
The MS reports NMAs on classes of treatments (i.e. SGLT2s, DPP-4s, SUs, TZDs) rather than comparing individual treatments. Such ‘lumping’ of evidence is a concern as regards the assumption of consistency, leading to heterogeneity, difficulties in interpreting results and potential conflict between the direct and indirect evidence. The MS states that the rationale for considering the treatments as a class was due to the limited evidence base for some treatments; that previous NICE CGs had indicated that they could be considered as a class; and that heterogeneity among some individual studies in terms of study characteristics within a class of treatments meant that comparison of individual studies may be affected by a risk of bias. The MS should have considered a NMA of individual treatments as well as presenting one of class effects. This would have shown results similar to the AG NMA, when dapagliflozin has slightly less effect than empagliflozin 10 mg and canagliflozin 100 mg, the other starting doses. Although it is not clear which treatment was the reference treatment in the NMAs, results are presented for comparisons of the treatment classes with both placebo and SGLT2 only.
Continuous outcomes of mean change from baseline in HbA1c, weight and SBP (MD scale) and count-based outcomes of proportion of patients experiencing hypoglycaemia (odds ratios) were used in the NMA. Although data for the continuous outcomes were for ITT populations using LOCF, any missing data were based on estimates from the primary study. Data time points ranged from 18 to 30 weeks.
Comparison of the baseline characteristics of the 32 studies showed variability. Although the MS stated that the trials were generally similar in baseline characteristics, it identified that nine RCTs were conducted only in Asian patients, one RCT had a higher mean age, one RCT had a higher mean baseline HbA1c, eight RCTs had higher mean baseline weights and that average duration of diabetes and baseline BMI varied. It should be noted that the included studies were conducted between 1994 and 2014 and study duration ranged from 18 to 102 weeks. Although the effects of baseline HbA1c was assessed through meta-regression and those associated with the Asian-only studies through exclusion of the studies in a sensitivity analysis, possible heterogeneity associated with the other factors was not considered further.
The MS presents network diagrams of the decision space for the classes of treatment. The number of RCTs linking each treatment class is not clear, and subsequent forest plots are presented for comparisons with placebo only. It is difficult to judge whether or not sparse evidence networks or zero values were a concern, although the ‘lumping’ of evidence into treatment classes may well have overcome this issue. It is also unclear which treatment was used as the reference treatment.
The MS clearly specified the approach it had taken to the majority of the elements of its NMAs. It lacked details concerning the prior distributions and link functions used; its assessment of autocorrelation in MCMC models; and sensitivity analyses concerning the elements of the models themselves (e.g. prior distributions, link functions and priors for parameters). Although it assessed some possible causes of heterogeneity, others were not considered (e.g. participant characteristics, length of study follow-up). It appropriately examined consistency of the outcomes from the NMAs. The MS identified several limitations underlying its analysis, including high placebo effects associated with the assessment of body weight in a dapagliflozin monotherapy study and a study focusing on Asian patients; a lack of evidence on specific patient groups (i.e. metformin intolerant); limited duration of follow-up; different definitions of hypoglycaemia; and inconsistent reporting of safety outcomes. However, the key limitation that affects the NMA is the lack of evidence on individual treatments. As a result, the MS ‘lumps’ together the evidence by treatment class. This can cause concerns with regards to the assumption of consistency, and lead to heterogeneity, difficulties in interpreting results and potential conflict between the direct and indirect evidence.
Boehringer Ingelheim
The Boehringer NMA is shaded as confidential. It was very complex and included 37 studies, including some that the AG rejected for our NMA. All of the sulfonylureas trials were pooled into one node, which we think is undesirable, given the mix of drugs from tolbutamide to gliclazide. The NMA includes both the Saleem 2011149 and Jibran 2006151 papers with their striking similarities.
Janssen
The Janssen NMA included 40 studies, including some that the AG did not think relevant, such as dapagliflozin 5 mg. It included four DPP-4 inhibitors and four sulfonylureas. It did not include repaglinide but this was included in a sensitivity analysis.
Review of statistical methods
A Bayesian hierarchical model was used for the Janssen NMA. Although not explicitly stated, it is evident that both fixed- and random-effects models were estimated. No analysis was undertaken of possible effect modifiers using meta-regression; instead, sensitivity analyses excluded trials with different characteristics. The MS used the DIC to assess the goodness of fit of the models, selecting the model with the lowest DIC as the most appropriate. A threshold of 3 points on the DIC is used to judge significant change. Where a random-effects model was selected as the base-case analysis, a fixed-effects model was estimated in a sensitivity analysis. Given other statements in the MS, it is assumed that random-effects models may have also been estimated as a sensitivity analysis when a fixed-effects model was the base case. When trials had multiple arms, the MS correctly made adjustments to the statistical approach to account for the correlation between treatment effects from the same trials. The approach taken was based on a conditional distribution formulation of the multivariate normal distribution. The influence of heterogeneity was assessed through an analysis of the direct pairwise comparison of treatments using Cochran’s Q test (p = 0.1), I2-statistic (threshold > 50%), comparisons using forest plots and comparison of the characteristics of the trials. Consistency of the direct and indirect evidence was compared using the difference in the respective point estimates and their p-values, testing whether they differed statistically significantly from zero. As well as producing point estimates (and CrIs) of the MD and odds ratios, the MS ranked the probability of the different treatments as being the most effective based on the Surface Under the Cumulative Ranking (SUCRA). SUCRA produces probabilities that range from 100%, showing the treatment ranks first, to 0%, which shows it ranks last. These rankings formed the basis of the comparison of the different treatments, along with an assessment of the probability that canagliflozin performed better than the other treatments. The comparative ranks were interpreted on the basis that a treatment with > 70% was judged the best, between 30% and 70% no difference between treatments, and < 30% the alternative treatment was considered best. Although the analysis lacked an assessment of heterogeneity through meta-regression, the overall modelling strategy used in the MS appeared appropriate.
For the NMAs of continuous outcomes, the MS correctly assumed that a Normal distribution and identity link function should be used. Similarly, for binary outcomes, the MS appropriately selected a binomial distribution and logit link function. The MS states that it uses non-informative priors for unknown parameters. Priors for the Normal distributions for treatment effects (0, 104) and the uniform distributions for between-trial SDs (binary outcomes range (0,2); continuous outcomes range based on outcome scale with assessment of posterior distribution to select prior distribution) were specified. Although the priors are considered suitable, issues concerning sparse data may require other priors to be considered, particularly if the model fails to converge. Although not specifically stated in the MS, this issue appears to have been considered as a sensitivity analysis on the prior distributions for between-trial precision uses a gamma distribution (0.001, 0.001) for the random-effects model. No other prior distributions appear to have been examined in sensitivity analyses.
The NMAs used MCMC simulation in WinBUGS, running three chains with different starting values. It assessed convergence through history and Gelman–Rubin plots, although these are not presented. Fixed-effects NMAs used a burn-in of 20,000 iterations, which were discarded, and a further 20,000 iterations to monitor the parameters. Random-effects NMAs used a burn-in of 100,000 iterations (which were discarded) and monitored parameters for a further 100,000 iterations. Where convergence was not achieved, iterations were increased (numbers of iterations used not stated).
Treatments included in the NMA had to be in common use in the UK. The NMAs assessed both treatment- and dose-specific outcomes in the classes of SGLT2, TZD and DPP-4, with those for sulfonylurea pooled to reflect dose adjustments on a per-patient basis. The MS appropriately selected placebo as the reference treatment for all of the evidence network diagrams; however, all results were compared with canagliflozin. No comparisons were made between the other active treatments, which may reflect the sparse nature of the evidence.
Continuous outcomes measured the change from baseline in each treatment arm for HbA1c, FPG, weight, BMI and SBP. If data were missing, values were estimated as the difference between the final value and the value at baseline, with the variance calculated using an approach recommended by the NICE Decision Support Unit. Sensitivity analyses were conducted on the approach to estimating the variance of the mean change (i.e. within-patient correlation varied from base case of 0.5 to 0.7). Binary outcomes used the number of events and total patients in each treatment arm for calculating the proportion of patients reaching HbA1c of < 7%, proportion of patients with one or more hypoglycaemic events and proportion of patients reaching HbA1c of < 6.5%. Handling of missing data from binary outcomes is not discussed. Outcomes were assessed at 26 weeks ± 4 weeks, with a sensitivity analysis of 26 weeks ± 10 weeks. This variation may have led to heterogeneity in the outcomes reported, although the MS states that these were based on expert clinical opinion. Additional sensitivity analyses were also conducted including studies reporting outcomes from 16 to 21 weeks and/or 31–36 weeks.
There appeared to be some heterogeneity in the participant characteristics. Patients in the included studies ranged in age from 48 to 72 years; the proportion of males from 11% to 80%; the proportion who were white from 6% to 80%; and, in duration of their diabetes 1.1 years to 13 years. In many instances studies did not report the characteristics of their participant populations. As a result, heterogeneity was identified in the NMAs and sensitivity analyses were undertaken.
The MS presented evidence networks for the different comparisons undertaken. It was evident from the network diagrams that some of the treatments were in parts of the network that were unconnected, and these were excluded from the analyses. Other parts of the evidence networks were sparsely populated with only one trial. Such limited data may have resulted in posterior distributions of the SDs that included extreme values and the possibility of non-convergence of the model. This increased the uncertainty around the outcome of the NMAs. Trials including binary outcomes were affected by zero events. Where this occurred, the MS appropriately used a continuity correction (0.5 added to all cells counts of studies with at least one arm with a zero). Trials with no event in any arm or that were considered to affect convergence (basis of exclusion not stated) were excluded from the analysis.
The trials included in the evidence network were assessed through sensitivity analyses that excluded trials considered a source of heterogeneity or inconsistency; identified as lower quality (not double blind), where it was unclear if it assessed monotherapies; assessed a single ethnic group; or published in a non-peer reviewed journal or as part of a regulatory process. Further sensitivity analyses were conducted, which included an unpublished trial (DIA3011) assessing canagliflozin 100 mg and canagliflozin 300 mg, and repaglinide trials that included metformin and sulfonylurea.
The MS clearly outlined the key aspects of the NMAs. It estimated fixed- and random-effects Bayesian hierarchical models using MCMC simulation in WinBUGS, evaluating the fit of the models through DIC. Prior distributions and values were correctly assumed, with an alternative assessed through sensitivity analysis to examine the effects of sparse data. The MS discussed the simulation process in terms of chains run, iterations for burn-in and monitoring parameters, and the process for assessing convergence. The analysis also assessed heterogeneity, inconsistency between direct and indirect meta-analyses, and made adjustments for multiple treatment arms. The NMAs presented point estimates and CrIs for outcomes, and ranked treatments as to which performed the best. Treatments were compared with canagliflozin, with no comparisons of the other active treatments. Missing data were appropriately estimated for continuous measures; however, there is no discussion of missing data for binary outcomes. Outcomes were assessed at 26 weeks ± 4 weeks, with a sensitivity analysis at 26 weeks ± 10 weeks, which may have resulted in some heterogeneity. It was evident that the network was sparsely populated in certain comparisons and that there were zero values for binary outcomes. Although the zero values were handled appropriately through a continuity correction, the effects of sparse data for the continuous variables may lead to increased uncertainty around the estimates. The MS produced a range of sensitivity analyses to explore the robustness of the models. Overall, the methods used in the NMAs appeared appropriate and identified most limitations in the evidence. The sparse evidence base may influence the outcomes produced.
Comments
Despite the different approaches and inclusions, some findings from the different meta-analyses were similar. For example, the differences in effect sizes of HbA1c of canagliflozin 100 mg and dapagliflozin 10 mg were reported as 0.33% (Janssen), 0.365% (Boehringer) and 0.36% (AG).
There appears to be a systematic difference between results of the AG NMA and the Boehringer NMA, with effects on HbA1c being higher in the latter, with the AG results being closer to the trial results. The Janssen figures are similar to the AG ones. This is shown in Table 14.
However, the relative differences between drugs are similar, and those are what matter in the modelling.
- Clinical effectiveness aspects of the submissions from the manufacturers - Canag...Clinical effectiveness aspects of the submissions from the manufacturers - Canagliflozin, dapagliflozin and empagliflozin monotherapy for treating type 2 diabetes: systematic review and economic evaluation
- GALK2 [Pteropus vampyrus]GALK2 [Pteropus vampyrus]Gene ID:105288812Gene
Your browsing activity is empty.
Activity recording is turned off.
See more...