Internal Validity
All studies used an appropriate method of randomization using an interactive voice/Web response system with appropriate allocation concealment. Patient characteristics were generally well balanced in most studies, although in Study 3668 the baseline characteristics were not well balanced with respect to gender. With respect to blinding, only DEVOTE, SWITCH-1, and SWITCH-2, as well as Study 3944, were double blinded, and the insulins were provided in “visually identical” vials. All other studies were open labelled and therefore subject to bias, particularly for subjective outcomes such as HRQoL, which could be influenced by patient knowledge of their assigned treatment. A blinded EAC was employed in DEVOTE, and an interim analysis was prepared and submitted to the FDA.
The BEGIN studies that included an insulin comparator used a noninferiority design for testing the primary outcome (change from baseline to end of treatment in A1C), and all used the same margin for noninferiority of a change in A1C of 0.4%. The manufacturer provided a rationale for the choice of this noninferiority margin, and this rationale appeared to be reasonable. This margin is also suggested by the FDA, which considers an A1C reduction of > 0.3% to be clinically meaningful; therefore, a difference in A1C of 0.3 to 0.4% between treatments could be considered clinically significant.36 Noninferiority was also tested for the primary outcome in DEVOTE; again, the rationale of the margin for noninferiority was described and the margin appeared reasonable. The SWITCH studies employed a noninferiority design in a hierarchy in order to determine testing of the primary outcome. In both studies, noninferiority for change from baseline in A1C, a secondary end point, had to be met before the primary end point and subsequent secondary end points in the hierarchy were tested.
A hierarchical testing procedure was employed to account for type I error in all studies that included confirmatory secondary end points, and the hierarchy was adhered to. The studies that did not include confirmatory secondary end points were Studies 3770, 3668, 3943, and 3944.
In the SWITCH-1 study, a per-protocol sensitivity analysis of the primary end point, a test of noninferiority, does not appear to have been conducted, as a per-protocol population was not defined in the study. The use of a per-protocol population is a recommended approach for noninferiority trials; thus, differences between the groups could have been masked, particularly given the high withdrawal rates in these studies.
With the exception of DEVOTE, no studies conducted a true intention-to-treat analysis; however, the small number of participants excluded from the main analysis is unlikely to have biased results.
The proportion of participants withdrawing varied greatly between studies, with the highest withdrawal rates seen in the SWITCH studies, ranging between 18% and 23% between groups. The direction of bias is confounded by the crossover design; however, given the high proportion of withdrawals, it is likely that the results were affected in some way, as the composition of the original randomized population would have been altered significantly throughout the trial. Proportions of withdrawals above 20% were also seen in studies 3579 (22%) and 3580 (24%), although generally no differences in proportion of withdrawals were evident between groups. The largest difference in proportion of withdrawals was in Study 3944, with the IDeg + liraglutide group having a much lower proportion than the placebo + liraglutide group (8% versus 24%, respectively). In Study 3770, there was a numerically higher proportion of withdrawals in both IDeg groups versus IGlar (16% in each, versus 7%). A high proportion of withdrawals may understate important outcomes such as hypoglycemic events, for example; and a higher proportion of withdrawals in one group versus another may bias results in favour of the group with more withdrawals, as they have less exposure to risk of hypoglycemia. Additionally, extensive withdrawals are a concern, given the noninferiority study designs employed across the studies.
The included studies typically accounted for missing data using an LOCF approach. Sensitivity analyses were also performed and appeared to support the results of the primary analysis. The LOCF approach can introduce bias into the results, and the risk of bias would be expected to increase with higher proportions of withdrawals and when there are differential withdrawals between groups within studies; both of these phenomena were seen among the included studies. The fact that the sensitivity analyses supported the conclusions of the LOCF results does allay some concern about the use of this approach for the imputation of missing data; however, a major assumption in the sensitivity analysis is that the data were missing at random, which is rarely the case and could also bias the results. The DEVOTE study, which was an event-driven study, employed a tipping-point analysis to account for missing data due to early withdrawals.
All studies employed a treat-to-target design with respect to dosing; therefore, differences in A1C would not be expected. This approach is recommended by the FDA for assessing differences in safety, tolerability, and clinical utility when insulin dosing and efficacy are maximized. However, these studies have limited utility for evaluating treatment efficacy.
External Validity
There were numerous clinical trials included in this review, with representation across the globe. Across all the included studies, there was a relatively low proportion of Indigenous participants (< 1% across the studies). The consistent majority of participants were Caucasian, with the exception of studies conducted in Asia. The lack of representation of Indigenous populations is a generalizability issue for Canada, given the relatively high proportion of Indigenous peoples diagnosed with diabetes mellitus. The clinical expert also noted that participants had a relatively long duration of disease at baseline, but did not note any other generalizability issues.
The trials largely focused on IGlar as a comparator, with IDet a comparator in only one of the studies. These are the two most commonly used intermediate-acting and long-acting insulins; however, NPH is still a popular option in some patients due to its lower cost. Thus, at least one trial comparing IDeg with NPH may have provided some additional useful insight into the comparative efficacy and harms of IDeg.
DEVOTE had the longest follow-up of all the included studies, with a mean exposure of approximately 24 months, and was powered to assess clinical outcomes in a T2DM population. However, there were no trials that similarly assessed diabetes complications in T1DM. Such trials would likely require a much longer follow-up than those included in this review. Other included trials focused on hypoglycemia as a primary outcome (SWITCH studies); the majority of included studies (BEGIN trials) focused on A1C, a widely used surrogate marker of disease in diabetes mellitus.
Two studies had extensive run-in periods, which can suggest enrichment of the study population. In Study 3943, the run-in period was to establish that the study population was one requiring high-dose insulin (participants were included only if they failed to reach target while on high-dose IGlar); this does seem appropriate, given the study objective. In Study 3944, participants were initiated on liraglutide in the 15-week run-in, which they then continued on in the study. The purpose of the study was to assess the combination of IDeg with liraglutide versus liraglutide alone (i.e., placebo plus liraglutide) in patients on metformin. A large proportion of participants were screened out during the run-in (in most cases due to failure to reach A1C targets); this was consistent with the planned 941 participants in the run-in versus 320 that were to be randomized. The study set a relatively narrow target for A1C (7.0% to 9.0%) for inclusion in the study. The rationale was that, as it was placebo-controlled, this narrow range would reduce the risk of intensified treatment being needed during the 26-week treatment phase.
HRQoL was consistently assessed in the included studies, but not as a confirmatory (i.e., high-priority) secondary outcome and not at all in the largest study, DEVOTE. Thus, HRQoL appears to have been given a lower priority in the included studies than would be expected based on the importance that patients place on this outcome.