Internal Validity
Severity of AD and improvements in AD were measured with ISGA score by treating physicians or investigators. The validity of this score (range 0 to 4) for measuring mild-to-moderate severity, and judgment regarding what constitutes an improvement remain unknown, as did the MCID of the treatment effect as measured by the proportion of patients with change from baseline to day 29 in the score. Of note, there were remarkable vehicle (placebo) effects observed in both trials. There were 25% and 18%, respectively, in the two trials of patients who experienced success at the primary end point over the 29 days of the study period, which by definition was clear or almost clear with at least a 2-grade improvement in ISGA score. Also, as per protocol, no other concomitant therapies including TCS were allowed and protocol violation was low (3%). This strong placebo effect could reflect the natural variation of disease symptoms over time, or it could reflect patients’ response to the emollient effect of the vehicle itself, and emollients are used in managing mild AD. As a consequence, it could be highly unreliable to judge the difference of, for example, 1 to 2 or 3 to 2 in severity based on a score that is mostly a reflection of symptoms in an affected area. This is particularly problematic in assessing the treatment effect without considering the degree of change from baseline to day 29 in the grade of improvement. Therefore, a clear or almost clear with at least 2-grade improvement from baseline as the primary end point would be less prone to subjective bias comparing to a judgment on clear or almost clear alone, as one grade of difference would be difficult to discern (e.g., from 1 to 0).
The subjectivity is also an issue when patients suffered drug-related adverse effect, such as a higher incidence of application site pain (6.2% and 2.7% versus 1.2%, crisaborole versus vehicle, respectively, in both studies) than the vehicle group which might have led to the assessing physician becoming aware of the treatment assignment, and thus biased the estimate of ISGA score in favour of the study drug. Such situation may also apply to other drug-related adverse effects such as upper respiratory tract infection, pyrexia, nasopharyngitis, vomiting and nasal congestion, in which the incidence of the events was consistently higher in the crisaborole arm than the vehicle arm.
The manufacturer conducted calculations to determine the appropriate sample size for each of the included studies, and appeared to meet the minimum targets for enrolment in each group. The calculations were based upon assumptions of treatment success of 20% with crisaborole and 10% with vehicle, and it is not clear upon what basis these assumptions were made.
The manufacturer appeared to make adjustments for multiple statistical comparisons tests on the two secondary end points of ISGA success by applying a hierarchical testing procedure. Important outcomes such as HRQoL, assessed by the DLQI, and symptoms were not part of the statistical hierarchy and were not statistically assessed in either included study. Due to those random variability and possible subjective bias, the potential benefit of crisaborole on HRQoL improvement remains uncertain. AD can have a significant effect on quality of life, both in adults and children, thus this represents a significant gap in knowledge about crisaborole.
Subgroup analyses were planned and reported. No testing of heterogeneity on-treatment effect was reported. Additionally, there was no subgroup analysis reported based on baseline AD severity, although a post hoc analysis was provided to CADTH after a request of the manufacturer. Such subgroup analysis could be useful in revealing possible differential treatment effect by severity. In particular, it would help demonstrate the major driver (mild versus moderate AD) of treatment effect in the overall population.
The manufacturer has requested that crisaborole be reimbursed for patients who have failed or are intolerant to TCS, yet no subgroup analyses were provided assessing whether responses differed based on these subpopulations.
The included studies were both double-blinded; this was accomplished by the use of vehicle control. There were numerically more withdrawals in the vehicle-treated group than in the crisaborole group, and the difference seems to have been largely accounted for by withdrawals by parent or guardian. It is not clear whether this difference is an indication of a problem with the blinding. Unlike a typical matched control for an orally administered drug, a topically applied vehicle needs to be matched in all respects, including texture and odour, presenting additional challenges. It is not clear whether the manufacturer matched vehicle in all these respects. Additionally, there was a numerical increase in topical AEs with crisaborole such as application site pain, and this might have also made it difficult to maintain blinding in these patients.
The manufacturer used a MCMC method to impute missing data, and this method assumes data are missing at random. The fact that more patients withdrew in the vehicle group than in the crisaborole group in both studies, and that most of this difference was accounted for by lack of efficacy might indicate that an assumption of data missing at random might not be appropriate. Of note, the FDA statistical reviewer conducted a sensitivity analysis assuming all missing values constituted treatment failures and this did not change the overall conclusions on the relative effect of crisaborole versus vehicle on the primary end point.18 It is likely that missing data on the primary end point due to withdrawal was small and therefore did not have a substantial impact on the findings.
The use of concomitant TCS was prohibited in the included studies; however, there were a small percentage of patients in each of the included studies that were reported as using TCS during the study. These represent a protocol violation and potentially an important one as TCS are considered the standard therapy for treating AD. The proportion of patients using TCS was small, however; 3% in each group in AD-301 and 3% with crisaborole and 6% with vehicle in AD-302, and this may have mitigated the impact on the overall analysis.
External Validity
The populations in both studies were generalizable to the Canadian population that might use crisaborole with respect to age and sex, according to the clinical experts consulted for this review. However, both studies were conducted entirely in the US, and there was a relatively high proportion of African-Americans enrolled in each study, compared with the proportion one would expect to see in Canada. The clinical experts also noted that instruments relying on visual assessment of AD may be less reliable in patients with darker skin as changes in colour and other morphology may be more challenging to detect. The clinical experts also noted the relatively high proportion of responders in the vehicle-treated group, particularly in AD-301 (treatment success in 25% of patients), and speculated as to whether this high vehicle response might have been at least in part due to difficulties in assessing patients with more pigmented skin tones.
The primary outcome in the included studies was based on the ISGA, which relies on investigator assessment, in this case looking for responses of clear or almost clear on AD lesions. Thus this is a subjective measure; however, it is widely accepted as a standard instrument for assessment of AD in clinical trials according to the clinical experts consulted on this review. Yet, this ISGA score was not used in routine clinical practice as a measure to judge the improvement in treatment effect. This would make it uncertain whether the observed treatment effect as measured by this score could be readily interpretable to and meaningful in a real-world setting. The time frame for detecting an improvement appears short at 29 days; however, the clinical experts also believed this to be adequate for detecting a response to therapy.
There was no active comparator in either of the included studies. The two most relevant comparators would be the TCI and the TCS. The manufacturer-requested reimbursement criteria includes patients who are intolerant to or have failed TCS, thus, the efficacy and harms of crisaborole relative to TCS are unknown. TCI are typically considered to be an alternative to TCS in these unresponsive or intolerant patients, or in patients who wish to avoid using TCS due to their side effects, thus the lack of comparative data versus the TCI is a significant gap in knowledge.
Crisaborole employs a unique mechanism of action for topically applied drugs and is first in its class; therefore it is important to have long-term safety data regarding this drug. The included studies double-blind phase ended after 29 days, therefore this is not of sufficient duration to determine if there are any long-term safety issues associated with the use of this drug. AD is a chronic condition and patients would be expected to use a drug like crisaborole for long periods of time. There is data available from a longer-term extension; however, there is no longer a control group in this phase of the study.