Description of Studies
Two trials, Study 141 and Study 701, met the inclusion criteria for this review. Study 141 was a pivotal trial.
Study 141 (N = 241) was a phase III, multi-centre, double-blind, prospective, randomized, placebo-controlled, single treatment cycle study that assessed the efficacy of aboBoNTA compared with placebo in children with dynamic equinus foot deformity associated with CP. Study 141 consisted of a screening period (day −7 to day 1), and patients received treatment on day 1 and were followed up for a minimum of 12 weeks and a maximum of 28 weeks (double-blind treatment period). All patients who had had at least 12 weeks of follow-up were considered to have completed the study. Patients were randomized into one of three treatment groups; aboBoNTA 10 U/kg, aboBoNTA 15 U/kg, or placebo, in a ratio of 1:1:1, and stratified according to age range (2 to 9 years and 10 to 17 years) and botulinum toxin (BoNT)–naive or non-naive status, as assessed at baseline. After randomization, aboBoNTA or placebo was administered by intramuscular injections into the gastrocnemius-soleus complex (GSC) of each affected lower limb. The dose of aboBoNTA administered was either 10 U/kg or 15 U/kg per affected GSC, so the total dose was 10 U/kg or 15 U/kg for unilateral injections and 20 U/kg or 30 U/kg for bilateral injections. Patients who required re-treatment at week 12, 16, 22, or 28 were offered entry into an open-label extension study (Study 147).
Study 701 (N = 52) was a phase III, multi-centre, double-blind, prospective, randomized placebo-controlled study that compared the efficacy and safety of a single administration of aboBoNTA or placebo in the treatment of pediatric dynamic equinus spasticity associated with CP. Following initial assessment of their LLS, eligible patients were randomized to receive a single treatment of either aboBoNTA (30 U/kg) or placebo. Study medication was distributed equally between both legs by injection into each of the gastrocnemius muscles of each limb. Each muscle was injected at two sites. The effects of the treatment were monitored over a minimum 16-week period. Post-treatment assessments were made at weeks 4, 8, and 16. If an investigator believed that a treatment effect was maintained at week 16, additional visits at weeks 24 and 36 were scheduled.
Interventions
In Study 141, before administration, the powder was reconstituted at the investigational site with sterile, preservative-free saline (sodium chloride for injection 0.9%). Sterile saline was added to obtain a total volume to inject per patients of 2 mL per lower limb (i.e., 2 mL for unilateral and 4 mL for bilateral injections). Two aboBoNTA doses (10 U/kg or 15 U/kg per GSC injected into the affected leg (s)) were compared with placebo. The total dose was either 10 U/kg or 15 U/kg for unilateral injections, or 20 U/kg or 30 U/kg for bilateral injections. A total dose of either 10 U/kg or 15 U/kg of aboBoNTA was injected intramuscular into the gastrocnemius muscle and soleus muscle in four injection sites per affected lower limb. The 2 mL volume of injection per lower limb was split between gastrocnemius and soleus muscles according to a ratio of 3:2. The injection volume for each site is illustrated in below. The maximum dose injected in patients was not to exceed 1,000 U or 30 U/kg, whichever was the lower value. The intramuscular injections were administered at the treatment visit into clinically indicated lower-limb muscles, using electrical stimulation or ultrasound (combined with complementary technique), in single dosing sessions. aboBoNTA was provided by the manufacturer as a white lyophilized powder in type I, 3 mL glass vials. Placebo was provided by the manufacturer in type I, 3.0 mL glass vials and was indistinguishable from aboBoNTA. The placebo contained only the excipients described for aboBoNTA. To maintain the blind, an independent reconstitutor prepared the study treatment in the syringes.
Injection Volume in Gastrocnemius-Soleus Complex per Leg Without Hamstring Injections for Study 141.
Concomitant use of anticholinergic drugs and concomitant treatment with dantrolene, tizanidine, or a gamma-aminobutyric acidergic (GABAergic) opioid or other anti-spasticity drug, including baclofen and benzodiazepines, were permitted during this study if the dosage had been stable for the four weeks prior to study treatment and was expected to remain at this stable dose throughout the study. Physiotherapy and the use of casts and orthoses were also permitted if they had been initiated prior to study entry (at least four weeks prior in the case of physiotherapy). In addition, both physiotherapy and the use of casts or orthoses had to continue at the same pre-study frequency and intensity until at least week 12. No new casts or orthoses were to be initiated until week 12, and no new physiotherapy was to be initiated less than four weeks prior to study entry or during the course of study up to the week 12 visit. The following was not permitting during the study:
the administration of BoNT into any site of the body other than the lower limb
use of any investigational new drug or device or off-label use of any drug
treatment with any drug that interfered either directly or indirectly with neuromuscular function (e.g., aminoglycoside antibiotics)
use of neuroblocking drugs, such as those used during surgery (e.g., curare).
In Study 701, patients were randomized to one of two treatment groups, receiving either aboBoNTA (30 U/kg) or placebo. Study medication was prepared to a final volume of 2 mL and was administered equally into two sites in the gastrocnemius muscle in both legs (0.5 mL per site). aboBoNTA was presented as a freeze-dried white pellet containing 500 units of Clostridium BoNTA–hemagglutinin complex together with 125 mcg of human albumin and 2.5 mg of lactose in a clear glass vial. Matching placebo supplies were presented in identical clear glass vials containing 125 mcg of human albumin and 2.5 mg of lactose. Blinding was achieved by supplying the study medication for each patient in identical patient packs. The use of BoNT during the study or during the nine months preceding the study was prohibited. Any oral anti-spasticity medication being taken prior to the study was to be continued at the same dose throughout the study period. Other concomitant medications were allowed at the discretion of the investigator. Regular physiotherapy and the use of walking aids and orthoses were also permitted to continue during the study. If orthoses were changed at entry, it was recommended that the baseline assessments be delayed until the patient had stabilized. Nine patients (35%) in the placebo treatment group and seven patients (27%) in the aboBoNTA treatment group were taking concomitant medications at entry. Antiepileptics and psychoanaleptics were the most frequently used concomitant medications.
Outcomes
In Study 141, the primary outcome was the change from baseline to week 4 in the Modified Ashworth Scale (MAS) score in the GSC at the ankle joint of the (most) affected lower limb. The first secondary outcome was Physician’s Global Assessment (PGA) at week 4. The second secondary outcome was goal attainment scaling (GAS) at week 4. Tertiary outcomes included:
mean change from baseline to week 12 (and at end of study [EOS] or early withdrawal [EW]) in the MAS score in the GSC at the ankle joint of the (most) affected lower limb
proportion of patients with at least one grade reduction in MAS score from baseline to week 4 (and to week 12 and EOS/EW) in the GSC at the ankle joint of the (most) affected lower limb
mean PGA score at week 12 (and EOS/EW)
mean GAS score at week 12 (and EOS/EW)
▬▬▬▬▬
▬▬▬▬▬
▬▬▬▬▬
▬▬▬▬▬
If relevant, the aforementioned tertiary efficacy end points were also assessed at week 16, week 22, and week 28.
In Study 141, effort was made in each centre to ensure that the same evaluating investigator assessed the same patients for the duration of the study. All investigators were trained in the use of the assessment scales prior to the start of the study in an attempt to minimize variability between centres. They were also given follow-up training during the study. The assessor who conducted the PGA of treatment response was different from the person who evaluated the MAS. When making their assessment, none of the assessors had knowledge of the scores obtained by the other assessor.
In Study 701, the primary efficacy variable was functional change, as assessed by the change from baseline in overall GMFM score without walking aids or orthoses at week 4 of the study. Secondary efficacy outcomes were: change in GMFM overall score at weeks 8 and 16 compared with baseline, change in GMFM total score at weeks 4, 8, and 16 compared with baseline, change in VGA scores at weeks 4 and 16 compared with baseline, change in Leeds Functional Mobility Questionnaire (FMQ) at weeks 4 and 16 compared with baseline, and subjective functional assessment of gait at weeks 4, 8, and 16.
Please refer to Appendix 5 for more information on the validity of the outcome measures described in this section.
Modified Ashworth Scale
The MAS is commonly used to measure increased muscle tone and spasticity due to different pathologies and neurologic conditions.31 The MAS was derived from the original Ashworth Scale to measure muscle resistance while moving the affected joint through its full range of movement in order to passively stretch the muscle.31 It provides a semiquantitative measure of this resistance to passive movement.32,33 The MAS is easy to use as it requires no additional equipment; hence, it is one of the most commonly used tools to measure spasticity and muscle rigidity in patients with CP34 or hypertonia.35 It is administered by a physician or therapist during the patient visit and comprises a six-point scale used to measure the degree of spasticity (intensity of muscle tone) as follows: 0 = no increase in muscle tone; 1 = slight increase in muscle tone, manifested by a catch and release or by minimal resistance at the end range of motion when the affected part(s) is moved in flexion or extension; 1+ = slight increase in muscle tone, manifested by a catch, followed by minimal resistance throughout the remainder (less than half) of the range of movement; 2 = more marked increase in muscle tone through most of the range of movement, but the affected part(s) is easily moved; 3 = considerable increase in muscle tone, passive movement is difficult; 4 = affected part(s) rigid in flexion or extension. 25,35,36 The MAS score is normally a categorical variable; however, for this review, it was treated as a continuous variable and, hence, needed to be transformed. The derived MAS scores that were used in this review were 0, 1, 2, 3, 4, and 5, which corresponded to the aforementioned original MAS scores of 0, 1, 1+, 2, 3, and 4 (as previously described), respectively.25 A higher MAS score indicates increased muscle tone, rigidity, or spasticity. There is no evidence of the validity of the MAS in children with spasticity and there is conflicting evidence on reliability. In Study 141, all investigators were trained in the use of the assessment scales prior to the start of the study in an attempt to minimize variability between centres. They were also given follow-up training during the study. One of the clinical experts consulted for this review indicated that a one-point difference in the MAS (in either direction) was clinically relevant; however, no peer-reviewed evidence was identified regarding a minimal clinically important difference (MCID) for the MAS in pediatric patients with LLS. The other clinical expert consulted for this review indicated that defining an MCID is challenging for the MAS, but considers that a clinically important change in a single patient must be at least a one-point change due to the nature of the MAS. However, based upon his clinical experience, a change between–treatment groups as low as 0.38 would be considered clinically significant when related to a group of patients receiving treatment.
Physician’s Global Assessment
In the pivotal study of this submission, the PGA of treatment response was conducted by the investigator by scoring responses to the question: “How would you rate the response to treatment in the patient’s lower limb(s) since the last injection?” on a nine-point categorical scale where −4 = markedly worse, −3 = much worse, −2 = worse, −1 = slightly worse, 0 = no change, +1 = slightly improved, +2 = improved, +3 = much improved, and +4 = markedly improved. Assessment of the PGA was undertaken independently by an investigator who was different from the one who assessed the MAS.22 No literature was identified regarding the psychometric properties (validity, reliability, or responsiveness) of the PGA for pediatric patients with LLS. In addition, no MCID for the PGA in this population has been identified.
Goal Attainment Scaling
GAS is a method of integrating the achievement of a number of individually set goals into a single goal attainment score.37 It has been applied in various areas of complex interventions, including spasticity management.38 Before the treatment, one or more individual goals are established by the patient (or their caregiver, if the patient is a child)39 and one or more researchers or practitioners (or others agreed upon by the practitioner). The clinician/researcher requires sufficient knowledge and experience when supporting patients to set realistic goals. In addition, they must be able to respect the patient’s ideology and what is important to them when setting goals (and thus able to avoid projecting their own goals and what they perceive to be important onto the patient) and they must have good negotiating skills in order to manage potentially unrealistic goals set by the patient.37 The number of goals can vary between patients in the same study and between patients in different studies. Numerical values ranging from −2 to +2 (a five-point scale) are used to describe the degree to which the goal(s) were or were not met.37 The expected target of achievement is set by the patient and treating team and given a value of 0. Outcomes that are less than expected are given values of −1 or −2 (the most unfavourable outcome) and outcomes that are better than expected are given values of +1 or +2 (the most favourable outcome). The originators of the GAS score transformed it to a standard variable (the T score), with scores ranging from 0 to 100, a mean of 50, and a standard deviation of 10. A change in the GAS T score of more than 10 appeared clinically important in adult patients with upper-limb spasticity (ULS) who had suffered diffuse brain injury or stroke or who had been diagnosed with multiple sclerosis and had been classified as responders (positive clinical outcome associated with BoNT treatment as identified by the treating physician) and nonresponders (negative or non-significant clinical outcome associated with BoNT treatment as identified by the treating physician).40 However, no validity or reliability studies have been conducted in children and, as a result, it is unclear if the psychometric properties observed in adults (particularly the responsiveness with GAS) apply to children. No MCID was identified for the GAS score in pediatric patients with LLS.
Tardieu Scale
The TS was developed by Tardieu et al. in 1954 to clinically measure spasticity by measuring the different angles of reaction when passing the muscle through stretches at different predefined velocities.41,42 This outcome measure was developed to more closely align with the 1979 Lance definition of spasticity, specifically, a “motor disorder characterized by a velocity-dependent increase in tonic stretch reflexes (muscle tone), with exaggerated tendon jerks, resulting from hyperexcitability of the stretch reflex, as one component of the upper motor neuron syndrome.”41 Spasticity is thus rated by examining the reaction difference of the muscle in question between the slowest and fastest stretch speed, both of which are performed by the same practitioner at the same time of day with the muscle being in the same resting position.41 The slow stretch assesses the passive range of motion and is slow enough to avoid producing a significant stretch reflex. The stretch at the fastest velocity is performed to maximize the involvement of the stretch reflex, thus producing a catch-and-release sensation (also termed clonus) that is dependent on the amount of spasticity present.41 Two parameters are used to measure the muscle spasticity, namely the spasticity angle X (which is the difference between slow-speed angle of arrest [V1] and the clonus or catch-and-release angle at the highest speed [V3]) and the spasticity grade Y (the grading of the intensity of the muscle reaction to the fastest stretch [V3] and is an ordinal variable). Larger spasticity angles correspond to more spasticity in the muscle. The spasticity is graded as follows: grade 0 = absence of spasticity as defined by a catch that is not followed by a release; grade 1 = passive movement is slowed down by mild resistance; grade 2 = passive movement (the catch and release) is transiently interrupted, grades 3 and 4 = severe spasticity; and non-ratable = a catch that is not followed by an obvious release occurring at inconsistent angles.41 Training has been shown to enhance the reliability of the TS, particularly in the angle of catch at fast speed (XV3), in all muscles except the knee flexors.41 In Study 141, all investigators were trained in the use of the assessment scales prior to the start of the study in an attempt to minimize variability between centres. They were also given follow-up training during the study. No MCID was identified in the literature with regard to pediatric patients with LLS. The TS at the ankle joint of the most affected lower limb was reported in Study 141.
Observational Gait Scale
The OGS is an objective outcome measure used to document gait changes (or impairments) of the upper motor syndrome in young children who have received injections of BoNT.22,43 It was derived from the Physician Rating Scale by expanding the scale from six to eight sections, including putting more emphasis on the knee-to-foot relationship during the standing phase. The gait parameter sections that make up the OGS include knee position in mid stance, initial foot contact, foot contact mid stance, timing of heel rise, hindfoot at mid stance, base of support, gait assistive devices, and change. The maximum score is 22 for each leg, which denotes a normal gait. In older children, the standard of assessing gait includes instrumented three-dimensional gait analysis; however, this is not always appropriate for children due their potential to be uncooperative and their small size.43 The child is recorded while walking and the investigator (e.g., someone with extensive knowledge of gait analysis) looks at the video in order to score each component.43 The OGS is a validated and reliable instrument to assess response to treatment for pediatric patients with spasticity. No MCID was identified in the literature regarding pediatric patients with LLS. The OGS in the most affected leg was reported in Study 141.
Pediatric Quality of Life Inventory Version 4.0 Generic Core Scales
The original Pediatric Quality of Life Inventory (PedsQL) was developed as a health-related quality-of-life (HRQoL) measure that addressed the paucity of appropriately validated and reliable instruments incorporating both the child and parental experience with chronic health conditions. The PedsQL uses a modular approach and incorporates generic and disease- and symptom-specific items that are appropriate for the assessment of pediatric chronic conditions.44 The PedsQL 4.0 Generic Core Scales comprise 23 items under the following modules: Physical Functioning (eight items), Emotional Functioning (five items), Social Functioning (five items), and School Functioning (five items).45 The Generic Core Scales comprise both a parent-proxy report and a child self-report that assess health perceptions. The child self-report format is specifically for three age groups: five to seven, eight to 12, and 13 to 18 years of age, while the corresponding parent-proxy reports are specifically for toddlers (ages two to four, for which there is no child self-assessment report), young children (ages five to seven), children (ages eight to 12), and adolescents (ages 13 to 18). The questions ask how much of a problem each item has been in the past month. A fivepoint Likert response scale is used across the child reports (from ages eight to 18) and the corresponding parent report and includes the following responses with corresponding scores: 0 = never a problem; 1 = almost never a problem; 2 = sometimes a problem; 3 = often a problem; and 4 = almost always a problem. In addition, a three-point scale is used for simplification and ease of use for children aged five to seven years (0 = not at all a problem; 2 = sometimes a problem; and 4 = a lot of a problem), with each of the response choices on the scale anchored to a happy, neutral, or sad face.45 The scores, which are reversed scored, are transformed linearly to a 0 to 100 scale, whereby 0 = 100, 1 = 75, 2 = 50, 3 = 25, and 4 = 0, with higher scores indicative of a higher HRQoL. The PedsQL Generic Core Scales have been validated and determined to be reliable and responsive in pediatric patients with chronic conditions. However, whether validity and responsiveness of the PedsQL holds true in pediatric patients with LLS is unknown, as the PedsQL has never been evaluated in this population and currently no known MCID exists for the PedsQL in pediatric patients with LLS.
Faces Pain Scale — Revised
The Faces Pain Scale (FPS) and Faces Pain Scale — Revised (FPS-R) were developed to measure pain in pediatric patients.46,47 Bieri et al.46 developed the FPS using a five-phase approach, with each phase helping lead to the development of the seven-faces (seven items) scale construct. The final phase examined the test–retest reliability and subsequently showed that a rank correlation coefficient of 0.79 was obtained when six-year-old children rated a painful experience over a two-week time period.46 Hicks et al.47 undertook the revising of the original FPS, as the seven-point version was not easily rescaled to either a 0 to 5 or 0 to 10 metric. Instead, they adapted the FPS to a six-face scale, with corresponding scoring of 0, 2, 4, 6, 8, and 10 (or 0, 1, 2, 3, 4, 5); a higher score indicates more pain.47 The clinical expert consulted for this review explained that pain is not normally associated with spasticity; therefore, the relevance of this outcome measure remains under question. No literature was identified regarding the psychometric properties (validity, reliability, or responsiveness) of the FPS for pediatric patients with LLS. In addition, no MCID for the FPS-R in this population has been identified.
Gross Motor Function Measure
The GMFM (and, subsequently, the 88-item GMFM [GMFM-88]48,49) is an outcome measure used to evaluate change in gross motor function over time in children with varying degrees of CP.50 The 85 items that made up the original GMFM (and the subsequent five additional items included in the GMFM-8848,49) were chosen because they were the items most likely to show change in patients with CP. Individuals items were combined into five separate areas of motor function to facilitate scoring. These dimensions include: A = lying and rolling; B = sitting; C = crawling and kneeling; D = standing; and E = walking, running, and jumping.48,50 Each individual item is scored on a four-point Likert scale (0 to 3), with assignments as follows: 0 = cannot do; 1 = initiates (< 10% of the task is completed); 2 = partially completes (10% to < 100% of the task); and 3 = task completion (100% of the task). Each dimension contributes equal weight; therefore, dimension scores are calculated using the following formula: child’s score ÷ maximum score × 100%. The total score is then obtained by adding up all of the dimension scores (per cent) and then dividing them by the total number of dimensions (five dimensions). To increase responsiveness, and if the therapist identifies specific goals, a goal-score total can also be calculated (using the same aforementioned algorithm for obtaining the total scores; however, this time, by dividing by the dimensions that were part of the goal setting).48–50 It should be noted that the GMFM (and GMFM-88) only assesses how much of the task the child can perform (quantity) and does not measure how well the task is performed (quality).49 The GMFM is a validated instrument to assess response to treatment for pediatric patients with spasticity. No MCID was identified in the literature for the GMFM-88 with regard to pediatric patients with LLS. In Study 701, the total score was named the overall score.
Leeds Functional Mobility Questionnaire
In Study 701,51 the investigators used the Leeds FMQ, a 50-item questionnaire that was developed to identify and assess changes in the patient’s ability to manage everyday activities that are typically impaired in patients with LLS. It is administered as a structured interview with the patient’s parents and was administered at 0, 4, and 16 weeks post–aboBoNTA treatment. It is subdivided into three separate domains: sitting and standing, mobility, and other activities.51 There is no overall score for this rating instrument and each question is summarized and analyzed separately. Categorical data are generated from each question and assess the degree of difficulty when performing certain activities. A lower score indicates improved function. The Leeds FMQ was developed by the Regional Child Development Centre at St James’s University Hospital in Leeds, UK; however, it is still in the process of development.51 Hence, there has been no literature identified regarding its psychometric properties (validity, reliability, or responsiveness) for pediatric patients with LLS. In addition, no MCID for the Leeds FMQ in this population has been identified.
Leeds Videographic Gait Assessment
In Study 701,51 the investigators used the Leeds VGA to observe patient gait, viewed in both the sagittal and coronal planes. It was developed by the Leeds Regional Child Development Centre at St James’s University Hospital in Leeds, UK.51 In the study, patients walked along a walkway both with and without their normal splints and footwear at weeks 0, 4, and 16. The video clips were blinded and randomized to be reviewed by a panel of clinicians and physiotherapists who had experience in the management of children with walking difficulties associated with muscle spasticity. A standard score sheet was used to rate the following parameters, with each leg scored separately: initial foot contact, degree of knee flexion, presence/absence of rocker-bottom foot, hindfoot deformity (presence of valgus or varus), and walking aids used.51 No literature was identified regarding the psychometric properties (validity, reliability, or responsiveness) of the Leeds VGA for pediatric patients with LLS. In addition, no MCID for the Leeds VGA in this population has been identified.
Subjective Functional Assessment of Gait
This subjective functional assessment of gait was used by both the parent and investigator at each post-treatment visit to assess functional changes in response to treatment with aboBoNTA.51 Specifically, there is the parent’s and investigator’s opinion (scored separately) on the child’s functional changes, with the choices being presented as follows: good response; minimal response; no response; worse response; and not recorded.51 No literature was identified regarding any psychometric properties (validity, reliability, or responsiveness) of the subjective functional assessment of gait for pediatric patients with LLS. In addition, no MCID for the outcome measure in this population has been identified.
Harms
Adverse events (i.e., treatment-emergent adverse events [TEAEs], serious adverse events [SAEs], withdrawal due to adverse events [WDAEs], and notable adverse events [i.e., adverse events of special interest in this review]) were reported in both randomized controlled trials (RCTs).
In Study 141, a TEAE was defined as any adverse event that occurs during the treatment phase of the study if it: was not present prior to receiving the first intake of study medication; was present prior to receiving the first intake of study medication but the intensity increased during the treatment phase of the study; or it was present prior to receiving the first intake of study medication and the intensity was the same as it was prior to the first intake of study medication, however, during the active phase of the study the adverse event was related to the medication intake. An SAE was defined as any adverse event that is life-threatening or resulted in death, patient hospitalization, or prolongation of an existing hospitalization, a persistent or significant disability or incapacity, or a congenital anomaly or birth defect in the offspring of a patient who received the study treatment. An SAE was also defined as an important medical event that, based on appropriate medical judgment, may jeopardize the patient and may require medical and/or surgical intervention to prevent one of the outcomes listed previously.
In Study 701, an adverse event included any noxious, pathologic, or unintended change in anatomical, physiologic, or metabolic functions as indicated by physical signs, symptoms, and/or laboratory changes occurring in any phase of the clinical trial, whether associated with a drug or placebo and whether or not considered drug-related. An SAE was defined as any event that is fatal; life-threatening; permanently or temporarily disabling or incapacitating or results in hospitalization; or prolongs a hospital stay or is associated with congenital abnormality, cancer, or overdose (either accidental or intentional). In addition, any event the investigator regards as serious, or which would suggest any significant hazard, contraindication, side effect, or precaution that may be associated with the use of the drug, should be reported as a serious event.
Statistical Analysis
Study 141
In Study 141, the primary (MAS) and first secondary (PGA) efficacy end points were taken into account in the sample size calculation. The power used for the sample size calculations was equal to 85% for the primary efficacy end point and equal to 90% for the first secondary efficacy end point. The sample size needed per group was calculated for each end point separately, and then the larger one was retained. A total of 228 randomized patients (i.e., 76 randomized patients per treatment group) were necessary to demonstrate a statistically significant treatment effect on the primary efficacy end point with a type I error rate controlled at level 0.05 and a power of 85%, assuming: mean changes from baseline to week 4 in the MAS score of −1.3 and −0.9 in the aboBoNTA and placebo groups, respectively, a common standard deviation for the change from baseline to week 4 in the MAS score of 0.8, and a 3% dropout rate from baseline to week 4. A total of 165 randomized patients (i.e., 55 patients per treatment group) were necessary to demonstrate a statistically significant treatment effect on the mean PGA score with a two-sided comparison-wise type I error rate controlled at 0.05 and a power of 90%, assuming a between-group mean score difference of the PGA at week 4 of 0.7, a common standard deviation of the PGA score at week 4 of 1.1, and a 3% dropout rate at week 4. A targeted total sample size of 228 randomized patients (i.e., 76 randomized patients per treatment group) was considered sufficient to detect a treatment effect on both the primary and first secondary efficacy end points. Using a sample size of 228 as the larger of the two required figures meant the actual power for the PGA score comparison rose to 97%. The rationale for the above threshold for the MAS score was based on a previous clinical trial conducted in children with CP for lower extremity spasticity.52 That study assessed three different doses of onaBoNTA (low-dose group [1 U/kg], middle-dose group [3 U/kg], and high-dose group [5 U/kg]). The rationale for the aforementioned threshold or PGA was based on a previous clinical trial for ULS conducted in adult patients after a stroke.53
Two different statistical methodologies for the efficacy analyses were applied for the registrations in the US and non-US countries. Only the non-US approach and data are presented for the purposes of this review.
In non-US countries, superiority was based on the primary efficacy end point only. In order to control the family-wise type I error rate, the following two-step hierarchical testing procedure was applied for the testing of the superiority of each of the two aboBoNTA doses to placebo, where the superiority of aboBoNTA 15 U/kg to placebo for MAS at four weeks (primary efficacy outcome) was tested at a significance level of 0.05. If the P value associated with that testing was lower than 0.05, then the superiority of aboBoNTA 10 U/kg to placebo for MAS at four weeks (primary efficacy outcome) was tested at a significance level of 0.05. If the P value associated with that testing was lower than 0.05, it was then considered significant. In the event the hierarchical testing procedure was stopped at the end of step 1, the testing of the superiority of aboBoNTA 10 U/kg to placebo on the primary efficacy end point was performed to characterize the full clinical effect, but no formal statistical conclusion was drawn. Each of the two secondary efficacy end points was analyzed to compare each aboBoNTA dose with placebo at a 0.05 type I error rate.
Each tertiary efficacy end point was analyzed for exploratory purposes only to compare each aboBoNTA dose with placebo. No adjustment for multiplicity was completed for these analyses.
The primary efficacy analysis consisted of two contrast analyses within a single analysis of covariance (ANCOVA) model controlling for the baseline MAS score and the randomization stratification factors (age range and BoNT-naive or non-naive status as assessed at baseline) and the centre, all as fixed effects. The least squares (LS) means and the associated 95% CIs were calculated for the aboBoNTA and placebo groups, plus the differences in the LS means between these groups and the associated P values. The first secondary efficacy end point (mean PGA score at week 4) and the second secondary efficacy end point (mean GAS score at week 4) were analyzed using an analysis of variance (ANOVA) model, controlling for the randomization stratification factors (age range and BoNT treatment status) at baseline and at the centre, all as fixed effects. The LS means and the associated 95% CIs were calculated for the aboBoNTA and placebo groups, as were the differences in the LS means between these groups and the associated P values. For each of the tertiary end points, summary tables of raw values and change from baseline were provided at each visit. In ANOVA or analysis of covariance (ANCOVA) models, the LS means and the associated 95% CIs were calculated for the aboBoNTA and placebo groups, as were the differences in the LS means between these groups and the associated P values. The odds ratios and their 95% CIs were calculated from a logistic regression. To assess the impact of missing efficacy data at week 4, sensitivity analyses were performed with missing data imputed with baseline values (primary end point) or with the “markedly better” or “markedly worse” data (sensitivity analysis of the first secondary end point PGA data).
Subgroup analyses were performed on the primary and secondary efficacy end points in the intention-to-treat (ITT) population by BoNT status (naive or non-naive) at baseline. In the protocol for this review, a subgroup analysis by baseline severity of spasticity was identified; however, such analyses were not conducted.
Study 701
In Study 701, a sample size of 50 patients (25 patients per treatment group) was planned to provide 90% power to detect a clinically significant between-group difference of 10% in the overall GMFM score at the 0.05 significance level, allowing for a dropout rate of 5% to 10%. No rationale was provided on how the clinically significant between-group difference of 10% was selected. The primary efficacy end points in GMFM scores were analyzed using ANCOVA. For all other efficacy variables, analysis was performed using logistic regression. Centre, strata, and baseline scores were included in the model, as appropriate. No adjustments for multiplicity were performed. Missing data were imputed using the last observation carried forward (LOCF). No subgroup analysis was conducted in Study 701.
Analysis Populations
In Study 141, efficacy analyses were performed using the ITT population, which included all randomized patients who received at least one injection of study treatment and who had a MAS score in the GSC assessed both at baseline and at week 4. The per-protocol (PP) population was defined as all patients in the ITT population who did not have major protocol violations between baseline and week 4, inclusive. The safety population was defined as all randomized patients who received at least one injection of study treatment. The ITT population should be considered modified ITT given that appropriate ITT population would include all randomized patients regardless if they received treatment or had assessment after receiving treatments.
In Study 701, all safety and efficacy analyses were performed using the all-patients-treated (APT) population, which comprised all patients randomized to the study who received some study medication. The PP population comprised all patients in the APT population who did not have major protocol violations.