VALIDITY OF OUTCOME MEASURES

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Entyvio (Vedolizumab) [Internet]. Ottawa (ON): Canadian Agency for Drugs and Technologies in Health; 2016 Dec.

Entyvio (Vedolizumab) [Internet].

Show details

Contents

< Prev Next >

APPENDIX 5VALIDITY OF OUTCOME MEASURES

Issues considered in this section were provided as supporting information. The information has not been systematically reviewed.

Aim

To summarize the measurement properties (e.g., reliability, validity, minimal clinically important difference [MCID]) of the following outcome measures used in the GEMINI II and GEMINI III studies:

Crohn’s Disease Activity Index (CDAI)
Inflammatory Bowel Disease Questionnaire (IBDQ)
Short Form (36) Health Survey (SF-36)
EuroQol 5-Dimensions Questionnaire (EQ-5D-3L)

Findings

1. Crohn’s Disease Activity Index

The National Cooperative Crohn’s Disease Study Group developed the CDAI using prospective data gathered from 187 visits of 112 patients suffering from Crohn’s disease.⁷⁰ It is a disease-specific index and considered as the standard for assessing Crohn’s disease activity. The CDAI consists of eight domains, which are used to evaluate overall disease severity. The overall score is based on the sum of the weighted value of each item and ranges from 0 to 600, where a score of 150 is defined as the threshold between remission and active disease. Scores ranging between 150 and 219 indicate mild to moderate Crohn’s disease, scores ranging between 220 and 450 indicate moderate to severe Crohn’s disease, whereas scores above 450 indicate very severe Crohn’s disease.^71,72 Item scores are derived using patient diaries for the seven days preceding each visit. Generally, the CDAI is considered impractical for use in clinical practice, with no clear MCID clearly defined.^72,73 Originally, changes of 50 points in the CDAI were associated with physician evaluations of “slightly better” and/or “slightly worse” compared with baseline.^70,72,73 However, clinical trials have commonly used changes of 50, 60, 70, or 100 points in CDAI defined as clinical response.⁷² More recently, the FDA and European Medicines Agency have suggested that a change of 100 points in CDAI is considered to be a more meaningful response (i.e., enhanced clinical response).⁷²

Development of the Crohn’s Disease Activity Index

Gastroenterologists considered 18 parameters to inform the CDAI, including the following Crohn’s disease domains: subjective patient symptoms and need for symptomatic medications, objective clinical findings on physical examination, extraintestinal manifestations of Crohn’s disease, complications of Crohn’s disease (e.g., fistulas), radiologic and endoscopic examinations, and laboratory parameters. A global assessment score was also assessed at each visit by the gastroenterologist based on the following scheme: “very well” = 1, “fair to good” = 3, “poor” = 5, and “very poor” = 7.

Multiple regression and backwards stepwise deletions were utilized to assess the correlation between the 18 parameters and the physician global assessment score. Based on the results of the correlations, eight independent weighted variables (weighting ranges from 1 to 30) were included in the final CDAI formula.

Table 38Final Items Included in the CDAI and Their Weights

Item (Daily Sum Per Week)	Weight
Number of liquid or very soft stools	2
Abdominal pain score in one week (rating: 0 to 3)	5
General well-being (rating: 0 to 4)	7
Sum of findings per week: Arthritis/arthralgia Mucocutaneous lesions (e.g., erythema nodosum aphthous ulcers) Iritis/uveitis Anal disease (fissure, fistula, etc.) External fistula (enterocutaneous/vesicle/vaginal, etc.) Fever > 37.8°C	20
Antidiarrheal use (e.g., diphenoxylate hydrochloride)	30
Abdominal mass (none = 0, equivocal = 2, present = 5)	10
47 minus hematocrit (males) or 42 minus hematocrit (females)	6
100 × (1 - [body weight divided by standard weight])	1

: CDAI = Crohn’s Disease Activity Index.
: Source: Best et al.⁷⁰

Reliability of the Crohn’s Disease Activity Index

Reliability was not originally assessed during the development of the CDAI; however, the index did provide good to very good test–retest reliability evaluated based on two successive visits for 32 patients.^70,71 The CDAI was subsequently re-evaluated and re-derived using data collected from 1,058 patients and demonstrated little difference compared to the original formulation; therefore, the original version was recommended.⁷⁴

Validity of the Crohn’s Disease Activity Index

Construct validity: The items included in the CDAI were selected by gastroenterologists and are based on accepted features of Crohn’s disease, thereby demonstrating construct validity.⁷¹

Content validity: The CDAI appears to be responsive as it allows detectable changes in Crohn’s disease severity to be measured (i.e., the CDAI is able to differentiate levels of Crohn’s disease severity). Additionally, the CDAI appears to be widely utilized in clinical trials and is an accepted measure by gastroenterologists as a primary end point to assess Crohn’s disease activity. In contrast, the CDAI does not appear to be reflective of Crohn’s disease activity for pediatric patients suffering from Crohn’s disease, nor does the instrument address all aspects of Crohn’s disease such as quality of life.⁷¹

Criterion validity: Selecting a gold standard measure for comparison is difficult when considering Crohn’s disease, due to the heterogeneous nature of its manifestations. Generally, the CDAI does not demonstrate any significant correlation between the overall score and objective measurements such as mucosal healing; however, the lack of correlation may not be indicative of a lack of criterion validity due to the multi-faceted nature of Crohn’s disease.⁷¹ Predictability is another component of criterion validity. One study demonstrated that the CDAI scores increased two months preceding exacerbations of Crohn’s disease and decreased one month following exacerbations of Crohn’s disease, thereby demonstrating criterion validity.⁷¹

Limitations of the Crohn’s Disease Activity Index

The CDAI scores appear to vary depending on the observer’s review despite the evaluation of the same case histories.⁷⁵ In addition, the overall CDAI score is derived based on some subjective items such as “general well-being” and “intensity of abdominal pain” based on patient perception.

2. Inflammatory Bowel Disease Questionnaire

The IBDQ, developed by Guyatt et al.,^36,37 is a physician-administered questionnaire to assess health-related quality of life (HRQoL) in patients with IBD (e.g., ulcerative colitis and Crohn’s disease).⁷⁶ It is a 32-item Likert-based questionnaire divided into four dimensions: bowel symptoms (10 items), systemic symptoms (five items), emotional function (12 items), and social function (five items). Patients are asked to recall symptoms and quality of life from the last two weeks with response graded on a 7-point Likert scale (1 being the worst situation, 7 being the best) with the total IBDQ score ranging between 32 and 224 (i.e., higher scores representing better quality of life). Scores of patients in remission typically range from 170 to 190.

This questionnaire has been validated in a variety of settings, countries, and languages.⁷⁶ A review⁷⁶ of nine validation studies on the IBDQ in patients with IBD reported that the IBDQ was able to differentiate clinically important differences between patients with disease remission and patients with disease relapse. In a randomized placebo-controlled trial on patients with ulcerative colitis, the IBDQ was found to be able discriminate changes in the social and emotional state of patients.⁷⁷ The IBDQ has high test–retest reliability in all four dimensional scores. Six studies evaluated IBDQ for sensitivity to change and all found that changes in HRQoL correlated to changes in clinical activity in patients with Crohn’s disease.⁷⁶

A study conducted by Gregor et al.⁷ noted that a clinically meaningful improvement in quality of life would be an increase ≥ 16 points in the IBDQ total score or ≥ 0.5 points per question in patients with Crohn’s disease.

3. Short Form (36) Health Survey

The SF-36 is a generic health assessment questionnaire that has been used in clinical trials to study the impact of chronic disease on HRQoL. The SF-36 consists of eight domains: Physical functioning, role physical, bodily pain, general health, vitality, social functioning, role emotional, and mental health. The SF-36 also provides two component summaries: The physical component summary (PCS) and the mental component summary (MCS), which are scores created by aggregating the eight domains. The SF-36-PCS, SF-36-MCS, and eight domains are each measured on a scale of 0 to 100, with an increase in score indicating improvement in health status. In general use of SF-36, a change of 2 to 4 points in each domain or 2 to 3 points in each component summary indicates a clinically meaningful improvement as determined by the patient.⁷⁸

Validation work reports satisfactory reliability and discriminant ability for all SF-36 dimensions in patients with ulcerative colitis. As symptoms increase, HRQoL scores are statistically significantly reduced. In a population-based cohort in which patients were studied for 10 years, SF-36 scores of patients with ulcerative colitis were found to be comparable with those of a general population sample when adjusted for age, gender, and education. A study indicated that the individual domains may present with ceiling effects in patients with less severe ulcerative colitis. Individual domain scores were also found to have less responsiveness in patients with mild ulcerative colitis, although it is unclear if this can be generalized to the broader PCS and MCS scores.⁷⁹

A study by Coteur et al.³⁸ explored MCID estimates within the Crohn’s disease patient population using data from multinational, multi-centre, double-blind, placebo-controlled parallel group clinical trials in which clinical remission of Crohn’s disease was assessed using the CDAI measure as the primary outcome. Secondary outcomes included the IBDQ and SF-36. All end points were measured at weeks 0, 6, 16, and 26 and used standardized procedures. A total of six estimates of MCID were evaluated for each SF-36 scale summary score to determine the most appropriate measure to use as the anchor: two analyses utilizing anchor-based methods and four analyses utilizing distribution-based methods. For the anchor-based estimates, a linear regression was performed using the two anchors, the CDAI and IBDQ. The MCID estimates for the SF-36 were then extracted from the regression equations using a change of 16 points for the IBDQ total score or a score change of 50 points for the CDAI score considered as meaningful. For distribution-based estimates, measures rely on the statistical distributions of HRQoL data and include effect size measures (effect sizes of 0.2 and 0.5 were used and suggested as small to moderate effect sizes), the standard error of measurement, and the standard error of the difference. Overall, the MCID for the SF-36 PCS and MCS summary scores ranged from 1.6 to 7.0 and 2.3 to 8.7, respectively, depending on the approach. Because score changes in the SF-36 showed greater correlations with score changes in the IBDQ than with the CDAI, the IBDQ was selected as the “best anchor,” with corresponding MCID values of 4.1 and 3.9, respectively. The values derived by the IBDQ anchor-based method were similar to the values obtained by the distribution-based methods and were representative of small to moderate effect sizes.

4. EuroQol 5-Dimensions Questionnaire

The EQ-5D is a generic HRQoL instrument that may be applied to a wide range of health conditions and treatments.^39,40 The first of two parts of the EQ-5D is a descriptive system that classifies respondents (aged ≥ 12 years) based on the following five dimensions: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. The EQ-5D-3L has three possible levels (1, 2, or 3) for each domain, representing “no problems,” “some problems,” and “extreme problems,” respectively. Respondents are asked to choose the level that reflects their health state for each of the five dimensions, corresponding with 243 different health states. A scoring function can be used to assign a value (EQ-5D-3L index score) to self-reported health states from a set of population-based preference weights.^39,40 The second part is a 20 cm visual analogue scale (EQ-VAS) that has end points labelled 0 and 100, with respective anchors of “worst imaginable health state” and “best imaginable health state.” Respondents are asked to rate their health by drawing a line from an anchor box to the point on the EQ-VAS which best represents their health on that day. Hence, the EQ-5D produces three types of data for each respondent:

A profile indicating the extent of problems on each of the five dimensions represented by a five-digit descriptor, such as 11121, 33211, etc.
A population preference-weighted health index score based on the descriptive system
A self-reported assessment of health status based on the EQ-VAS.

The EQ-5D index score is generated by applying a multi-attribute utility function to the descriptive system. Different utility functions are available that reflect the preferences of specific populations (e.g., US or UK). The lowest possible overall score for the 3L version (corresponding to severe problems on all five attributes) varies depending on the utility function that is applied to the descriptive system (e.g., −0.59 for the UK algorithm and −0.109 for the US algorithm). Scores < 0 represent health states that are valued by society as being worse than dead, while scores of 0 and 1.00 are assigned to the health states “dead” and “perfect health,” respectively. Reported MCIDs for the 3L version of the scale have ranged from 0.033 to 0.074.⁸⁰

Studies are emerging supporting the validity of the EQ-5D in patients with IBD, including Crohn’s disease. Both EQ-VAS and EQ-index scores were found to correlate well with disease activity indices and differed significantly between patients with active disease and remission. Test–retest reliability was high. EQ-VAS was more responsive to deterioration in health than improvement in health and tended to be more responsive than EQ-index scores.⁸¹

The study by Coteur et al.³⁸ explored MCID estimates within the Crohn’s disease patient population using data from multinational, multi-centre, double-blind, placebo-controlled parallel group clinical trials in which clinical remission of Crohn’s disease was assessed using the CDAI measure as the primary outcome. Secondary outcomes included the IBDQ and EQ-5D VAS scores. All end points were measured at weeks 0, 6, 16, and 26 and used standardized procedures. A total of six estimates of MCID were evaluated for the EQ-5D VAS score to determine the most appropriate measure to use as the anchor: Two analyses utilizing anchor-based methods and four analyses utilizing distribution-based methods. For the anchor-based estimates, a linear regression was performed using the two anchors, the CDAI and IBDQ. The MCID estimates for the EQ-5D VAS score were then extracted from the regression equations using a change of 16 points for the IBDQ total score or a score change of 50 points for the CDAI score considered as meaningful. For distribution-based estimates, measures rely on the statistical distributions of HRQoL data, and include effect size measures (effect sizes of 0.2 and 0.5 were used and suggested as small to moderate effect sizes), the standard error of measurement, and the standard error of the difference. Overall, the MCID for the EQ-5D VAS score ranged from 4.2 to 14.8, depending on the approach. Because score changes in the EQ-5D VAS score showed greater correlations with score changes in the IBDQ than with CDAI, the IBDQ was selected as the “best anchor” with a corresponding MCID value of 8.2. The values derived by the IBDQ anchor-based method were similar to the values obtained by the distribution-based methods and were representative of small to moderate effect sizes.

Table 39Summary of Outcomes Measures

Measure	Definition	Evidence of Validity	MCID	Reference
CDAI	Physician-evaluated 8-item CD-specific index used to assess CD severity	Yes	NA	Best et al.⁷⁰
IBDQ	Physician-administered 32-item questionnaire used to assess HRQoL in patients with IBD	Yes	16	Gregor et al.⁷
SF-36	Patient-reported generic QoL instrument	Yes	PCS 4.1 MCS 3.9	Coteur et al.³⁸
EQ-5D	Patient-reported generic QoL instrument	Yes	VAS 8.2	Coteur et al.³⁸

: CD = Crohn’s disease; CDAI = Crohn’s Disease Activity Index; EQ-5D = EuroQol 5-Dimensions Questionnaire; HRQoL = health-related quality of life; IBD = Inflammatory Bowel Disease; IBDQ = Inflammatory Bowel Disease Questionnaire; MCID = minimal clinically important difference; MCS = mental component summary; NA = not applicable; PCS = physical component summary; QoL = quality of life; SF-36 = Short Form (36) Health Survey; VAS = visual analogue scale.

Conclusion

The CDAI, IBDQ, SF-36, and EQ-5D have all been validated within the Crohn’s disease population. Although a definition of an MCID change in the CDAI, IBDQ, SF-36, and EQ-5D instruments has not been established, some regulatory agencies rely on a reduction of 100 points in the CDAI as meaningful change, while other studies suggest MCIDs of 16, 4.1, 3.9, and 8.2 for the IBDQ, SF-36-PCS, SF-36-MCS, and EQ-5D, respectively.

Except where otherwise noted, this work is distributed under the terms of a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International licence (CC BY-NC-ND), a copy of which is available at http://creativecommons.org/licenses/by-nc-nd/4.0/

Bookshelf ID: NBK424360

Contents

< Prev Next >

PubReader
Print View
Cite this Page
Entyvio (Vedolizumab) [Internet]. Ottawa (ON): Canadian Agency for Drugs and Technologies in Health; 2016 Dec. APPENDIX 5, VALIDITY OF OUTCOME MEASURES.

VALIDITY OF OUTCOME MEASURES - Entyvio (Vedolizumab)
VALIDITY OF OUTCOME MEASURES - Entyvio (Vedolizumab)

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Bookshelf