Validity of Outcome Measures

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Clinical Review Report: Insulin Degludec (Tresiba): (Novo Nordisk Canada Inc): Indication: For once-daily treatment of adults with diabetes mellitus to improve glycemic control [Internet]. Ottawa (ON): Canadian Agency for Drugs and Technologies in Health; 2017 Dec.

Cover of Clinical Review Report: Insulin Degludec (Tresiba)

Clinical Review Report: Insulin Degludec (Tresiba): (Novo Nordisk Canada Inc): Indication: For once-daily treatment of adults with diabetes mellitus to improve glycemic control [Internet].

Show details

Contents

< Prev Next >

Appendix 4Validity of Outcome Measures

Aim

To summarize the validity of the following outcome measures:

Short Form (36) Health Survey (SF-36)
Treatment-Related Impact Measure — Diabetes (TRIM-D)
Treatment-Related Impact Measure — Hypoglycemic Events (TRIM-HYPO)

Findings

A focused literature search was conducted to identify the psychometric properties and minimal clinically important difference of each of the stated outcome measures. Table 38 summarizes the findings.

Table 38Validity and Minimal Clinically Important Difference of Outcome Measures

Instrument	Type	Evidence of Validity	MCID	References
SF-36v2	Generic tool to measure multidimensional health concepts and to capture a full range of health states	Yes	MID benchmarks are 1-point changes in SF-36v2 scores in diabetes General (non–disease-specific) MID: 2 points in PCS; 3 points in MCS; 2 to 4 points for individual dimensions	Bjorner 2013³⁷ SF-36v2 User’s manual²⁶
TRIM-D	Disease-specific tool designed to evaluate the impact of treatment in both T1DM and T2DM	Yes	No MCID	Brod 2009²⁸ Brod 2011²⁷
TRIM-HYPO	Disease-specific tool designed to evaluate the impact of treatment-related non-severe hypoglycemia in both T1DM and T2DM	Yes	No MCID	Brod 2015²⁹

: MCS = mental component summary; MCID = minimal clinically important difference; MID = minimal important difference; PCS = physical component summary; SF-36v2 = Short Form 36 Health Survey, version 2.0; T1DM = type 1 diabetes mellitus; T2DM = type 2 diabetes mellitus; TRIM-D = Treatment-Related Impact Measure — Diabetes; TRIM-HYPO = Treatment-Related Impact Measure — Hypoglycemic Events.

Short Form (36) Health Survey

The SF-36 is a generic health assessment questionnaire that has been used in clinical trials to study the impact of chronic disease on health-related quality of life (HRQoL). The SF-36 consists of 36 items representing eight dimensions: physical functioning, role physical, bodily pain, general health, vitality, social functioning, role emotional, and mental health. Item response options are presented on a 3-point to 6-point Likert-like scale. Each item is scored on a 0 to 100 range, and item scores are averaged together to create the eight domain scores. The SF-36 also provides two component summaries, the physical component summary and the mental component summary, which are created by aggregating the eight domains according to a scoring algorithm. Therefore, the physical and mental component summaries and eight dimensions are each measured on a scale of 0 to 100, which are t-scores (mean of 50 and standard deviation of 10) that have been standardized to the US general population.²⁶ Thus, a score of 50 on any scale would be at the average or norm of the general US population and a score 10 points lower (i.e., 40) would be one standard deviation below the norm.²⁶ On any of the scales, an increase in score indicates improvement in health status. In general use of the SF-36 version 2.0 (SF-36v2), the User’s Manual proposed the following minimal important differences (MIDs): a change of 2 points on the physical component summary and 3 points on the mental component summary. The manual also proposes the following minimal mean group differences, in terms of t-score points, for SF-36v2 individual dimension scores: physical functioning, 3; role physical, 3; bodily pain, 3; general health, 2; vitality, 2; social functioning, 3; role emotional, 4; and mental health, 3. It should be noted that these MID values were determined as appropriate for groups with mean t-score ranges of 30 to 40; for higher t-score ranges, values may be higher.²⁶ MID values do not represent patient-derived scores. The MIDs for the SF-36v2 are based on clinical and other non–patient-reported anchors.²⁶

Two versions of the SF-36 exist: the original and version 2.0 (SF-36v2 was made available in 1996).²⁶ The SF-36v2 contains minor changes to the original survey, including changes to instructions (reduced ambiguity), questions and answers (better layout), item-level response choices (increased), and cultural and language comparability (increased), and the elimination of a response option from the items in the mental health and vitality dimensions.²⁶

One study has investigated benchmarks for MIDs for 1-point lower SF-36 scores in populations with diabetes.³⁷ SF-36 surveys of three general US patient populations were analyzed to derive statistical models using non–patient-reported anchors of two-year mortality, seven-year mortality, ability to work, hospitalization within six months from baseline, and loss of ability to work within six months from baseline. The authors accounted for certain variables, including age, number of comorbidities, education, marital status, and score levels as well as interactions and nonlinear effects in their analyses. The three surveys produced different outcome risks associated with 1-point changes in SF-36 dimension and component scores. The models were then applied to the diabetes subpopulations within each patient population to estimate the relative risk associated with each outcome and a 1-point hypothetical decrease in SF-36 scores. Different risks were associated with each population and each outcome. For example, using the Medicare Health Outcome Survey, 1-point lower dimension and component scores were associated with increased risks of two-year mortality ranging from 1.8% to 6.4%, while the Medical Outcomes Survey data generated increased risks of seven-year mortality ranging from 2.0% to 9.0%. One-point lower scores using Medical Outcomes Survey data were associated with a six-month increased risk of hospitalization ranging from not statistically significant to 3.7% and an increased risk of losing the ability to work within six months of baseline ranging from 2.8% to 6.9%.³⁷ While MID benchmarks can be helpful in interpreting SF-36 scores in the absence of minimal clinically important differences (MCIDs), the magnitude of the increased risk, while statistically significant, can be difficult to interpret from a clinical and patient perspective. Furthermore, the 1-point score decrease associated with a small risk of hospitalization within a six-month time frame is difficult to interpret as clinically meaningful. Finally, the study failed to adjust for potentially important confounding variables relating to diabetes, including disease type (type 1 diabetes mellitus [T1DM] versus type 2 diabetes mellitus [T2DM]), disease duration, treatment type, glycemic control, lifestyle factors (such as smoking), and socioeconomic factors (such as income level).³⁷ As such, the validity of these 1-point score difference benchmarks remains unclear.

Validation of the Short Form (36) Health Survey in Type 1 and Type 2 Diabetes

Validation of the SF-36 has been performed in a number of studies in T1DM and T2DM combined populations³⁸^–⁴¹ and in T2DM general populations in Germany (N = 144),⁴² the United Kingdom (N = 131),⁴³ Pima Indian adults (N = 54),⁴⁴ older Chinese adults (N = 182),⁴⁵ and US veterans (N = 331; 98% male).⁴⁶ All validation studies were performed in male and female adults; none assessed the SF-36 in T1DM patients exclusively. Validation tests in these populations are described in the sections below.

Reliability

Cronbach’s alpha correlation coefficients measure internal consistency and reliability, conveying how well an item relates to its hypothesized dimension. Alpha coefficients varied according to study and population with some ranges reporting internal reliability ≥ 0.7 to 0.94 for all dimensions,⁴¹^,⁴³ while others found some dimensions to have lower reliability: social functioning,³⁸^,⁴⁴ role emotional, role physical, vitality,⁴² and general health.³⁹^,⁴²^,⁴⁵ Internal reliability discrepancies (dimensions with alpha lower than 0.7) may relate to the specific characteristics, health states, and socioeconomic or cultural traits of the population used to validate the instrument. No dimensions were found to have alpha coefficients ≥ 0.95, though some exceeded 0.9 (higher alpha coefficients may suggest redundancy).

One US study of the adult population (18 to 60 years of age; 64% T1DM, 31% T2DM) measured test-retest reliability by comparing baseline to six-month surveys. All correlations were positive, but ranges of coefficients were reported for the different dimensions: 0.902 for physical function; > 0.6 to 0.9 for social function, role physical, role emotional, mental health, vitality, and general health perception; and 0.433 for pain. As a reference point of the measure of maintenance of health state, a diabetes-specific health status questionnaire served as a reference point for each patient at both time points, with a correlation coefficient of 0.827.⁴⁰ Test-retest reliability was also measured in a German population of T2DM (approximately 70 patients, approximately 50% taking insulin, approximately 50% male) within one to three days of the original test. Measures of internal consistency at both time points were captured, but no correlations were calculated. Internal consistency ranged from 0.67 to 0.96 at baseline and from 0.61 to 0.89 at retest. Upon retest, some dimensions were more affected than others, including role emotional and role physical (both lower) and general health (higher).⁴²

Responsiveness has been assessed in a single study of 331 US veterans (98% male, mean age of 63.5 years, 91% T2DM). The observational, prospective study of 25 diabetic complications, sampled at two time points over a mean interval of 3.1 years, was powered to detect a minimum difference of 5 points across all dimensions of the SF-36 and used Cohen’s effect size to evaluate responsiveness (effect size ranges were defined as “trivial” [< 0.20], “small” [≥ 0.20 to < 0.50], “moderate” [≥ 0.50 to < 0.80], or “large” [≥ 0.80]).⁴⁶ Six of the SF-36 dimensions (general health, physical functioning, social functioning, role physical, bodily pain, and vitality) were found to be responsive when patients who developed two or more complications were compared with those who were stable or improved (effect size 0.31 to 0.66); an increase of more than one complication was associated with a loss of 4.1 points to 23.6 points on these six scales. Statistically significant changes in SF-36 dimension scores were related to any renal complication in five of these six dimensions (general health, physical functioning, social functioning. role physical, vitality) or to any neuropathy complication in four of them (general health, physical functioning, role physical, vitality).⁴⁶

Validity

Two cross-sectional studies conducted in Taiwan³⁸ and mainland China⁴⁵ primarily studying T2DM patients (mean ages 63³⁸ and 69⁴⁵) evaluated the internal validity of the SF-36 by factor analysis (eigenvalues ≥ 1.0 and factor loadings ≥ 0.4 were significant). In one study, all dimensions loaded onto their hypothesized component summary (physical or mental).³⁸ In the other study, factor analysis revealed appropriate loading except for general health on the mental component summary and role physical on both the mental component summary and the physical component summary. Item-dimension correlations ranged from 0.27 to 0.81 across all dimensions and summary scores; only the physical functioning dimension had a scaling of success rates < 100% (physical functioning, 99%).⁴⁵ In a large observational cohort study of chronic disease in the United States (T1DM and T2DM subgroup, N = 624), item-dimension correlations ranged from 0.62 to 0.76 in all but the general health (0.38 to 0.71) and physical functioning (0.52 to 0.82).⁴¹ Scaling success rates from 280 tests, based on item correlation with hypothesized dimension exceeding that of all others by more than two standard errors, were 100% in all but general health (90%) and physical functioning (99%).⁴¹ General health was found to correlate with both the physical component summary and the mental component summary during SF-36 development.²⁶

Inter-dimension correlations of the SF-36 in a T1DM and T2DM patient population ranged from 0.179 (mental health correlation with physical functioning) to 0.637 (role physical with pain),⁴⁰ suggesting that different dimensions are measuring somewhat different constructs.

One challenge when validating a pre-established, generic HRQoL instrument for use in a specific disease population is in the identification of appropriate measures against which to test the instrument (construct validity) when no gold standard is available (criterion validity). A number of studies have assessed the association between glycated hemoglobin (A1C), a known surrogate marker in both forms of diabetes, and SF-36 dimensions, or have performed known-group comparisons based on A1C level stratification. These studies have established that there is no clear relationship between dimensions of the SF-36 and A1C levels, reporting unexpected, poor, or negligible correlations⁴⁴^,⁴⁷ or an inability of the SF-36 to discriminate between known groups based on A1C levels.³⁸^,⁴⁰ An initial study comparing physician assessment of patient health to the patient-reported SF-36 dimension scores reported unsatisfactory correlations (0.39 to 0.64).⁴⁰ Construct validity testing was based on exploratory and a priori hypotheses. The SF-36 showed evidence of measuring effects of diabetic complications,⁴³ treatment type, and changes following diabetes interventions,⁴²^,⁴⁴ but it was also influenced by non-diabetic comorbidity⁴³^,⁴⁴ and other non–diabetes-specific factors, such as age:⁴³^,⁴⁴

Age: Physical functioning, role physical, social functioning, and mental health deteriorated in older age groups (Spearman rank correlation coefficients, −0.52 to −0.40; P < 0.005)⁴⁴; physical functioning and role physical were impaired in older age groups (P < 0.05),⁴³ but role emotional was impaired in younger age groups (P < 0.01).⁴³
Sex: No statistically significant differences were found.⁴³^,⁴⁴ Women had lower scores on multiple dimensions (P < 0.05).⁴²
Education level: No correlation was found.⁴⁴
Socioeconomic status and income: No statistically significant differences were found.⁴³^,⁴⁴
Diabetes-related laboratory markers: No correlation was found.⁴⁴
Diabetic complications: These were associated with lower dimension scores for social functioning, role emotional, vitality (P < 0.01), and role physical (P < 0.05)⁴³; all dimensions of the SF-36 were lower with more than one late complication (P < 0.01).⁴²
Non-diabetic comorbidities: These showed lower scores in physical functioning and role physical (both with P = 0.001), vitality and general health (P < 0.01), and mental health (P < 0.05).⁴³
Comorbidities: These showed lower scores in physical functioning, role physical, role emotional, mental health, and social functioning (Spearman rank correlation coefficients, −0.42 to −0.32; P < 0.02).⁴⁴
Diabetic treatment: Insulin was associated with lower scores than non-insulin treatment in physical functioning, role physical, social functioning, and general health (Spearman rank correlation coefficients, 0.31 to 0.40; P < 0.03),⁴⁴ and in vitality and mental health (P < 0.01).⁴²
Response to diabetes intervention (treatment, education, or both) showed statistically significant score changes for the following dimensions: Role physical, general health, vitality, and social functioning (P < 0.05 or less).⁴²

Validity of the SF-36 dimensions was also evaluated using diabetes-specific HRQoL measures. The Audit of Diabetes Dependent Quality of Life is a validated tool for measuring the impact of diabetes on general quality of life across 13 domains. SF-36 correlated better with this tool in patients without any other disease or comorbidity than in those with comorbidities (Spearman’s rank coefficients, 0.30 to 0.44) across five domains: social functioning, role physical, mental health, vitality, and general health (P < 0.05).⁴³ Another study compared validation of the SF-36 with Diabetes-39, a five-dimension measure consisting of 39 items that probe diabetes-related HRQoL.³⁸ The SF-36 performed better than Diabetes-39 on some dimensions and in the physical component summary for cardiovascular disease and cerebrovascular complications (Cohen’s effect sizes highest in the physical dimensions) and for the diabetic all-complication summary known-group comparison: effect sizes of SF-36 were 0.38 compared with the Diabetes-39 summary score of 0.15. The Diabetes-39 had discriminative power over the SF-36 (based on C-statistic) in two-hour post-prandial glucose (0.7 versus 0.63; P < 0.05); the SF-36 generally performed better than the Diabetes-39 for complication-known groups. SF-36 dimensions performed better at a statistically significant level than Diabetes-39 subscales for cardiovascular disease and the all-complication–known groups;³⁸ in the German T2DM population, SF-36 showed statistically significant multidimensional changes after diabetes intervention when the Diabetes-39 did not.⁴² Based on a priori hypotheses, known-groups comparisons of self-reported high blood pressure, heart problems, and measured depression levels showed significantly higher SF-36 dimension scores for patients without high blood pressure, heart problems, or moderate to high depressive levels (no effect sizes presented).⁴⁵

Critical Appraisal

The SF-36 requires further and more comprehensive validation across the combined population of T1DM and T2DM patients, across different ethnic and cultural populations, and in T1DM patient groups. As SF-36 was developed as a generic instrument, it has been suggested that the tool be evaluated and possibly revalidated whenever a new study is undertaken in any diabetes population, as some items and dimensions of the SF-36 did not respond optimally during validation in various groups.³⁸^,⁴⁵ Furthermore, in the CADTH Common Drug Review (CDR) search of the literature, few studies were identified that attempted to validate the test-retest reliability,⁴⁰^,⁴² responsiveness,⁴⁶ or MCID of the SF-36 in the general diabetes population, in separate T1DM and T2DM populations, and in more specific diabetes subgroups. The SF-36 has shown evidence of measuring the effects of diabetic complications,⁴²^,⁴³ but it is also influenced by non-diabetic comorbidity⁴³^,⁴⁴ and other non–diabetes-specific factors, such as age.⁴³^,⁴⁴ It does not demonstrate evidence of association with surrogate markers of disease severity,³⁸^,⁴⁰^,⁴⁴^,⁴⁷ but does respond to treatment type and changes following diabetes interventions.⁴²^,⁴⁴ The SF-36 and diabetes-specific instruments likely provide some degree of overlap, but also address different features of a patient’s overall HRQoL.³⁸^,⁴⁵ Taken together, the evidence suggests that the SF-36 is not likely an appropriate stand-alone tool for the evaluation of all facets of HRQoL in patients with diabetes, but can provide useful insight when used in combination with the appropriate, complementary diabetes-specific treatment evaluation and HRQoL instruments. Comprehensive validation of the SF-36 in T1DM and T2DM is incomplete, and no MCID specifically in diabetes has been established.

Treatment-Related Impact Measure — Diabetes

TRIM-D was developed in English by The Brod Group and by Novo Nordisk as a questionnaire appropriate for both T1DM and T2DM diabetes patients. This patient-reported outcome measure was developed to address gaps in reporting of treatment impact in both forms of diabetes. TRIM-D is a 28-item, self-reported questionnaire encompassing five domains: treatment burden (six items), daily life (five items), diabetes management (five items), psychological health (eight items), and compliance (four items). Response options are presented on a 5-point Likert-like scale. An increase in score indicates an improvement in health state. Domains can be scored individually or the measure can be scored as a total of these domains.²⁸ No MCID was been determined for TRIM-D.

Validation of the Treatment-Related Impact Measure — Diabetes in Type 1 and Type 2 Diabetes Mellitus

Content validity was addressed during instrument development. Item development was initially extracted from the literature and T1DM and T2DM patient and expert input, and then compiled²⁸^,⁴⁸ and assembled into an early version of the survey measuring the multifaceted impact of diabetes. Five individual telephone interviews of pre-filled early surveys were conducted; findings were reviewed and decisions made about changes to measures. These blocks of five interviews continued until a consensus was met by an entire block. The initial validation study recruited 507 diabetes patients ranging from 18 to 80 years (mean 51.4 years) to respond to Web-based questionnaires (initial TRIM-D and a battery of other patient-reported measures). The group was stratified across income, age, ethnicity, and diabetes medications: 53% female, 84% white, 6% African American, 74% T2DM.²⁸ Analysis of ceiling effect (> 50%), inter-item correlations (> 0.7), and conceptual framework led to the refined 28-item TRIM-D.

Reliability

Evaluation of internal consistency produced Cronbach’s alpha correlation coefficients of 0.94 (for the total score) and ranged from 0.86 to 0.91 (for the subscale scores);²⁸ follow-up internal reliability alphas exceeded 0.7 and fell within 0.1 of those found in the development study.²⁷ Test-retest analysis was performed using data from a subset of 56 patients who completed the questionnaire within the permitted time gap of two weeks ± one day, with coefficients for total score measured at 0.85, and those for the subscales ranging from 0.71 to 0.83²⁸ (coefficients ≥ 0.7 are considered acceptable, ≥ 0.8 are good, and ≥ 0.9 are excellent).

Validity

Validation of the TRIM-D total questionnaire and domains was performed using a battery of Web-based survey outcomes measures (validated and not validated in diabetes).

Convergent validity was reported based on a priori hypotheses using a two-tailed Pearson’s correlation coefficient (r), significance < 0.05, with r > 0.40 considered evidence of moderate to strong associations. The following significant correlations were found between TRIM-D (total or subdomain) and the indicated outcome measure:²⁸

r = 0.63: Global satisfaction scale of the Treatment Satisfaction Questionnaire for Medication
r = 0.45: Diabetes Medication Satisfaction Measure, burden subscale
r = −0.67: Activity Impairment Assessment total score
r = 0.66 and 0.60: Diabetes Medication Satisfaction, efficacy, and the Treatment Satisfaction Questionnaire for Medication, effectiveness scales, respectively
r = −0.75: TRIM-D, psychological health, with the Problem Areas in Diabetes
r = −0.69: TRIM-D, compliance, with the Medication Compliance Scale.

A number of known-groups validity a priori hypotheses were tested for the TRIM-D total score and subscales by one-way analysis of variance (groups as fixed factors; ANOVA F-value (F); significance P values < 0.05).²⁸

The total TRIM-D distinguished between willingness of respondents to change diabetes treatment (F = 83.7; P < 0.001) and between those compliant versus not compliant with their treatment (F = 136.6; P < 0.001).
The TRIM-D burden domain distinguished between the types of treatment (oral, pump, and syringe, F = 27.7; P < 0.001), but not between number of daily injections.
The TRIM-D daily life domain distinguished (P < 0.001) between levels of satisfaction (measured by the Quality of Life Enjoyment and Satisfaction Questionnaire, F = 47.5) and work days lost due to diabetes (F = 43.1).
The TRIM-D diabetes management domain distinguished between A1C levels (F = 16.6; P < 0.001), the number of medical visits (F = 4.8; P < 0.01), the changing of diabetes treatment plans (none, 1 to 2 times, or > 3, F = 8.5; P < 0.001), and diabetes control (F = 115.8; P < 0.001).
The psychological health subscale distinguished between depression severity (F = 152.9; P < 0.001) and level of social support (F = 92.6; P < 0.001).
The TRIM-D compliance domain distinguished between the type of treatment (oral versus other, F = 14.3; P < 0.001).

Responsiveness

Internal and external responsiveness of the TRIM-D were assessed in a 2-by-12-week, crossover, randomized controlled trial (RCT) using two different pre-filled insulin pens, with participation of 242 patients aged 18 years or older with T1DM or T2DM.²⁷ Internal responsiveness measurements found statistically significant score changes ranging from 18.6 (effect size = 0.84, TRIM-D treatment burden) to 3.1 (effect size = 0.17, TRIM-D psychological health).²⁷ External responsiveness using the Insulin Treatment Satisfaction Questionnaire change found a strong association with the TRIM-D total score (r = 0.72; P < 0.001). The Insulin Treatment Satisfaction Questionnaire summary score showed the following correlations with TRIM-D domain items: treatment burden items (r ranging between 0.32 and 0.53), daily life items (r = 0.37 to 0.45), diabetes management items (r = 0.22 to 0.38), psychological health items (r = 0.35 to 0.51), and compliance domain items (r = 0.14 to 0.25). Five of 28 items within the domains were not responsive. Responsiveness of each domain may vary according to study design, and this should be taken into account when defining, a priori, the TRIM-D domains that will be expected to respond to change within a study.²⁷

Preliminary, exploratory estimates of MIDs in this study²⁷ were based on self-reported anchor items, without longitudinal data. The statistical analysis plan defined the MID threshold criterion to be half the standard deviation of the TRIM-D domain score differences (Δ) corresponding with minimal important anchor response intervals of “slightly” and “somewhat.” Based on this criterion, each of the TRIM-D domains met the MID threshold except for the compliance domain, for which no overall anchor item had been established:²⁸

Treatment burden, Δ = 10.6; ½ SD = 9.5
Daily life, Δ = 16.0; ½ SD = 9.2
Diabetes management, Δ = 12.0; ½ SD = 8.2
Psychological, Δ = 17.8; ½ SD = 8.7
TRIM-D total score, Δ = 17.6; ½ SD = 7.8.²⁷

Critical Appraisal

The TRIM-D demonstrated good internal consistency (with Cronbach’s alphas > 0.7 and < 0.95) and acceptable test-retest reliability (with coefficients > 0.7). Good construct validity of the five domains and the total score of the TRIM-D were supported by a priori hypotheses (demonstrating moderate to strong associations) and known-groups methods. Most items of the TRIM-D were responsive in an RCT setting of T1DM and T2DM patients, but five did not respond as expected. Further validation of the TRIM-D should also be considered (1) in different subpopulations of T1DM and T2DM, (2) in different countries or languages and cultural settings, (3) using non–Web-based methods, and (4) using non–patient-reported outcomes and clinical factors to assess validity. At present, no MCID has been determined for the TRIM-D.

Treatment-Related Impact Measure — Hypoglycemic Events

The Treatment-Related Impact Measure — Hypoglycemic Events (TRIM-HYPO), a patient-reported outcome measure, was developed in English by The Brod Group in collaboration with Novo Nordisk A/S and Health Research Associates, Inc. The TRIM-HYPO was developed to measure the impact of non-severe hypoglycemic events on patients’ HRQoL arising from the use of insulin to treat both forms of diabetes (T1DM and T2DM). TRIM-HYPO is a self-reported questionnaire, comprising 33 Likert-like scale items (scored 1 to 5) in five domains: daily functioning, emotional well-being, diabetes management, work productivity, and sleep disruption.²⁹

Domains are scored individually. A total score is also calculated using three of the five domains (daily functioning, emotional well-being, and diabetes management), as work productivity and sleep disruption do not apply to all patients. Lower scores on the TRIM-HYPO indicate a better health state. Raw scores are obtained by aggregating scale items into their respective domain scales. A weighted score is then generated based by the number of non-severe hypoglycemic occurrences in the past 30 days: the higher the number, the greater the impact on the weighted score. This weighting helps account for the difference in HRQoL of patients experiencing few versus those experiencing many hypoglycemic events. A standard algorithm method transforms the weighted scores into a 0 to 100 score. No MCID has been determined for the TRIM-HYPO.²⁹

Validation of the Treatment-Related Impact Measure — Hypoglycemic Events in Type 1 and Type 2 Diabetes Mellitus

Content validity was addressed during instrument development. Item development was initially generated from literature review, expert opinion, and focus groups (T1DM and T2DM patients requiring insulin). Responses were coded, grouped into domains, and assembled into an early version of the survey. Blocks of three individual telephone interviews of pre-filled early surveys were conducted to assess the instructions, readability, and relevance of items. Findings were reviewed and decisions made about changes to measures. These blocks of interviews continued until a consensus was met by an entire block. Edits made to the survey based on these interviews were reviewed by a final block before the validation began. The non-interventional validation study recruited 407 diabetes patients in the United States ranging from 18 years to 89 years (mean 50.2 years) to respond to Web-based questionnaires (including the initial five-domain, 46-item TRIM-HYPO along with a battery of other patient-reported measures relevant to hypoglycemia and HRQoL). Group characteristics included 48% female, 77.9% white, 10.3% African American, and 67.3% T2DM.²⁹

Removal of items due to floor and ceiling effects (> 50% responses at extremes) and inter-item Pearson’s correlations (> 0.7) led to the refined 33-item TRIM-HYPO. Principle component analysis supported the retention of all five domains. Item-to-scale correlations (or item-total correlations, which describe how well individual items behave relative to the scale average) ranged from 0.492 to 0.662. Item Response Theory was applied during the questionnaire refinement process. Rasch item reliability ranged from 0.95 to 0.98 (> 0.9 was considered an acceptable threshold), and item separation coefficients ranged from 4.20 to 7.37 (a threshold > 2.0 was considered appropriate). Fit analyses found that there were many responses above and below the items’ coverage and that gaps between items existed.²⁹

Reliability

Evaluation of internal consistency produced Cronbach’s alpha coefficients of 0.86 to 0.95 (minimum threshold for consistency was 0.7). Test-retest analysis was performed using data from a subset of 42 patients who completed the questionnaire within the permitted time gap of three weeks, with coefficients for total score measured at 0.84 and those for the subscales ranging from 0.75 to 0.98.²⁹

Validity

Convergent validity was reported based on a priori hypotheses using a two-tailed Pearson’s correlation coefficient (significance P < 0.05, with coefficients > 0.40 considered evidence of moderate to strong associations). All hypotheses tested showed significant correlations (P < 0.01) between the total (or subscale) scores and the related measures tested. Anchors included the following:

Activity Impairment Assessment total score
Psychological General Well-being Index global score
Insulin Treatment Satisfaction Questionnaire, HYPO control domain score
Medical Outcomes Survey — Sleep, problems index I and II
Work Productivity and Activity Impairment, per cent overall work impairment due to problem
Sheehan Disability Scale

Known-group validity testing confirmed all but one known-group hypothesis (P < 0.001 or P < 0.0001) and the expected discrimination between the known relationships.²⁹ These known groups were based on the following:

Self-reported hypoglycemia management
Quality of Life Enjoyment and Satisfaction Questionnaire — Short Form, total score
Self-reported recovery time after hypoglycemic event
UCLA Loneliness Scale, total score
Self-reported level of emotional well-being
Treatment Satisfaction Questionnaire for Medication (ver. 2), side effect subscale score
Self-report on extra blood glucose monitoring after a hypoglycemic event
Fatigue Symptom Inventory, severity of fatigue domain score
Self-reported number of nocturnal hypoglycemic events within last 30 days
Self-reported recovery time after nocturnal hypoglycemic event
Self-reported degree of hypoglycemia interference with work.

Critical Appraisal

The TRIM-HYPO was specifically developed for use by patients with diabetes as a means of comprehensively assessing the impact of non-severe hypoglycemia, an important side effect of insulin therapy. A preliminary study confirmed aspects of the measure’s validity (content validity, construct validity, and reliability).²⁹ Further steps may be required to fully develop the TRIM-HYPO for use in the diabetes population, as item analysis identified deficiencies in item coverage and gaps between some items. Further validation should also be considered (1) in different subpopulations of T1DM and T2DM, (2) in different countries or languages and cultural settings, (3) using non–Web-based methods, (4) using non–patient-reported outcomes and clinical factors to assess validity, and (5) to assess MCID and responsiveness using longitudinal data and the selection of anchors appropriate to the study and the long-term outcomes in diabetes. To date, no MCID is available for this measure.

Summary

The SF-36 was developed as a generic HRQoL measure and has shown good validity and reliability in diabetes populations; however, the performance of each dimension, and of the summary component scores, varies between populations and according to study design. No MCID has been established in diabetes populations. The SF-36 should be used in combination with other instruments when studying the HRQoL of patients with diabetes. The TRIM-D is a patient-reported outcome measure that was developed to address gaps in the reporting of treatment impact in both forms of diabetes. The TRIM-D demonstrated good internal consistency and acceptable test-retest reliability. Most items of the TRIM-D were responsive in an RCT setting of T1DM and T2DM patients, but five did not respond as expected. No MCID has been determined for the TRIM-D. The TRIM-HYPO was developed to comprehensively assess the impact of non-severe hypoglycemia in patients with diabetes, and only one study was found to assess the validity of the measure. No MCID is currently available.

The copyright and other intellectual property rights in this document are owned by CADTH and its licensors. These rights are protected by the Canadian Copyright Act and other national and international laws and agreements. Users are permitted to make copies of this document for non-commercial purposes only, provided it is not modified when reproduced and appropriate credit is given to CADTH and its licensors.

Except where otherwise noted, this work is distributed under the terms of a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International licence (CC BY-NC-ND), a copy of which is available at http://creativecommons.org/licenses/by-nc-nd/4.0/

Bookshelf ID: NBK533976

Contents

< Prev Next >

PubReader
Print View
Cite this Page
Clinical Review Report: Insulin Degludec (Tresiba): (Novo Nordisk Canada Inc): Indication: For once-daily treatment of adults with diabetes mellitus to improve glycemic control [Internet]. Ottawa (ON): Canadian Agency for Drugs and Technologies in Health; 2017 Dec. Appendix 4, Validity of Outcome Measures.
PDF version of this title (6.4M)

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Bookshelf

Clinical Review Report: Insulin Degludec (Tresiba): (Novo Nordisk Canada Inc): Indication: For once-daily treatment of adults with diabetes mellitus to improve glycemic control [Internet].

Appendix 4Validity of Outcome Measures

Aim

Findings

Table 38Validity and Minimal Clinically Important Difference of Outcome Measures

Short Form (36) Health Survey

Validation of the Short Form (36) Health Survey in Type 1 and Type 2 Diabetes

Reliability

Validity

Critical Appraisal

Treatment-Related Impact Measure — Diabetes

Validation of the Treatment-Related Impact Measure — Diabetes in Type 1 and Type 2 Diabetes Mellitus

Reliability

Validity

Responsiveness

Critical Appraisal

Treatment-Related Impact Measure — Hypoglycemic Events

Validation of the Treatment-Related Impact Measure — Hypoglycemic Events in Type 1 and Type 2 Diabetes Mellitus

Reliability

Validity

Critical Appraisal

Summary

Views

In this Page

Other titles in this collection

Recent Activity