Included under terms of UK Non-commercial Government License.
NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Brazier J, Connell J, Papaioannou D, et al. A systematic review, psychometric analysis and qualitative assessment of generic preference-based measures of health in mental health populations and the estimation of mapping functions from widely used specific measures. Southampton (UK): NIHR Journals Library; 2014 May. (Health Technology Assessment, No. 18.34.)
A systematic review, psychometric analysis and qualitative assessment of generic preference-based measures of health in mental health populations and the estimation of mapping functions from widely used specific measures.
Show detailsThis chapter examines the validity and responsiveness of two generic preference-based measures of health (the EQ-5D and SF-6D) and two related generic non-preference-based measures (the SF-36 and SF-12) in populations with mental health problems. The assessment is based on a systematic review of studies reporting one or more of these measures alongside various condition-specific indicators of mental health that can be used to assess validity using known-group comparisons and convergence, and responsiveness to changes in health over time. It forms the first study presented in this report.
This review covers five mental health conditions: schizophrenia, bipolar disorder, personality disorders, depression and anxiety. Four separate systematic reviews were undertaken from one common search of the literature, with depression and anxiety reviewed together. This chapter presents the methods and an overview of the findings. The detailed findings with tables of results by study are available in published articles or in discussion paper form.60–62
Methods
Selecting review studies
Inclusion and exclusion criteria
Studies were eligible for inclusion if they contained HRQoL data obtained using one or more of the instruments under study (SF-36, SF-12, SF-6D or EQ-5D) within the specified population of adults (aged ≥ 18 years) suffering from one of the five conditions. HRQoL data could be from descriptive systems (i.e. their items and dimensions) or health-state utility values generated by the EQ-5D or SF-6D, or the EQ-VAS. Studies whose primary focus was on individuals with alcohol and/or drug dependency were excluded whether or not those individuals had one of the five conditions. The outcomes had to include data that allowed measurement of the construct validity (i.e. known groups, convergent or discriminant) or the responsiveness of the HRQoL instrument(s). Responsiveness data had to be in the form of effect sizes, standardised response means (SRMs) or correlation with change scores on symptom measures. Studies that only provided data on other psychometric properties, such as reliability, face validity and content validity, were not included.
Identification of studies
A literature search was performed to identify relevant research for all mental health conditions being investigated within the wider review, using a database thesaurus and free text terms. Two sets of search terms were combined: terms for each of the four HRQoL measures and terms for each mental health condition (the search strategy is in Appendix 2 ). Ten databases were searched for published research from inception: Cochrane Database of Systematic Reviews (CDSR), Cochrane Central Register of Controlled Trials (CENTRAL), NHS Economic Evaluation Database (NHS EED), Health Technology Database, Database of Abstracts of Reviews of Effects (DARE), MEDLINE, PreMEDLINE, Cumulative Index to Nursing and Allied Health Literature (CINAHL), EMBASE and Web of Science. Searches were limited to English language only but not by any date restriction. All searches were initially conducted in August 2009, though updates were undertaken for personality and bipolar disorders until March 2011, and for depression and anxiety until December 2010. The reference lists of relevant studies were searched for further papers.
Citations identified by the searching process were screened by one reviewer (DP or TP) using the inclusion criteria. The full texts of papers were retrieved for any titles or abstracts that appeared to satisfy the inclusion criteria, or for which inclusion or exclusion could not be definitely determined. The same inclusion and exclusion criteria were used to assess full-text papers and any queries over inclusion were resolved by discussion and consensus between two reviewers (DP and JB or TP and JB).
Data extraction
Data from all included trials were extracted using a form designed specifically for this review, and piloted on a sample of papers. Data extracted included: country of publication, type of disorder, study sample characteristics (numbers, age, sex), other measures used, mean scores on HRQoL measures, type and method of validity assessment, type and method of responsiveness assessment, and validity and responsiveness data. Extractions were performed by one reviewer (DP or TP). Where duplicate publications reported on similar data, the most complete and recent data were extracted.
Quality assessment
There is no formal method for assessing the quality of these studies (i.e. there are no quality assessment checklists), and thus we used the methods described by Fitzsimmons et al. 63 to evaluate HRQoL data in their systematic review on the use and validation of HRQoL instruments with older cancer patients. These included examining whether or not tests of statistical significance were applied, differences between treatment groups reported (where applicable, e.g. in known-group validity), clinical significance discussed and missing data documented. We also report on response and completion rates where these were provided.
Evidence synthesis and meta-analysis
Owing to the large degree of heterogeneity between studies (including types of study designs, HRQoL measure, population characteristics and methods of determining construct validity and responsiveness), it was not appropriate to perform meta-analysis. Analysis was by narrative synthesis and data were tabulated. All analyses were performed based on the HRQoL measure, with data analysis grouped by type of validity (convergent/discriminant or known groups) or responsiveness test used.
Defining validity and responsiveness
Validity and responsiveness were assessed using the definitions presented in Chapter 1 . For convergent validity, the strength of correlation between the two measures is calculated using statistical tests (e.g. Pearson’s product moment correlation or Spearman’s rank correlation). We have used the following categories for evidence of correlation: > 0.6 = very strong, ≥ 0.5 to < 0.6 = strong, < 0.5 to ≥ 0.3 = moderate, and < 0.3 = weak. Statistical significance is also attached to correlations (p < 0.05). Responsiveness can be measured in a number of ways by effect size statistics64 standardised in different ways, such as dividing through by the SD at baseline or SD of the change in scores over time (i.e. SRMs). Within this review, Cohen’s65 categories for magnitude of effect size were used: ≥ 0.80 = large, < 0.80 and ≥ 0.50 = moderate, and 0.30 to < 0.50 = small. As pointed out in Chapter 1 , these tests need to be used with some care as there is no gold standard and the application of these tests sometimes uses indirect indicators of the concept (e.g. symptoms rather than quality of life).
Study characteristics
The initial search for studies for the wider review retrieved 4115 unique citations across the five mental health conditions (Figure 1). Of these, 3849 were excluded at the title and abstract stage and 266 were examined in full text. From these, 154 studies were found that met the inclusion criteria. A further 12 studies were identified through reference list checking. Overall, the findings from 91 studies are discussed in this chapter for the five conditions. SF-36 and SF-12 studies were not ultimately included in the depression and anxiety review as a sufficiently large number of studies used the SF-6D to be able to extrapolate to these longer versions of the measure. Figure 1 shows a flow diagram of study identification and the characteristics of the studies reviewed by condition are presented in Appendix 2 .
Schizophrenia
Thirty-one studies were identified that provided data on the validity and/or responsiveness of the EQ-5D, SF-36, SF-12 or SF-6D within individuals diagnosed with schizophrenia, schizophreniform disorder or schizoaffective disorder (see Appendix 2 , Table 32 ).48,66–95 Six studies were undertaken internationally across more than one country.66–71 The numbers of participants in the studies with schizophrenia or related conditions ranged from 15 to 2657. Participants included males and females with a mean age of participants with a schizophrenia spectrum disorder, reported in 21 of the 33 studies, ranging between 20.3 and 57.9 years.48,68,70–73,75–80,82,84,86,88–91,94,95
All studies obtained HRQoL information from patients; seven of these studies compared patient HRQoL values with published general population ‘normative’ values,70–76 three compared HRQoL values with normal comparison participants that were recruited to the study77–79 and two used ‘norms’ from healthy participants who had taken part in large surveys.80,96
Bipolar disorder
Twenty-two studies were identified that provided data on the validity and/or responsiveness of the generic HRQoL measures in bipolar disorder (see Appendix 2 , Table 33 ).74,97–117 Nineteen studies74,97,98,102–117 contained data on the SF-36, one study involved the SF-12101 and four studies98–101 contained data on the EQ-5D. No studies were identified that examined the SF-6D in individuals diagnosed with bipolar disorder. The numbers of participants in the studies with bipolar disorder ranged from 30 to 1999. Participants included men and women. The mean age of participants with a bipolar disorder, reported in 19 of the 22 studies,97–103,106–117 ranged between 29.3 and 60.2 years.
All studies obtained HRQoL information from patients; six of these studies compared patient HRQoL values with ‘norms’ derived from published general population ‘normative’ values,48,76,80,81,96,118 three compared HRQoL values with normal comparison subjects that were recruited to the study74,79,82 and one study used ‘norms’ from healthy subjects taking part in large surveys.83 Four studies investigated differences in HRQoL between mood groups in bipolar disorder.75,84–86 Two of the four studies investigating the EQ-5D used general population preferences for EQ-5D health states to generate EQ-5D index values.87,96
Personality disorder
In total, there were 10 studies reporting HRQoL data on patients with personality disorder.96,119–128 Six studies looked at the EQ-5D,120,124–128 two at the SF-36119,121 and two at the SF-12 (corresponding to three articles).96,122,123 No studies were found investigating the validity or responsiveness of the SF-6D in this patient group. Studies were undertaken in four countries. Nine96,119–128 of the 10 studies presented data for different personality disorders together. One study looked exclusively at individuals with borderline personality disorder. The numbers of individuals included within the studies that were diagnosed or screened as having one or more personality disorders ranged from 48 to 1708. Participants included males and females (proportions can be seen in Appendix 2 , Table 34 ). The mean age of participants with personality disorders, reported in 9 of the 10 studies, ranged between 29.4 and 45 years.96,119–121,124–128
Two studies120,127 investigated the known-group validity of the EQ-5D, one study127 investigated the convergent validity of the EQ-5D and four studies124–126,128 investigated the responsiveness of the EQ-5D. Two studies119,121 investigated the known-group validity of the SF-36 and two studies96,122,123 investigated this property in the SF-12. One study119 investigated the responsiveness and convergent validity of the SF-36.
Depression and anxiety
Owing to the large number of studies reporting SF-36 and SF-12 data in this group, it was decided to focus on EQ-5D and SF-6D data. SF-36 and SF-12 are not preference-based and have been included in the other studies to give an indication of the likely performance of the derivative SF-6D. In all, there were 22 studies50,129–149 identified with data on the validity and/or responsiveness of the generic HRQoL measures in depression and anxiety for EQ-5D and SF-6D. Fourteen studies50,129,136–139,142–149 had data on the EQ-5D and seven130,131,133–135,140,141 contained data on the SF-6D. Studies were undertaken in at least 12 countries (a number covered Europe). The numbers of participants with depression and anxiety in these studies ranged from 44 to 3815. Participants included men and women with a mean age between 39.2 and 49 years of age.
Six studies139,142,144,147–149 investigated the known-group validity of the EQ-5D across severity groups, five studies129,136,137,143,145 reported convergent validity of the EQ-5D and 14 studies50,130–138,140,141,145,146 investigated the responsiveness of the EQ-5D. Two studies139,142 had known-groups differences in the SF-6D, three132,134,137 had convergent validity and two136,145 had responsiveness.
Quality of included studies
Most studies reported tests for statistical significance of the properties measured for difference between groups (e.g. known-group validity) and responsiveness to change over time. A minority of studies considered what constituted a clinically significant difference in HRQoL scores, by either providing a predefined value or discussing whether or not the results were clinically meaningful. There was little discussion or inclusion, however, of clinical significance defined in terms of patient perception, and thus, from the perspective of preference-based measures, the lack of patient preference undermines the concept of clinical significance. Most studies did not report missing HRQoL data. This has implications for the representativeness of these samples due to possible selection bias.
Results
Detailed findings with tables of results on the validity and responsiveness have been reported elsewhere.60–62 This section summarises the results using a simple classification of the evidence: ✓ indicates results in support of validity or responsiveness and ✗ indicates an inconsistent or non-significant result. The results on validity have been divided into known-group differences across severity groups typically defined using symptoms, known differences against a normal case–control group and convergence with a measure of the condition.
Schizophrenia
The majority of the evidence (25 studies) examined the validity and responsiveness of the SF-36.66,67,69–85,88,92–95 Although there appears to be strong evidence that the SF-36 is able to distinguish between general population norms and scores of people with schizophrenia (known-group validity), the evidence for convergent validity and responsiveness is less certain (Table 4). Similar findings exist for the EQ-5D, with mixed evidence for the properties of convergent validity and responsiveness. Indeed, when strong associations were found between individual EQ-5D health-state dimensions (e.g. anxiety/depression or self-care) and symptom or functioning measures, this did not necessarily translate into comparable changes in overall EQ-5D index scores, i.e. health-state utility values.48,90 There was some evidence that associations with measures of depression were comparatively stronger than those with symptom measures of schizophrenia [e.g. the Positive and Negative Syndrome Scale (PANSS)].71,88–90
When testing associations between measures for convergent validity (or change scores in responsiveness), there are reasons to predict that stronger and more consistent correlations might exist between generic HRQoL measures and functioning [e.g. Global Assessment of Functioning (GAF), Social and Occupational Functioning Assessment Scale (SOFAS)] or mental health/schizophrenia-specific HRQoL [e.g. Quality of Life Scale for Schizophrenia (QLS)] measures than purely symptom-based measures such as the PANSS. By their very nature, symptom measures are measuring different concepts from HRQoL measures, so it might be reasonable to predict that it is less likely that a strong correlation might exist. A re-examination of the evidence, taking into account evidence for the type of measure used to assess convergent validity (symptom vs. functioning or HRQoL measures, subjective vs. objective measures), produced mixed results. Functioning and schizophrenia HRQoL measures did not fare much better than clinical and symptom-based measures, with four studies indicating strong evidence for convergent validity,82–84,86 and four indicating uncertain or no evidence of such a relationship.69,81,88,89
Bipolar disorder
There was positive evidence that the SF-36 is a valid and responsive measure in bipolar disorder when individuals are in a depressed, euthymic or mixed state (Table 5). There was little evidence available on the EQ-5D and SF-12 and none for the SF-6D. What evidence there is on the EQ-5D is mixed, with three tests supporting the validity of the EQ-5D in this group and three against it across four studies.98–101
It is unclear if these generic measures are valid in manic or hypomanic individuals. Only 7 out of 22 SF-36 studies included individuals in a manic or hypomanic state,74,99,101,104,106,110,117 and these suggest it is not a valid and responsive measure within this population. However, where studies examined convergent validity with clinical measures of mania, the numbers of patients in the manic or hypomanic mood state were too small to be meaningful. More generally, there is some concern around how to obtain reliable HRQoL ratings within bipolar disorder individuals in manic or hypomanic states as this relies on self-report.
Depression and anxiety
The SF-6D and EQ-5D demonstrate good construct validity and responsiveness for patients with depression (Table 6). They can both distinguish between groups that are known to vary according to severity of depression, and across differences in quality of life of depressed patients. Both measures respond to clinical and quality-of-life improvement and deterioration. Indeed, in many cases they are more responsive than depression-specific measures (this may be due to the integrated nature of mental and physical health problems and potential simultaneous improvement in comorbid conditions).
The performance of the EQ-5D for patients with anxiety is a little more mixed. The measures were found to be more highly correlated with depression scales than with clinical anxiety scales in patients with anxiety.
The relationship between the EQ-5D and the SF-6D reflects that found for other conditions. The EQ-5D shows a lower level of utility at the most severe end for depression, and the SF-6D shows equal or greater detriment at the milder end. The SF-6D identifies utility loss in patients that report full health on the EQ-5D, though patient averages for mild depression and anxiety are still able to show lower than normal population utility using the EQ-5D.
Personality disorder
The EQ-5D appears responsive in individuals with personality disorders (Table 7). Data on other properties such as convergent and known-group validity were very limited. There was also little evidence on the SF-36 or SF-12 and none on the SF-6D. Nevertheless, the studies which did exist provided some positive evidence that the measures are valid for use in personality disorders.100,102,107 An exception was Narud et al.,119 who found that most dimensions on the SF-36 were not able to detect changes in patients in the same way as clinical measures. They concluded that this may be because some SF-36 dimensions are not relevant to HRQoL, so that, even if patients change clinically, this does not translate to a change in HRQoL.
Discussion
This review is the first to have comprehensively identified studies that report on the construct validity and responsiveness of these four generic HRQoL measures, and to tabulate and give a narrative synthesis of the findings. Overall, the evidence suggests that generic HRQoL measures are appropriate for patients with depression and personality disorder, but it is more mixed in relation to anxiety, bipolar disorder and schizophrenia.
The findings for depression are encouraging, but there is a concern that this may be driving the differences between groups found for other conditions. For anxiety, the ability of generic preference-based measures to distinguish between subgroups of patients with anxiety may be driven by aspects of depression within anxiety disorder and the presence of comorbid depression. There was some evidence that associations with measures of depression were comparatively stronger than those with symptom measures of schizophrenia (e.g. the PANSS).71,88–90 This may indicate that: (a) the generic HRQoL measures were only able to detect this component of HRQoL, or (b) depression is the only component of HRQoL within these groups of patients that is important within the context of HRQoL measurement. The issue is whether schizophrenia or anxiety has quality-of-life implications not adequately described by the five dimensions of the EQ-5D. It is also difficult to predict how HRQoL is affected by the manic or hypomanic states from the perspective of the individual with bipolar disorder. These non-depression consequences of these conditions are explored later in this report through qualitative interviews with patients.
The review has some limitations, resulting from the need to compromise on some elements of the review process because of the large scope of the project. Although the search for studies was reasonably comprehensive, it was limited to key databases and reference list checking of included studies, and study selection was undertaken by one reviewer. Ideally, further searching could be undertaken in trial registries, conference proceedings and by citation searching to make the search more comprehensive in terms of process. Study quality assessment has been pragmatic and focused on the elements that contribute to HRQoL analysis. The populations included in this review were heterogeneous in terms of the nature of their conditions, particularly for conditions such as schizophrenia and personality disorder where there are numerous subgroups, and not all studies provided detailed or uniform information on these characteristics. Such clinical variables clearly have an impact on HRQoL, and thus these factors will have had an impact upon the results of individual studies.
It is also difficult to draw any firm conclusions on the basis of this review, owing to the limited nature of much of the evidence in terms of the number of studies, the size of some of the studies and the heterogeneity within the conditions. There is very limited evidence of validity or responsiveness for the SF-12 and SF-6D, and though these are derivatives of the SF-36, their more limited item coverage (12 and 11, respectively) means that they may not perform as well. Therefore, further research needs to be directed towards demonstrating these properties for these instruments. To improve the evidence base, the next chapter will conduct further psychometric tests on existing data sets containing the EQ-5D and SF-6D. More evidence is also required on the validity and responsiveness of generic measures for older people with depression, as this group may be different from the younger adults typically found in published trials.
There is another general concern regarding whether or not it is reasonable to assess HRQoL when an individual is in a particular state, such as a manic or hypomanic state, as he or she may view the effect that the state had on his or her HRQoL very differently when not actually in that health state.
The findings are also limited by the measures used to establish validity and responsiveness. It is difficult to determine, in theory, how strongly correlated generic HRQoL measures should be with symptom and/or other clinical measures, and there is little guidance on what constitutes reasonable correlation. Indeed, it is impossible to prove validity of HRQoL instruments, as no ‘gold standard’ exists. Also, as discussed previously, where health dimensions and changes appear to have been missed by preference-based HRQoL measures, these may not actually be important to patients or valued by the general population. The former will be examined in the qualitative research reported in Chapters 5 and 6 .
The dominance of physical health in the EQ-5D may explain why it is not sensitive to differences in some mental health populations.48 Although this does not seem to have been a problem in depression, it may account for the more mixed results in other conditions. There are also concerns that the descriptive systems of the generic measures are too narrow in terms of the dimensions they cover. Some of the questions raised are addressed later in this report using the findings of qualitative interviews of people with mental health problems, who can provide some insight into the shortcomings of the content of the descriptions contained in these generic measures (see Chapters 5 and 6 ).
Research needs to be directed towards developing robust methods of demonstrating validity and responsiveness for generic HRQoL measures. For known-group validity, the evidence discriminating between healthy and non-healthy individuals could be considered fairly crude; large differences should be obviously apparent between such groups. Therefore, research is required to test instruments in terms of the ability to reflect known-group differences using indicators of condition severity that are important to patients. For convergent validity, this might mean consideration of which measures to choose for assessment of strength of correlation, both by considering the type of measure (e.g. symptom functioning or HRQoL) and the nature of the measure (subjective or objective). Studies need to be explicit at their outset about the hypothesised associations when investigating validity and responsiveness. In addition, wherever studies investigate the feasibility of administering generic HRQoL measures alongside construct validity and responsiveness using quantitative and qualitative methods within this disease area, this will allow a greater overall understanding of which measures are useful within schizophrenia.
Conclusion
Despite the shortcomings identified in the evidence base, this review gives an overall picture of the validity and responsiveness of the EQ-5D and SF measures across mental health populations. It has shown a mixed picture, with the generic measures appearing to perform acceptably well in depression and personality disorder, but less well in anxiety, schizophrenia and bipolar disorder. This has highlighted the need for further quantitative research, and the insights that can be gained from people regarding the content validity of the measures in terms of coverage of the dimensions of their life impacted upon by their mental health problems. The following chapters report both quantitative and qualitative studies that further investigate the validity of these measures.
- A systematic review of the validity and responsiveness of the EQ-5D, SF-36, SF-1...A systematic review of the validity and responsiveness of the EQ-5D, SF-36, SF-12 and SF-6D in mental health - A systematic review, psychometric analysis and qualitative assessment of generic preference-based measures of health in mental health populations and the estimation of mapping functions from widely used specific measures
- histone H3, partial [Fulvia mutica]histone H3, partial [Fulvia mutica]gi|924864290|gb|ALC76464.1|Protein
- Cormocephalus gervaisianus voucher IZ-130630 18S ribosomal RNA gene, partial seq...Cormocephalus gervaisianus voucher IZ-130630 18S ribosomal RNA gene, partial sequencegi|557900688|gb|KF676444.1|Nucleotide
Your browsing activity is empty.
Activity recording is turned off.
See more...