Included under terms of UK Non-commercial Government License.
NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Brazier J, Connell J, Papaioannou D, et al. A systematic review, psychometric analysis and qualitative assessment of generic preference-based measures of health in mental health populations and the estimation of mapping functions from widely used specific measures. Southampton (UK): NIHR Journals Library; 2014 May. (Health Technology Assessment, No. 18.34.)
A systematic review, psychometric analysis and qualitative assessment of generic preference-based measures of health in mental health populations and the estimation of mapping functions from widely used specific measures.
Show detailsWe have presented the results of a detailed examination of the appropriateness of the EQ-5D and the SF-36 (and its derivatives) in people with mental health problems. We have used mixed methods to evaluate the key properties of validity and responsiveness. Studies employing quantitative methods, including systematic reviews of the literature and psychometric analyses of existing data sets, have been used to examine the construct validity (testing for known-group validity and convergent validity) and the responsiveness to change in mental health status of these measures. Qualitative evidence on content validity has been obtained from a systematic review of the literature and analyses of interviews with people with a range of mental health problems. This chapter begins by presenting a brief overview of the results of each study. It then discusses the main findings of this research and presents the implications for policy and recommendations for further research.
Summary of main studies
Psychometric evidence
Systematic reviews were undertaken of the psychometric literature in five mental health conditions: depression, anxiety, bipolar disorder, personality disorder and schizophrenia. Overall, the results from 91 studies identified by an exhaustive search of the literature were used to assess the construct validity and responsiveness of the EQ-5D and SF-36.
Generic measures were found to adequately reflect differences between groups or changes over time in populations diagnosed with depression. In populations with anxiety, the evidence was less clear as the differences between known groups may have been driven by comorbid depression rather than anxiety disorders themselves. For personality disorder, most studies supported the construct validity of the EQ-5D, but there was insufficient evidence on the SF-36. Within schizophrenia the evidence demonstrated known-group differences, but this was mostly limited to differences between individuals diagnosed with schizophrenia and the general population. Contradictory evidence was found in studies using clinical measures of symptom severity, where the generic measures failed to reflect differences or correlate with the clinical indicators. In bipolar disorder, generic measures reflected known differences in clinical measures of depression but not mania.
The amount of evidence found in the literature was limited in terms of size and coverage, so it was decided to expand the evidence base by undertaking further psychometric analysis of a number of existing data sets. This provided more evidence in depression, anxiety, personality disorder and schizophrenia and used more patient-based assessments in the validation. These analyses broadly supported the findings of the reviews, with the EQ-5D and the SF-6D being found to be valid in samples with mild and moderate depression and anxiety. For schizophrenia, the findings were less clear, with the EQ-5D and SF-6D not being responsive to change in comparison with the condition-specific measures.
The tests of construct validity and responsiveness tend to be rather crude, as they depend on the validity of the construct used to assess the criteria. Tests of known-group validity, for example, usually either rely on comparison with the general population or use clinical assessments of severity that may not reflect meaningful differences in quality of life from the perspective of the population with mental health problems. Such evidence cannot prove or disprove the validity of a measure; at best it can provide support for appropriateness of the measure. The findings from our review and the further analyses seem to highlight potential concerns in anxiety, bipolar disorder and schizophrenia (particularly with regard to responsiveness to treatments). It is important to judge these findings alongside qualitative evidence on the content validity of the measure.
The psychometric evidence presented an interesting and mixed picture in terms of the performance of the EQ-5D and SF-36 in populations with mental health problems. However, this quantitative evidence is not able to offer an explanation for these mixed results. For this reason, qualitative research was undertaken on the impact of mental health problems on quality of life.
Qualitative evidence
The review of previous research found 13 studies that had interviewed people with mental health problems about the way their condition impacts on their lives. It was difficult to be sure that all informative studies had been identified given differences in terminology. The findings from those studies located were synthesised using the ‘framework’ approach.198 We identified six domains from this review: well-being and ill-being; control, autonomy and choice; self-perception; belonging; activity; and hope and hopelessness.
The complementary primary research involved semistructured interviews of people with mental health problems. These interviews expanded the scope of diagnosis and severity of illness covered by the studies in the review to include people with affective and anxiety disorders referred via NHS services, and those with severe and enduring mental health problems and mild to moderate depression/anxiety. We found that our interview data corresponded well with the themes from the review, and any differences tended to be within the themes and related to the degree of impact of the different levels of severity, chronicity and diagnosis. The only change to the themes was that physical health was found to be more important among the interviewees than suggested from the review, so this was added as a seventh theme. The review and interview data found that each theme had a spectrum of positive and negative components.
Mapping between mental health and generic scales
When generic measures are not being used, one solution is to estimate mapping functions where condition-specific measures are regressed onto generic measures to produce health-state values. We estimated mapping functions between widely used condition-specific measures (HADS, CORE-OM, PHQ-9, GAD-7, and GHQ-12) for depression and anxiety and the EQ-5D and SF-6D, using four data sets available to the investigators. Mapping functions were not estimated for other conditions as the psychometric evidence suggested that the generic measures were not appropriate.
The statistical models mapping the HADS onto the EQ-5D had poor predictive performance. The mapping functions for the SF-6D fitted the data sets better, particularly the CORE-OM onto the SF-6D, but they still suffered from some degree of over- and underprediction towards the ends of the ranges. Given the psychometric evidence that the generic measures performed satisfactorily in this group, this result is perhaps surprising. These analyses were limited by the small size of some of the data sets and the condition-specific measures available, with the EQ-5D only being mapped onto the HADS. Furthermore, most of the mental health measures only focus on mental health, and often on very narrow aspects of symptoms of mental health, which may not translate into a general health score. We conclude that these mapping functions should not be used, but that the original generic measures should be used in trials and other clinical applications in order to obtain accurate estimates.
Implications of findings
The content of the EQ-5D and the SF-36 was compared with the seven themes identified in the qualitative research in order to provide an assessment of content validity. In summary, the EQ-5D would seem to cover little of the content of these seven themes owing to its focus on physical health. Only physical health seems to be covered, along with activities and functioning, which is included in a rather crude way through usual activities, and ill-being in terms of depression and anxiety. The SF-36 covers more by having a multi-item dimension on mental health and a vitality dimension that covers more aspects of well-being and ill-being, and some aspects of social functioning. However, like the EQ-5D it fails to include the response of participants and others to activities and social well-being; self-perception and control; autonomy and choice; and hope and hopelessness. It was decided to extend this analysis to a new measure, the ICECAP-A, as this was derived from interview data, although it was obtained from the general public and not people with mental health problems. The ICECAP-A has greater overlap with the themes, but it is limited to the positive manifestations of these themes, and this may be because people with mental health problems were not targeted in the interview sample used to develop the measure. It should also be emphasised that there is no psychometric evidence on the performance of ICECAP-A in a mental health population, and it may miss important changes, particularly in populations with more severe problems.
Care must be taken in drawing any firm conclusions about the generic preference-based measures of health reviewed in this report due to the following limitations:
- The quantitative psychometric evidence is limited to a comparatively small number of studies that in many cases have small sample sizes. Furthermore, the tests were reliant on measures of symptoms and clinical diagnosis which, although widely accepted in mental health research and validated in the populations concerned, are not measures of the construct of HRQoL.
- The population of people with mental health problems in the qualitative research was not comprehensive despite an extensive review of the literature and an attempt to recruit across the spectrum of conditions. An important problem is the risk of selection bias by professionals in assessing the suitability of service users for interview.
- There has not been time to undertake a full theoretical analysis of the qualitative data. Further work is required to fully understand the themes and consider them in relation to existing models of quality of life and related concepts.
A more general problem arises from the lack of an agreed definition for quality of life or mental health-related quality of life, and hence there is no gold standard for examining the validity of these or any other measures. This is a major handicap for the field. However, improving the mental health-related quality of life of people with mental health problems is a key objective, so it is incumbent on researchers to develop measures and assess their performance as best they can, and to use accepted tests such as known-group validity, convergent validity and content validity to provide support or otherwise for the measures they use. Policy-makers also need guidance on which measures to use for assessing the cost-effectiveness of interventions and monitoring the outcomes of services.
Despite these concerns, there are conclusions to be drawn about the implications of the findings.
- The combined evidence from the psychometric research and the qualitative research suggests that the generic measures of health do not capture many of the concerns of importance to people with mental health problems. While the ICECAP-A is better in covering many of the themes, it does not consider the negative end of the spectrum of these themes, more relevant to people with mental health problems.
- For depression and personality disorders, the generic measures of health would seem to be adequate in picking up differences between known groups, but there is less support for their responsiveness to changes over time. These also exclude key themes.
- For anxiety, bipolar disorder and schizophrenia, the generic measures fail to capture many of the problems that arose in the interviews and this is reflected in the psychometric evidence on validity and responsiveness.
Further research
This has been an extensive and rigorous testing of generic measures of health in mental health population through the application of quantitative and qualitative techniques. However, there are important gaps remaining in the literature and limitations in the research reported here that need to be addressed.
Further research is required to improve the robustness of the findings as follows:
- The quantitative analysis needs to be extended to include further data sets that can allow the further testing of the construct validity and responsiveness of the generic measures. The relevance of the tests would be improved by administering other measures of self-perceived HRQoL rather than relying on clinical indicators to examine construct validity and responsiveness.
- This report has examined tests drawn mainly from classical psychometric theory, but further insight could be gained from the application of modern methods using latent trait models, such as Rasch and item response theory, to examine how well the items used in the generic measures reflect the dimension in general. This would require the collection of generic measures alongside more specific measures of the dimension of interest, such as depression or anxiety, before pooling the items and running these models to examine item performance in terms of characteristics such as model fit, differential item functioning across groups, item severity and ordering of responses.246
- Further interviews need to be carried out in those conditions not well represented, such as obsessive compulsive disorder and alcohol and drug misuse. It is also important to recruit some respondents through different channels in order to avoid the risk of a selection bias caused by relying on professionals. This is difficult to achieve from an ethical viewpoint, but there may be scope through patient groups.
Research implications of the findings reported here include the following:
- The analysis of content validity should be extended to mental health-specific outcome measures. This is important for guiding the choice of mental health measures for use in research to measure mental health-related quality of life.
- Consideration should be given to the development of a preference-based measure for calculating QALYs that is more appropriate to mental health. This could be an enhanced version of an existing generic measure. A recent development has been the addition of extra dimensions or ‘bolt-ons’ in order to make the EQ-5D more relevant.247 The problem for the EQ-5D and SF-6D is that adding more dimensions makes them difficult to value using one of the preference elicitation techniques (such as TTO). A more fruitful avenue might be to develop preference-based measures more specific to mental health either from existing measures51,248–252 or by the development of new measures. Developing from an existing measure has the advantage of building on past work and can be applied to existing data sets containing the measure. A recent example of this has been the development of the CORE-6D preference-based measure from the mental health-specific CORE-OM.52,253 A limitation in this case is that it is concerned with common mental health problems such as depression and anxiety rather than more complex conditions such as schizophrenia and personality disorder. The other option would be to develop a new preference-based measure for use in mental health populations. This could build on the findings from the review of qualitative evidence and the interviews presented in this report and elsewhere, but it would be a major research endeavour.
Conclusion
The results of this mixed-method study are that the generic EQ-5D and SF-36 seem to achieve an adequate level of performance against some psychometric tests and may be acceptable for use in depression, and to some extent in anxiety and personality disorder. However, there are concerns regarding the way depression and anxiety are combined into a single dimension in the EQ-5D and SF-36. Results from the psychometric analyses in schizophrenia and bipolar disorder have been more mixed. These measures provide only a limited coverage of the themes found in the qualitative research carried out with people with mental health problems, and so may present a misleading impression of the impact of these problems on the lives of those affected. This has implications for the validity of economic evaluation in mental health. Recommendations for further work include the development of a new preference-based measure in mental health based on the themes identified in thus research and existing measures.
- Discussion and conclusion - A systematic review, psychometric analysis and quali...Discussion and conclusion - A systematic review, psychometric analysis and qualitative assessment of generic preference-based measures of health in mental health populations and the estimation of mapping functions from widely used specific measures
- PREDICTED: Homo sapiens N-acetylated alpha-linked acidic dipeptidase like 2 (NAA...PREDICTED: Homo sapiens N-acetylated alpha-linked acidic dipeptidase like 2 (NAALADL2), transcript variant X4, mRNAgi|2217343047|ref|XM_017006075.3|Nucleotide
- cilia- and flagella-associated protein 221 isoform X4 [Homo sapiens]cilia- and flagella-associated protein 221 isoform X4 [Homo sapiens]gi|1034612311|ref|XP_016859047.1|Protein
Your browsing activity is empty.
Activity recording is turned off.
See more...