U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Collaborating Centre for Mental Health (UK). Antenatal and Postnatal Mental Health: Clinical Management and Service Guidance: Updated edition. Leicester (UK): British Psychological Society; 2014 Dec. (NICE Clinical Guidelines, No. 192.)

  • April 2018: Footnotes and cautions have been added and amended by NICE to link to the MHRA's latest advice and resources on sodium valproate. Sodium valproate must not be used in pregnancy, and only used in girls and women when there is no alternative and a pregnancy prevention plan is in place. This is because of the risk of malformations and developmental abnormalities in the baby.

April 2018: Footnotes and cautions have been added and amended by NICE to link to the MHRA's latest advice and resources on sodium valproate. Sodium valproate must not be used in pregnancy, and only used in girls and women when there is no alternative and a pregnancy prevention plan is in place. This is because of the risk of malformations and developmental abnormalities in the baby.

Cover of Antenatal and Postnatal Mental Health

Antenatal and Postnatal Mental Health: Clinical Management and Service Guidance: Updated edition.

Show details

APPENDIX 12EVIDENCE SYNTHESIS METHODS

Synthesising the evidence from test accuracy studies

Meta-analysis

Review Manager was used to summarise test accuracy data from each study using forest plots and summary ROC plots. Where more than two studies reported appropriate data, a bivariate test accuracy meta-analysis was conducted using Meta-DiSc (Zamora et al., 2006) in order to obtain pooled estimates of sensitivity, specificity, and positive and negative likelihood ratios.

Sensitivity and specificity

The sensitivity of an instrument refers to the probability that it will produce a true positive result when given to a population with the target disorder (as compared to a reference or “gold standard”). An instrument that detects a low percentage of cases will not be very helpful in determining the numbers of service users who should receive further assessment or a known effective intervention, as many individuals who should receive the treatment will not do so. This would lead to an under-estimation of the prevalence of the disorder, contribute to inadequate care and make for poor planning and costing of the need for treatment. As the sensitivity of an instrument increases, the number of false negatives it detects will decrease.

The specificity of an instrument refers to the probability that a test will produce a true negative result when given to a population without the target disorder (as determined by a reference or “gold standard”). This is important so that people without the disorder are not offered further assessment or interventions they do not need. As the specificity of an instrument increases, the number of false positives will decrease.

To illustrate this: from a population in which the point prevalence rate of anxiety is 10% (that is, 10% of the population has anxiety at any one time), 1000 people are given a test that has 90% sensitivity and 85% specificity. It is known that 100 people in this population have anxiety, but the test detects only 90 (true positives), leaving 10 undetected (false negatives). It is also known that 900 people do not have anxiety, and the test correctly identifies 765 of these (true negatives), but classifies 135 incorrectly as having anxiety (false positives). The positive predictive value of the test (the number correctly identified as having anxiety as a proportion of positive tests) is 40% (90/90+135), and the negative predictive value (the number correctly identified as not having anxiety as a proportion of negative tests) is 98% (765/765 +10). Therefore, in this example, a positive test result is correct in only 40% of cases, while a negative result can be relied upon in 98% of cases.

The example above illustrates some of the main differences between positive predictive values and negative predictive values in comparison with sensitivity and specificity. For both positive and negative predictive values, prevalence explicitly forms part of their calculation (see Altman & Bland, 1994a). When the prevalence of a disorder is low in a population this is generally associated with a higher negative predictive value and a lower positive predictive value. Therefore although these statistics are concerned with issues probably more directly applicable to clinical practice (for example, the probability that a person with a positive test result actually has anxiety) they are largely dependent on the characteristics of the population sampled and cannot be universally applied (Altman & Bland, 1994a).

On the other hand, sensitivity and specificity do not necessarily depend on prevalence of anxiety (Altman & Bland, 1994b). For example, sensitivity is concerned with the performance of an identification instrument conditional on a person having anxiety. Therefore the higher false positives often associated with samples of low prevalence will not affect such estimates. The advantage of this approach is that sensitivity and specificity can be applied across populations (Altman & Bland, 1994b). However, the main disadvantage is that clinicians tend to find such estimates more difficult to interpret.

When describing the sensitivity and specificity of the different instruments, the GDG defined values above 0.9 as ‘excellent’, 0.8 to 0.9 as ‘good’, 0.5 to 0.7 as ‘moderate’, 0.3 to 0.4 as ‘low’, and less than 0.3 as ‘poor’.

Receiver operator characteristic curves

The qualities of a particular tool are summarised in a receiver operator characteristic (ROC) curve, which plots sensitivity (expressed as a per cent) against (100-specificity) (see Figure 1).

Figure 1. Receiver operator characteristic (ROC) curve.

Figure 1

Receiver operator characteristic (ROC) curve.

A test with perfect discrimination would have an ROC curve that passed through the top left hand corner; that is, it would have 100% specificity and pick up all true positives with no false positives. While this is never achieved in practice, the area under the curve (AUC) measures how close the tool gets to the theoretical ideal. A perfect test would have an AUC of 1, and a test with AUC above 0.5 is better than chance. As discussed above, because these measures are based on sensitivity and 100-specificity, theoretically these estimates are not affected by prevalence.

Negative and positive likelihood ratios

Positive (LR+) and negative (LR-) likelihood ratios are thought not to be dependent on prevalence. LR+ is calculated by sensitivity/(1-specificity) and LR-is (1-sensitivity)/specificity. A value of LR+ >5 and LR- <0.3 suggests the test is relatively accurate (Fischer et al., 2003).

Heterogeneity

Heterogeneity is usually much greater, and is to be expected, in meta-analyses of test accuracy studies compared with meta-analyses of RCTs (Macaskill et al., 2010). Therefore, a higher threshold for acceptable heterogeneity in such meta-analyses is required. However, when pooling studies resulted in I2 > 90%, meta-analyses were not conducted.

Synthesising the evidence for the effectiveness of interventions

Meta-analysis

Where appropriate, meta-analysis was used to synthesise evidence for the effectiveness of interventions using Review Manager Version 5.2. If necessary, reanalyses of the data or sub-analyses were used to answer review questions not addressed in the original studies or reviews.

Dichotomous outcomes were analysed as relative risks (RR; also called a risk ratio) or odds ratios (ORs) with the associated 95% CI (see Figure 2 for an example of a forest plot displaying dichotomous data). An RR is the ratio of the treatment event rate to the control event rate. An RR of 1 indicates no difference between treatment and control. In Figure 2, the overall RR of 0.73 indicates that the event rate (in this case, rate of non-remission) associated with intervention A is about three-quarters of that of the control intervention or, in other words, the reduction in the relative risk is 27%.

Figure 2. Example of a forest plot displaying dichotomous data.

Figure 2

Example of a forest plot displaying dichotomous data.

The CI shows a range of values within which it is possible to be 95% confident that the true effect will lie. If the effect size has a CI that does not cross the ‘line of no effect’, then the effect is commonly interpreted as being statistically significant.

Continuous outcomes were analysed using the mean difference (MD) or standardised mean difference (SMD) when different measures were used in different studies to estimate the same underlying effect (see Figure 3 for an example of a forest plot displaying continuous data). If reported by study authors, ITT data, using a valid method for imputation of missing data, were preferred over data only from people who completed the study.

Figure 3. Example of a forest plot displaying continuous data.

Figure 3

Example of a forest plot displaying continuous data.

Heterogeneity

To check for consistency of effects among studies, both the I2 statistic and the chi-squared test of heterogeneity, as well as a visual inspection of the forest plots were used. The I2 statistic describes the proportion of total variation in study estimates that is due to heterogeneity (Higgins & Thompson, 2002). For meta-analyses of comparative effectiveness studies, the I2 statistic was interpreted in the following way based on guidelines from the Cochrane Collaboration (Higgins & Green, 2011):

  • 0% to 40%: might not be important
  • 30% to 60%: may represent moderate heterogeneity
  • 50% to 90%: may represent substantial heterogeneity
  • 75% to 100%: considerable heterogeneity.

The Cochrane Collaboration advice suggests that overlapping categories are less misleading than simple thresholds since the importance of inconsistency depends on (1) the magnitude and direction of effects, and (2) the strength of evidence for heterogeneity (for example, p value from the chi-squared test, or a CI for I2).

Publication bias

Where there was sufficient data, funnel plots were used to explore the possibility of publication bias. Asymmetry of the plot would be taken to indicate possible publication bias and investigated further.

Where necessary, an estimate of the proportion of eligible data that were missing (because some studies did not include all relevant outcomes) was calculated for each analysis.

Synthesising the evidence for the harms of interventions

Meta-analysis

Where appropriate, meta-analysis was used to synthesise evidence for the harms of interventions using Review Manager Version 5.2. Dichotomous outcomes were analysed as odds ratios (ORs), the OR is the ratio of the odds of an event (the odds of the outcome in one group divided by the odds of the outcome in the other group). ORs can be more difficult to interpret than RRs, however, it is not meaningful to calculate the risk ratio for a case-control study as participants are selected on the basis of the outcome of interest (rather than on the basis of exposure status) and are not tracked over time. Unlike cohort studies which examine the risk of the incidence of an outcome in different groups, case-control studies examine the strength of an association between a risk factor and outcome. Using ORs allowed case-control and cohort study designs to be combined in meta-analysis. However, the difference between odds and risk is small when an event is rare, as is usually the case with respect to harms. Consistent with the RR an OR of 1 indicates no difference between treatment and control, or in this case between exposed and unexposed. Where possible the absolute risk difference, in this case the difference between the proportion of the exposed group with the harm and the proportion of the unexposed group with the harm, was calculated and considered. Mantel-Haenszel methods were used as standard. However, where there were zero counts in the same cell across studies and there were unequal sample sizes between exposed and unexposed arms, the Peto odds ratio method was used instead. This was because Mantel-Haenszel methods apply zero count corrections (add a fixed value of 0.5 to all cells of study results tables) in order to avoid computational problems, however, where the sizes of the study arms are unequal this correction will introduce a directional bias in the treatment effect. The Peto odds ratio method only encounters computation problems when there are no events occurring in all arms of all studies.

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (8.9M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...