Data Synthesis Methods
The results of studies included for questions addressed by a SR of primary studies (questions 2 and 3) will be pooled using MA, if appropriate. The decision to pool all studies or subsets of studies will be made after review and exploration of heterogeneity. Clinical and methodological heterogeneity will be assessed in consultation with the clinical experts. This assessment will consider patient and study design factors that might be expected to affect test performance. This will include assessment of heterogeneity of composite reference standards used in the primary studies. If pooling is not appropriate, due to significant clinical heterogeneity, or methodological or statistical heterogeneity that cannot be addressed analytically, the findings will be synthesized narratively.
For each outcome of interest, analysis will be conducted for the overall study population and also for each subgroup listed in , as the data permit.
Meta-Analysis of Diagnostic Test Accuracy Studies
Based on the scoping review, studies that assessed diagnostic accuracy varied in terms of their applied reference standards, diagnostic pathways, settings, and patient populations. Thus, it is unclear whether pooling will be appropriate. If MA is deemed inappropriate, studies that report on diagnostic accuracy will be reviewed and results will be reported narratively.
Between-study heterogeneity within groups of studies being considered for pooling will be assessed, using graphical presentations including forest plots and plots of sensitivity and specificity in ROC-space, and calculation of between-study variance t2, summary and predictive CIs.53
Reasons for observed heterogeneity will be explored by subgroup or multivariate regression analyses, given the availability of covariate data. Individual comparisons will be summarized separately, and the consistency assessed. Additional sensitivity analyses dealing with study outliers, study size, study quality, study design, and other study- or design-related factors will also be considered to establish the robustness of findings. As some variation in patient population and associated prevalence of PE is anticipated, risk of verification bias as determined during critical appraisal will be assessed in sensitivity analysis. If substantial verification bias is detected, models will be adjusted using the method of de Groot et al.54
There are no established thresholds to determine the appropriateness of pooling of diagnostic testing studies,53 so the findings of the above will be appraised in terms of usefulness in answering our clinical and policy questions. Should we decide that MA is appropriate, we will use the model developed by Rutter and Gatsonis to generate hierarchical summary receiver operating characteristic (HSROC) curves,55 as well as pooled sensitivity, specificity, diagnostic odds ratio (DOR) values, and their 95% CIs. The area under the curve (AUC) will be used as a quantitative measure of the diagnostic accuracy of imaging studies for PE diagnosis, with values closer to 1 indicating better diagnostic performance, and a value closer to 0.5 indicating poor performance.56 Positive and negative likelihood ratios above 10 and below 0.1, respectively, will indicate low misdiagnosis rates.
In the event that we observe substantial variation in reference standards between studies put forth for pooling, particularly involving composites of multiple tests, we will use an extension of the Rutter and Gatsonis model that allows imperfect and composite reference standards.57,58 Unknown parameters will be estimated using a Bayesian approach with non-informative prior distributions, which allow the observed data to dominate the final estimates of sensitivity and specificity. Our models will assume independence of the combined tests acknowledging evidence that suggests the results may be affected if tests show dependency.59,60
Exploration of heterogeneity, plotting, and MA will be conducted using the statistical software R,61 with packages mada62 and HSROC.63
If pooling is not appropriate, a narrative synthesis will include the presentation of findings within summary tables, alongside study and clinical characteristics believed to contribute to heterogeneity, as determined during the exploration of the data. A narrative description will aim to synthesize observed test performance in the absence of an MA.
Meta-Analysis of Primary Clinical Utility and Safety Studies
The clinical utility of risk stratification strategies and imaging studies for PE diagnosis will be based on findings about the benefits (e.g., diagnostic efficiency, influence on choice of treatment and subsequent reduced exposure to imaging harms or harms of unnecessary intervention, and the indirect effect on clinical outcomes), and harms (e.g., failure rate).
Dichotomous outcomes (e.g., mortality) will be summarized, using relative risks and 95% CIs (or odds ratios and 95% CIs, if case-control studies are included). Continuous outcomes will be summarized, using differences in means and 95% CIs, if appropriate. If indicated (e.g., for quality-of-life scales), standard methods for converting between units of measurement will be used, and we will calculate standardized mean differences if possible. For outcomes reported as time-to-event and given available individual patient data in the form of a survival curve or table of events per patients at risk, analyses will be performed, using Kaplan–Meier curves and Cox regression. If studies report adjusted effects measures, the adjusted results in the primary analysis will be used, with the unadjusted result in exploratory analyses presented and comments on any differences between the two. If required measures of variance are not available, variances will be imputed if possible.64 Forest plots will be shown for all individual summary estimates. Findings will be reported as “not statistically significant” if the 95% CI of the overall estimate includes unity for dichotomous data or includes 0 for continuous data.
Between-study heterogeneity within groups of studies being considered for pooling will be assessed, using graphical presentations (including forest plots and plots of outcomes against covariates), and calculations of the I2 and Cochran’s Q test statistic. An I2 ≥ 75% will be interpreted to indicate considerable heterogeneity across studies, as suggested by the Cochrane Handbook for Systematic Reviews of Interventions. Cochran’s Q test statistic — based on chi-squared, where I2 = (Q-degrees of freedom)/Q — will be based on a level of significance of 10%. Clinical and methodological heterogeneity will be assessed in consultation with the clinical experts.
Reasons for observed heterogeneity will be explored by subgroup or multivariate regression analyses, given the availability of the data. Individual contrasts will be summarized separately, and the consistency assessed. Additional sensitivity analyses dealing with study outliers, study size, study quality, study design, and other study- or design-related factors will also be considered to establish the robustness of findings.
If pooling of outcome data is appropriate, summary measures and CIs for the reported outcomes will be reported. Random-effects models will be used. In the event that both randomized and non-randomized studies report on the same outcome, RCTs will be considered separately from non-randomized studies. The influence of study design will be explored in sensitivity analyses. MAs would be carried out using the Cochrane Review Manager software, version 5.3, or using R with package metafor.65
If pooling is not appropriate, a narrative synthesis will include the presentation of findings within summary tables, alongside study and clinical characteristics believed to contribute to heterogeneity, as determined during the exploration of the data. A narrative description will aim to synthesize the direction and size of any observed effects across studies in the absence of a MA and will include an assessment of the likelihood of clinical benefit or harm.
Publication bias will be assessed using visual funnel plots, and tested using Egger’s regression test and Begg’s rank correlation test.66
Network Meta-Analysis
The results of studies assessing the diagnostic accuracy of diagnostic imaging studies for PE diagnosis (questions 2 and 3) will be pooled, using network meta-analysis (NMA), if appropriate. Methods will be further developed in consultation with an expert if it is deemed possible to conduct the NMA. The scope of this analysis is presented in Appendix 6. All possible comparisons between diagnostic imaging studies (interventions) of interest will be evaluated.