U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Ensor J, Riley RD, Jowett S, et al.; on behalf of the PIT-STOP collaborative group. Prediction of risk of recurrence of venous thromboembolism following treatment for a first unprovoked venous thromboembolism: systematic review, prognostic model and clinical decision rule, and economic evaluation. Southampton (UK): NIHR Journals Library; 2016 Feb. (Health Technology Assessment, No. 20.12.)

Cover of Prediction of risk of recurrence of venous thromboembolism following treatment for a first unprovoked venous thromboembolism: systematic review, prognostic model and clinical decision rule, and economic evaluation

Prediction of risk of recurrence of venous thromboembolism following treatment for a first unprovoked venous thromboembolism: systematic review, prognostic model and clinical decision rule, and economic evaluation.

Show details

Chapter 3Systematic review of existing prognostic models for the recurrence of venous thromboembolism following treatment for a first unprovoked venous thromboembolism

Introduction

The aim of this chapter was to undertake a systematic review of studies developing or validating a prognostic model for individual recurrence risk prediction following cessation of therapy for a first unprovoked VTE. The review identified current prognostic models and critically appraised the development and validation of these models. This systematic review highlights the current work in this area and informed the development of a new prognostic model (see Chapter 4) within the context of the existing models.

It is common in areas where prognostic models are useful, for several models to be developed, due to the myriad ways of developing a model and differences between the underlying populations used to develop these models. As such, it can be difficult for practitioners to identify which model is the most appropriate to their problem and to understand the shortcomings of the model.12 A definitive review and critique of the existing models could help clinicians and other practitioners to better understand the strengths and weaknesses of each model, allowing informed decisions to be made on which (if any) models to use in practice.

The systematic review could also help to identify predictors for which evidence towards their prognostic effect was strong or weak, highlighting predictors for consideration within the model development (see Chapter 4). Further issues found within the development of existing models could also be considered within the development of a prognostic model for this project (e.g. related to model development methodology). Finally, as detailed in the research aims (see Chapter 2), the performance of existing models found in the review could be compared with the newly developed prognostic model (see Chapter 4), where the included predictors are similar.

A protocol for this systematic review was submitted to the National Institute for Health Research, outlining the methods which follow, published in BioMed Central Systematic Reviews journal13 and was registered on PROSPERO (CRD42013003494).

Methods

Aims of the review

The primary aim for the review was to identify studies that had developed or validated a prognostic model utilising multiple (at least two) predictors to predict the risk of recurrent VTE or adverse outcome (mortality or bleeding) following cessation of therapy for a first unprovoked VTE. Then to critique and summarise the development and validation (internal and external performance) of the identified prognostic models, with a view to identifying issues which may be considered in the development of a new prognostic model (see Chapter 4). For all models a summary of their context was described qualitatively including the predictors modelled, the development population and the setting of the model.

Search strategy

The following bibliographic databases were searched: The Cochrane Library (Wiley) (including the Cochrane Database of Systematic Reviews, Database of Abstracts of Reviews of Effects, Health Technology Assessment databases and the Cochrane Central Register of Controlled Trials), MEDLINE (Ovid) 1950–2014, MEDLINE In-Process & Other Non-Indexed Citations (Ovid) to date and EMBASE (Ovid) 1980–2014. Searches used index terms and text words that encompassed the patient group supplemented by terms relating to recurrence or adverse outcome and prognostic factors (see sample MEDLINE search in Appendix 1).

Publicly available trials registers were also searched, such as ClinicalTrials.gov, UK Clinical Research Network Study Portfolio Database, World Health Organization International Clinical Trials Registry Platform and the metaRegister of Current Controlled Trials. Reference lists of all included papers were checked and subject experts were contacted. No restrictions on publication language were applied.

In addition, abstracts from the following national and international conferences from 2005 onwards were hand-searched in order to capture studies that were not yet fully published:

  • haematology conferences: International Society of Thrombosis and Haemostasis, American Society of Hematology, European Hematology Association, British Society of Haematology
  • cardiology conferences: British Cardiac Society, American College of Cardiology, European Society of Cardiology, American Heart Association, ACCP.

Selection criteria

Inclusion criteria

Study design

Studies of any design [e.g. cohorts, randomised controlled trials (RCTs)] or systematic reviews that developed, compared or validated a prognostic model (or clinical prediction rule based on a model) utilising multiple (at least two) predictors to predict the risk of recurrent VTE or adverse outcome (mortality or bleeding) following cessation of therapy for a first unprovoked VTE.

Patient group

Patients aged ≥ 18 years with a first unprovoked VTE where the patient has received at least 3 months treatment with an OAC therapy. Studies with mixed populations (including those outside the remit), were included provided that appropriate data for the defined group of patients were extractable.

Setting

Studies in any setting were included.

Potential prognostic models

Studies must have reported a prognostic model utilising multiple prognostic factors to predict the risk of recurrent VTE or adverse outcome following cessation of therapy for a first unprovoked VTE. A prognostic model was defined as a combination of at least two predictors within a statistical model, used to predict an individual’s risk of outcome (e.g. VTE recurrence).

Study selection

Study selection followed a two-step process. Titles (and abstracts where available) were initially screened by two reviewers independently, using predefined screening criteria (see Appendix 2). These were broadly based on whether or not studies (1) included patients with a first unprovoked VTE, who received a minimum of 3 months OAC therapy, and (2) developed or examined prognostic models in relation to individual prediction of VTE recurrence or other clinical outcomes. Full texts of any potentially relevant articles were then obtained and two reviewers independently applied the full inclusion criteria (see Appendix 2). Any discrepancies between reviewers were resolved by discussion or by referral to a third reviewer. Portions of non-English-language studies were translated where necessary to facilitate study selection and subsequent data extraction. The study selection process was documented using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram.14 Any relevant systematic reviews identified were screened for further primary studies. Reference management software was used to record reviewer decisions, including reasons for exclusion.

Data extraction

Data extraction was conducted independently by two reviewers using an in-depth piloted data extraction form. Disagreements were resolved through discussion or referral to a third reviewer.

Data extraction included the following elements:

  • study characteristics (e.g. sample size, country, year)
  • study design characteristics (e.g. RCT, prospective, length of follow-up)
  • patient characteristics (e.g. summaries of age, sex, family history, treatment details in the sample)
  • candidate prognostic factors considered (e.g. any thresholds used for continuous predictors, methods of measurement, timing of measurement post cessation of therapy)
  • outcome measures (e.g. recurrence of VTE, mortality, bleeding)
  • statistical methods employed and how prognostic factors included in the analysis were handled (e.g. continuous, dichotomised)
  • prognostic models {e.g. the final model (its specification and included factors), how it was developed and an individual risk probability was produced, and any internal and external validation performance statistics for discrimination [such as the c-statistic (area under the curve; AUC)] and for calibration [such as the E/O ratio (expected/observed events)], together with their confidence intervals (CIs)}.

Assessment of study quality

The quality (risk of bias) of any studies developing or evaluating a prognostic model was assessed by piloting Prediction study Risk Of Bias Assessment Tool (PROBAST), a tool for assessing risk of bias and applicability of prognostic model studies that was nearing completion and ready for piloting when this review was undertaken.15

Particular elements were considered in the following domains:

  • Patient selection (such as whether or not it was a prospective design, what study design was used, if appropriate inclusions and exclusions were used, and whether or not patients had similar disease presentation, or if this was accounted for in analyses).
  • Outcomes (such as whether or not the outcome definition was pre specified, predictors were excluded from the definition, the same definition and assessment was used for all patients and whether or not the outcome was determined blind to predictor information).
  • Predictors (such as whether or not the same predictor definitions were used for all patients, predictors were measured blinded to outcome data, all predictor information was available at the time the model was intended for use and whether or not non-linear associations for continuous predictors were considered and categorisation was not data driven).
  • Sample size (such as whether or not there was a pre-specified sample size consideration accounting for numbers of events and multiple comparisons in selection of factors, whether or not all enrolled patients were included in analyses and how many data were available for external validation).
  • Missing data (such as adequate reporting on completeness of data and whether or not imputation was investigated).
  • Statistical analysis (such as handling of continuous variables, selection of possible predictors irrespective of univariable analyses, whether or not overfitting and optimism was accounted for using bootstrapping or shrinkage and whether or not weights assigned to predictors related to regression coefficients).
  • Internal and external model validation (such as whether or not model validations are reported and how these were carried out).

Evidence synthesis

Details on the methodology of model development and findings from studies reporting a prognostic model were summarised narratively, and in the context of any potential risk of bias identified. Key components were the predictors included in the final model; how the included predictors were coded; what the specification of the model was, and how it produces an individual outcome probability or risk score; the reported predictive accuracy of the model; and whether or not the model was validated internally and externally and, if so, how.

The consistency of development methods used and main findings was examined to identify if studies at higher risk of bias produced different results and conclusions to those considered to be at low risk of bias.

If multiple studies were found that validated the same prognostic model, then it was planned to synthesise calibration statistics (such as E/O events) and discriminatory statistics (such as the c-statistic, AUC), using the random-effects meta-analysis of DerSimonian and Laird,16,17 to summarise the model’s average performance across different settings and its predicted performance in a future setting.18,19

Amendments to protocol

The original protocol for this review stated that a systematic review of all potential prognostic factors would be undertaken, which would include any individual factor shown to be associated with risk of recurrence or adverse outcome. Due to the large wealth of information on potential prognostic factors and high levels of heterogeneity in the evidence, a review of all prognostic factors would require significant resource for limited conclusions. Due to the large heterogeneity between studies it was thought to be unwise to attempt to synthesise such evidence as no firm conclusions could be drawn from such a review.

It is widely accepted within the prognostic research community that individual risk prediction requires the use of a combination of factors.18 Therefore summarising the prognostic ability of individual factors in isolation would also not meet the primary aim of this project. As a result, a protocol amendment was submitted to undertake a systematic review of only prognostic models, including any study utilising a combination of multiple factors in a model to predict individual risk of recurrence or adverse outcome after cessation of therapy in patients with a first unprovoked VTE. Thus studies only considering the prognostic ability of a candidate prognostic factor were no longer included or synthesised, unless they also reported the development and/or validation of a multivariable prognostic model.

These amendments were discussed with, and agreed by, the National Institute for Health Research and a revised protocol submitted.

Results

Quantity of research available

Searching of bibliographic databases as described in Search strategy resulted in 13,516 records identified after automatic removal of 1879 duplicate records. A further 2747 records were manually removed as duplicates, leaving 10,769 records to be screened for inclusion. Screening of titles and abstracts identified that 10,485 were not relevant to the review question. Full-text articles were sought for eligibility assessment, three articles were unobtainable,2022 and while 16 non-English-language articles were translated, a further three articles could not be translated into English despite extensive efforts to obtain translations2325 (see Appendix 3). This resulted in a total of 278 full-text articles sourced for assessment. Of the 278 full-text articles assessed for inclusion, 258 articles were excluded (see Appendix 3), with 91 articles excluded as discussion or review articles not developing or updating a prognostic model, 150 articles were excluded based on issues related to the model (e.g. not developed for individual prediction, adjusted a single predictor, etc.), three articles were excluded based on the study population, and 14 were excluded based on both study population and issues around the model used (Figure 1).

FIGURE 1. Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram.

FIGURE 1

Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram.

As a result of the screening process a total of 20 articles met the inclusion criteria, including seven ongoing studies8,2635 (see Ongoing studies), eight conference abstracts relating to studies which appeared to meet the inclusion criteria,3040 one record of this Health Technology Assessment report29 and four full-text peer-reviewed articles,2,9,41,42 one of which appeared to be an update to one of the other three articles.42 The authors of the 15 included conference abstracts and ongoing studies were contacted to seek additional information, such as a subsequent publication. Based on author responses, 13 of the 15 articles were associated with the four full-text articles included. The authors of the remaining two articles did not respond to further enquiry and so no further publications could be found to supplement the available abstracts.26,37 One, a study by Raskob et al.,37 is based on data from the EINSTEIN extension study43 and aimed to identify a subgroup of patients at high and low risk of recurrent VTE. The analyses investigated patient characteristics including age (dichotomised), sex, body mass index (BMI) (dichotomised), idiopathic presentation, site of index event, malignancy, creatinine clearance, known thrombophilia, immobilisation, and comorbid cardiac and pulmonary disorders; it was unclear from the abstract if analyses were univariable or multivariable. Further information regarding the study was unavailable from the included abstract, therefore it was unclear whether or not a prognostic model was developed and if individual recurrence risk could be predicted from such a model. The second abstract relates to the ongoing VISTA study,26 which is further discussed in Ongoing studies.

The four full-text articles included in the review (and 13 associated abstracts) will be discussed in further detail throughout the remaining sections of this chapter: Main study and patient characteristics, Description, critique and main findings of model studies, Comparison of included studies quality (which compare and contrast the three main articles and discuss common strengths and weaknesses), and Discussion (which provides an overall discussion on the issues found which may help to inform the development of a new prognostic model). The ongoing studies identified will be discussed in Ongoing studies. Three of the articles developed three independent prognostic models or rules (whereas the fourth is an update to one of the models), outlined briefly below.

HER DOO 2: Rodger et al.9

Rodger et al.9 used a conditional logistic regression model to develop a clinical decision rule which suggested that a female patient with less than two predictors (post-thrombotic signs, D-dimer level ≥ 250 µg/l, BMI ≥ 30 kg/m2 or aged ≥ 65 years) could potentially safely discontinue OAC therapy after 5–7 months of initial OAC therapy for an unprovoked VTE. A low risk (< 3% annual recurrence risk) group of males could not be identified in the study and therefore Rodger et al.9 recommended that all male patients continue OAC therapy.9

Vienna prediction model: Eichinger et al.2,42

Eichinger et al.2,42 used a Cox proportional hazards model to develop a prognostic model including sex, site of index event and D-dimer as predictors. A nomogram based on the prognostic model was derived to allow easy implementation of the model and can be used to calculate patient’s cumulative recurrence rate at 12 and 60 months from cessation of therapy, with estimated 95% CIs.2 An extension to the Vienna model has been proposed which aimed to utilise D-dimer measurements over time to allow prediction using the Vienna model from time points at 3, 9 and 15 months after cessation of therapy.42

DASH score: Tosetto et al.41

Tosetto et al.41 used a Cox proportional hazards model to develop a clinical prediction guide including predictors for abnormal D-dimer levels (+2 score), age ≤ 50 years (+1 score), male sex (+1 score) and hormone use (–2 score). This proposed score can be used to calculate patients’ cumulative recurrence rate at 1, 2 and 5 years from cessation of therapy, with estimated 95% CIs. Tosetto et al.41 suggest that a combined D-dimer, Age, Sex, Hormone therapy (DASH) score of ≤ 1 would indicate an annual recurrence risk < 5% and therefore indicate that a patient could potentially stop OAC therapy, conversely a DASH score of ≥ 2 would indicate annual recurrence risk > 5% and thus suggest patients should potentially continue OAC therapy.41

Throughout this chapter these articles will be referred to using their author’s name, while the corresponding models will be referred to using their given name as above.

Main study and patient characteristics

In this section study and patient characteristics will be compared across the three included articles where possible. Due to the large heterogeneity in the methods used for model development and the presentation of the final models, a detailed assessment and discussion of the methods used and proposed models will follow in a critical appraisal of the included articles (see Description, critique and main findings of model studies).

The study characteristics of included articles are described in Table 1. Two of the articles, Rodger et al.9 and Eichinger et al.,2,42 conducted prospective cohort studies to develop their prognostic models. Tosetto et al.41 collected IPD from seven prospective cohort studies with which to build their prognostic model. All studies included recurrent VTE as their outcome of interest, and each included patients from several different centres across a range of countries creating the opportunity for heterogeneity in overall results and making it difficult to define the population in which the developed models can be applied. Table 2 shows the variable inclusion and exclusion criteria used across the three studies, with different criteria applied for treatment duration; for example, Rodger et al. including those treated for 5–7 months with OAC therapy, and Eichinger et al. including any patient treated for greater than 3 months OAC therapy.

TABLE 1

TABLE 1

Study characteristics for the included articles

TABLE 2

TABLE 2

Study inclusion/exclusion criteria and definitions of unprovoked for the included articles

Importantly, there were some differences across the studies in terms of the definition of unprovoked VTE used. Although surgery, trauma, immobility, pregnancy and cancer all appear as provoking factors across the studies, only the Vienna study2,42 excludes hormone therapy as a provoking factor. Hormone therapy is considered to be a weak risk factor for recurrent VTE and as such is often included in the definition of unprovoked VTE. However, hormone therapy is a transient risk factor and unprovoked VTE is defined as an absence of transient risk factors, making its inclusion concerning. The patient population may differ in terms of recurrence risk due to including patients with a lower risk of recurrence than the unprovoked VTE population, which will impact on the estimated predictor effects in any developed model (see Description, critique and main findings of model studies).

The study populations differed in various ways across the three included studies, and differing patient characteristics and predictors were recorded. Those characteristics which were commonly reported across the studies are presented in Table 3, including patient age, sex, site of index VTE, BMI, D-dimer level, presence of factor V Leiden, duration of OAC therapy, duration of follow-up and definition of unprovoked VTE.

TABLE 3

TABLE 3

Commonly reported patient population characteristics of included articles

There were differences in the presentation of results across the studies, with the Vienna2,42 and DASH41 studies presenting the median of continuous characteristics and frequency of categorical characteristics, whereas the HER DOO 2 [Hyperpigmentation, Edema, Redness/D-dimer, Obese (body mass index > 30 km/m2), Old (age > 65 years)/2 or more factors should indicate for patients to continue therapy]9 study presented means for continuous characteristics (see Table 3). Both the HER DOO 29 and DASH41 studies split characteristic data by event status (recurrent VTE/no recurrence), whereas the Vienna2,42 study presented overall population characteristics, making comparisons across the studies difficult.

There were distinct differences in the number of observed events across the included studies, with 176, 239 and 91 events for the Vienna,2,42 DASH41 and HER DOO 29 studies respectively (see Table 3). The small number of events seen in the Vienna2,42 and HER DOO 29 studies may lead to insufficient statistical power, which will be discussed within the critical appraisal (see Description, critique and main findings of model studies). The DASH41 study combined IPD from seven source studies providing greater sample size and statistical power; the same IPD database was utilised in the development of a prognostic model in Chapter 4.11

Other characteristics, such as patient age, proportion of males, BMI and duration of OAC therapy, appeared to be somewhat consistent in terms of reporting across the studies (see Table 3). D-dimer levels were consistent across the Vienna2,42 and HER DOO 29 models, where both measured the median D-dimer level. Comparison to the DASH41 study was not possible as D-dimer was dichotomised (normal vs. abnormal) within the study. The proportion of patients with factor V Leiden appears to be greater in the Vienna2,42 study than the comparable HER DOO 29 study, though this may be explained by greater chance discrepancies in the prevalence of factor V Leiden likely to occur in smaller studies.

Description, critique and main findings of model studies

Quality assessment of the three included articles was undertaken using an early version of the PROBAST for assessing risk of bias and applicability of prognostic model studies.15 The results of this assessment formed the structure and content of this critical appraisal covering areas including patient selection, outcomes, predictors, sample size and analysis methods of the three included studies (see Assessment of study quality).

Key similarities and differences between the models are then summarised, focussing on the implications for the robustness of the respective findings. The findings from the PROBAST assessment, in terms of areas of potential risk of bias in the included studies, are summarised in Table 4, and presented validation statistics for the studies are summarised in Table 5.

TABLE 4

TABLE 4

Quality issues for the included articles

TABLE 5

TABLE 5

Model performance statistics for internal validation of proposed models presented for the included articles

HER DOO 29

Rodger et al.9 aimed to develop and internally validated a clinical decision rule to identify a subgroup of patients at low risk of recurrent VTE, in whom OAC therapy could be stopped after 6 months.

Patient selection

Rodger et al.9 used a prospective cohort study with consecutive unselected patients from 12 tertiary care centres in four countries. The prospective cohort design ensures that predictor information can be collected blinded of patient outcome. Inclusion criteria were patients treated with low-molecular-weight heparin for at least 5 days, OAC therapy for between 5 and 7 months with an INR between 2 and 3, and with no recurrence during therapy. Patients were excluded if they would not provide consent, were aged < 17 years, had already ceased therapy, required therapy for another reason, were inaccessible for follow-up, were being treated for unprovoked recurrence or known thrombophilia. Thrombophilia testing was not performed as part of the study, but any patients with known thrombophilia prior to the start point were excluded.

The definition of unprovoked VTE used by Rodger et al.9 was based on an absence of the following provoking factors:

  • leg fracture or plaster cast
  • immobility for > 3 days
  • surgery using general anaesthetic (in the 3 months prior to the index event)
  • diagnosis of cancer (in the past 5 years).

This definition of unprovoked VTE therefore includes women who have currently, or previously, used OCs or HRT. Hormone therapy should be considered as a provoking factor for recurrent VTE; though some argue that because the effect of hormone therapy is weak it may be included within the definition.41,44 Strictly, the definition of unprovoked VTE refers to a VTE in the absence of any transient risk factors, hence why hereditary thrombophilia can be considered unprovoked VTE.45 As hormone therapy is a transient risk factor patients with a history of hormone therapy could be considered provoked. As a result there may be patients included in the model development who are at potentially lower risk than other unprovoked patients. This could lead to biased estimation of effect sizes within analyses, particularly for the effect of sex, which could lead to poor validation in external populations which may not include patients with hormone-related index events.

Outcomes

The primary outcome of the study was recurrent VTE, with associated deaths also recorded. Suspected DVT was confirmed by compression ultrasound, whereas suspected PE was confirmed by high-probability ventilation/perfusion scanning and/or spiral computed tomography. All events were adjudicated by independent physicians who were blinded to predictor information. As such, outcomes were pre-specified with the same definition and assessment used for all patients reducing the risk of detection bias, where there are differences in the determination of outcomes. The outcome definition excluded any candidate predictors again reducing the risk of detection bias and overall providing a low risk of bias with regard to outcome assessment.

Predictors

Rodger et al.9 report that information was collected on 69 potential predictors based on evidence from a pre-specified systematic review. Summary information was provided for 21 candidate predictors (with categories equating to 39 candidate predictors) including:

  • sex
  • ethnicity
  • age (years)
  • weight (kg)
  • height (cm)
  • BMI (kg/m2)
  • abnormal baseline imaging
  • abnormal baseline compression ultrasound
  • D-dimer (µg/l)
  • homocystine (mmol/l)
  • haemoglobin (g/l)
  • heterozygous for prothrombin gene mutation
  • factor VIII (U/ml)
  • factor V Leiden
  • ventilation/perfusion scan or pulmonary vascular obstruction result < 95%
  • post-thrombotic signs
  • history of chronic obstructive pulmonary disease or emphysema
  • family history of VTE
  • previous secondary VTE
  • use of OC in year before event
  • HRT in year before the event.

All predictors were measured before outcomes occurred, with laboratory predictors measured from samples taken while still on OAC therapy, and predictor definitions were consistent for all patients. Therefore the risk of selection bias due to differences in the characteristics of patients, and also the risk of reporting bias through differences in the reporting of predictors based on outcomes, could be considered low.

The use of a systematic review to identify potential predictors, the consistent updating of the systematic review and the assessment of all identified predictors provides confidence that there is a low risk of selection bias where patients are selected based on their characteristics, and attrition bias based on exclusions from analyses.

Rodger et al.9 categorised all continuous predictors that had a p-value < 0.2 at univariable analysis, by dichotomising at various thresholds and identifying ‘optimal’ thresholds as those with the highest chi-squared value. The process of categorisation was therefore completely data driven which often leads to reporting bias, where the most significant results are presented.46 Rodger et al.9 indicate that dichotomised predictors were used to enhance the applicability and acceptance of the proposed clinical decision rule. Although it is a valid reasoning, with researchers and clinicians looking for parsimony in any decision rule,12 the dichotomisation of continuous predictors leads to a loss of information. Dichotomisation splits patient’s risks into two groups treating patients on either side of a threshold as distinctly different, when in reality they may be very similar on the original scale.47 Best practice recommends that continuous predictors remain continuous in any analysis and to investigate non-linear associations.4750

Sample size and patient flow

A total of 646 patients had at least one follow-up visit and were included within the analysis. Only 600 of these patients completed follow-up, with 14 lost to follow-up, 10 patients dying after their first follow-up, nine patients withdrawn and 13 patients restarting OAC therapy for another reason. There were 91 events recorded out of the 646 patients included in the analysis, which could be considered insufficient power to investigate the 36 predictors described within the article. As a rough guide at least 10 events are required for each candidate predictor being investigated to give enough power for the analysis to yield valid conclusions.51 Therefore, to investigate 36 predictors would require roughly 360 events. It is unclear from the article whether or not model building included the 36 predictors summarised within the article, or the 96 predictors for which information was recorded: roughly 960 events would be required to adequately power such an analysis.51 As the study was substantially underpowered (with a maximum of 2.5 events per predictor), the results of model building and conclusions drawn must be carefully interpreted. Predictor effects could be substantially biased, with overestimation or underestimation of both the effect size and its associated uncertainty.51

Rodger et al.9 conducted a complete-case analysis, excluding patients with any missing predictor information. There was missing predictor information quantified for five predictors included in the predictor selection procedure:

  • abnormal baseline compression ultrasound (events = 14/non-events = 180)
  • factor V Leiden (events = 0/non-events = 1)
  • ventilation/perfusion scan or pulmonary vascular obstruction result < 95% (events = 53/non-events = 283)
  • post-thrombotic signs (hyperpigmentation, edema or redness in either leg) = (events = 18/non-events = 83)
  • family history of VTE (events = 0/non-events = 1).

The use of a complete-case analysis for the final model may introduce attrition bias by excluding patients who have outcome data but are missing information from one or more candidate predictors. The drop in the number of events is also a cause for concern; a potential total of 85 events were missing across the five predictors, enhancing the possibility of spurious relationships being seen or important prognostic effects being missed during the predictor selection procedure. In terms of the final model, only post-thrombotic signs were included and so the complete-case analysis sample size was reduced by 101 patients, of whom 18 were events. Overall the sample size and handling of missing data within the study indicate a potentially high risk of bias within the model development.

Analysis

Rodger et al.9 used a conditional logistic regression model and selected predictors using a stepwise forward selection process. As the outcome of interest was time to recurrent VTE, a time-to-event analysis may have been more appropriate here as logistic regression does not account for the censoring of patients over time and variable lengths of follow-up. The analysis also did not consider the potential heterogeneity across the three tertiary centres from four different countries, stratification by centre or country would take into account any heterogeneity in the baseline risk of recurrence within these potentially different populations.52

Predictors were only included in multivariable analysis if univariable analysis yielded a p-value < 0.2, though univariable results were not presented in the article. This exclusion of candidate predictors from multivariable analysis was therefore completely data driven based on univariable results, which could lead to potential bias in results because predictors which may be important in combination (e.g. in a multivariable model) were not considered for multivariable analysis. Univariable analyses are not recommended for decisions about inclusion criteria in a multivariable model.53

The use of a forward selection procedure in model development can lead to overoptimism in regression coefficients (betas) and therefore a method such as shrinkage or bootstrapping should be used to account for this optimism. Rodger et al.9 did not use methods to account for optimism in their analyses and as such there is a risk that the performance of their proposed model could be weaker when applied to a new population.

Rodger et al.9 decided to split patients into two groups based on sex in a post-hoc analysis because a specified low-risk group (< 3% annual risk of recurrence) could not be identified. Post-hoc subgroup analyses such as this are often not considered in any assessment of study power. Stratifying by sex reduced the number of events for females and males to 28 and 63 events, respectively, creating similar issues to those discussed above in the estimation of regression coefficients within the model.51

Five clinical decision rules were developed for women and two for men, the final decision rule was selected based on criteria including the classification of performance; the proportion of patients identified as low risk; the face validity of the model; ease of use of the model; and more parsimonious models. The performance of decision rules for men was considered poor, particularly in identifying a low-risk group of men which could be considered for cessation of therapy, as such the study recommended that men continue with OAC therapy. The final decision rule for women included predictors for:

  • post-thrombotic signs:
    • hyperpigmentation
    • oedema
    • redness in either leg
  • D-dimer level ≥ 250 µg/l
  • BMI ≥ 30 kg/m2
  • aged ≥ 65 years.

Rodger et al.9 suggested that any female patient with fewer than two of these predictors could potentially safely cease OAC therapy after 5–7 months of initial OAC therapy for an unprovoked VTE. The specification of the model is not described in full, with regression coefficients presented for the final model only, but without any measurement of variability such as a standard error or 95% CI for the coefficients. It is unclear if or how the decision rule is related to the regression coefficients, though it may be that a rounding of the coefficients to the nearest integer was used. There is no indication of the level of risk associated with a particular score using the proposed decision rule; for example, what the risk of recurrence at 1 year post cessation of therapy is for a female with three of the included factors.

The decision rules developed were internally validated using split-sample cross-validation. Five hundred subsamples, half the size of the study sample, consisting of randomly selected patients from the population were used to assess rule performance. The mean annual recurrence risk predicted by the clinical decision rule was recorded within each subsample and showed that for all subsamples the decision rule identified a low-risk group of women with mean annual risk of recurrence between 0% and 3%, suggesting that the rule performed well in internal validation. No measure of performance in terms of calibration or discrimination was presented, and no external validation of the rule was conducted. An external validation of the clinical decision rule is currently being undertaken comparing use of the decision rule to decide on cessation of therapy versus standard therapy.27,28

Summary

Overall there are significant concerns over the robustness of the proposed clinical decision rule. The testing of many more predictors than the study was powered for could lead to potentially spurious predictor effects. The data-driven dichotomisation of continuous predictors does not allow for non-linear effects, and instead suggests a constant effect in patients at either side of the chosen threshold.48 The effects of missing predictor information were not assessed which could have resulted in different conclusions had there been more data available. Potentially inappropriate analyses were performed, without accounting for heterogeneity in baseline risk across differing populations and performing post-hoc subgroup analysis. The decision rule was poorly presented, with a lack of explanation linking regression coefficients with the decision rule. Furthermore, there was no reporting of uncertainty surrounding coefficients and no description of calibration or discrimination given. Therefore, there is substantial concern that the proposed decision rule would not perform as presented when applied in a new population and examined in new data independent to that used to develop the model.

Vienna prediction model2,42

Eichinger et al.2,42 aimed to develop and internally validate a prognostic model to improve VTE recurrence risk prediction for patients following a first unprovoked VTE.

Patient selection

Eichinger et al.2,42 also used a prospective cohort design with consecutive unselected patients recruited from four thrombosis centres in Vienna, between July 1992 and August 2008. Similar to Rodger et al.,9 a prospective cohort design ensures that predictor information can be collected blinded of patient outcome. Patients were included if they were at least 18 years old and had been treated with OAC therapy for at least 3 months for a first unprovoked VTE.

The definition of unprovoked VTE used by Eichinger et al.2,42 was based on an absence of the following provoking factors:

  • surgery
  • trauma
  • pregnancy
  • hormone intake
  • deficiency in antithrombin, protein C, protein S
  • presence of lupus anticoagulant, or
  • cancer.

This definition of unprovoked VTE therefore follows a standard definition based on an absence of any transient risk factors. As a result the study population should be representative of the unprovoked VTE population and therefore provides a low risk of bias in the model development.

Outcomes

The main outcome of the study was recurrent VTE, with suspected DVT confirmed by venography or colour duplex ultrasonography. Suspected PE was confirmed by ventilation/perfusion scan and/or spiral computed tomography. All events were adjudicated by a committee of independent radiologists. Detection bias was limited by pre-specification of outcome definitions, with the same definition and assessment used for all patients, meaning systematic differences in the determination of outcomes were avoided. Outcomes were pre-specified with the same definition and assessment used for all patients reducing the risk of differences in the diagnosis and reporting of outcomes. The outcome definition excluded any candidate predictors and was also determined blind to any predictor information, again, reducing the risk of detection bias and overall resulting in a low risk of bias based on the study outcomes.

Predictors

Eichinger et al.2,42 pre-specified a selection of clinical and laboratory predictors based on criteria including independent confirmation of the association with risk of recurrence (literature), simplicity of assessment and reproducibility. Using pre-specified candidate predictors reduces the risk of bias through unnecessary investigations, limiting the number of hypothesis tests and therefore reducing the chance of multiple testing issues. Candidate predictors included:

  • sex
  • age (years)
  • BMI (kg/m2)
  • site of index event (distal DVT, proximal DVT, PE)
  • D-dimer (µg/l)
  • factor V Leiden
  • factor II mutation
  • peak thrombin.

All predictors were measured blinded to outcome data (with laboratory predictors measured at cessation of OAC therapy), and predictor definitions were consistent for all patients. These methods reduce the risk of selection bias due to differences in the patient characteristics and also the risk of bias in the reporting of predictors based on outcomes.

Eichinger et al.2,42 investigated linear forms of all continuous predictors including age, BMI and D-dimer level. The study authors also investigated a dichotomisation of BMI based on the standard threshold for obesity (BMI > 30 kg/m2) used in clinical practice. The study therefore avoided introducing bias from data-driven methods of categorising continuous predictors, but this approach can still lead to a loss of information by splitting patients risk into distinct groups.47

Sample size and flow

A total of 929 patients were included within the analysis with 176 recurrent events being recorded. Given that Eichinger et al.2,42 investigated a total of eight predictors (15 predictors including categorisations of predictors), the number of events seen could be considered sufficient given the rule of thumb of 10 events per predictor.51 As the study could be considered suitably powered (with a maximum of 12 events per predictor), the inclusion of predictors and their effects are likely to be at low risk of bias, with effect sizes and associated uncertainty likely to be reliably estimated.51

Eichinger et al.2,42 conducted a complete-case analysis, excluding patients with any missing predictor information. There was missing predictor information quantified for five predictors included in predictor selection, but no indication of the number of associated events:

  • BMI = 20
  • D-dimer = 97
  • peak thrombin = 300
  • factor V Leiden = 13, and
  • factor II G20210A mutation = 14.

Complete-case analysis introduces attrition bias through excluding patients with outcome data because they have missing information related to one or more candidate predictors. The drop in the number of events could have an impact on the regression coefficients and selection of predictors for inclusion within the final model, particularly for peak thrombin with a third of patients missing predictor information. Any analyses including peak thrombin would exclude a third of the study population, markedly reducing sample size and causing issues with the estimation of predictor effects. As D-dimer is the only predictor included in the final model, the complete-case analysis would have reduced the sample size by 97, though it is unclear how many events this represents and therefore could present a high risk of bias in the results presented. Overall the sample size and handling of missing data within the study indicate a potentially high risk of bias within the model development.

Analysis

Eichinger et al.2,42 used a Cox proportional hazards model and selected predictors using a forwards stepwise selection process. A Cox regression model accounts for the censoring of patients over time and variable length of follow-up in a time-to-event analysis, making it an appropriate choice for the recurrent VTE outcome. No stratification by centre was performed as part of the analysis, which did not account for potential heterogeneity in the baseline risk of recurrence for patients from the four different thrombosis centres. Failing to account for differences in baseline risk (or at least to investigate whether or not stratification is necessary), could lead to biased predictor effects which may not be replicated when the model is applied in a new population.52

Eichinger et al.2,42 first fitted a saturated model using all clinical predictors and then performed forward selection to investigate laboratory predictors (using an inclusion threshold of p-value < 0.5). To account for optimism associated with the stepwise selection procedure, Eichinger et al.2,42 evaluated the statistical significance of included predictors using bootstrap zero-corrected 95% CIs. From 1000 bootstrap resamples of the population, only factors for which the 95% CI did not overlap zero were included in the model. Further to this, Eichinger et al.2,42 applied a shrinkage factor (calculated by bootstrap resampling), to the final beta coefficients to adjust for optimism which may affect the models performance in new study populations. The extensive use of methods to account for optimism in their analyses suggests that performance of the model is unlikely to be affected when applied to a new population.

Two prognostic models were presented within the study including the following predictors:

  • sex (female as the reference category)
  • site of index event (distal DVT as the reference category)
  • peak thrombin (included in the first model)
  • D-dimer (included in the second model).

Both models included sex and site of index event as predictors, but during initial predictor selection peak thrombin was identified as significantly prognostic and D-dimer did not enter the model [hazard ratio (HR) 1.21, 95% CI 0.87 to 1.53; p = 0.622]. The authors then decided to evaluate a model including D-dimer without peak thrombin, and in this model D-dimer was a significant predictor of recurrent VTE. Eichinger et al.2,42 chose to take forward the model including D-dimer levels as a final model because ‘D-dimer is a well-standardised and widely established parameter’. The ad hoc selection of predictors may be a cause for concern as it was not pre-specified that D-dimer was to be included and there was strong evidence against the prognostic value of D-dimer compared with peak thrombin levels. However, as discussed by the authors, the inclusion of D-dimer within the model could potentially improve the implementation of the model, as D-dimer is an established predictor more readily measured in practice. Prognostic models should aim to include predictors with standard definitions, which are easily available at the time the model is intended for use; however, the performance of the model may have been significantly improved by the inclusion of peak thrombin as a predictor (which showed strong evidence of prognostic value). However, it is also worth noting the substantial number of missing data for peak thrombin (300/929 patients missing peak thrombin data), which could have influenced the estimated effect of peak thrombin within the complete-case analysis performed.

Eichinger et al.2,42 proposed a nomogram based on the final model including sex, site of index event and D-dimer as predictors. The nomogram can be used to calculate patient’s cumulative recurrence rate at 12 and 60 months from cessation of therapy, with estimated 95% CIs. The relationship between the regression coefficients of the model and the simplified nomogram is not explicitly stated, though it is suggested that the coefficients are first multiplied by the estimated shrinkage factor. No estimate of baseline risk is provided and therefore it is only possible to estimate patients predicted risk of recurrence at the specified time points within the study (1 and 5 years). The use of estimated recurrence risk at specific time points (with associated uncertainty), improves on the HER DOO 2 model9 by allowing practitioners to apply their own judgement based on the predicted risk, using current guidelines and patient consultation, to make an informed decision on treatment strategy for the individual patient.

Internal validation of the model was performed using bootstrap cross-validated risk scores. Patients were randomly drawn with replacement from the original sample, to make a new bootstrap sample of 929 patients. Eichinger et al.2,42 re-evaluated their model within this new bootstrap sample and validated their model in the sample of patients that were not selected from the original study data. The process was repeated 1000 times and an average risk score for each patient was calculated from the 1000 replications. Using these averaged risk scores the performance of the model (within the validation subsets) was assessed using the AUC, which is a measure of model discrimination. For recurrence risk at 12 and 60 months from cessation of therapy the optimism adjusted (bootstrapped) AUC was estimated at 0.674 and 0.646, respectively, indicating moderate discrimination, which suggests moderate ability of the model to separate groups of patients such as high- and low-risk patients (where AUC = 1 represents perfect discrimination). The study also reported an apparent c-statistic (assessing discrimination across all time points, and without adjustment for optimism) of 0.651 for the developed model as a measure of discrimination (where c-statistic = 1 represents perfect discrimination), which suggests a small reduction in model performance after accounting for optimism. The Vienna model also calculated a bootstrap optimism-adjusted calibration slope (or uniform shrinkage factor), which showed moderate calibration performance of 0.88 (with 1 indicating perfect calibration). This shrinkage factor was also used to adjust the predictor effect values in their final model, to adjust for the overoptimism. However, the performance of a model measured in the same data set used to develop the model will always be biased, showing greater performance than can be expected in an external setting.

An external validation of the Vienna prediction model is currently being undertaken, which should provide a more reliable indication of model performance in a new patient population. It should also be noted that the internal validation of the Vienna prediction model relates to the multivariable Cox regression model developed, and it is unclear whether or not internal validation of the simplified nomogram was conducted.

Summary

In summary, there was a moderate risk of bias associated with the Vienna predication model, mainly because no external validation has yet been performed. Model development itself was undertaken well. Patient selection avoided inappropriate exclusions and outcomes were defined consistently for all patients and blinded to predictor information. Continuous predictors were assessed in their linear form as opposed to categorising, therefore avoiding a loss of information. Overoptimism in the estimated regression coefficients was accounted for by using bootstrapping methods to adjust these coefficients.

However, there were also some areas for concern which could have introduced bias into the model development. The analysis did not investigate potential heterogeneity in the baseline risk of recurrence in the different populations from the four thrombosis centres used, potentially affecting the models applicability to a new population. The study authors made an ad hoc decision to include D-dimer as a predictor in the model, despite its lack of prognostic value during predictor selection. There was strong evidence of an important effect for peak thrombin, which meant D-dimer was originally excluded. The decision to include D-dimer was based on practical implications for model use, as it is a more established predictor. There were also some issues with missing predictor information, with marked reductions in data for both D-dimer and peak thrombin, and no information on the number of events excluded from complete case analyses investigating these predictors. Furthermore, it was not clear how the nomogram was created and whether or not the internal validation of the nomogram was examined.

Overall the Vienna prediction rule was presented well and classed as low risk of bias in terms of model development. External validation is, however, now essential for the proposed nomogram. If performance was found to be acceptable, the model would allow individual prediction of recurrence risk limited to the two specified time points. The model was presented in a nomogram which may facilitate uptake of the model in practice, though this format limits the precision available (e.g. in specifying a patient’s exact D-dimer level). The predicted recurrence rate is provided with associated uncertainty (95% CI), which allows both clinician and patient to make an informed decision regarding treatment.

Finally, the extension to the Vienna model aimed to allow prediction of recurrence risk at further time points post cessation of therapy by measuring D-dimer over time.42 Measurements were made at 3, 9 and 15 months post cessation of therapy, and three more nomograms were developed to allow risk prediction using the Vienna model at these time points. D-dimer levels did not vary over the observation time and the associated HRs remained very similar (only the point estimate slightly decreased over time).42 A web-based calculator allows users to predict recurrence risk at any time between baseline (3 weeks) and 15 months post cessation of therapy.42 The model was adjusted for optimism using leave-one-out resampling to calculate shrinkage factors for 3, 9 and 15 months of 0.79, 0.81 and 0.7, indicating moderate calibration of the model at all time points but reduced performance compared with the original Vienna model (optimism-adjusted calibration slope = 0.88). In terms of discrimination performance (for 5-year predictions) at each time point, optimism adjusted AUC values were 0.61, 0.61 and 0.58, representing a small reduction in performance compared with the original model (AUC = 0.646).2,42 However, although the earlier Vienna model has recently been externally validated, this model has not been externally validated to date.

DASH score41

Tosetto et al.41 aimed to develop and internally validate a clinical prediction guide to stratify unprovoked VTE patients by their risk of recurrence and identify those suitable for long-term OAC therapy. The study performed a meta-analysis of IPD from seven prospective studies (see Main study and patient characteristics), so as to alleviate issues of statistical power often encountered in single prospective study.

Patient selection

Tosetto et al.41 used IPD from seven prospective cohort studies with consecutive unselected patients described by Douketis et al.11 previously. The prospective cohort design as used in the previous studies ensures that predictor information can be collected blinded of patient outcome. Patients were included if they were at least 18 years old and had been treated with OAC therapy for at least 3 months for a first unprovoked VTE. Patients were excluded if follow-up ended before D-dimer measurement, or if they had a distal DVT; only proximal DVT and PE were included as valid index sites.

The definition of unprovoked VTE used by Tosetto et al.41 was based on an absence of the following provoking factors:

  • surgery
  • trauma
  • pregnancy and the puerperium
  • immobility
  • cancer.

This definition of unprovoked VTE therefore includes women who have currently, or previously, used OCs or HRT. As described for the Rodger et al.9 study, hormone therapy is sometimes included in the definition of unprovoked VTE because the effect of hormone therapy is weak;41,44 however, hormone therapy should be considered as a provoking factor. The definition of unprovoked VTE refers to a VTE in the absence of any transient risk factors, hence why hereditary thrombophilia can be considered unprovoked VTE.45 As hormone therapy is a transient risk factor patients with a history of hormone therapy could be considered provoked. This could lead to effect sizes within the analyses being incorrectly estimated, particularly the effect of sex, because there may be subgroups of patients included in the model development who are at potentially lower risk than other unprovoked patients.

Outcomes

The primary outcome of the study was recurrent VTE, with associated deaths also recorded. All suspected outcomes were objectively confirmed and independently adjudicated.11 Outcomes were pre-specified with the same definition and assessment used for all patients, therefore reducing the risk of differences in the determination of outcomes. The outcome definition excluded any candidate predictors and outcomes were also determined blind to any predictor information again reducing the risk of detection bias.

Predictors

Tosetto et al.41 used a backward elimination approach, starting with a saturated model including the following candidate predictors:

  • sex
  • age (years)
  • site of index event (DVT alone, or DVT and PE)
  • D-dimer (ng/ml)
  • hormone use at time of VTE (women)
  • previous history of cancer.

All predictors were measured blinded to outcome data (with D-dimer measured 3–5 weeks after cessation of OAC therapy), and predictor definitions were consistent for all patients. These methods reduce the risk of selection bias due to differences in the patient characteristics and also the risk of bias in the reporting of predictors based on outcomes.

Tosetto et al.41 also stratified their analyses by source study to allow for potential heterogeneity in the baseline risk of recurrence within these seven different populations. The adjustment for underlying differences in the source study populations may make the final model more robust when applied within a new population, not used in the development process.

Tosetto et al.41 categorised all continuous predictors, creating a dichotomisation of D-dimer (normal vs. abnormal), while categorising age into quartiles. Quartiles of age were used to control for a non-linear relationship between patient age and recurrence risk. The study may have therefore introduced bias from categorisation of continuous predictors, which can lead to a loss of information by separating patients risk into distinct groups.47 Categorisation of age appeared to be data driven, whereas dichotomisation of D-dimer was likely based on the instrument used (though it was not stated), both inducing the risk of reporting biases.

Sample size and flow

A total of 1818 patients were included within the analysis with 239 recurrent events being recorded. Given that Tosetto et al.41 investigated a total of six predictors (14 predictors including categorisations of predictors), the number of events seen could be considered sufficient given the rule of thumb of 10 events per predictor.51 As the study could be considered suitably powered (with a maximum of 17 events per predictor), the inclusion of predictors and their effect estimates may be considered to be at low risk of bias. Predictor effects are likely to be at low risk of bias, with effect sizes and associated uncertainty likely to be reliably estimated.51

There was no missing predictor information in the analyses performed by Tosetto et al.41 and therefore it was not necessary to allow for the effects of potential attrition bias. The previous two studies have both suffered with missing predictor information and conducted complete case analyses, potentially introducing attrition bias into their analyses. Tosetto et al.41 were therefore able to use all patient data in their analyses and thus statistical power remained appropriate to assess all predictors for inclusion. However, the study also used a selection procedure meaning more predictors were considered, resulting in a proportion of missing predictor data. The DASH model considered predictors including BMI, for which only 802 out of 1818 patients had complete predictor information, which may have effected the ability to detect a BMI effect. Overall the sample size and handling of missing data within the study resulted in a low risk of bias associated with the model development.

Analysis

Tosetto et al.41 used a Cox proportional hazards model and selected predictors using a backwards elimination process. A Cox regression model accounts for variable lengths of follow-up and the censoring of patients over time and in a time-to-event analysis, making it an appropriate choice for the time to recurrent VTE outcome. The analyses were stratified by source study to allow for the differences in the baseline risk across the seven different study populations, therefore avoiding biased predictor effects which may improve the external performance of the model.52

Tosetto et al.41 first fitted a saturated model using all clinical predictors, and then performed backward elimination to investigate candidate predictors (using an exclusion threshold of p-value > 0.1). To account for excessive optimism associated with the backward selection procedure, Tosetto et al.41 evaluated this using a heuristic formula and linear shrinkage by bootstrapping. A correction factor was calculated and applied to the final beta coefficients to adjust for optimism which may affect the models performance in new study populations. The use of shrinkage methods to account for overoptimism in their selection process and analyses provides a low risk that the performance of their proposed model could differ when applied to a new population.

The final DASH score was developed by multiplying the regression coefficients by the calculated correction factor, doubling these coefficients and rounding them to the nearest integer. Giving a final model included the following predictors:

  • abnormal D-dimer (post therapy), score = +2
  • age (≤ 50 years), score = +1
  • sex (male), score = +1
  • hormone use (at time of index event, in women), score = –2.

The proposed score can be used to calculate patients’ cumulative recurrence rate at 1, 2 and 5 years from cessation of therapy, with estimated 95% CIs. Despite stratification for source study in the analysis, there is no reported estimate of baseline risk and therefore patients’ predicted risk of recurrence can only be estimated at the specified time points presented. The use of estimated recurrence risk at specific time points (with associated uncertainty) is similar to that presented for the Vienna prediction model.2,42 This is a substantial improvement compared with the HER DOO 2 model9 as it allows physicians and patients to make an informed decision on treatment duration for the individual patient.

Internal validation of the model was performed using a bootstrap procedure similar to that described for the Eichinger et al. study.2,42 Patients were randomly drawn with replacement from the original sample, to make a new bootstrap sample of 1818 patients. Tosetto et al.41 then re-estimated the DASH score within this new bootstrap sample, to confirm the recurrence rate and associated CI for DASH score of < 1 (identified as having an annual recurrence risk < 5%). The process was repeated 500 times and an average risk of recurrence was calculated to be less than the agreed 5% annual recurrence risk. Apparent c-statistics (which represent the discriminatory performance within the development data without adjustment for optimism using, for example, bootstrapping) were between 0.71 and 0.72 for the score and model (beta terms), respectively, indicating moderate discrimination ability even for the simplified score for use in practice; however, apparent performance is likely to be optimistic. The DASH model also provided a bootstrap optimism-adjusted calibration slope (or uniform shrinkage factor) as the Vienna model did, which also showed strong calibration performance of 0.97 for the DASH model (with 1 indicating perfect calibration). The shrinkage factor was then used to adjust the predictor effect values for overoptimism in the final model. However, the performance of a model measured within the development data set is likely to be biased, indicating stronger performance than could be expected in a new population. The use of IPD meta-analysis in the derivation of the model and stratification for source studies may make the DASH score more robust to departures from the development population’s characteristics, but external validation should be sought to identify the true performance of the model in a new patient population.

Summary

In summary the DASH score proposed by Tosetto et al.41 could be considered at moderate risk of bias, mainly due to the lack of external validation and the categorisation of continuous predictors in the model development. Many aspects of model development were done well. Patient selection avoided inappropriate exclusions and outcomes were defined consistently for all patients and blinded to predictor information. Stratification was used in analyses to account for heterogeneity in the baseline risk of recurrence across the source studies. Missing predictor information was not an issue in the model development, avoiding attrition bias and preserving statistical power. Overoptimism in the selection of predictors and estimated effects was accounted for by a correction factor calculated by bootstrapping. Reporting of the final score was clear with recurrence risks associated with particular scores presented including uncertainty.

However, there were also some areas for concern which could have introduced bias into development of the DASH score. There were issues with categorisation of continuous predictors, which could lead to a loss of important prognostic information. Furthermore, and most importantly, there was no external validation.

External validation is therefore now essential. The DASH score provides individual recurrence risk prediction at specific time points post cessation of therapy. If externally validated and found to perform well, then the score could be useful in practice as the included predictors are well defined and readily available at the time the decision rule would be applied. However, the true performance of the DASH score within a new patient population is unclear given the lack of internal and external validation statistics. Any physician or patient using the DASH score should therefore interpret the predicted recurrence risk with care, and the included 95% CIs are importantly presented within the study, allowing informed decision-making regarding treatment.

Comparison of included studies quality

All studies performed suitable patient selection avoiding inappropriate exclusions, used appropriate study designs, pre-specified outcomes and assessed outcomes blinded to predictor information, giving low risk of bias across all studies. All studies recruited patients from different centres or countries; however, only one (Tosetto et al.41) stratified by source in their analyses. Stratification accounts for heterogeneity in the baseline recurrence risk in different patient groups. Ignoring the clustering of patients within centres or countries could lead to poor model calibration,52 where model predictions do not closely fit observed recurrence rates, and could diminish performance in a new setting.

The three studies investigated a wide variety of candidate predictors, including clinical and laboratory predictors. Eichinger et al.2,42 avoided the categorisation of continuous candidate predictors (see Table 4), Tosetto et al.41 investigated patient age in quartiles, but pre-specified the analysis to allow for non-linear associations between age and recurrence risk. Rodger et al.,9 in contrast, performed chi-squared testing to identify the optimal threshold to dichotomise every continuous predictor under consideration. The data-driven nature of the analysis incites reporting biases where the optimal thresholds are reported without any clinical meaning. Dichotomisation of continuous predictors is also methodologically poor, as it seeks to separate patients risk into two categories treating those above and below the threshold as having different constant risks, which is unrealistic in practice.48

The HER DOO 2 model development by Rodger et al.9 was markedly underpowered, having collected information on 69 predictors and assessed at least 36 candidate predictors, with only 91 recurrent events. Given a rule of thumb based on at least 10 events per candidate predictor to be investigated,51 Rodger et al.9 only had 2.5 events per predictor, indicating a lack of power that could lead to biased estimates of predictor effects. Following the same rule, the other two studies had sufficient numbers of events to assess the predictors of interest with appropriate statistical power (see Table 4).

All included studies suffered from some degree of missing predictor information (either in predictor selection or final model predictors) and used a complete-case analysis to overcome this issue within the final models (see Table 4). No methods to assess the impact of this missing predictor information were used (i.e. an imputation procedure), and in the study by Eichinger et al.2,42 the number of missing recurrent events was not reported, so no assessment of the statistical power could be made accurately. Attrition bias can lead to unbalanced groups of patients and exclusion of patients reduces sample size making estimation of predictor effects biased and performance of the model specific to a subgroup of the population for whom information was not missing (there may be a risk of bias due to the nature of the missing data).

Two studies used bootstrapping and shrinkage methods to adjust predictor coefficients for over-optimism (see Table 4),2,41,42 whereas the HER DOO 2 development did not account for optimism in predictor estimates.9 The use of optimism correction methods provides a lower risk of biased, unrealistic predictor effects, and should ensure the model performance is more consistent in a new patient population. Another methodological issue relating to model performance is validation; internal validation was performed across all of the studies, but only one has since been external validated (see Table 4).2,42 Internal validation was reported in terms of both calibration and discrimination within the DASH41 and Vienna models2,42 (though not for the simplified nomogram), whereas Rodger et al.9 presented neither (see Table 5): both calibration and discrimination are vitally important performance statistics for any prognostic model. External validation is the true indication of model performance, as a model validated within its development data set will always give optimistic performance statistics.18 The Vienna model has now externally validated (see Relevant studies identified after the search cut-off dates),2,42 but issues remain because (i) validation was shown to be lower than expected and uncertainty was high;54 (ii) a new Weibull model component was added, which itself requires additional validation; (iii) the nomogram version of Vienna, which is the most used, was not validated; and (iv) validation was not made by independent authors to the original model development. Thus, until further external validation is undertaken, the true performance in new populations cannot be ascertained. Further external validation studies are currently being undertaken to validate both the HER DOO 2 decision rule8,27,28 and the Vienna prediction model, which will provide a true indication as to the overall performance (in terms of calibration and discrimination) of these models in new patient populations where they are intended for use.

Finally, the application of the proposed models was described in various ways across the studies. Both the Vienna prediction model2,42 and the DASH score41 were presented well, with an indication of how the predictors are combined to calculate a patient’s recurrence risk at a specific time point (see Table 4). Both provided cumulative recurrence rates at specific time points after cessation of therapy including an estimate of the uncertainty surrounding these estimates (95% CIs). This information could be used to direct the decision-making process, informing clinicians and patients of the individual’s level of risk and therefore allowing individualised treatment strategies. Conversely, the HER DOO 2 model9 derived a clinical decision rule splitting patients into those with less than two predictors (from their model) and those greater than two predictors, suggesting that one group could continue OAC therapy, while the other could safely stop. Rodger et al.9 did not report individuals risk at specific time points, only that fewer than two predictors would indicate a < 3% annual risk of recurrence. This therefore does not allow clinicians or patients to make decisions based on their preference of recurrence risk threshold, limiting the applicability of the decision rule.

Ongoing studies

There were two ongoing studies identified through the literature searches: the REVERSE II study (Recurrent Venous thromboembolism Risk Stratification Evaluation II)27,28 related to the HER DOO 2 rule, and the VISTA study26 related to the Vienna prediction model.

The first was an external validation trial of the HER DOO 2 rule proposed by Rodger et al.9 which was internally validated within the original study. This ongoing randomised trial aims to compare the use of the proposed decision rule to decide on cessation of OAC therapy, compared with standard practice.27,28 The second is an ongoing randomised trial comparing the use of the Vienna prediction model to decide on treatment duration, compared with usual care where treatment duration is based on physician judgement.26

Relevant studies identified after the search cut-off dates

Subsequent to the completion of our review searches, one additional highly relevant study was identified related to one of the ongoing studies found through the systematic review.34,54 This was an external validation of the Vienna prediction model using IPD from five studies, which aimed to assess the performance of the Vienna model in terms of both discrimination and calibration in a new population.11,54

The study reported that the derivation and validation populations were homogeneous after removal of patients with provoked VTE and those with missing predictor information.54 Discrimination was calculated using the c-statistic for comparison with the original Vienna model, with a c-statistic in the validation cohort of 0.626 compared with 0.646 (the optimism adjusted discrimination – see Table 5) for the derivation data, indicating a reduction in the discriminatory performance of the model in a new setting.

The true calibration of the model in the validation data could not be assessed without the baseline hazard function.55 As the original Vienna model was developed using a Cox model which does not parameterise the baseline hazard function, this meant that assumptions about the shape of the baseline hazard function had to be made.54,56 The authors recalibrated the Vienna model assuming a Weibull distribution; however, because this new component of the model was developed, this new model would itself require further external validation.56 As the authors could not use the Cox model directly to predict survival probabilities (due to the lack of baseline hazard function), they could only assess calibration using the prognostic index to make predictions within the validation data.55 Comparison of observed and expected survival probabilities in five risk groups showed a general trend for the Vienna model to underpredict the risk of VTE recurrence at 12 months post cessation of therapy.54

Discussion

The systematic review of prognostic models for recurrence risk identified three full-text articles developing three independent prognostic models, or clinical decision tools,2,9,41 from 257 eligible full texts which met the full inclusion criteria. Data extraction of the three included articles showed that study characteristics and patient populations differed in some respects, particularly in terms of the definition of unprovoked VTE and in the number of patients and events included within their analyses (see Main study and patient characteristics). A critique of the included studies described and identified the strengths and weaknesses of the studies with a particular focus on methods of patient selection, outcome reporting, predictor selection, sample size, model development and validation (see Description, critique and main findings of model studies).

Data extraction highlighted the variable definitions of unprovoked VTE across the included studies (see Table 2). Eichinger et al.2,42 excluded patients provoked by use of female hormones, such as the OC pill or HRT, whereas Rodger et al.9 and Tosetto et al.41 defined patients taking hormones as unprovoked. Risk factors consistently defined as provoking across the studies included surgery, trauma, immobility and pregnancy. The use of varying definitions to describe the unprovoked population creates confusion as to what population the proposed models apply to. Tosetto et al.41 justify including hormone intake as unprovoked because evidence suggests hormone therapy is a weak risk factor for VTE.41,44 However, hormone intake should be considered as a transient risk factor, provoking initial VTE but an easily removable risk factor, whereas unprovoked VTE is not categorised by removable risk factors. Further research in developing prognostic models to predict recurrence risk in an unprovoked population should use a standard, consistent definition, excluding transient/removable risk factors to ensure that model predictions are reliable for intended patients. Given the definition of unprovoked VTE used by Tosetto et al.,41 the proposed DASH score is not applicable within an unprovoked population as it is defined in this report (see Chapter 1).

Across the included studies various predictors were included within the proposed final models, with sex, age and D-dimer level being included consistently within all three models, indicating strong evidence of an association with recurrence risk. As such, any future model development should investigate the effect of these predictors (along with other important predictors) in multivariable modelling due to their repeatable association with recurrence.

Quality assessment based on an early version of the PROBAST showed that there was evidence throughout the included studies of a moderate to high risk of bias, predominantly because of a lack of external validation (see Tables 4 and 5). The HER DOO 2 model9 development suffered high risk of bias, and some marked methodological issues, including the choice of analysis model, substantially underpowered analyses, data-driven categorisation of predictors, lack of adjustment for optimism and the presentation of the model for use. The Vienna prediction model2,42 and DASH score41 were more methodologically sound than the HER DOO 2 model,9 but had moderate risk of bias due to a lack of external validation. Both had statistical power to investigate their candidate predictors, accounted for optimism in their selection procedures and Eichinger et al.2,42 assessed continuous predictors without categorisation and loss of information (though Tosetto et al.41 did categorise continuous predictors). Both studies presented their proposed models more clearly than the HER DOO 2 model;9 indicating the recurrence rate associated with predictor values and the uncertainty around those estimates. However, predictions are only provided for particular, discretised values of risk; for example, both models provide predictions for only a small selection of time points (Vienna model for 12 and 60 months post therapy, DASH score for 1, 2 and 5 years from cessation of therapy), and both models only provide 95% CIs for a small selection of predicted annual recurrence rates. However, until further external validation is undertaken, the true performance in new populations cannot be ascertained. Further research should aim to consider some of the issues discussed here with regard to study quality to improve the performance of any proposed models within practice, provide transparent reporting of model development and finally to improve statistical analyses to ensure model predictions are more robust.

Copyright © Queen’s Printer and Controller of HMSO 2016. This work was produced by Ensor et al. under the terms of a commissioning contract issued by the Secretary of State for Health. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.

Included under terms of UK Non-commercial Government License.

Bookshelf ID: NBK344099

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (9.7M)

Other titles in this collection

Recent Activity

  • Systematic review of existing prognostic models for the recurrence of venous thr...
    Systematic review of existing prognostic models for the recurrence of venous thromboembolism following treatment for a first unprovoked venous thromboembolism - Prediction of risk of recurrence of venous thromboembolism following treatment for a first unprovoked venous thromboembolism: systematic review, prognostic model and clinical decision rule, and economic evaluation

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...