3.1. 2006 Guideline method
This guideline was commissioned by NICE and developed in accordance with the guideline development process outlined in the NICE technical manual.36
Literature search strategy
Initial scoping searches were executed to identify relevant guidelines (local, national and international) produced by other development groups. The reference lists in these guidelines were checked against subsequent searches to identify missing evidence.
Relevant published evidence to inform the guideline development process and answer the clinical questions was identified by systematic search strategies. The questions are shown in Appendix B. Additionally, stakeholder organisations were invited to submit evidence for consideration by the guideline development group (GDG) provided it was relevant to the clinical questions and of equivalent or better quality than evidence identified by the search strategies.
Systematic searches to answer the clinical questions formulated and agreed by the GDG were executed using the following databases via the ‘Ovid’ platform: Medline (1966 onwards), Embase (1980 onwards), Cumulative Index to Nursing and Allied Health Literature (1982 onwards), British Nursing Index (1985 onwards) and PsycINFO (1967 onwards). The most recent search conducted for the three Cochrane databases (Cochrane Central Register of Controlled Trials, Cochrane Database of Systematic Reviews, and the Database of Abstracts of Reviews of Effects) was Quarter 1, 2006. The Allied and Complementary Medicine Database (AMED) was also used for alternative therapies (1985 onwards via the Datastar platform). Searches to identify economic studies were undertaken using the above databases and the NHS Economic Evaluations Database (NHS EED).
Search strategies combined relevant controlled vocabulary and natural language in an effort to balance sensitivity and specificity. Unless advised by the GDG, searches were not date specific. Language restrictions were not applied to searches. Both generic and specially developed methodological search filters were used appropriately.
There was no systematic attempt to search grey literature (conferences, abstracts, theses and unpublished trials). Hand searching of journals not indexed on the databases was not undertaken.
Towards the end of the guideline development process, searches were updated and re-executed, thereby including evidence published and included in the databases up to 17 March 2006. Any evidence published after this date was not included. This date should be considered the starting point for searching for new evidence for future updates to this guideline.
Further details of the search strategies, including the methodological filters employed, are available in the appendix.
Synthesis of clinical effectiveness evidence
Evidence relating to clinical effectiveness was reviewed using established guides37–43 and classified using the established hierarchical system shown in .36 This system reflects the susceptibility to bias that is inherent in particular study designs.
Levels of evidence for intervention studies.
The type of clinical question dictates the highest level of evidence that may be sought. In assessing the quality of the evidence, each study receives a quality rating coded as ‘++’, ‘+’ or ‘−’. For issues of therapy or treatment, the highest possible evidence level (EL) is a well-conducted systematic review or meta-analysis of randomised controlled trials (RCTs; EL = 1++) or an individual RCT (EL = 1+). Studies of poor quality are rated as ‘−’. Usually, studies rated as ‘−’ should not be used as a basis for making a recommendation, but they can be used to inform recommendations. For issues of prognosis, the highest possible level of evidence is a cohort study (EL = 2). A level of evidence was assigned to each study, and to the body of evidence for each question.
For each clinical question, the highest available level of evidence was selected. Where appropriate, for example if a systematic review, meta-analysis or RCT existed in relation to a question, studies of a weaker design were not included. Where systematic reviews, meta-analyses and RCTs did not exist, other appropriate experimental or observational studies were sought. For diagnostic tests, test evaluation studies examining the performance of the test were used if the efficacy of the test was required, but where an evaluation of the effectiveness of the test in the clinical management of patients and the outcome of disease was required, evidence from RCTs or cohort studies was optimal.
The system described above covers studies of treatment effectiveness. However, it is less appropriate for studies reporting diagnostic tests of accuracy. In the absence of a validated ranking system for this type of test, NICE has developed a hierarchy for evidence of accuracy of diagnostic tests that takes into account the various factors likely to affect the validity of these studies ().36
‘Levels of evidence for studies of the accuracy of diagnostic tests.
For economic evaluations, no standard system of grading the quality of evidence exists. Economic evaluations that are included in the review have been assessed using a quality assessment checklist based on good practice in decision-analytic modelling.44
Evidence was synthesised qualitatively by summarising the content of identified papers in evidence tables and agreeing brief statements that accurately reflected the evidence. Quantitative synthesis (meta-analysis) was performed where appropriate. Where confidence intervals were calculated, this was done in accordance with accepted methods.45 Summary results and data are presented in the guideline text. More detailed results and data are presented in the evidence tables in the appendix, where a list of excluded studies is also provided.
Specific considerations for this guideline
It was anticipated that some evidence relevant to this guideline would not be specific to women with urinary incontinence (UI) and thus studies with mixed populations (men and women, and/or with UI of different aetiology) were considered if the majority of the population was women with idiopathic UI or overactive bladder (OAB).
Published guidance from the NICE Interventional Procedures (IP) Programme was considered, alongside all relevant evidence in women with UI or OAB when an interventional procedure was approved for use. Where the IP guidance states that an interventional procedure is not for routine use, the procedure was not considered within this guideline.
The NICE health technology appraisal on tension-free vaginal tape (2003) was updated within this guideline by addressing a question on the intervention. The associated NICE guidance will be withdrawn on publication of this guideline.
The classification of adverse effect frequency used by the Medicines and Healthcare products Regulatory Agency (MHRA) was adopted within the guideline, as shown in .
MHRA classification of adverse effect frequency.
Health economics
The aims of the economic input into the guideline were to inform the GDG of potential economic issues relating to UI in women and to ensure that recommendations represent a cost-effective use of healthcare resources.
The health economist helped the GDG by identifying topics within the guideline that might benefit from economic analysis, reviewing the available economic evidence and, where necessary, conducting economic analysis. Reviews of published health economic evidence are presented alongside the reviews of clinical evidence, and modelling is presented in the appendices, with cross references from the relevant chapters.
Outcome measures used in the guideline
For this guideline, treatment has been assessed against a number of outcome domains, as follows:
the woman's observations, including changes in symptoms and satisfaction
generic and incontinence-specific aspects of quality of life (QOL)
the clinician's observations including urodynamic investigation and quantification of incontinence
harm (adverse effects, surgical complications)
health economic outcomes, for example quality-adjusted life years (QALYs).
Forming and grading recommendations
For each guideline question, recommendations were derived using, and explicitly linked to, the evidence that supported them. In the first instance, informal consensus methods were used by the GDG to agree evidence statements and recommendations. Additionally, in areas where no substantial evidence existed, the GDG considered other guidelines or consensus statements to identify current best practice. Shortly before the consultation period, formal consensus methods were used to agree guideline recommendations (modified Delphi technique) and to select five to ten key priorities for implementation (nominal group technique).
Each recommendation was graded according to the level of evidence upon which it was based, using the established systems shown in and . For issues of therapy or treatment, the best possible level of evidence (a systematic review or meta-analysis or an individual RCT) equates to a grade A recommendation. For issues of prognosis, the best possible level of evidence (a cohort study) equates to a grade B recommendation. However, this should not be interpreted as an inferior grade of recommendation because it represents the highest level of relevant evidence.
Classification (grading) of recommendations for intervention studies.
Classification (grading) of recommendations for intervention studies.
In addition, the GDG made research recommendations in areas where evidence is lacking.
External review
This guideline has been developed in accordance with the NICE guideline development process. This has included giving registered stakeholder organisations the opportunity to comment on the scope of the guideline at the initial stage of development and on the evidence and recommendations at the concluding stage. In addition, the guideline was peer reviewed by nominated individuals. The developers have carefully considered all of the comments during the consultation periods by registered stakeholders with validation by NICE.
3.3. Methodology for 2013 Update
This guidance was commissioned by NICE and developed in accordance with the guideline development process outlined in the 2009 edition of The Guidelines Manual.
In accordance with NICE's Equality Scheme, ethnic and cultural considerations and factors relating to disabilities have been considered by the GDG throughout the development process and specifically addressed in individual recommendations where relevant. Further information is available from: www.nice.org.uk/aboutnice/howwework/NICEEqualityScheme.jsp.
Developing review questions and protocols and identifying evidence
The GDG formulated review questions based on the topics agreed with the stakeholders and included in the scope (see Appendix A) and prepared a protocol for each review question (see Appendix D). These formed the starting point for systematic reviews of relevant evidence. Published evidence was identified by applying systematic search strategies (see Appendix E) to the following databases: Medline (1950 onwards), Embase (1980 onwards), Cumulative Index to Nursing and Allied Health Literature (CINAHL; 1982 onwards), and three Cochrane databases (Cochrane Central Register of Controlled Trials, Cochrane Database of Systematic Reviews, and the Database of Abstracts of Reviews of Effects). Searches to identify economic studies were undertaken using the above databases, the NHS Economic Evaluation Database (NHS EED) and the Health Technology Assessment (HTA) database. None of the searches were limited by date. Searches in Embase were limited to English language, and searches in Medline were limited to English language and studies in humans. None of the other searches were limited by language of publication (although publications in languages other than English were not reviewed). Validated search filters were used to identify particular study designs, such as RCTs. There was no systematic attempt to search grey literature (conference abstracts, theses or unpublished trials), nor was hand searching undertaken of journals not indexed on the databases.
Towards the end of the guideline development process, the searches were updated and re-executed to include evidence published and indexed in the databases by 30 November 2012.
Reviewing and synthesising evidence
Evidence relating to clinical effectiveness was reviewed and synthesised according to the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach. In the GRADE approach, the quality of the evidence identified for each outcome in the review protocol is assessed according to the factors listed below, and an overall quality rating (high, moderate, low or very low) is assigned by combining the ratings for the individual factors.
Risk of bias (in study design using either NICE or CASP methodological checklists; see
http://www.nice.org.uk/guidelinesmanual and
http://www.casp-uk.net/). This also includes limitations in the design or execution of the study (including concealment of allocation, blinding, loss to follow up; these can reduce the quality rating).
Inconsistency of effects across studies – occurs when there is variability in the treatment effect demonstrated across studies (heterogeneity). (This can reduce the quality rating.)
Indirectness – the extent to which the available evidence fails to address the specific review question (this can reduce the quality rating).
Imprecision – present when there is uncertainty around the estimate of effect, for example when the confidence intervals are wide and cross the ‘imaginary’ lines of clinically significant effect (see ‘
Outcome measures’ below). This reflects the confidence in the estimate of effect. (This can reduce the quality rating.)
Other considerations (including large magnitude of effect, evidence of a dose–response relationship, or confounding variables likely to have reduced the magnitude of an effect; these can increase the quality rating in observational studies, provided no downgrading for other features has occurred).
The type of review question determines the highest level of evidence that may be sought. For issues of therapy or treatment, the highest possible evidence level is a well-conducted systematic review or meta-analysis of RCTs, or an individual RCT. In the GRADE approach, a body of evidence based on RCTs has an initial quality rating of high, but this may be downgraded to moderate, low or very low if the factors listed above are not addressed adequately. For issues of prognosis, the highest possible level of evidence is a controlled observational study (a cohort study or case–control study), and a body of evidence based on such studies would have an initial quality rating of low, which might be downgraded to very low or upgraded to moderate or high, depending on the factors listed above.
For each review question the highest available level of evidence was sought. Where appropriate, for example, if a systematic review, meta-analysis or RCT was identified to answer a question directly, studies of a weaker design were not considered. Where systematic reviews, meta-analyses and RCTs were not identified, other appropriate experimental or observational studies were sought.
Within the full guideline, summary GRADE tables are presented. The full GRADE tables can be found in Appendix I.
For the review in this update we used the study types and methodology checklists shown in .
Study types per question and corresponding NICE methodological checklist used.
We used the CASP checklist for observational studies as none of the NICE checklists were appropriate for non-comparative studies.
The quality items for each study are reported the study's evidence table and are summarised in the footnotes of each GRADE profile. For this guideline, we inserted footnotes to explain the choice we made while assessing the quality of evidence for each outcomes. These footnotes indicated if we upgraded the evidence level, downgraded the evidence level or left the evidence level unchanged, and gave the rationale for doing this.
Some studies were excluded from the guideline reviews after obtaining copies of the publications because they did not meet inclusion criteria specified by the GDG (see Appendix G). These studies are listed in alphabetical order for each question and the reason for exclusion provided for each one.
Basic characteristics of each included study were summarised in evidence tables for each review question (see Appendix H) along with the quality assessment. Where outcome data were presented, results were entered in text-boxes exactly as reported in the full-text report of the study. The data grids in the ‘Results’ column contain data we exported to Revman 5.1 (see http://ims.cochrane.org/revman) for meta-analysis. Where the standard deviation of the mean change from baseline was not reported, we imputed this using either the baseline standard deviation (SD) from the control group or theSD from a similar group.
Where possible, dichotomous outcomes were presented as relative risks (RRs) with 95% confidence intervals (CIs), and continuous outcomes were presented as mean differences with 95% CIs or SDs.
The body of evidence identified for each therapy or treatment review question (or part of a review question) was presented in the form of a GRADE evidence profile summarising the quality of the evidence and the findings (pooled relative and absolute effect sizes and associated CIs). Where possible, the body of evidence corresponding to each outcome specified in the review protocol was subjected to quantitative meta-analysis. In such cases, pooled effect sizes were presented as pooled risk ratios (RRs), pooled odds ratios (ORs), or mean differences. By default, meta-analyses were conducted using a random effects models as this is regarded as a more conservative method.
Where quantitative meta-analysis could not be undertaken, the range of effect sizes reported in the included studies was presented in a GRADE profile.
Outcome measures
For this guideline update, the effectiveness of interventions to treat urinary incontinence has been assessed against a variety of outcomes. The justification for using these outcomes is based on their relevance to women with the condition, to stakeholders involved in the consultation for this guideline and the expert consensus opinion of members of the multidisciplinary GDG. Outcomes included those that were felt to be desirable states (for example improvement in continence status) and the unwanted side-effects of treatment (for example the need for self-catheterisation). When assessing the effectiveness of a particular treatment, information about the effect of that treatment on one or more primary outcomes was sought.
Primary outcomes agreed in stakeholder consultation were:
continence status (zero episodes per day)
self-reported rate of absolute symptom reduction; for example number of episodes of incontinence per day
adverse effects; for example tolerability of drugs, development of new OAB symptoms after surgery for stress urinary incontinences, need for self-catheterisation after botulinum toxin A
incontinence-specific quality of life; for example Incontinence – Quality of Life, Bristol Female Lower Urinary Tract Symptoms questionnaire (BFLUTS) or the King's Health Questionnaire
psychological outcomes; such as anxiety and depression
clinical measures; such as cystometric capacity, post-void residual volume.
Once the GDG was convened, each member was surveyed to reach agreement on how to measure outcomes in a clinically meaningful way. The GDG members were asked individually to consider the time-point at which a specific outcome should be measured and the important adverse effects, and to prioritise the outcomes. (The questionnaire and feedback are available in Appendix V.)
Throughout the review we used the confidence intervals to decide imprecision, using a ‘zone’ rule.
The three zones, for example for the risk ratio, are less than 0.75, 0.75 to 1.25, and greater than 1.25. As demonstrated in , if the confidence interval:
Zones approach to imprecision.
was in a single zone, we rated the findings as precise and did not upgrade or downgrade
crossed into two zones, we downgraded to ‘serious imprecision’
crossed into three zones, we downgraded to ‘very serious imprecision’.
Where the GDG selected a minimal important difference (MID) for a continuous outcome, this MID defined the three zones, for example for the OAB-Q quality of life scale the MID used in the literature was 10 points, so this was used to define the zones for imprecision. The mean number of episodes differed across studies at baseline so it was not feasible to define a study-based MID such as percentage reduction in symptoms. A default MID of 1 episode per day difference between the treatments was chosen to define the zones.
The GDG consensus was that patient satisfaction with treatment was the the best overall indicator of treatment success since it includes those women who, while not on optimal treatment, may nevertheless have improved quality of life compared with before treatment.
Network meta-analysis
A network meta-analysis (NMA) can be undertaken where there is a comparison of multiple treatments. The approach is an extension of meta-analysis that includes multiple different pairwise comparisons across a range of interventions to treat one condition.
For this guideline, a hierarchical Bayesian NMA was undertaken to evaluate the effectiveness of antimuscarinic drugs for the treatment of overactive bladder. Trial populations were sufficiently homogenous to allow indirect comparisons of treatments that had not been directly evaluated as trials were identified that compared treatments with a common comparator. The analysis was strengthened by incorporating direct evidence from head-to-head trials as well as indirect comparisons from placebo-controlled trials. The output of the NMA was odds ratios and median probabilities of effectiveness with 95% credible interval ratios (comparable with confidence intervals). The probabilities of effectiveness were used to parameterise a new health economic model developed for this guideline update.
The NMA was undertaken in WinBugs® with additional expert support provided by the Technical Support Unit at NICE.
Incorporating health economics
The aims of the health economic input to the guideline were to inform the GDG of potential economic issues relating to urinary incontinence and to ensure that recommendations represented a cost effective use of healthcare resources. Health economic evaluations aim to integrate data on benefits (ideally in terms of QALYs), harms and costs of different care options.
The GDG prioritised a number of review questions where it was thought that economic considerations would be particularly important in formulating recommendations. Systematic searches for published economic evidence were undertaken for these questions. For economic evaluations, no standard system of grading the quality of evidence exists and included papers were assessed using a quality assessment checklist based on good practice in economic evaluation. Reviews of the relevant published health economic literature are presented alongside the clinical effectiveness reviews.
Health economic considerations were aided by original economic analysis undertaken as part of the development process. For this guideline the areas prioritised for economic analysis were:
The cost effectiveness of antimuscarinic drugs for overactive bladder after conservative management has been unsuccessful (incorporating a network meta-analysis of evidence of effectiveness).
The cost effectiveness of Botulinum Toxin A versus sacral nerve stimulation in the treatment of overactive bladder once pharmacological treatment has been unsuccessful.
A third analysis comparing surgical approaches for mid-urethral procedures in women undergoing their primary surgical tape procedure was considered. However, there was insufficient evidence of difference in effectiveness or cost between each type of procedure to undertake a health economic analysis.
Evidence to recommendations
For each review question recommendations for clinical care were derived using, and linked explicitly to, the evidence that supported them. In the first instance, informal consensus methods were used by the GDG to agree short clinical and, where appropriate, cost effectiveness evidence statements which were presented alongside the evidence profiles.
Statements summarising the GDG's interpretation of the evidence and any extrapolation from the evidence used to form recommendations were also prepared to ensure transparency in the decision-making process. The criteria used in moving from evidence to recommendations were:
relative value placed on the outcomes considered
consideration of clinical benefits and harms
consideration of net health benefits and resource use
quality of the evidence
other considerations (including equalities issues).
In areas where no substantial clinical research evidence was identified, the GDG members considered other evidence-based guidelines and consensus statements or used their collective experience to identify good practice. The health economics justification in areas of the guideline where the use of NHS resources (interventions) was considered was based on GDG consensus in relation to the likely cost effectiveness implications of the recommendations. The GDG members also identified areas where evidence to answer their review questions was lacking and used this information to formulate recommendations for future research.
Towards the end of the guideline development process, formal consensus methods (voting) were used to consider all the clinical care recommendations and research recommendations that had been drafted previously. The GDG identified ten ‘key priorities for implementation’ (key recommendations) and five high-priority research recommendations. The key priorities for implementation were those recommendations thought likely to have the biggest impact on clinical care and outcomes in the NHS as a whole. The priority research recommendations were selected in a similar way. Only a single round of voting was needed to reach consensus on the key priorities for implementation and the priority research recommendations.
Stakeholder involvement
Registered stakeholder organisations were invited to comment on the draft scope and the draft guideline. Stakeholder organisations were also invited to undertake a pre-publication check of the final guideline to identify factual inaccuracies. The GDG carefully considered and responded to all comments received from stakeholder organisations. The comments and responses, which were reviewed independently by NICE, are published on the NICE website.
Specific considerations for this guideline
Formal consensus voting
A formal consensus approach was used where it was agreed that a recommendation was required, but where the GDG was unable to reach a conclusion using discussion alone.
Methods
The formal consensus approach involved a series of action statements relating to management or treatment under review being drafted by the NCC-WCH technical team. These were collated into a consensus questionnaire. The GDG members were asked to independently complete the questionnaire stating their level of agreement (“strongly agree” to “strongly disagree”) with each statement and provide comments on where statements should be amended. The results of the voting were collated by the technical team. If 70% or more of the GDG members agreed or disagreed with a statement then consensus was reached. If there was no consensus the statement could be adapted based on comments and presented for a second round of voting, applying the same majority threshold. This process would go on until consensus was reached, at which point the statements were then used to draft recommendations. These were discussed and ratified at a subsequent GDG meeting.
The GDG made ‘a priori’ decisions regarding outcomes. For each outcome it defined thresholds for clinically important differences (also known as ‘minimal important difference’ [MID]) for all outcome measures which are summarised here:
For the outcome ‘Patient satisfaction with treatment’ the GDG agreed that, where possible, outcomes should be dichotomised into ‘improved’ and ‘not improved’ by combining categories, for example ‘very improved’ and ‘improved’. The outcome statistic (RR) default definitions of MID were 0.75 and 1.25.
For the outcome ‘Self reported rate of absolute symptom reduction’ the GDG agreed that a 50% reduction in symptoms constituted a clinically significant difference for both episodes of incontinence and episodes of urgency.
For the outcome ‘Continence status (zero episodes per day)’ the GDG accepted that this was a valid definition in itself. Again, we used the default definitions of MID for RR as above.
For the outcome ‘Incontinence-specific quality of life (QOL)’ the GDG agreed that only incontinence-specific quality of life should be used. The developers of these scales have published MIDs which can be used as the thresholds for clinically significant difference
For the outcome ‘Adverse effects’ the GDG agreed that this should vary from question to question. For example, for BoNT-A, the need for self-catheterisation was specified as the single most important adverse effect. Default definitions of MID for relative risk were adopted as above.
For the outcome ‘Psychological outcomes’ the GDG agreed that depression and anxiety were important outcomes. As with the I-QOL, an MID from the published literature would be used.
For the outcome ‘Clinical measures’ the GDG agreed that post-void residual volume was the single most important of the different clinical measures used. In the absence of data, a default MID of 25% change in post-void residual volume was used. This meant that if the intervention or control led to an improvement or worsening of 25% of the baseline values then this was considered clinically meaningful for both patient and clinician.