Methods

National Guideline Centre (UK)

4Methods

This chapter sets out in detail the methods used to review the evidence and to develop the recommendations that are presented in subsequent chapters of this guideline. There were unusual methodological challenges for this guideline: i) the majority of the review questions focused on the initial presentations which might raise suspicion of a number of possible neurological diagnoses, instead of the traditional diagnostic questions on signs and symptoms of a single, specific condition; ii) the large number of neurological conditions covered by the guideline; and iii) the sparsity or lack of published evidence. A mixture of alternative approaches, together with NICE standard methodology where possible, were adopted to address these challenges. Where systematic reviews of evidence have been undertaken, NICE standard methodology was used in accordance with the NICE guidelines manual, 2014 version,¹¹ a description of which is outlined in section 4.2.

The challenges that arose for each key area of the scope, and the methods chosen to address these, are summarised below:

For questions on symptoms, initial literature searches retrieved extremely large numbers of papers of very limited relevance for inclusion, or with no relevance at all. There was a lack of high quality evidence in the key areas the guideline is covering and there was likely to be limited available evidence due to the relative rarity of neurological conditions.
For questions on examinations and investigative tests, there was limited available evidence because general examination techniques are accepted practice and either free or low cost, so they are unlikely to be a research priority. In addition, clinicians will have already examined the patient to some extent before they suspect a neurological condition and will be familiar with standard practice examination techniques from their medical training.
For the question on information and support for patients, there would be difficulty developing a focused protocol due to a very broad population of this guideline population that covers a large number of neurological conditions. Therefore, it would be difficult to identify and give guidance on specific support and needs for these patients.

Therefore, an alternative approach was undertaken and is summarised below:

Prior to the first guideline committee meeting, the Chair (adult neurologist) and a committee member (paediatric neurologist) drafted an initial list of the signs and symptoms for which patients are currently referred to them for neurological assessment, based on their knowledge and expertise.
During the first 2 committee meetings, committee members discussed the circumstances under which referrals should and should not be made (based on their knowledge and expertise), considering both red and green flags, and identifying areas of uncertainty, contention and disagreement.
The committee then categorised the initial list of signs and symptoms based on consideration of whether an evidence review would add value. Decisions took into account whether the sign or symptom is being recognised already by non-specialists, whether there are issues with inappropriate referrals, whether there is disagreement about patients being referred or not, and whether there is a limited lack of published evidence based on their knowledge in the field.
The list of signs and symptoms was then split into 4 categories based on the committee’s discussions, and a mix of committee consensus and evidence reviews was used to formulate draft recommendations. A summary of the committee’s decisions and rationales is provided in appendix O.

The 4 categories were as follows:

Category 1: signs and symptoms for which there are current issues of over-referral or under-referral, where based on their knowledge and expertise the committee were in agreement that the optimal referral advice is clear and non-contentious and where an evidence review would not change this or add value. The committee drafted the recommendations for these signs and symptoms by consensus and no systematic reviews of evidence were undertaken.
Category 2: signs and symptoms for which it is unclear in which circumstances patients should be referred. Systematic evidence reviews were carried out for these signs and symptoms.
Category 3: signs and symptoms covered by recognition and referral recommendations in existing NICE guidance. The committee signposted to these recommendations where appropriate.
Category 4: signs and symptoms that are not prioritised for inclusion within the guideline either because current referral patterns are already working well or the presentation is too infrequent to require national guidance.

As the draft recommendations for signs and symptoms in category 1 are based on the committee’s knowledge and expertise and had not been produced with the benefit of an evidence review, the committee agreed that further validation from other external experts is necessary. This further validation was conducted through a targeted engagement exercise with a sample of expert advisors (outlined in section 4.1 below) which took place prior to the main stakeholder consultation.

Draft recommendations for signs and symptoms in categories 2 and 3 were not included in the targeted engagement exercise. Standard guideline methodology was followed for the signs and symptoms in category 2 (see sections 4.2). Recommendations in category 3 consist of cross-references to published NICE recommendations. Both categories 2 and 3 did not require feedback from expert advisors as these were in line with standard methodology.

As indicated at the start of this section, the committee judged that there was limited value in undertaking evidence reviews for most of the questions on investigative tests and all the questions on information for patients. Therefore, NICE agreed that the committee could make recommendations for these questions by consensus.

4.1. Targeted engagement exercise with external experts:

The targeted engagement exercise focussed on draft recommendations for referral (or non-referral) of signs and symptoms in category 1. Once the committee had formulated draft consensus recommendations for the identified questions, the targeted engagement exercise provided an opportunity for expert advisors to ratify and suggest changes to the recommendations for inclusion in the final draft of the guideline before the main consultation.

The targeted engagement exercise is summarised below:

The targeted engagement exercise took place towards the end of the development phase. An online survey using modified RAND rating for agreement was run for 4 weeks, from 23 January to 17 February 2017.
Recruitment of external experts: recruitment took place in late autumn 2016. A formal open recruitment process was followed. An advertisement was posted on the NICE website for 1 month, which explained the reasons for recruitment and highlighted the time commitment and input required. Key stakeholder organisations including the Association of British Neurologists, British Paediatric Neurology Association, Primary Care Neurology Society, Royal College of GPs and the Royal College of Emergency Medicine, and other specialist societies were notified of the advertisement. External experts were invited to apply with their CV and to declare any conflicting interests. The Chair and Guideline Lead checked external experts based on their level of expertise and the relevance of their role. External experts were asked to sign consent and confidentiality forms. The list of expert advisors is listed in appendix P.
Constituency of external experts: The aim was to recruit approximately 30 external experts to account for withdrawal and non-responders and to include both adult and paediatric neurology specialists and non-specialists including primary care physicians.
Online survey using modified RAND rating for agreement: An online survey was developed asking external experts whether they agreed with the draft recommendations. If they did not agree, external experts were asked to provide comments on their reasoning (and if possible, alternative suggestions or wording) for the committee’s consideration.
Results and analysis:
1. There were 43 external experts in total consisting of the following expertise: 9 adult neurologists (20.9%), 7 paediatric neurologists (16.3%), 9 general practitioners (20.9%) and 18 other professionals including paediatricians, psychiatrists and physiotherapists (41.9%).
2. The external experts were given the option of commenting on either the adult or the children’s recommendations or both. Fifteen external experts (34.9%) provided feedback on adult recommendations, 21 external experts (48.8%) provided feedback on the children’s recommendations and 7 external experts (16.3%) provided feedback on recommendations for both age groups.
3. The committee agreed that the threshold for agreement was 75%, which is a threshold that is widely used for other consensus methods such as Delphi and RAND and was determined to be a good threshold for agreement in a recent systematic review by Diamond et al (2014)⁴. However, the results showed that although most recommendations reached the 75% agreement threshold, many external experts had provided very helpful comments that would improve the clarity and content of the recommendations. Therefore, the committee revisited the majority of the recommendations in light of the external experts’ feedback. A summary of the feedback from the targeted engagement exercise and the actions the committee took for each recommendation is included in the ‘recommendations and link to evidence’ sections under ‘other considerations’.
Following the targeted engagement exercise, the revised draft recommendations were submitted for the main consultation with all stakeholders along with the other recommendations in categories 2 and 3.

All results from the targeted engagement exercise are available in appendix S.

4.2. Evidence reviews

Sections 4.2, 4.3, 4.4 and 4.5 describe the process used to identify and review clinical evidence (summarised in Figure 1), section 4.6 describes the process used to identify and review the health economic evidence, and section 4.7 describes the process used to develop recommendations.

Figure 1

Step-by-step process of review of evidence in the guideline.

4.3. Developing the review questions and outcomes

Review questions were developed using a framework of population, index tests, reference standard and target condition for reviews of diagnostic test accuracy and using population, presence or absence of factors under investigation and outcomes for clinical prediction reviews. This use of a framework guided the literature searching process, critical appraisal and synthesis of evidence, and facilitated the committee’s development of the recommendations. The NGC technical team drafted the review questions, and the committee refined and validated them. The questions were based on the key clinical areas identified in the scope (appendix A).

A total of 10 review questions were identified.

Full literature searches, critical appraisals and evidence reviews were completed for all the specified review questions.

For questions on category 2 symptoms, the committee wanted to consider any studies that determine whether a certain sign or symptom accompanying a main presenting symptom (for example, hearing loss in the presence of dizziness) is indicative of a neurological condition that requires onward referral for a specialist assessment. Therefore, measures of diagnostic accuracy including sensitivity, specificity, positive predictive value, negative predictive value, ROC and AUC were considered as the main outcomes of interest. However, the committee was aware that there was limited evidence available in this area; therefore, the committee considered clinical prediction studies where a multivariate analysis was conducted and adjusted odds ratios for outcomes of interest were presented. Hence, some of the clinical questions were reviewed using a mix of diagnostic and clinical prediction strategies.

Table 1

Review questions on category 2 symptoms.

4.4. Searching for clinical evidence

4.4.1. Clinical literature search

Systematic literature searches were undertaken to identify all published clinical evidence relevant to the review questions. Searches were undertaken according to the parameters stipulated within the NICE guidelines manual 2014.¹¹ Databases were searched using relevant medical subject headings, free-text terms and study-type filters where appropriate. Where possible, searches were restricted to papers published in English. Studies published in languages other than English were not reviewed. All searches were conducted in Medline and Embase, and additional searches made in The Cochrane Library for diagnostic test questions. Searches were run between 3 June 2016 and 19 October 2016. No papers published after 19 October 2016 were considered.

Search strategies were quality assured by crosschecking reference lists of highly relevant papers, analysing search strategies in other systematic reviews, and asking committee members to highlight any additional studies. Searches were quality assured by a second information specialist before being run. The questions, the study types applied, the databases searched, the years covered and the date each search was run can be found in appendix G.

The titles and abstracts of records retrieved by the searches were sifted for relevance, with potentially significant publications obtained in full text. These were assessed against the inclusion criteria.

During the scoping stage, a search was conducted for guidelines and reports on the websites listed below from organisations relevant to the topic.

Guidelines International Network database (www.g-i-n.net)
National Guideline Clearing House (www.guideline.gov)
NHS Evidence Search (www.evidence.nhs.uk).

All references sent by stakeholders were considered. Searching for unpublished literature was not undertaken.

Due to the broad population of the guideline and the inclusion of non-randomised trials, initial evidence searches retrieved significantly large numbers of abstracts (tens of thousands for each question). Therefore, to make the most efficient use of time and resources, the searches had to be specific. The committee was aware of a recent widely agreed classification that was published by the Neurological Intelligence Network (NIN)¹⁷ and which used the International Classification of Diseases codes (ICD10 2015) to define the conditions included in adult neurology in England. The committee used this classification to map out each sign and symptom to the relevant conditions, which were then used to narrow down the populations in the search strategy and hence obtain more precise and focused abstracts. For conditions relating to childhood conditions, the paediatric specialists on the committee used the NIN list as a basis for producing a list of relevant conditions for children. The conditions and population included for each search strategy are listed in the protocols in appendix C and in the search strategies in appendix G.

4.4.2. Health economic literature search

Systematic literature searches were also undertaken to identify health economic evidence within published literature relevant to the review questions. The evidence was identified by conducting a broad search relating to suspected neurological conditions in the NHS Economic Evaluation Database (NHS EED) and the Health Technology Assessment database (HTA) with no date restrictions, (NHS EED ceased to be updated after March 2015). Additionally, the search was run on Medline and Embase using a health economic filter, from January 2015, to ensure recent publications that had not yet been indexed by the economic databases were identified. Where possible, searches were restricted to papers published in English. Studies published in languages other than English were not reviewed.

The health economic search strategies are included in appendix G. All searches were updated on 9 March 2017. No papers published after this date were considered.

4.5. Identifying and analysing clinical evidence

Research fellows conducted the tasks listed below, which are described in further detail in the rest of this section:

Identified potentially relevant studies for each review question from the relevant search results by reviewing titles and abstracts. Full papers were then obtained.
Reviewed full papers against pre-specified inclusion and exclusion criteria to identify studies that addressed the review question in the appropriate population and reported on outcomes of interest (review protocols are included in appendix C).
Critically appraised relevant studies using the appropriate study design checklist as specified in the NICE guidelines manual.¹¹ Clinical prediction studies were critically appraised using NGC checklists. Generated summaries of the evidence by outcome. Outcome data were combined, analysed and reported according to study design:
- Clinical prediction data were meta-analysed where appropriate and reported in GRADE-like profile tables.
- Diagnostic data studies were meta-analysed where appropriate or presented as a range of values in adapted GRADE profile tables
A sample of a minimum of 10% of the abstract lists of the first 3 sifts by new reviewers, and a senior research fellow double sifted those for complex review questions (for example, clinical prediction reviews). Any discrepancies were rectified. All of the evidence reviews were quality assured by a senior research fellow. This included checking:
- papers were included or excluded appropriately
- a sample of the data extractions
- correct methods were used to synthesise data
- a sample of the risk of bias assessments.

4.5.1. Inclusion and exclusion criteria

The inclusion and exclusion of studies were based on the criteria defined in the review protocols, which can be found in appendix C. Excluded studies by review question (with the reasons for their exclusion) are listed in appendix L. The committee was consulted about any uncertainty regarding inclusion or exclusion.

The key population inclusion criterion was:

children, young people and adults who present to non-specialist settings with symptoms suggestive of a neurological condition.

The key population exclusion criterion was:

neonates (infants aged 28 days and under).

Conference abstracts were not automatically excluded from any review. The abstracts were initially assessed against the inclusion criteria for the review question and further processed when a full publication was not available for that review question. If the abstracts were included, the authors were contacted for further information. No relevant conference abstracts were identified for this guideline. Literature reviews, posters, letters, editorials, comment articles, unpublished studies and studies not in English were excluded.

4.5.2. Type of studies

For diagnostic accuracy review questions, diagnostic RCTs, cross-sectional studies and retrospective cohort studies were considered for inclusion. For clinical prediction review questions, only prospective and retrospective cohort studies that conducted a multivariate analysis including at least some of the predictors and key confounders identified in the protocols were included. Case–control studies were not included.

4.5.3. Methods of combining clinical studies

4.5.3.1. Data synthesis for diagnostic test accuracy reviews

Diagnostic test accuracy measures used in the analysis were area under the receiver operating characteristics (ROC) curve (AUC) and, for different thresholds (if appropriate), sensitivity and specificity. The threshold of a diagnostic test is defined as the value at which the test can best differentiate between those with and without the target condition. In practice, this varies amongst studies. If a test has a high sensitivity, then very few people with the condition will be missed (few false negatives). For example, a test with a sensitivity of 97% will only miss 3% of people with the condition. Conversely, if a test has a high specificity then few people without the condition would be incorrectly diagnosed (few false positives). For example, a test with a specificity of 97% will only incorrectly diagnose 3% of people who do not have the condition as positive. For most of the reviews in this guideline, sensitivity and specificity were considered to be of equal importance. Sensitivity was important because the consequences of missing a patient with a neurological condition would have serious implications, which can lead to rapid deterioration of health or even death. Specificity was important because incorrectly diagnosing an individual may result in inappropriate administration of medications or treatments. Coupled forest plots of sensitivity and specificity with their 95% CIs across studies were produced for each test, using RevMan5.²⁰ In order to do this, 2×2 tables (the number of true positives, false positives, true negatives and false negatives) were directly taken from the study if given or else were derived from raw data or calculated from the set of test accuracy statistics.

Diagnostic meta-analysis was conducted where appropriate, that is, when 3 or more studies were available per threshold. Test accuracy for the studies was pooled using the bivariate method for the direct estimation of summary sensitivity and specificity using a random-effects approach in WinBUGS software.²⁵ The advantage of this approach is that it produces summary estimates of sensitivity and specificity that account for the correlation between the 2 statistics. Other advantages of this method have been described elsewhere.¹⁸ ^,²³ ^,²⁴ The bivariate method uses logistic regression on the true positives, true negatives, false positives and false negatives reported in the studies. Overall sensitivity, specificity and confidence regions were plotted (using methods outlined by Novielli 2010.¹⁶) Pooled sensitivity and specificity and their 95% CIs were reported in the clinical evidence summary tables.

Heterogeneity or inconsistency amongst studies was visually inspected in the forest plots and pooled diagnostic meta-analysis plots.

4.5.3.2. Data synthesis for clinical prediction reviews

Odds ratios (ORs), risk ratios (RRs), or hazard ratios (HRs), with their 95% CIs, for the effect of the pre-specified clinical prediction factors were extracted from the studies. Studies were only included if the confounders the committee pre-specified were either matched at baseline or were adjusted in multivariate analysis.

Studies of lower risk of bias were preferred, taking into account the analysis and the study design. In particular, prospective cohort studies were preferred if they reported multivariate analyses that adjusted for the key confounders that the committee identified at the protocol stage for that outcome.

Data were not combined in meta-analyses for clinical prediction studies.

4.5.4. Appraising the quality of evidence by outcomes

4.5.4.1. Diagnostic studies

Risk of bias and indirectness of evidence for diagnostic data were evaluated by study using the Quality Assessment of Diagnostic Accuracy Studies version 2 (QUADAS-2) checklists (see appendix H in the NICE guidelines manual 2014¹¹). Risk of bias and applicability in primary diagnostic accuracy studies in QUADAS-2 consists of 4 domains (see Table 2):

patient selection
index test
reference standard
flow and timing.

Table 2

Summary of QUADAS-2 with list of signalling, risk of bias and applicability questions.

4.5.4.1.1. Inconsistency

Inconsistency refers to an unexplained heterogeneity of results for an outcome across different studies. Inconsistency was assessed by inspection of the sensitivity or specificity (based on the most important outcome for each particular review question) using the point estimates and 95% CIs of the individual studies on the forest plots. Particular attention was placed on values above or below 50% (diagnosis based on chance alone) and the threshold the committee set (the threshold above which it would be acceptable to recommend a test). For example, the committee might have set a threshold of 90% sensitivity as an acceptable level to recommend a test. The evidence was downgraded by 1 increment if the individual studies varied across 2 areas (for example, 50–90% and 90–100%) and by 2 increments if the individual studies varied across 3 areas (for example, 0–50%, 50–90% and 90–100%).

4.5.4.1.2. Imprecision

The judgement of precision was based on visual inspection of the confidence region around the summary sensitivity and specificity point from the diagnostic meta-analysis, if a diagnostic meta-analysis was conducted. Where a diagnostic meta-analysis was not conducted, imprecision was assessed according to the range of point estimates or, if only 1 study contributed to the evidence, the 95% CI around the single study. As a general rule (after discussion with the committee), a variation of 0–20% was considered precise, 20–40% serious imprecision, and >40% very serious imprecision. Imprecision was assessed on the primary outcome measure for decision-making.

4.5.4.1.3. Overall grading

Once an outcome had been appraised for the main quality elements, as above, an overall quality grade was calculated for that outcome. The scores (0, −1 or −2) from each of the main quality elements were summed to give a score that could be anything from 0 (the best possible) to −8 (the worst possible). However, scores were capped at −3. This final score was then applied to the starting grade that had originally been applied to the outcome by default, based on study design. Quality rating started at High for prospective and retrospective cross-sectional studies, and each major limitation (risk of bias, indirectness, inconsistency and imprecision) brought the rating down by 1 increment. Each of these studies started at High and the overall quality became Moderate, Low or Very Low if the overall score was −1, −2 or −3 points respectively. The significance of these overall ratings is explained in Table 3. The reasons for downgrading in each case were specified in the footnotes of the GRADE tables.

Table 3

Overall quality of outcome evidence in modified GRADE.

4.5.4.2. Clinical prediction reviews

A modified GRADE methodology was used for clinical prediction studies, considering risk of bias, indirectness, inconsistency and imprecision.

The quality of evidence for risk prediction studies was evaluated according to the criteria given in Table 4. This table was adapted from the Quality In Prognosis Studies tool (QUIPS)⁵. If data were meta-analysed, the quality for pooled studies was presented. If the data were not pooled, then a quality rating was presented for each study.

Table 4

Description of quality elements for prospective studies (adapted from the QUIPS tool).

The risk of bias rating was assigned per study for each combination of risk factor or outcome. When studies were pooled, the overall risk of bias for all studies covering a specific risk factor or outcome was determined by a weighted mean of the ratings across the studies (no risk = 0; serious risk = −1 and very serious risk = −2). The weighting depended on the weighting used in the meta-analysis, as in intervention reviews. Where a meta-analysis had not been conducted, a simple average was used.

4.5.4.2.1. Indirectness

Indirectness refers to the extent to which the populations, risk factors and outcome measures are dissimilar to those defined in the inclusion criteria for the reviews, as explained for intervention reviews. As for risk of bias, each outcome had its indirectness assessed within each study first. For each paper, if there were no sources of indirectness, indirectness was given a rating of 0. If there was indirectness in just 1 source (for example, in terms of population), indirectness was given a ‘serious’ rating of −1, but if there was indirectness in 2 or more sources (for example, in terms of population and risk factor), the indirectness was given a ‘very serious’ rating of −2. A weighted average score was then calculated across all studies contributing to the outcome, by taking into account the weights in the meta-analysis.

4.5.4.2.2. Inconsistency

Inconsistency refers to an unexplained heterogeneity of results for an outcome across different studies. When heterogeneity existed within an outcome (chi-squared p<0.1, or I²>50%), but no plausible explanation could be found, the quality of evidence for that outcome was downgraded. Inconsistency for that outcome was given a ‘serious’ score of −1 if the I² was 50–74% and a ‘very serious’ score of −2 if the I² was 75% or more.

If inconsistency could be explained based on pre-specified subgroup analysis (that is, each subgroup had an I²<50%), the committee considered this and considered whether to make separate recommendations on new outcomes based on the subgroups defined by the assumed explanatory factors. In such a situation, the quality of evidence was not downgraded for those emergent outcomes.

4.5.4.2.3. Imprecision

The criteria applied for imprecision were based on the confidence intervals around the estimate of association between the risk factor/predictor and the outcome (condition of interest). The decision to downgrade was discussed with the committee and was based on the interpretations of the width of the confidence intervals and how certain the committee was in drawing a conclusion (that is, how certain that there is no association, or a positive association, or a negative association (protective) between the risk factor/predictor and the outcome (condition of interest).

4.5.4.2.4. Overall grading

Quality rating started at High for prospective studies, and each major limitation brought the rating down by 1 increment to a minimum grade of Very Low, as explained for diagnostic reviews above. For clinical prediction reviews, prospective cohort studies with a multivariate analysis are regarded as the gold standard because RCTs are usually inappropriate for these types of review for ethical or pragmatic reasons. Furthermore, if the study looked at more than 1 risk factor of interest then randomisation would be inappropriate, as it can only be applied to 1 of the risk factors.

4.5.5. Clinical evidence statements

Clinical evidence statements are summary statements that are included in each review chapter, and which summarise the key features of the clinical effectiveness evidence presented. The wording of the evidence statements reflects the certainty or uncertainty in the estimate of effect. The evidence statements are presented by outcome and encompass the following key features of the evidence:

the number of studies and the number of participants for a particular outcome
an indication of the direction of clinical importance (if 1 treatment is beneficial or harmful compared to the other, or whether there is no difference between the 2 tested treatments)
a description of the overall quality of the evidence (GRADE overall quality).

4.6. Identifying and analysing evidence of cost effectiveness

The committee is required to make decisions based on the best available evidence of both clinical effectiveness and cost effectiveness. Guideline recommendations should be based on the expected costs of the different options in relation to their expected health benefits (that is, their ‘cost effectiveness’) rather than the total implementation cost. However, the committee will also need to be increasingly confident in the cost effectiveness of a recommendation as the cost of implementation increases. Therefore, the committee may require more robust evidence on the effectiveness and cost effectiveness of any recommendations that are expected to have a substantial impact on resources; any uncertainties must be offset by a compelling argument in favour of the recommendation. The cost impact or savings potential of a recommendation should not be the sole reason for the committee’s decision.¹¹

Health economic evidence was sought relating to the key clinical issues being addressed in the guideline. Health economists:

Undertook a systematic review of the published economic literature.
Considered options for new cost-effectiveness analysis in priority areas.

4.6.1. Literature review

The health economists identified potentially relevant studies for each review question from the health economic search results by reviewing titles and abstracts. Full papers would then have been obtained, reviewed against inclusion and exclusion criteria and critically appraised where relevant. However, for this guideline, a review of the health economic search results did not lead to the identification of any potentially relevant studies. Therefore, none were ordered or critically appraised. A second health economist reviewed and confirmed the decision not to order any full papers.

4.6.1.1. Inclusion and exclusion criteria

Full economic evaluations (studies comparing costs and health consequences of alternative courses of action: cost–utility, cost-effectiveness, cost–benefit and cost–consequences analyses) and comparative costing studies that addressed the review question in the relevant population were considered potentially includable as health economic evidence.

Studies that only reported cost per hospital (not per patient), or only reported average cost effectiveness without disaggregated costs and effects, were excluded. Literature reviews, abstracts, posters, letters, editorials, comment articles, unpublished studies and studies not in English were excluded. Studies published before 2000 and studies from non-OECD countries or the USA were also excluded, on the basis that the applicability of such studies to the present UK NHS context is likely to be too low for them to be helpful for decision-making.

When no relevant health economic studies were found from the economic literature review, relevant UK NHS unit costs related to the compared interventions were presented to the committee to inform the possible economic implications of the recommendations.

4.6.2. Undertaking new health economic analysis

As well as reviewing the published health economic literature for each review question, as described above, the health economist undertook a new health economic analysis in selected areas. The committee considered the priority areas for a new analysis after the formation of the review questions and consideration of the existing health economic evidence.

Due to the absence of economic and clinical evidence regarding individual review questions, no review questions were selected for original economic analysis.

Instead, a brief cost-impact analysis was conducted of the cost of neurological outpatient appointments (appendix N). This looked at the costs of adult and child neurology appointments and the current total annual cost of such appointments to the NHS.

This information was then used in each of the evidence reviews as a comparison point by which any changes to current practice included in the recommendations made in that review could be judged in terms of changes to cost impact, particularly with respect to any recommendations that might be expected to increase the number of referrals by GPs to neurology services for assessments, and hence to increase the total cost of neurology services compared to current practice. This is discussed in the recommendations and link to evidence tables in each section of the guideline.

4.6.3. Cost-effectiveness criteria

NICE’s report ‘Social value judgements: principles for the development of NICE guidance’ sets out the principles that committees should consider when judging whether an intervention offers good value for money.¹² In general, an intervention was considered to be cost effective (given that the estimate was considered plausible) if either of the following criteria applied:

the intervention dominated other relevant strategies (that is, it was both less costly in terms of resource use and more clinically effective compared with all the other relevant alternative strategies), or
the intervention cost less than £20,000 per QALY gained compared with the next best strategy.

If the committee recommended an intervention that was estimated to cost more than £20,000 per QALY gained, or did not recommend one that was estimated to cost less than £20,000 per QALY gained, the reasons for this decision are discussed explicitly in the ‘Recommendations and link to evidence’ section of the relevant chapter, with reference to issues regarding the plausibility of the estimate or to the factors set out in ‘Social value judgements: principles for the development of NICE guidance’.¹²

When QALYs, or life years gained, are not used in the analysis, results are difficult to interpret unless one strategy dominates the others with respect to every relevant health outcome and cost.

4.6.4. In the absence of health economic evidence

When no relevant published health economic studies were found and a new analysis was not prioritised, the committee made a qualitative judgement about cost effectiveness by considering expected differences in resource use between options and relevant UK NHS unit costs, alongside the results of the review of clinical effectiveness evidence.

The UK NHS costs reported in the guideline are those that were presented to the committee and were correct at the time recommendations were drafted. They may have changed subsequently before the time of publication. However, we have no reason to believe they have changed substantially.

4.7. Developing recommendations

Over the course of the guideline development process, the committee was presented with:

Evidence tables of the clinical evidence reviewed from the literature. All evidence tables are in appendix H.
Summaries of clinical and health economic evidence and quality (as presented in chapters 5-7).
Forest plots and summary ROC curves (appendix K).
A description of the methods and results of the cost-impact analysis undertaken for the guideline (appendix N).

Recommendations were drafted based on the committee’s interpretation of the available evidence, taking into account the balance of benefits, harms and costs between different courses of action. This was completed either formally in an economic model or informally. Firstly, the net clinical benefit over harm (clinical effectiveness) was considered, focusing on the critical outcomes. When this was done informally, the committee took into account the clinical benefits and harms when one intervention was compared with another. The assessment of net clinical benefit was moderated by the importance placed on the outcomes (the committee’s values and preferences) and the confidence the committee had in the evidence (evidence quality). Secondly, the committee assessed whether the net clinical benefit justified any differences in costs between the alternative interventions.

When clinical and health economic evidence was of poor quality, conflicting or absent, the committee drafted recommendations based on its expert opinion. The considerations for making consensus-based recommendations include the balance between potential harms and benefits, the economic costs compared to the economic benefits, current practices, recommendations made in other relevant guidelines, patient preferences and equality issues. The consensus recommendations were agreed through discussions in the committee. The committee also considered whether the uncertainty was sufficient to justify a delay in making a recommendation to await further research, taking into account the potential harm of failing to make a clear recommendation (see section 4.7.1 below).

The committee considered the appropriate ‘strength’ of each recommendation. This takes into account the quality of the evidence but is conceptually different. Some recommendations are ‘strong’ in that the committee believes that the vast majority of healthcare and other professionals and patients would choose a particular intervention (referral in this guideline) if they considered the evidence in the same way that the committee has. This is generally the case if the benefits clearly outweigh the harms for most people and the intervention is likely to be cost effective. However, there is often a closer balance between benefits and harms, and some patients would not choose an intervention whereas others would. This may happen, for example, if some patients are particularly averse to some side effects and others are not. In these circumstances, the recommendation is generally weaker, although it may be possible to make stronger recommendations about specific groups of patients.

The committee focused on the following factors in agreeing the wording of the recommendations:

The actions health professionals need to take.
The information readers need to know.
The strength of the recommendation (for example, the word ‘offer’ was used for strong recommendations and ‘consider’ for weaker recommendations).
The involvement of patients (and their carers if needed) in decisions on treatment and care.
Consistency with NICE’s standard advice on recommendations about drugs, waiting times and ineffective interventions (see section 9.2 in the 2014 NICE guidelines manual¹¹).

The main considerations specific to each recommendation are outlined in the ‘Recommendations and link to evidence’ sections within each chapter.

4.7.1. Research recommendations

When areas were identified for which good evidence was lacking, the committee considered making recommendations for future research. Decisions about the inclusion of a research recommendation were based on factors such as:

the importance to patients or the population
national priorities
potential impact on the NHS and future NICE guidance
ethical and technical feasibility.

Although research questions were debated, in the end, the committee did not make any future research recommendations. This was partly because many of the questions had not been the subject of a literature search (see reasons in section 4), and therefore the committee could not be absolutely certain that questions had not been addressed already. In addition, the practical difficulties of research into presenting symptoms made it, in the committee’s judgement, unlikely that the projects would be commissioned.

4.7.2. Validation process

This guidance is subject to a 6-week public consultation and feedback as part of the quality assurance and peer review of the document. All comments received from registered stakeholders are responded to in turn and posted on the NICE website.

4.7.3. Updating the guideline

Following publication, and in accordance with the NICE guidelines manual, NICE will undertake a review of whether the evidence base has progressed significantly to alter the guideline recommendations and warrant an update.

4.7.4. Disclaimer

Healthcare providers need to use clinical judgement, knowledge and expertise when deciding whether it is appropriate to apply guidelines. The recommendations cited here are a guide and may not be appropriate for use in all situations. The decision to adopt any of the recommendations cited here must be made by practitioners in light of individual patient circumstances, the wishes of the patient, clinical expertise and resources.

The National Guideline Centre disclaims any responsibility for damages arising out of the use or non-use of this guideline and the literature used in support of this guideline.

4.7.5. Funding

The National Guideline Centre was commissioned by the National Institute for Health and Care Excellence to undertake the work on this guideline.

Publication Details

Copyright

Publisher

National Institute for Health and Care Excellence (NICE), London

NLM Citation

National Guideline Centre (UK). Suspected neurological conditions: recognition and referral. London: National Institute for Health and Care Excellence (NICE); 2019 May. (NICE Guideline, No. 127.) 4, Methods.

Chapter	Type of review	Review questions	Outcomes
5.2.1 Dizziness and vertigo in adults	Diagnostic and clinical prediction	In adults and young people who present with dizziness or vertigo, what is the accuracy of accompanying signs and symptoms to support non-specialists in identifying neurological conditions?	Main outcomes: Sensitivity (%) and specificity (%) Area under the ROC curve (AUROC) – measure of predictive accuracy Positive/negative predictive value Other outcomes: Adjusted odds ratios for the presence of the following conditions: central nervous system causes such as posterior circulation strokes and other (migraines, tumours) peripheral vestibular disorders, including posterior semi-circular canal dehiscence, BPPV, and labyrinthitis cardiovascular disorders (presyncope, postural hypotension) functional disorders vertebrobasilar insufficiency.
5.2.2 Dizziness and vertigo in adults	Diagnostic	In people with suspected (or under investigation for) new onset of vertigo or dizziness, is the HINTS (Head-Impulse—Nystagmus—Test-of-Skew) test effective in identifying whether there is a central nervous system cause, as indicated by the reference standard, MRI?	Main outcomes: Sensitivity (%) and specificity (%) Area under the ROC curve (AUROC) – measure of predictive accuracy Positive/negative predictive value
5.3 Atraumatic facial pain in adults	Diagnostic and clinical prediction	In adults who present with atraumatic facial pain, what is the accuracy of accompanying signs and symptoms to support non-specialists in identifying suspected neurological conditions?	Main outcomes: Sensitivity (%) and specificity (%) Area under the ROC curve (AUROC) – measure of predictive accuracy Positive/negative predictive value Other outcomes: Adjusted odds ratios for the presence of the following conditions: carotid and vertebral artery dissection cluster headache dental pain max sinusitis migraine facial pain occipital neuralgia temporal arteritis tension headache TMJ dysfunction trigeminal neuralgia.
5.8.2 Memory tests in adults	Diagnostic	In adults under 40 with suspected (or under investigation for) memory failure, what is the negative predictive value of neuropsychological assessments in ruling out organic memory failure?	Main outcomes: Sensitivity (%) and specificity (%) Area under the ROC curve (AUROC) – measure of predictive accuracy Positive/negative predictive value
5.10 Sensory symptoms such as tingling or numbness	Diagnostic and clinical prediction	In people (adults, young people and children) who present with tingling or altered sensation in the body, what is the accuracy of accompanying signs and symptoms to support non-specialists in identifying suspected neurological conditions?	Main outcomes: Sensitivity (%) and specificity (%) Area under the ROC curve (AUROC) – measure of predictive accuracy Positive/negative predictive value Other outcomes: Adjusted odds ratios for the presence of the following conditions: compression neuropathy (for example, carpal tunnel syndrome and Meralgia parasthetica) demyelination drug toxicity – chemotherapy, alcohol, platinum-based drugs functional (hyperventilation) mononeuropathy multiplex peripheral neuropathy radiculopathy seizures small fibre neuropathy TIAs tethering of the spinal cord.
5.15 Tremor in adults	Diagnostic and clinical prediction	In adults and young people who present with tremor, what is the accuracy of accompanying signs and symptoms to support non-specialists in identifying neurological conditions?	Main outcomes: Sensitivity (%) and specificity (%) Area under the ROC curve (AUROC) – measure of predictive accuracy Positive/negative predictive value Other outcomes: Adjusted odds ratios for the presence of the following conditions: cerebellar tremors drug-related tremors dystonic tremor (task-specific tremor) essential tremor neuropathic tremor parkinsonism physiological tremor primary orthostatic tremor psychogenic tremors thyroid disorder.
7.2 Blackouts and paroxysmal events in children	Diagnostic and clinical prediction	In children and babies who present with paroxysmal events, what is the accuracy of accompanying signs and symptoms to support non-specialists in identifying suspected neurological conditions?	Main outcomes: Sensitivity (%) and specificity (%) Area under the ROC curve (AUROC) – measure of predictive accuracy Positive/negative predictive value Other outcomes: Adjusted odds ratios for the presence of the following conditions: behavioural (that is, temper tantrums, breath-holding attacks and emotional disorders) cardiac disorders – long QT, left ventricular outflow obstruction epilepsy reflex anoxic seizures vasovagal syncope or postural hypotension.
7.5 Headaches in children	Diagnostic and clinical prediction	In children under 12 who present with headache, what is the accuracy of accompanying signs and symptoms to support non-specialists in identifying suspected neurological conditions?	Main outcomes: Sensitivity (%) and specificity (%) Area under the ROC curve (AUROC) – measure of predictive accuracy Positive/negative predictive value Other outcomes: Adjusted odds ratios for the presence of the following conditions: brain tumour chronic daily headaches hydrocephalus idiopathic intracranial hypertension intracranial infection migraine nocturnal hypoventilation raised intracranial pressure sinusitis venous sinus thrombosis.
7.6 Head shape or size abnormalities in children	Diagnostic and clinical prediction	In children and babies who present with abnormal head shape or size, what is the accuracy of accompanying signs and symptoms to support non-specialists in identifying neurological problems?	Main outcomes: Sensitivity (%) and specificity (%) Area under the ROC curve (AUROC) – measure of predictive accuracy Positive/negative predictive value Other outcomes: Adjusted odds ratios for the presence of the following conditions: microcephaly familial macrocephaly hydrocephalus positional plagiocephaly single suture synostosis multiple suture synostosis syndromic synostosis growing skull fracture.
7.9.2 Motor developmental delay in children	Diagnostic	In children and infants under 10 years of age who present with motor developmental delay, is a creatine kinase (CK) test accurate in identifying whether muscular dystrophy is present as compared to no test (and as indicated by the reference standard, diagnosis at follow-up)?	Main outcomes: Sensitivity (%) and specificity (%) Area under the ROC curve (AUROC) – measure of predictive accuracy Positive/negative predictive value

Domain	Patient selection	Index test	Reference standard	Flow and timing
Description	Describe methods of patient selection. Describe included patients (prior testing, presentation, intended use of index test and setting)	Describe the index test and how it was conducted and interpreted	Describe the reference standard and how it was conducted and interpreted	Describe any patients who did not receive the index test(s) or reference standard or who were excluded from the 2×2 table (refer to flow diagram). Describe the time interval and any interventions between index test(s) and reference standard
Signalling questions (yes/no/unclear)	Was a consecutive or random sample of patients enrolled?	Were the index test results interpreted without knowledge of the results of the reference standard?	Is the reference standard likely to correctly classify the target condition?	Was there an appropriate interval between index test(s) and reference standard?
	Was a case–control design avoided?	If a threshold was used, was it pre-specified?	Were the reference standard results interpreted without knowledge of the results of the index test?	Did all patients receive a reference standard?
	Did the study avoid inappropriate exclusions?			Did all patients receive the same reference standard?
	Did the study avoid inappropriate exclusions?			Were all patients included in the analysis?
Risk of bias; (high/low/unclear)	Could the selection of patients have introduced bias?	Could the conduct or interpretation of the index test have introduced bias?	Could the reference standard, its conduct or its interpretation have introduced bias?	Could the patient flow have introduced bias?
Concerns regarding applicability (high/low/unclear)	Are there concerns that the included patients do not match the review question?	Are there concerns that the index test, its conduct, or interpretation differ from the review question?	Are there concerns that the target condition as defined by the reference standard does not match the review question?

Level	Description
High	Further research is very unlikely to change our confidence in the estimate of effect
Moderate	Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate
Low	Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate
Very low	Any estimate of effect is very uncertain

Domain	Risk of bias clinical prediction studies	Response and score
Selection bias	Was there a lack of reported attempts made to achieve some group comparability between the risk factor and non-risk factor groups? (ignore if 2 or more risk factors considered)	Consider if this was moderate, high or very high risk of bias if answer was ‘yes’.
	Was there a lack of consideration of any of the key confounders, or was this unclear? Note that if the study can show that a particular confounder was not at risk of causing bias (for example, by being well-matched at baseline between groups), then this confounder does not have to be adjusted for a multivariate analysis.	Exclude.
	Was there a lack of consideration of non-key plausible confounders, or was this unclear? Note that if the study can show that a particular confounder was not at risk of causing bias (for example, by being well-matched at baseline between groups), then this confounder does not have to be adjusted for a multivariate analysis.	Consider if this was moderate, high or very high risk of bias if answer was ‘yes’.
	If the outcome is categorical: Were there <10 events per variable included in the multivariable analysis? If the outcome is continuous: Were there <10 people per variable included in the multivariable analysis?	Consider if this was moderate, high or very high risk of bias if answer was ‘yes’ to either.
	Was it very clear that 1 group was more likely to have had more outcomes occurring at baseline than another group?	Consider if this was moderate, high or very high risk of bias if answer was ‘yes’.
Detection bias	Was there a lack of assessor blinding AND the outcome was not completely objective?	Consider if this was moderate, high or very high risk of bias if answer was ‘yes’.
	Were the risk factors measured in a way that would systematically favour either group?	Consider if this was moderate, high or very high risk of bias if answer was ‘yes’.
	Were the outcomes measured in a way that would systematically favour either group?	Consider if this was moderate, high or very high risk of bias if answer was ‘yes’.
	If there were multiple raters, was there lack of adjustment for systematic inter-rater measurement errors, or was inter-rater reliability unreported?	Consider if this was moderate, high or very high risk of bias if answer was ‘yes’.
	Was there an excessively short follow-up, such that there was not enough time for outcomes to occur?	Consider if this was moderate, high or very high risk of bias if answer was ‘yes’.
Attrition bias	Was there >10% group differential attrition (for reasons related to outcome) and there was no appropriate imputation? (if 1 risk factor) or Was there >10% overall attrition (for reasons related to outcome) and there was no appropriate imputation? (if > 1 risk factor).	Consider if this was moderate, high or very high risk of bias if answer was ‘yes’. Consider if this was moderate, high or very high risk of bias if answer was ‘yes’.
For each domain make a judgement of risk of bias (for example, very high if there are 2 moderate boxes and a high box). Sum these domain risks to form an overall rating of risk of bias (for example, no risk, serious risk or very serious risk).