The recommendations are summarised in diagrammatic form in management algorithms 1 and 2. The algorithms are flexible to allow for variation in local test accuracy, fitness and patient preference.
Qualifying Statement (Sequence of Investigations)
The evidence reviewed for the accuracy of diagnosis and staging investigations also served to inform about the best sequence of tests, but there was no evidence of sufficient quality that specifically compared different sequences. Expert opinion was used to make recommendations based on consideration of the clinical scenario, accuracy of the test and safety. The results of the health economic model were also used to make recommendations for mediastinal sampling.
Peripheral lesions
The recommendation to use transthoracic needle aspiration or biopsy was made on the basis of the accuracy of the test but as this comes at a cost of more frequent complications (pneumothorax requiring intervention 3-5%, death 0.1%), alternative approaches were considered. The recommendation to only use this test when treatment planning depended on the result was made partly from expert opinion but also from the evidence for the accuracy of other tests. There was insufficient evidence to make recommendations about newer techniques such as radial endobronchial ultrasound, electromagnetic navigation, fluoroscopy and ultra-thin bronchoscopy. These are time-consuming, have lower sensitivities than transthoracic needle aspiration and are not widely used in the UK.
Central lesions
Recommendations were based on the evidence for accuracy of bronchoscopy and expert opinion.
Mediastinal sampling
Although evidence was available about the accuracy of non-US guided TBNA, EBUS-TBNA, EUS-FNA and surgical sampling techniques, there were no studies that provided adequate evidence to suggest the most effective sequence of tests according to the pre-test probability of mediastinal malignancy. Specifically, studies did not analyse test accuracy by lymph node size. The GDG defined three categories according to appearance on CT scan: i) no enlarged lymph nodes (all <10 mm short axis and hence a peripheral lesion only on CT); ii) one or two discreet lymph nodes 10 to 20 mm short axis; and iii) lymph nodes >20 mm. These categories correspond to low (15%), intermediate (50%) and high (>85%) probabilities of mediastinal malignancy respectively. A fourth category, where the CT scan shows multiple and bulky lymph nodes that display obvious malignant features, such as invasion of mediastinal structures may not require specific further investigation unless they are the only source of diagnostic material. It is recognised that these categories are simplified, for example, it is known that a higher tumour stage in the TNM classification is associated with a greater prevalence of nodal malignancy.
The health economic model showed that for low probability (nodal short axis diameter <10mm) the test that dominated was PET-CT alone, for intermediate probability (1 or more nodes 10-20mm), PET-CT followed by non-US guided TBNA and for high probability (nodes >20mm), Neck US then non-US guided TBNA then PET-CT. For all probabilities of malignancy, but for intermediate probability in particular, there were other sequences that were very close to the most cost effective and given the assumptions made in the model, expert opinion was employed to reflect this in the recommendations about the sequence of tests. It should also be noted that the test accuracy measured in studies may not reflect that in general use and so a degree of flexibility has been incorporated into the management algorithms. Thus for intermediate probability nodes US guided or non-US guided needle sampling is recommended to reflect the fact that US guided tests have greater accuracy yet are still well below the cost per QALY threshold of £20,000, compared the next best strategy. The GDG recognised that these categories were simplified and that they would not represent all patients. For example, it is known that a higher tumour stage in the TNM classification is associated with a greater prevalence of nodal malignancy.
Where mediastinal adenopathy has obvious malignant features such as invasion of mediastinal structures, the probability of malignancy is very high. Expert opinion was used to categorise these within the management pathway for central lesions.
A single randomised trial of endosonographic mediastinal staging (combined EBUS and EUS) versus surgical staging showed that when endosonography is combined with surgical staging, sensitivity for detection of mediastinal malignancy was significantly increased over that of surgical staging alone. In addition, it was found that the sensitivity of endosonographic staging alone was equivalent to that of surgical staging. These findings were used to make the recommendation that combined EBUS and EUS can be used for the initial sampling of mediastinal lymph nodes as an alternative to surgical staging. However, the GDG felt that surgical staging is still indicated where endosonographic assessment is negative if clinical suspicion of mediastinal nodal disease remains high and hence made a recommendation to that effect.
Ultrasound of the neck ± biopsy
No studies that met inclusion criteria were found on US of the neck ± biopsy. Recommendations were therefore based on knowledge of limited case series and expert opinion.
Distant metastases
Recommendations about the place of CT and MRI in symptomatic and asymptomatic individuals with cerebral metastases were made from smaller comparative studies that showed that MRI is superior to CT but that CT will detect identify cerebral metastases in the majority of affected patients.
Recommendations about the use of PET-CT prior to treatment with curative intent were in part based on evidence that showed that adrenal lesions are readily detected by PET-CT (sensitivity of 94 to 100% and specificity of 80 to 100%). Recommendations concerning detection of bone metastases were based on the evidence review for bone scintigraphy including SPECT and PET-CT. Compared with PET-CT, the sensitivity of scintigraphy is less, though specificity may be better. The evidence for the use of MRI and plain x-ray was not reviewed as part of the 2011 update.
The evidence for the effectiveness of different diagnostic and staging tests for patients with suspected or confirmed NSCLC consisted of ninety-seven studies that ranged in quality from low to high and examined the following diagnostic and staging tests: Bronchoscopy (including endobronchial and endoscopic ultrasound and transbronchial biopsy), needle biopsy of the lung (including percutaneous biopsy), radionuclide imaging (PET-CT, NeoSpect, PET), ultrasound-guided biopsy of cervival lymph nodes, other biopsies of metastatic sites (other than lung), pleural biopsy, thoracoscopy (including medical and pleuroscopy), surgical techniques (including VATS, mediastinoscopy/mediastinotomy, frozen section), observation, and MRI/CT of the brain. The ranges of sensitivities and specificities reported by the studies of moderate to high quality for the different diagnostic and staging tests are summarised in .
Sensitivities and specificities of various diagnostic and staging tests for suspected/confirmed lung cancer reported by the moderate-high quality studies.
Health economic evidence
In the 2005 NICE Lung cancer guideline (NICE 2005), the staging of non-small cell lung cancer was prioritised for independent economic modelling. Accurate diagnostic and staging information, particularly of mediastinal disease, helps the clinician decide which patients are suitable for treatment with curative intent; mediastinal lymph-node involvement reduces the chance of surgery being curative. Since 2005 a number of minimally invasive techniques have started to be used in some centres, and whilst PET-CT scanners are now routinely available, a question remains over where best to use them in the diagnostic and staging pathway.
An economic model was developed to assess the cost-effectiveness of PET-CT, TBNA, EBUS, mediastinoscopy and neck ultrasound in 26 clinically relevant sequences, from a UK NHS perspective see . A detailed description of methods and results can be found in appendix 4. Separate analyses were run in three subgroups of patients with non-small cell lung cancer in which the prevalence of nodal and distant metastatic disease was low, intermediate or high. Not all staging strategies were considered by the GDG to be clinically relevant alternatives in each population subgroup; therefore the strategies considered in each analysis differ.
Test sequences considered in each subgroup analysis.
A decision tree approach was taken to model the staging alternatives with an embedded Markov process to model the longer term consequences resulting from treatment. For the purposes of the model, PET-CT only provides information on the presence of metastatic disease. If PET-CT is positive the patient is treated for distant metastasis. If PET-CT is negative the next test in the sequence is performed.
All other tests provide the clinician with information on the presence of nodal disease (defined as N2 or N3). If a test is positive the patient is treated for N2/3 M0 disease. Again, if a test is negative the next test is the sequence is performed.
The Markov model at the end of the decision tree branch is a simplified version of the natural progression of disease, accounting only for the possibility of death. Different stages of disease progression are not captured. Death can occur in the model as a result of a mediastinoscopy (in 0.5% of cases) or any other cause.
The decision about which treatment to offer patients on the basis of the staging test results was not evaluated in terms of cost-effectiveness (there are no embedded decision nodes in the decision tree). Instead the downstream consequences of the staging tests have been captured, as typified in current clinical practice or best practice as defined by relevant NICE guidance including recommendations within this guideline.
The model was populated with data from different sources considered to provide the best available evidence, as shown in :
Data sources used in to populate the model.
Data from the National Lung Cancer Audit was chosen over randomised controlled trial data since they capture the real treatment options offered to patients, given the stage of their disease, thus increasing the external validity of the model results.
Data on test accuracy was not reported for our three sub groups, so were dictated by expert opinion from the GDG. Gaps in data on test accuracy (in the three patient subgroups), quality of life and the cost of EBUS were acknowledged and assumptions were made by the GDG. Despite the rich source of data for survival estimates from National Lung Cancer Audit, we had no information about patients' survival from treatment given as the result of misleading test results (i.e. false positives or false negatives) so assumptions were made about the resulting survival outcomes in these patients.
In accordance with the perspective of this analysis, the only costs considered were those relevant to the UK NHS. Costs were estimated in 2008-9 prices (since this is the price year from the most recent edition of NHS Reference costs, published June 2010). Five categories of costs considered in the model; the cost of diagnostic tests, the cost of treatment, the cost of treating adverse events, the cost of follow-up and the cost of supportive and palliative care which was applied to all patients regardless of which (if any) anti-cancer treatment they initially received.
Deterministic sensitivity analysis was conducted on relevant parameters in order to identify variables which contribute most to the uncertainty surrounding the results of the model. The results of the cost-effectiveness analyses show that different sequences of staging tests are likely to be cost-effective in different subgroups of patients, see below.
These results may seem on the surface to be counter-intuitive. Those sequences of tests which lead to more accurate staging information do not lead to overall better outcomes for patients. However, test performance is only a surrogate endpoint – and the results of all three analyses are heavily dependent on assumptions made about downstream treatment decisions. Within the context of the model, strategies resulting in a higher number of false negatives allow a great proportion of patients with N2/3 disease to be offered surgery and other options for treatment with curative intent. Similarly if metastatic disease is missed, patients still achieve better outcomes with treatment for curative intent than with no anti-cancer treatment.
The sensitivity analysis performed showed the model was reasonably robust to small changes in the treatment options, the choice of radiotherapy schedules, the price of chemotherapy drugs, the price of diagnostic tests, the death rate from mediastinoscopy, changes in utility values as well as some assumptions about the choice of survival estimates for patients incorrectly staged. Other assumptions about utility values could not be tested without changing the model structure. Test accuracy data was not available for the three subgroups identified as relevant to the decision problem; as such we have relied on the expert opinion of the GDG.
Despite these acknowledged limitations, these three analyses provided the GDG with useful information in their deliberations when preparing recommendations on the best sequence in which to use tests to stage mediastinal disease in different subgroups of patients.