U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Guideline Centre (UK). Cirrhosis in Over 16s: Assessment and Management. London: National Institute for Health and Care Excellence (NICE); 2016 Jul. (NICE Guideline, No. 50.)

Cover of Cirrhosis in Over 16s

Cirrhosis in Over 16s: Assessment and Management.

Show details

7Severity risk tools

7.1. Introduction

The natural history of cirrhosis is characterised by a variable period, often of several years, during which the person affected remain well with few if any clinical symptoms and signs. At some stage, often determined by the passage of time but in some instances relating to lifestyle issues or intercurrent illness, complications develop which relate to the development of either portal hypertension or hepatocellular failure, or both. These complications include jaundice, ascites, variceal haemorrhage, or hepatic encephalopathy, and define the transition from compensated to decompensated cirrhosis. The course of cirrhosis varies considerably from person to person related to several factors, including the aetiology of the cirrhosis, changes in lifestyle for example abstinence from alcohol in people with alcohol-related cirrhosis, treatment for the underlying cause of the liver injury for example antiviral agents for people with chronic hepatitis C infection, and the development of hepatocellular carcinoma (HCC). The development of decompensation is associated with reduction in survival but this is not inevitable.

Since the course of cirrhosis is variable and because it is recognised that clinical evaluation alone does not accurately predict outcome, there is a clear need for a cirrhosis risk assessment tool to assist in the identification of people who are at high risk of liver decompensation before they experience a defining event. Such a risk prediction tool would allow patients with compensated cirrhosis to be optimally managed by providing information on the timing for referral to specialist hepatology services.

Several scoring systems have been developed to either assess the prognosis of cirrhosis or prioritise candidates for transplantation, including the Child-Pugh score167, the model for end-stage liver disease (MELD) score131 and the UK end-stage liver disease (UKELD) score. Other tests such as transient elastography have also been proposed for the assessment of prognosis of cirrhosis. The GDG decided to compare the clinical and cost-effectiveness of these risk assessment tools for predicting the risk of mortality and liver-related morbidity in people with compensated cirrhosis. The use of these tools to prioritise patients on a liver transplant waiting list is not included in this review.

7.2. Review question 1: Which risk assessment tool is the most accurate and cost-effective for predicting the risk of morbidity and mortality in people with compensated cirrhosis?
Review question 2: When (at what severity score on the risk assessment tool) should people with cirrhosis be referred to specialist care?

For full details see review protocol in Appendix C.

Table 39. Characteristics of review question.

Table 39

Characteristics of review question.

7.3. Clinical evidence

We searched for prospective and retrospective cohort studies assessing the accuracy of severity risk tools to predict the risk of mortality and liver-related morbidity in people with compensated cirrhosis. Only data relating to individuals with compensated cirrhosis at baseline were included. This population was specified as the aim was to find the most accurate risk tool for the prediction of mortality or decompensation in people who are currently compensated, but may require referral for specialist care because they are at higher risk of having one of these future events. Ten studies were included in the review.9,61,74,111,112,114,162,174,183,242 Evidence from these are summarised in the clinical evidence profile below. See also the study selection flow chart in Appendix E, sensitivity/specificity forest plots in Appendix K, study evidence tables in Appendix H and exclusion list in Appendix L.

Seven studies looked at the prognostic accuracy of transient elastography at a variety of thresholds61,111,112,114,162,174,242; 3 studies looked at the MELD score9,74,183 and 1 study looked at Child-Pugh.183 The components of these 3 tools can be found in Table 40, Table 41 and Table 42. No studies were identified that looked at the prognostic accuracy of the UKELD score in the prediction of mortality or decompensation in people with compensated cirrhosis. No studies were identified that looked at the prognostic accuracy of a modified risk tool in prediction of mortality or decompensation in people with compensated cirrhosis (modified by addition of one of the following factors to the risk tool: HVPG, Na, delta-MELD, EEG, transient elastography or nutrition).

Table 40. MELD components.

Table 40

MELD components.

Table 41. Child-Pugh components.

Table 41

Child-Pugh components.

Table 42. Transient elastography component.

Table 42

Transient elastography component.

The outcomes of interest were all-cause mortality and decompensation. The outcome of mortality was only reported as overall mortality and no studies reported liver-related mortality specifically. The decompensating events included under the outcome of decompensation differed slightly between individual studies and this information is summarised in Table 43. Six studies looked at a composite outcome of death or decompensation.9,61,112,114,242 This outcome was analysed separately. One study also looked at a composite outcome of hepatic decompensation, varices development and variceal growth and a single outcome of varices progression.242 Varices development, growth or progression were not considered decompensating events in the review protocol, therefore these two outcomes were analysed separately.

Table 43. Summary of studies included in the review.

Table 43

Summary of studies included in the review.

Assessing the performance of a risk tool

Evaluating the performance of a prediction model is typically done by examining discrimination and calibration. Discrimination refers to the ability of the prediction model to distinguish between those who do or do not experience the event of interest (decompensation). Calibration concerns how well the predicted risks compare to observed risks. A model is well calibrated if, for every 100 patients given a prediction of p%, the observed number of events is close to p%. Discrimination is typically assessed by calculating the area under the receiver operating characteristic curve (ROC AUC or c-statistic), where a value of 0.5 implies the model is no better than flipping a coin. However, there are limitations in the usefulness and interpretation of the area under the receiver operating characteristic curve to conclude whether the model is of any use. Calibration is evaluated either by calculating the Hosmer-Lemeshow test statistic, or preferably by plotting predicted risks against observed risks (calibration plot). The resulting calibration plot, if there is close agreement, will have points lying on or around a line of 45° with a slope value of approximately 1.0.

Predictive test accuracy and discrimination

We wished to know how accurate the risk stratification tools are in predicting mortality or decompensation. This means we want to know across a population if:

  • a high risk score in an individual is reflected in a future event (mortality or decompensation);
  • a low risk score in an individual is reflected in freedom from a future event (mortality or decompensation).

This is very similar, in principle, to how we look at diagnostic test accuracy (for diagnosis) and we take an analogous approach here – and use the term ‘predictive test accuracy’. Accordingly, we can use similar methods to determine predictive test accuracy statistics and similar quality assessments to diagnostic test accuracy. There are however some important differences, mainly related to the time dependence of prognosis, including the play of chance (that is, the fact that the event is yet to happen when we measure risk) and this means we have to modify our quality assessment and carry out additional analyses to truly answer these types of question (see below).

By analogy with diagnostic test accuracy, we considered the risk stratification tool as the ‘index test’ and the outcome (observed mortality or decompensation) as the ‘reference standard’. To calculate the sensitivity and specificity we have to define the cut-off threshold for high and low risk. The area under the receiver operating characteristics (ROC) curve, abbreviated to area under the curve (AUC) can also be calculated. The ROC curve is a curve fitted to the set of combinations of sensitivity and (1-specificity), across all possible (theoretical) cut-off points. The AUC gives an overall measure of accuracy of the risk tool across a range of thresholds. An AUC of 1 would indicate a perfect risk tool that can discriminate between people who will and will not have the event.

AUC on its own is not a good method of discriminating between risk stratification tools because the statistics are very insensitive even to major changes in the algorithm, and we also investigated calibration and reclassification, where reported.

Differences between prognostic tests are best determined by both discrimination and calibration

Outcomes reported

All the studies reported outcomes relating to the discriminative ability of the prognostic risk tools. For each outcome, ROC AUC values, as reported by the individual studies, are summarised in Table 50, Table 51, Table 52, Table 53, Table 54, Table 55 and Table 56. The GDG agreed on the following criteria for AUC: 90–100% indicates perfect discrimination; 70–89% indicates moderate discrimination; 50–69% indicates poor discrimination and <50% not discriminatory at all. Data other than AUC (for example sensitivity/specificity for certain thresholds, R2, D statistics, Brier score) were also presented if given.

In addition to identifying the most accurate risk tool, the aim was also to identify a risk threshold at which people with compensated cirrhosis should be referred for specialist care. Coupled sensitivity and specificity values at given cut-off thresholds were reported for each risk tool. This information was used to determine the risk tool (and threshold) with the highest sensitivity, without the expense of losing specificity. The ideal threshold would have a high sensitivity, so that people who will have a future event are defined as high risk by the tool and are referred. A high sensitivity would mean fewer false negatives (people who will have a future event, but are defined as low risk because they fall below the chosen threshold on the risk tool and therefore are not referred). Lower thresholds will give a high sensitivity, however this would be at the expense of specificity. A specific risk tool would mean fewer false positives (people who will not have a future event, but are defined as high risk by the tool) and therefore, fewer referrals of people who are not at risk.

Unlike discrimination outcomes, only 1 study was identified which reported outcomes related to the calibration of the risk tools.74 This study reported calibration for the MELD score. No studies were identified reporting calibration for the other risk tools.

No study reported reclassification of the risk tools. Reclassification is used to examine the added-value of new risk factors that have been proposed to improve the risk tool. No studies were identified that looked at the prognostic accuracy of a modified risk tool in prediction of mortality or decompensation in people with compensated cirrhosis (modified by addition of one of the following factors to the risk tool: HVPG, Na, delta-MELD, EEG, transient elastography or nutrition).

Table 44. Clinical evidence profile: 90-day mortality.

Table 44

Clinical evidence profile: 90-day mortality.

Table 45. Clinical evidence profile: Composite of death and other clinical events.

Table 45

Clinical evidence profile: Composite of death and other clinical events.

Table 46. Clinical evidence profile: Decompensation.

Table 46

Clinical evidence profile: Decompensation.

One study 61 reported that transient elastography predicted decompensation with a 20.3% sensitivity and an 88.2% specificity in 145 patients, however the threshold used was not reported.

One study 111 reported the following data for transient elastography predicting decompensation:

Score on risk tool:Risk of event:
<13 kPa0.93, 0.9, 2.31 and 4.02% at 1, 2, 3 and 4 years
13-18 kPa5.88, 10.54, 132.74 and 23.10% at 1, 2, 3 and 4 years
≥18 kPa13.38, 23.21, 30.5 and 55.32% at 1, 2, 3 and 4 years

One study 174 reported the following data for transient elastography predicting variceal bleeding and/or ascites:

Score on risk tool:Risk of event:
<21.1 kPa47%
≥21.1 kPa100%
Table 47. Clinical evidence profile: HCC.

Table 47

Clinical evidence profile: HCC.

Table 48. Clinical evidence profile: Composite of hepatic decompensation, varices development and variceal growth.

Table 48

Clinical evidence profile: Composite of hepatic decompensation, varices development and variceal growth.

Table 49. Clinical evidence profile: Varices progression.

Table 49

Clinical evidence profile: Varices progression.

AUC data

Table 50. Clinical evidence profile: Mortality.

Table 50

Clinical evidence profile: Mortality.

Table 51. Clinical evidence profile: Composite of death and decompensation.

Table 51

Clinical evidence profile: Composite of death and decompensation.

Table 52. Clinical evidence profile: Decompensation.

Table 52

Clinical evidence profile: Decompensation.

Table 53. Clinical evidence profile: Decompensation or HCC.

Table 53

Clinical evidence profile: Decompensation or HCC.

Table 54. Clinical evidence profile: HCC.

Table 54

Clinical evidence profile: HCC.

Table 55. Clinical evidence profile: Composite of hepatic decompensation, varices development and variceal growth.

Table 55

Clinical evidence profile: Composite of hepatic decompensation, varices development and variceal growth.

Table 56. Clinical evidence profile: varices progression.

Table 56

Clinical evidence profile: varices progression.

Calibration data

Calibration of MELD for 3-month mortality was poor for scores within the lower three quintiles but seemed to be fairly good in the fourth and fifth quintile of each score. The calibration of the scores for 1-year mortality was better but still remained imprecise within the lower quintiles.

7.4. Economic evidence

7.4.1. Published literature

No relevant economic evaluations were identified.

See also the economic article selection flow chart in Appendix F.

7.4.2. Unit costs

See Table 91 in Appendix O.

7.5. Evidence statements

7.5.1. Clinical

  • Low quality evidence from 5 studies (n=631) demonstrated a good AUC value for transient elastography in predicting decompensation (0.79) but Moderate quality evidence from 2 studies (n=380) demonstrated a lower accuracy for predicting a composite outcome of death and/or decompensation (AUC 0.59).
  • Low quality evidence from 1 study (n=429) indicated a good AUC value (0.90) for MELD in predicting 90- day all-cause mortality.
  • High quality evidence from 1 study (n=204) indicated a moderate AUC value (0.75) for MELD in predicting 1-year all-cause mortality.
  • High quality evidence from 1 study (n=204) indicated a moderate AUC value (0.66) for Child- Pugh in predicting 1-year all-cause mortality.
  • Moderate quality evidence from 1 study (n=77) indicated a moderate AUC value (0.59) for MELD in predicting a composite outcome of death and decompensation.
  • There was Moderate to Low quality evidence demonstrating that with transient elastography; as the threshold increases, sensitivity decreases and specificity increases in predicting decompensation.
  • Moderate quality evidence from 1 study (n=429) showed a high sensitivity and specificity of MELD in predicting 90-day all-cause mortality at a threshold of 16.

7.5.2. Economic

  • No relevant economic evaluations were identified.

7.6. Recommendations and link to evidence

Recommendations
13.

Refer people who have, or are at high risk of, complications of cirrhosis to a specialist hepatology centre.

14.

Calculate the Model for End-Stage Liver Disease (MELD) score every 6 months for people with compensated cirrhosis.

15.

Consider using a MELD score of 12 or more as an indicator that the person is at high risk of complications of cirrhosis.

Relative values of different outcomesThe GDG was interested in the prognostic accuracy of severity risk tools to predict the risk of mortality and liver-related morbidity in people with compensated cirrhosis. GDG members discussed that currently, in their opinion, there are a large number of patients with compensated cirrhosis who are not referred to specialist hepatology services until they have an episode of decompensation.
The GDG aimed to identify a risk tool that would be able to predict both all-cause mortality, and liver-related complications in people with compensated cirrhosis (defined as any of hepatic encephalopathy, hepatocellular carcinoma, ascites, spontaneous bacterial peritonitis, variceal haemorrhage, hepatorenal syndrome and jaundice). The outcome of mortality was only reported as overall mortality and no studies reported liver-related mortality specifically (with the exception of some studies reporting composite outcomes of morbidity and mortality). The GDG felt that the ability to predict those at high risk of a decompensating event should be the priority as this would allow timely prevention and intervention in people with compensated cirrhosis. The GDG agreed that people at risk of decompensation are likely to benefit from specialist hepatologist care.
A population with compensated cirrhosis at baseline was specified, as the aim was to find the most accurate risk tool for the prediction of all-cause mortality or decompensation in people who are currently compensated, but may require referral for specialist care because they are at higher risk of having one of these events in future. The GDG agreed that people with decompensated cirrhosis should have already been referred to specialist services, and the aim here is to intervene before decompensation or death occurs. Therefore, studies of prognostic accuracy in people with decompensated cirrhosis at baseline were not considered in this review.
The GDG focused on the value of the ROC AUC for each risk tool, as reported by the studies. This gives an overall measure of the prognostic accuracy of the tool across a range of cut-off thresholds and was used to identify the most accurate risk prediction tool. In addition to identifying the most accurate tool to predict these future events, the GDG also wanted to recommend a cut-off threshold for referral. This is a trade-off between sensitivity and specificity. A high sensitivity was desirable so that people who are at higher risk are not missed (fewer false negatives), but the GDG did not want to compromise the specificity too much, as patients who are not at risk would be referred unnecessarily (false positives).
Trade-off between clinical benefits and harmsEvidence was identified for the Child-Pugh score, the MELD score and transient elastography. No studies were identified that looked at the risk prediction accuracy of the UKELD score or of a risk tool modified by addition of one of the following factors to the risk tool: HVPG, Na, delta-MELD, EEG, transient elastography or nutrition.
The GDG discussed the evidence for transient elastography. The evidence demonstrated a good AUC value for transient elastography for predicting decompensation (0.79) but a lower accuracy for predicting a composite outcome of death or decompensation (AUC 0.59). There was no AUC evidence for transient elastography for the prediction of mortality alone. The GDG expressed concerns about the use of transient elastography for assessing prognosis in people with cirrhosis. Transient elastography is only a measure of the degree of fibrosis in the liver. The GDG agreed that more fibrosis would not necessarily mean an increased risk of complications, and that transient elastography alone should not be used for prediction of patients at high risk of decompensation. The GDG noted that aetiology plays an important role in the accuracy of transient elastography. It is most useful in hepatitis C but would not be as accurate in predicting risk in people with alcohol-related cirrhosis or obesity. In addition, transient elastography would need to be repeated regularly if it were to be used as a risk prediction tool, which is not in line with current clinical practice.
MELD data were available for the prediction of 1-year mortality with an AUC of 0.75 (95% CI 0.59–0.9). This value was better than the AUC value of the Child-Pugh score for prediction of 1-year mortality (0.66, 95% CI 0.5–0.82). The GDG also discussed other benefits of MELD over the Child-Pugh score; for example, MELD is completely objective (not subjective on clinical assessment) and all the constituent parts of MELD have independently been associated with mortality and outcome in patients with cirrhosis.
There was also evidence for MELD for the predication of 3-month mortality with an AUC of 0.9 (95% CI 0.84–0.96). However, the GDG had concerns about the applicability of this study (Finkenstedt 201274). The GDG suspected that this study included people with decompensated cirrhosis at baseline, due to the high mortality rate at 3 months and the high cut-off value of 16 used to assess sensitivity and specificity. Unfortunately, this was the only evidence for the sensitivity and specificity of MELD at a defined cut-off threshold.
No AUC evidence was available for MELD or Child-Pugh for the prediction of decompensation. However, evidence was available for MELD for the prediction of a composite outcome of decompensation or death, with an AUC of 0.59 (95% CI 0.47–0.72). This is comparable to the AUC value for the prediction of decompensation or death using transient elastography, but the evidence for MELD was higher quality. As there was no evidence on the prognostic accuracy of Child-Pugh for decompensation and this outcome was considered a priority for this review, the GDG agreed that overall the evidence, along with their other considerations, supports the use of MELD for predicting risk in people with compensated cirrhosis.
The GDG agreed it was important to recommend a cut-off threshold for referral to specialist hepatology services if they were to recommend the use of MELD as a severity risk tool. There is no standard threshold currently used for referral of people at high risk of decompensation to specialist care. The only commonly used threshold is a MELD score ≥16, which is used for assessment for liver transplantation. This cut-off is too high for the current recommendation as the people referred would most likely have decompensated disease. Thus the GDG, based on their clinical knowledge and experience, felt that a threshold of 12 for referral for to a specialist service would seem reasonable. MELD scores of around 6–7 indicate the presence of well compensated disease while a MELD score between 8 and 11 would not indicate an immediate risk of decompensation. A MELD score of 12 or above would mean the person would have an abnormal level for at least one of the measure variables and the GDG agreed this would put them at risk of decompensation. The GDG also made the point that if the MELD score were below 12, this is no way precluded referral to a specialist service if there were concerns about the wellbeing of a patient with cirrhosis.
All the data identified were AUC or sensitivity and specificity values: that is, the discriminative ability of the risk tools. Data on calibration was only available from 1 study for MELD scores. The study concluded that calibration of MELD for 3-month mortality was poor for scores within the lower 3 quintiles but seemed to be fairly good in the fourth and fifth quintile of each score. The calibration of the scores for 1-year mortality was better but still remained imprecise within the lower quintiles. However, this evidence was from the Finkenstedt 2012 study74 which, as discussed above, was in a population with an unclear status in relation to their functional hepatic reserve at baseline. The GDG did not feel this evidence could be considered due to the limitations of the study.
Trade-off between net clinical effects and costsNo relevant published economic evidence was identified.
The GDG reviewed the unit costs of transient elastography, MELD and Child-Pugh score. The costs of calculating MELD and Child-Pugh are low as they only require the results of inexpensive blood tests, which are performed as part of routine clinical assessment. Transient elastography is more expensive; in particular the GDG noted that this would need to be repeated at regular intervals, leading to substantial costs over time.
Although it is current practice to conduct the blood tests on a regular basis, it is not routine practice to automatically calculate MELD or Child-Pugh scores from them. The GDG considered the possibility of having laboratories routinely calculate and present the MELD score, in the same way as eGRF is routinely calculated from the serum creatinine as a measure of renal function. The GDG concluded that the routine report of the MELD score would be beneficial in people with confirmed cirrhosis, but might mislead clinicians to suspect liver decompensation inappropriately in people without liver disease but with raised creatinine and bilirubin due to other causes.
The GDG also considered the costs and benefits of consultation with a specialist at a hepatology centre. For people at high risk of developing decompensation, the GDG agreed that there would be considerable benefit in specialist assessment, and such a referral would very likely be cost-effective. However, if too low a threshold were used and people at low risk were also referred for specialist assessment this would lead to increased costs with little benefit, and this strategy would be unlikely to be cost-effective. Therefore, the GDG recommended that all people with cirrhosis with a MELD score of 12 or more should be referred to a specialist hepatology centre; those with a lower score should not be referred routinely but they could have other reasons to justify referral to a specialist centre apart from the MELD score.
Quality of evidenceThe AUC data for transient elastography for prediction of decompensation or for predicting a composite measure of death or decompensation was of Low quality, adding to the GDG's uncertainty about the use of transient elastography as a prognostic risk tool in people with cirrhosis. There was no AUC evidence for transient elastography for the prediction of mortality.
Both High and Low quality evidence was available for the MELD AUC data for the prediction of 1-year and 3-month mortality, respectively. The GDG noted this High quality evidence for the MELD score for prediction of 1-year mortality, which showed a good AUC value as described above. There was also High quality evidence for the Child-Pugh AUC score for the prediction of 1-year mortality. However, as described above, the MELD score had a higher AUC than Child-Pugh for prediction of 1-year mortality (AUC 0.75, 95% CI 0.59–0.9; and 0.66, 95% CI 0.5–0.82, respectively, both High quality). No AUC evidence was available for MELD for the prediction of decompensation. However, the evidence for the MELD AUC data for the prediction of a composite outcome of decompensation or death was Moderate quality.
Overall, the GDG agreed the AUC evidence for the MELD score was better quality evidence. However, data were only available from 1 study for the sensitivity and specificity of MELD at a particular cut-off threshold. This was Moderate quality evidence due to the indirect population. The GDG did not consider the population to be appropriate to the review question and recommendations, as it suspected people had decompensated disease at baseline. Therefore, the GDG used clinical judgement and experience to determine the optimal cut-off threshold for referral to specialist hepatology services. There was no evidence for the sensitivity and specificity of Child-Pugh for the prediction of mortality or decompensation.
The GDG noted that some of the papers reviewed included decompensating events, which were not included in our protocol (for example, deterioration of liver function to a lower Child-Pugh class). This evidence was included, but downgraded for indirectness. The GDG focused on the protocol-specified outcomes of decompensation and mortality for decision-making.
Other considerationsThe GDG discussed that the population in question are most likely to be cared for in secondary care or in shared primary care, and the recommendation is meant for those people who should be referred for specialist hepatology care, in either a secondary or tertiary centre. The GDG unanimously agreed that anyone with cirrhosis who has had a decompensating event prior to being referred to a specialist hepatology centre should be referred immediately. The GDG also noted that there can be considerable improvement in liver function with treatment of the underlying cause of liver disease, for example, abstinence from alcohol or treatment for hepatitis C infection. This should be taken into account when considering referral to specialist services. This in turn could influence the frequency of assessment which should be designed not only to detect deterioration, but also improvement.
The GDG were aware of a number of papers assessing MELD which were not included in the review. It was noted that these papers had been excluded as they predicted future risk in people who were decompensated at baseline. These studies are more abundant than studies in people with compensated cirrhosis, however they are not relevant for the recommendations on referral to specialist care.
In general, the GDG considered that MELD is a robust prognostic marker in people with compensated cirrhosis. The GDG felt that MELD is easy to calculate using the results of blood tests undertaken as part of standard practice, but noted that it would be useful if laboratories were encouraged to generate a MELD score automatically on liver blood tests, which could be used easily by clinicians. The GDG agreed that this recommendation is largely aimed at secondary care clinicians, as people with a diagnosis of cirrhosis are routinely seen in secondary care. The GDG discussed the limitations of using MELD in primary care. Unless someone has a definitive diagnosis of cirrhosis, the MELD score can be misleading as abnormal results can occur in other conditions such as kidney disease.
The GDG acknowledged that although no evidence was identified on the UKELD score, this is routinely used in many specialised centres. However, no further research on this risk prediction tool was recommended since the GDG was aware of an ongoing project led by NHS Blood & Transplant that aims to replace UKELD and modify the system of organ allocation in the UK.
Copyright © National Institute for Health and Care Excellence 2016.
Bookshelf ID: NBK385216