U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Guideline Centre (UK). Non-Alcoholic Fatty Liver Disease: Assessment and Management. London: National Institute for Health and Care Excellence (NICE); 2016 Jul. (NICE Guideline, No. 49.)

Cover of Non-Alcoholic Fatty Liver Disease

Non-Alcoholic Fatty Liver Disease: Assessment and Management.

Show details

6Diagnosis of NAFLD

6.1. Introduction

Historically, the presence of NAFLD has been suspected in those presenting with abnormal liver blood tests or evidence of fatty changes on ultrasound. However, the full spectrum of NAFLD (from simple steatosis to steatohepatitis, cirrhosis and liver-related morbidity) can also be present with normal liver tests.

Early detection of NAFLD may be useful to identify those with potentially silent progressive fatty liver disease. Diagnostic practice varies widely and includes clinical, biochemical and radiographic tests. Currently, liver biopsy remains the gold standard for NAFLD diagnosis but it is impractical as a diagnostic tool because it is invasive and expensive. As such, there is a need for highly sensitive and specific diagnostic tests that can be commonly used by clinicians in primary, secondary and tertiary care.

The aim of this review is to objectively evaluate existing invasive and non-invasive tests to accurately diagnose NAFLD in adults, young people and children. The outcome will facilitate development of a practical diagnostic pathway.

Hepatic steatosis at 5% or more is the accepted histological definition of grade 1 steatosis86; steatosis at less than 5% is considered normal. Steatosis at 30% is the accepted lower limit where steatosis can be detected reliably by ultrasound (currently the most commonly used diagnostic test for fatty liver).

In addition to the imaging techniques assessed by these guidelines, fatty liver can also be detected using computed tomography (CT). Fat has a lower attenuation than water using X-ray based techniques; this makes the liver appear darker on images and, by measuring the radiodensity, fat can be quantified (in Hounsfield units). Through its widespread diagnostic use, CT has become the largest source of radiation to many populations, and with the high prevalence of fatty liver in the West there would be the potential to significantly add to this radiation burden. Given that fatty liver usually has a benign clinical course and that there are alternative imaging techniques without radiation, it was decided that it would be inappropriate to potentially recommend a technique that would significantly add to the population radiation dose. On this basis, CT was not formally evaluated as a technique for the detection or quantification of fat within the liver; this is consistent with the principles of the European Committee of Radiation Risk.68

6.2. Review question: What is (are) the appropriate investigation(s) for diagnosing NAFLD in adults, young people and children?

For full details see review protocol in Appendix C.

Table 18. Characteristics of review question.

Table 18

Characteristics of review question.

6.3. Clinical evidence

Thirty-eight studies were included in the review.19,24,28,29,34,35,37,49,53,69,77,78,87,90,95,97,105,110-112,115,117,135,137,142,162-164,170,187,197,201,202,205-208,218,221 Evidence from these are summarised in the clinical evidence profiles below (Table 20, Table 21, Table 22 and Table 23). See also the study selection flow chart in Appendix E, sensitivity/specificity plots and diagnostic meta-analysis plots in Appendix K, study evidence tables in Appendix H and exclusion list in Appendix M.

Table 20. Clinical evidence profile: Clinical evidence profile (SENSITIVITY and SPECIFICITY): Diagnostic tests for steatosis ≥5%.

Table 20

Clinical evidence profile: Clinical evidence profile (SENSITIVITY and SPECIFICITY): Diagnostic tests for steatosis ≥5%.

Table 21. Clinical evidence profile: Clinical evidence profile (AUC): Diagnostic tests for steatosis ≥5%.

Table 21

Clinical evidence profile: Clinical evidence profile (AUC): Diagnostic tests for steatosis ≥5%.

Table 22. Clinical evidence profile (SENSITIVITY and SPECIFICITY): Diagnostic tests for steatosis ≥30%.

Table 22

Clinical evidence profile (SENSITIVITY and SPECIFICITY): Diagnostic tests for steatosis ≥30%.

Table 23. Clinical evidence profile (AUC): Diagnostic tests for steatosis ≥30%.

Table 23

Clinical evidence profile (AUC): Diagnostic tests for steatosis ≥30%.

Papers reported diagnostic accuracy for a variety of tests; controlled attenuation parameter (CAP), fatty liver index (FLI), magnetic resonance imaging (MRI), magnetic resonance spectroscopy (MRS), NAFLD liver fat score, SteatoTest and ultrasound, using multiple different thresholds. No papers relevant to the review protocol were identified for alanine transaminase (ALT), aspartate aminotransferase (AST) or gamma-glutamyl transferase (GGT). Many papers used the NAFLD activity score (NAS)86 to grade steatosis on biopsy (by assessing the percentage of hepatocytes containing lipid droplets) S0: less than 5%, S1: 5-33%, S2: 34-66%, S3: greater than 66%. However, this was not universally used across all papers. Details of individual steatosis grading systems are available in Table 19.

Table 19. Summary of studies included in the review.

Table 19

Summary of studies included in the review.

Diagnosing steatosis ≥5%

Twenty-five papers investigated tests for diagnosing steatosis ≥5%. Five papers presented evidence for controlled attenuation parameter (CAP),29,53,111,117,170 2 papers for the Fatty Liver Index (FLI),19,49 9 papers for different types of MRI,28,97,110,115,137,164,187,201,218 4 papers for MRS,97,201,202,218 1 for the NAFLD liver fat score,49 1 for SteatoTest90 and eight for ultrasound.34,37,77,95,201,205,207,208

Diagnosing steatosis ≥30%

Twenty-seven papers investigated tests for diagnosing steatosis ≥30, ≥33 or ≥34%. These papers are pooled under the heading of diagnosing steatosis ≥30% (as the GDG agreed that these 3 different cut-offs all captured the concept of fat affecting approximately a third of the liver). Nine papers presented evidence for controlled attenuation parameter (CAP),29,35,53,105,117,162,163,170,206 2 papers for the Fatty Liver Index (FLI), 35,49 4 papers for different types of MRI,97,110,137,187 3 papers for MRS,87,97,197 1 for the NAFLD liver fat score,49 2 for the SteatoTest,35,90 and ten for ultrasound.69,77,78,95,112,135,142,205,207,221

6.4. Economic evidence

6.4.1. Published evidence

No relevant economic evaluations were identified.

See also the economic article selection flow chart in Appendix F.

6.4.2. Unit costs

See Table 61 in Appendix N.

6.4.3. New cost-effectiveness analysis

Original cost-effectiveness modelling was undertaken for this question using the NGC liver disease pathway model developed for this guideline. A summary is included here. An evidence statement summarising the results of the analysis can be found below. The full analysis can be found in Appendix N.

6.4.3.1. Aim and structure

The aim of the health economic modelling for this question was to determine the most cost-effective diagnostic test to detect 5% steatosis and whom to test (according to specific risk factors). Within the scope of the model was also to examine the cost-effectiveness of the various retest frequencies for every risk factor group. For these purposes a lifetime health state transition (Markov) model was constructed, following the NICE reference case,124 which depicted the patient pathway of liver disease from the development of early steatosis to liver transplant.

The diagnostic strategies compared were:

  • CAP at 200–249
  • fatty liver index at 60
  • MRI PDFF at 6.87
  • MRS at 0–5
  • liver fat score at 0.16
  • SteatoTest at 0.38
  • ultrasound
  • liver biopsy
  • no test – treat all
  • no test – no treatment.

The population was adults with suspected NAFLD, categorised into the following subgroups:

  • Obese (BMI≥30)
  • Wide waist circumference (≥102cm for men, ≥88cm for women)
  • Diabetes (glyceamia≥110mg/dl)
  • Low HDL (<40mg/dl men, <50mg/dl women)
  • High triglycerides (≥150mg/dl)
  • Metabolic syndrome (NCEP criteria)

The model used diagnostic accuracy data from studies identified in the clinical review in this chapter. Test costs were obtained from published literature and GDG sources. Health states costs were constructed under GDG guidance specifically for the purposes of the model. Utilities and transition probabilities were mostly obtained from published literature and through extrapolations from other liver diseases where there was a lack of evidence. The model was built probabilistically to take account of the uncertainty around input parameter point estimates.

Cost-effectiveness was defined by the value of the net monetary benefit (NMB) attributed to every test. The decision rule applied is that the comparator with the highest NMB is the most cost-effective option at the specified cost-effectiveness threshold of £20,000 per QALY gained.

6.4.3.2. Results

Testing for 5% steatosis was considered cost-effective at a cost-effectiveness threshold of £20,000 per QALY gained for all retest frequencies. Irrespective of the risk factor examined, the 6-year retest frequency delivered the highest NMB benefit for FLI (the first ranking test).

In almost all combinations of risk factor and retest frequency the testing strategies had the following rankings (these figures apply specifically to people with type 2 diabetes at a 5-year retest frequency).

TestMean cost (£)Mean QALYsNMB (£) at £20,000/QALYRank
CAP at 200–2497,42715.40300,6656
FLI at 606,54015.37300,9001
MRI PDFF at 6.876,61715.37300,7674
MRS at 0–57,14015.40300,7923
Ultrasound6,65915.37300,8072
LFS at 0.166,39115.36300,7485
SteatoTest at 0.387,37815.40300,6587
Liver biopsy8,01215.41300,1119
No test – treat all7,78015.41300,5138
No test – no treatment3,90215.18299,78110

Among the 8 diagnostic tests compared, FLI ranked first due to the best combination of test unit costs and diagnostic accuracy. Ultrasound ranked second having lower sensitivity (64% against FLI's 76%) and noticeably higher test unit costs. MRS closely followed ultrasound with a slightly lower NMB. MRI and LFS ranked fourth and fifth across all tests having the next best combinations of diagnostic accuracy and unit cost. Most of these tests had similarly wide 95% confidence intervals ranking from first to eighth. Although there was small difference in the NMB values between some of the strategies, FLI was around £90 ahead of the second ranking test. When the starting age of the model was increased from 45 years to 50, 55 and 58 years, the cost-effectiveness of testing compared to no testing reduced, with FLI having an ICER of £17,514 per QALY gained in the type 2 diabetes cohort at a starting age of 58 years.

Testing for NAFLD was cost-effective compared to no testing at all retest frequencies. Irrespective of the risk factor examined, the 6-year retest frequency delivered the highest NMB for FLI, though the difference in NMB at different frequencies was small and within the margin of error.

In the deterministic sensitivity analysis FLI remained the first ranking test in most of the examined scenarios. In the multiway deterministic analysis FLI remained first when parallel changes were applied on the liver-related mortality, the other-cause mortality and the liver disease progression. No testing ranked first in the scenario when the starting age was set at 58 years and the benefit of lifestyle modification intervention was also removed.

6.5. Evidence statements

6.5.1. Clinical

Diagnosing steatosis ≥5%

  • No evidence was identified to determine the diagnostic accuracy of ALT, AST or GGT as separate tests.
  • Low quality evidence from 2 studies which could not be pooled showed sensitivities of 91% (79-98) and 87% (75-95); and specificities of 52% (39-64) and 77% (68-85) for CAP using thresholds between 200-249. Very low quality evidence from a diagnostic meta-analysis of 3 studies (n=440) showed a pooled sensitivity of 76% (47-94) and a pooled specificity of 87% (65-97) for CAP used at a threshold between 250-300. The median AUC from these 5 studies was 88 with a range across study confidence intervals of 67 to 97.
  • Low quality evidence from 2 studies (n=574) showed sensitivities of 81% (75-86) and 76% (71-81) and specificities of 49% (36-63) and 87% (60-98) for FLI using thresholds of 79 and 60. The AUC from these 2 studies were 67 and 83 with confidence intervals ranging from 59 to 91.
  • Low quality evidence from 2 studies (n=221) showed MRI-DE used at thresholds of 4.0 or 11.08 had sensitivities of 77% (64-87) and 86% (57-98), and specificities of 87% (79-93) and 78% (64-89). Very low quality evidence from 2 studies (n=109) showed that calculating the fat fraction using MRI with thresholds of 3.42 and 1.5 had sensitivities of 100% (78-100) and 90% (70-99), and specificities of 77% (63-88) and 91% (71-99). Moderate quality evidence from 1 study (n=40) looking at the fat water ratio of < 0 on MRI found sensitivity of 97% (84-100) and specificity of 86% (42-100). Very low quality evidence from 3 studies (n=340; including the only identified study in children and young people) using MRI PDFF at thresholds of 6.87 and 6.4 found sensitivities of 68% (60-75), 87% (70-96) and 86% (76-92) and specificities of 96% (79-100), 98% (88-100) and 83% (36-100). The lowest of these accuracy readings came from the study in children and young people. Low quality evidence from 1 study (n=36) using %RSID on MRI at a threshold of -0.74 found sensitivity of 87% (66-97) and specificity of 69% (39-91). Low quality evidence from 1 study (n=60) on MRI-TE at 5.35 found sensitivity of 93% (66-100) and specificity of 96% (85-99).The median AUC from 8 of the 9 studies was 93 with a range across study confidence intervals of 82 to 100.
  • Low quality evidence from a diagnostic meta-analysis of 3 studies (n=265) showed a pooled sensitivity of 86% (63-98) and a pooled specificity of 82% (59-95) for MRS at a threshold range within 0-5. Low quality evidence from 1 study (n=38) using a higher threshold of 5.7 found a sensitivity of 85% (62-97) and a specificity of 94% (70-100). The median AUC from these 4 studies was 91 with a range across study confidence intervals of 78 to 97.
  • Very low quality evidence from 1 study (n=324) showed a sensitivity of 65% (59-70) and a specificity of 87% (60-98) for the NAFLD-LFS at a threshold of 0.16. The AUC from this study was 80 (69-88).
  • Very low quality evidence from 1 study (n=288) showed a sensitivity of 87% (82-91) and a specificity of 50% (33-67) for SteatoTest at a threshold of 0.38. This study did not report AUC data.
  • Very low quality evidence from a diagnostic meta-analysis of 6 studies (n=4836) showed a pooled sensitivity of 64% (48-78) and a pooled specificity of 87% (76-94) for ultrasound. Very low quality evidence from 2 studies (n=286) using the hepatorenal contrast ratio of 4.0 or 1.49 found sensitivities of 82% (74-89) and 100% (92-100) and specificities of 63% (50-74) and 91% (81-97)The median AUC from 4 of these studies that reported AUC data was 77 with a range across study confidence intervals of 69 to 99.

Diagnosis steatosis ≥30%

  • No evidence was identified to determine the diagnostic accuracy of ALT, AST or GGT as separate tests.
  • Low quality evidence from 2 studies (n=786) using a CAP threshold between 200-249 found sensitivities of 87% (78-94) and 83% (63-95) and specificities of 74% (70-78) and 78% (66-87). Very low quality evidence from a diagnostic meta-analysis of 6 studies (n=782) showed a pooled sensitivity of 82% (68-92) and a pooled specificity of 83% (71-91) for CAP at a threshold of 250-299. Low quality evidence from 1 study (n=112) using a higher threshold of 311 found a sensitivity of 58% (39-75) and a specificity of 94% (86-98) The median AUC from these eight studies was 86 with a range across study confidence intervals of 69 to 100.
  • Very low quality evidence from 2 studies (n=436) showed sensitivities of 27% (13-46) and 59% (52-66) and specificities of 96% (89-99) and 69% (61-77) for FLI at thresholds of 93.9 and 82. The AUC from these 2 studies were 65 and 71 with confidence intervals ranging from 59 to 83.
  • Very low quality evidence from 1 study (n=161) showed sensitivity of 91% (59-100) and specificity of 94% (89-97) for MRI-DE with a threshold of 6.5. Very low quality evidence from 2 studies (n=166) looking at MRI PDFF with thresholds of 11.08 and 22.1 showed sensitivities of 88% (47-100) and 64% (48-78) and specificities of 88% (78-95) and 96% (85-99). Very low quality evidence from 1 study (n=36) looking at %RSID on MRI at a threshold 19.22 found a sensitivity of 78% (40-97) and a specificity of 100% (87-100). The mean AUC from 3 studies which reported AUC data was 95 (85-100).
  • Very low quality evidence from 3 studies (n=231) looking at MRS that could not be pooled due to high variation in thresholds (2.7, 7.7 and 10.2) showed a sensitivities of 100% (74-100), 73% (39-94) and 100% (66-100) and specificities of 87% (66-97), 79% (72-86) and 92% (75-99). The AUC from 2 of the 3 studies were 91 and 98 with confidence intervals ranging from 85 to 100.
  • Very low quality evidence from 1 study (n=324) showed a sensitivity of 78% (72-84) and a specificity of 59% (51-68) for the NAFLD-LFS at 0.16 threshold. The AUC from this study was 72 (66-77).
  • Very low quality evidence from 2 studies (n=400) showed sensitivities of 9% (2-24) and 42% (33-50) and specificities of 42% (33-50) and 79% (72-85) for SteatoTest at thresholds of 0.94 and 0.69. The AUC from these 2 studies were 70 and 73 with confidence intervals ranging from 61 to 84.
  • Very low quality evidence from a diagnostic meta-analysis of 9 studies (n=5554) showed a pooled sensitivity of 79% (59-91) and a pooled specificity of 85% (77-92) for ultrasound when no threshold was specified. Low quality evidence from 1 study (n=175) son hepatorenal contrast using ultrasound with a threshold of 7 showed a sensitivity of 86% (67-96) and a specificity of 85% (78-90). Only 1 study reported AUC and this was 93 (88-97).

6.5.2. Economic

  • One original cost-utility analysis that compared 10 different diagnostic strategies to detect NAFLD found that FLI ranked first compared to the following diagnostic strategies at a retest frequency of 6 years, using relevant thresholds for each test, with reference to a cost-effectiveness threshold of £20,000 per QALY gained:
    • ultrasound
    • NAFLD liver fat score
    • MRI PDFF
    • MRS
    • SteatoTest
    • CAP
    • no test – no treatment
    • liver biopsy
    • no test – treat all.

This analysis was assessed as directly applicable with minor limitations.

6.6. Recommendations and link to evidence

Recommendations
2.

Take an alcohol history to rule out alcohol-related liver disease. See also NICE's cirrhosis guideline.

3.

Do not use routine liver blood tests to rule out NAFLD.

4.

Offer a liver ultrasound to test children and young people for NAFLD if they:

  • have type 2 diabetes or metabolic syndrome and
  • do not misuse alcohol.
5.

Refer children with suspected NAFLD to a relevant paediatric specialist in hepatology in tertiary care.

6.

Diagnose children and young people with NAFLD if:

  • ultrasound shows they have fatty liver and
  • other suspected causes of fatty liver have been ruled out.
7.

Offer liver ultrasound to retest children and young people for NAFLD every 3 years if they:

  • have a normal ultrasound and
  • have type 2 diabetes or metabolic syndrome and
  • do not misuse alcohol.
Research recommendation
2.

Which non-invasive tests are most accurate and cost-effective in identifying non-alcoholic fatty liver disease (NAFLD) in adults with risk factors, type 2 diabetes and metabolic syndrome?

3.

Which non-invasive tests most accurately diagnose NAFLD and advanced liver fibrosis in children and young people?

4.

Which non-invasive tests most accurately identify non-alcoholic steatohepatitis (NASH)?

Relative values of different diagnostic measures and outcomesThe GDG evaluated the evidence for both diagnostic tests assessing for at least 5% steatosis as well as those assessing for at least 30% steatosis. The threshold of greater than or equal to 5% steatosis was selected because at least 5% of hepatocytes containing fat on a liver biopsy sample is the conventional histological diagnostic criterion for hepatic steatosis. Greater than or equal to 30% steatosis was selected as it is broadly accepted that this is the threshold at which hepatic steatosis may generally be observed on ultrasonography; the conventional means by which fatty liver has typically been identified. The GDG observed that, whilst certain chronic liver pathologies may occur in a patchy fashion throughout the organ and therefore may potentially be missed on a liver biopsy (for example, regenerative cirrhotic nodules), people with NAFLD tend to have hepatic steatosis distributed reasonably evenly throughout the organ, meaning that liver biopsy is still widely accepted as the diagnostic ‘gold standard’ for NAFLD.
No studies were identified that used liver blood test measurements alone as a diagnostic test. However, the GDG noted that liver enzyme measurements form part of 3 of the diagnostic tests under evaluation in this review (FLI, NAFLD-LFS and SteatoTest). Evidence was identified and reviewed for all 3 of these tests.
The GDG acknowledged that the invasiveness of liver biopsy as the reference standard may contribute to lower numbers of people who may appear otherwise healthy being recruited to studies testing for NAFLD. This will also have an effect on the numbers that test negative for NAFLD and the specificity of the index tests.
Trade-off between clinical benefits and harmsThe GDG noted that only 1 of the identified studies (which looked at the effectiveness of MRI) had been performed in children or young people. The likely explanation for this is that very few of the diagnostic techniques under investigation in this review have been validated within cohorts of children and young people with NAFLD. In addition, there is a high threshold to carry out liver biopsy to diagnose this condition in children and young people due to its invasiveness. Therefore, in this age group, this procedure is mainly carried out if there is diagnostic doubt or if there is concern about more advanced disease. Therefore, it is unlikely that studies matching the review protocol with liver biopsy as the reference standard would be widely conducted in this younger population.
Members of the GDG noted that, at present, the imaging tests included within this review may be difficult to access within primary care, with even ultrasound not always being easily accessible. Furthermore, the GDG expressed some concerns about the interpretation of results of these imaging tests; for example, the GDG noted a wide range of practice in the means by which fatty liver is identified by clinicians performing ultrasound, as there is no universally accepted definition on what exactly constitutes a diagnosis of steatosis on ultrasound.
The GDG also expressed concerns about certain practicalities regarding non-imaging diagnostic tests. For example, NAFLD-LFS includes measurement of fasting insulin and this is not a test typically performed routinely within primary care.
The GDG concluded that the identified studies provided evidence for all 4 of the imaging techniques under review (CAP, MRI, MRS and abdominal ultrasound) as being sufficiently effective tests for detecting both greater than or equal to 5% and greater than or equal to 30% steatosis in adults. This justified their inclusion within cost-effectiveness modelling. The opinion of the GDG was that MRI and MRS appeared to be the most accurate imaging techniques for diagnosing NAFLD in adults (for example, MRI and MRS were the only diagnostic tests under review with evidence for an AUC greater than or equal to 90% at both greater than or equal to 5% and greater than or equal to 30% steatosis), with MRI and MRS appearing to be of similar efficacy to each other. However, the GDG noted that MRS is still largely a research tool.
The GDG also reviewed the evidence for non-imaging based diagnostic tests for NAFLD in adults. Of these, it appeared that the FLI test was the most effective. However, the GDG noted that different studies showed variable specificity for the FLI test in diagnosing greater than or equal to 5% steatosis and that FLI appeared to demonstrate very high specificity but more limited sensitivity in diagnosing greater than or equal to 30% steatosis. On balance, however, it was agreed that the FLI test should be included within cost modelling. A threshold of FLI greater than 60 was felt the most appropriate one by the GDG as this was the score that generated maximum sensitivity for the diagnosis of hepatic steatosis in the reviewed literature.49
The GDG felt that SteatoTest and NAFLD-LFS appeared to be much less effective tests for diagnosing NAFLD in adults (noting the low specificity of SteatoTest and limited sensitivity of NAFLD-LFS in detecting greater than or equal to 5% steatosis, along with the poor sensitivity of SteatoTest and low specificity of NAFLD-LFS in detecting greater than or equal to 30% steatosis). Nevertheless, despite these concerns, the GDG concluded that the SteatoTest and NAFLD-LFS tests still, overall, had sufficient sensitivity and specificity to merit inclusion within the cost modelling.
The results of the base case scenario of original economic model demonstrated that FLI was the most cost-effective test to use to diagnose NAFLD in adults who had type 2 diabetes or metabolic syndrome. Ultrasound was the next most cost-effective option after FLI, with all non-invasive testing strategies being close in terms of cost- effectiveness. However, concerns were discussed regarding the confidence intervals for the specificity of the FLI and it was agreed that, as a targeted case finding recommendation, it would not be appropriate to make a recommendation that could potentially misdiagnose a large number of people. The GDG discussed the disconnect between making a recommendation detailing in whom to suspect NAFLD (Chapter 5 on risk factors) but then not being able to recommend a specific test to confirm or disconfirm suspicion of NAFLD. However given the concerns about large numbers of false positive misdiagnoses that could occur if the FLI performed at its lowest confidence interval (60%) a recommendation for a specific test to investigate NAFLD could not be made. The GDG were anxious to note that this potentially left primary care practitioners with no guidance on how to progress with patients they suspect have NAFLD, other than if fatty liver is discovered on incidental findings (e.g. ultrasound) when investigating for other health issues. In order to highlight the importance of finding this group of people who may now consequently not present until they have advanced liver disease, missing the opportunity for disease management, the GDG made high-priority research recommendations to identify the most accurate non-invasive tests to diagnose NAFLD in adults.
The GDG discussed whether a recommendation was still warranted for children and young people. They expressed concern that FLI was not validated in children and young people and, as waist circumference (1 of the 3 components of the FLI) is not a reliable predictor of NAFLD in children and young people, agreed that it was not appropriate to extrapolate the evidence from the adult population for this particular test. Ultrasound is the next most cost-effective option and the GDG agreed it was widely accepted as an appropriate diagnostic tool for children and young people as there was no clinical reason to believe that the performance would differ in a younger population. As the prevalence of type 2 diabetes and metabolic syndrome, and hence NAFLD, is considerably lower in children and young people, it was agreed that ultrasound could be considered as a test for NAFLD in those who have type 2 diabetes or metabolic syndrome and do not misuse alcohol. It was agreed that as the health benefits of identifying NAFLD in children, where present, would extend over a longer time-horizon than adults, this test should be offered to children in which NAFLD is suspected. However, due to the uncertain evidence base, it was agreed that a research recommendation was also warranted for this population.
The GDG was informed, by evidence from the review in Chapter 5, on risk factors for NAFLD and the economic model for the frequency of retesting for presence of NAFLD in those who had a negative test result. This review suggested that it may be appropriate to consider retesting adults with NAFLD and the aforementioned risk factors every 6 years. The GDG considered whether this retesting frequency would be appropriate in children and young people. However, there was concern that children and young people are rapidly developing and experiencing hormonal changes which may affect their risk of developing NAFLD. Furthermore, type and volume of food intake and type and frequency of physical activity undertaken changes immensely in younger people over short periods of time. For these reasons, the GDG agreed that a recommendation for retesting every 3 years in children and young people was warranted, based on expert opinion, so as not to miss the development of NAFLD.
Trade-off between net clinical effects and costsNo relevant published economic evaluations were identified.
Original cost-utility analysis was conducted for this guideline to address the questions in this review and also Chapters 5, 7 and 8.
This analysis found that FLI was the most cost-effective of the 8 diagnostic tests and 2 non-testing strategies being compared for adults, followed by ultrasound, MRS and MRI. The GDG noted that the cost-effectiveness of all the tests was similar, as the overall difference in future health (in QALYs) for a person following the addition of testing with 1 or other of these different tests was small. A probabilistic sensitivity analysis showed that 5 of the 8 tests (along with not testing and not treating anyone for NAFLD) could be the preferred strategy within the bounds of 95% confidence. The probabilistic sensitivity analysis conducted showed FLI was the preferred option in 34% of simulations for metabolic syndrome with 6 years retesting, with MRI being preferred in 26% and no testing being most cost-effective in 15%.
The model showed that the difference in the cost-effectiveness of different retesting intervals was small, but 6-yearly retesting was favoured for the base case (ICERs for people with type 2 diabetes: £13,538 per QALY gained for 6-yearly retesting compared to 7-yearly retesting, but £66,451 per QALY gained for 5-yearly testing compared to 6-yearly testing; with similar figures for metabolic syndrome).
The GDG noted that the cost of conducting FLI is very small, and the difference in costs between strategies largely consists of the treatments given and additional further tests undertaken by people after they are diagnosed with NAFLD. The average health benefit per person is also small. However, the model does assume in the no testing strategy that, not only are people not tested at the start, but they will not be tested for NAFLD, advanced fibrosis or cirrhosis at any later stage, and they will not need to receive any treatment (and so incur any cost) unless and until they reach the stage of symptomatic decompensated cirrhosis. This is likely to be a cautious assumption, diminishing the effectiveness of testing compared to no testing.
The GDG also considered the practical feasibility of offering alternative diagnostic tests. The GDG noted that FLI could be easily conducted by GPs in primary care, without referring individuals to secondary care for initial testing. In contrast, ultrasound is not routinely accessible in most primary care practices, and so a recommendation to use ultrasound for diagnosis would require the referral of a very large number of people to secondary services. Since such services do not currently have spare ultrasound capacity, this would require a large upfront increase in ultrasound equipment and personnel in order to fulfil such a recommendation. MRI has the same disadvantages, but more so, as an upfront increase in capacity would be even more expensive. In addition, it is likely that some people would either be unwilling to be referred or would not take up their appointment in secondary care, and so a strategy where consultation and testing can be conducted in a GP surgery within a single visit is likely to maximise the number of people taking up the offer of diagnostic testing. Therefore, not only is FLI the most cost-effective of the diagnostic tests, it would also be the easiest to practically implement.
The GDG explored the robustness of the results of the original economic analysis by conducting extensive sensitivity analyses.
The base case assumes a very modest benefit from a lifestyle modification programme (see Chapter 13), involving a benefit to quality of life during the year of the intervention, but no lasting benefit. The effect of not offering lifestyle modification was modelled, and testing was still preferred to no testing. It was noted that a similar programme would probably be already offered to patients with the underlying condition and no NAFLD. However, the GDG considered that the current national uptake of such advice is poor and the GDG considered that people are more likely to take up a lifestyle modification intervention if they are also diagnosed having NAFLD and, in addition, benefit more from it.
In the base case, it was assumed that people offered a test would not require an additional GP appointment since this could be conducted during a routine appointment as part of the management of their underlying condition (such as type 2 diabetes). However, this was tested by adding in the cost of an additional appointment in each testing cycle for people receiving diagnostic tests to take into account the extra time needed to discuss the purpose, benefits and limitations of NAFLD testing with the patient. This made a small difference to overall costs but did not alter the order of ranking of the strategies.
The parameter of NAFLD prevalence in every risk factor group was also discussed since there was a recorded variation in its values depending on the selected source. In order to avoid using contradictory prevalence data, a single source, which reported data for all risk factors, was selected. However, it was noted that although the true values of this parameter could be indeed lower or higher, the effect of lowering or increasing the prevalence on the results was small to moderate.
The GDG also noted the relatively high proportion of deaths of people in the model from liver-related causes. The GDG believe this is consistent with recent studies following up the causes of death of people with NAFLD,13,44,232 but was aware that this is a higher death rate than is typically reported in national mortality statistics. The GDG believe that this is likely to be due to systematic underreporting of liver problems as a cause of death, combined with a higher rate of death from liver-related causes in people with type 2 diabetes than in the general population (2.5 times the average rate36). Additionally, the fact that deaths from liver-related causes continue to rise, whilst deaths from other causes (notably cardiovascular disease) continue to fall; this model predicts that deaths from liver-related causes will continue to rise in the future. The GDG was also aware that people with NAFLD typically have a higher risk of death from cardiovascular causes than the general population (HR 1.55 according to 1 recent study44). As the rates of other (not liver-related) death in this model were taken from the general (age-related) population, these may underestimate the risk of other-cause death. To test the possible impact of these factors, sensitivity analyses were undertaken decreasing liver-related mortality rates; increasing other-cause mortality rates; decreasing the progression of people from each stage of NAFLD or fibrosis to the next, thus lowering the number of people with cirrhosis and so dying from cirrhosis; and combinations of these. Only when the transition from advanced fibrosis to cirrhosis was decreased by half did no testing become cost-effective compared to testing.
Finally, the GDG noted that the age for starting the model was set at 45 in the base case, in line with the average age of people receiving diagnostic tests in the studies included in the clinical review of accuracy of the diagnostic tests. However, this may be lower than would be expected for people with metabolic syndrome or type 2 diabetes. The NICE Type 2 diabetes guideline NG28 used an age of average diagnosis for type 2 diabetes of 57.8 years. If the starting age is increased in the economic model, testing (using FLI) becomes decreasingly cost-effective compared to no testing. The ICER for FLI compared to no testing at 58 years is £17,514 per QALY gained. If additional variations are made to the model favouring no testing (such as increasing the number of GP appointments or removing the effect of lifestyle modification), then that increases this ICER further, and makes FLI less cost-effective. In considering these analyses, the GDG noted that testing is cost-effective at a threshold of £20,000 per QALY under the base case conditions it pre-specified, but not under all sensitivity analyses; in particular, if the average age of people when first tested was to be increased to 58 or higher while also removing the lifestyle modification intervention from the patient pathway. As a result, the GDG cannot be absolutely sure if a strategy of testing everyone with type 2 diabetes or metabolic syndrome for NAFLD would be cost-effective for those specific populations. Therefore, due to the variation in the cost-effectiveness results for all tests under these scenarios and the pre-specified uncertainty in the underlying evidence base (FLI diagnostic accuracy, including uncertainty in the specificity), testing for NAFLD was not recommended.
No economic analysis was conducted relating to children and young people under 18 years due to a lack of data on the diagnostic accuracy of tests for NAFLD in under 18s. However, the GDG noted that children and young people have a longer potential life ahead of them, and hence successful treatment due to early identification would be expected to lead to a greater potential future benefit in terms of QALYs gained compared to adults. Combined with the considerations noted above of potential faster progression of liver disease in this group, the GDG considered that this was likely to make it cost-effective to retest children and young people at risk of NAFLD more frequently than for adults.
Quality of evidenceMany of the included studies included large numbers of people with NAFLD, although the quality of the assessed evidence was all low or very low by GRADE criteria. Much of the evidence was assessed as being at serious or very serious risk of bias due to issues with patient selection, unclear reporting on whether the index test results were interpreted without knowledge of biopsy findings, lack of pre-specified thresholds for the index tests and unclear timing between index test and biopsy. Further adding to the downgrading of the evidence was the imprecision around the effects and heterogeneity of results.
In addition, the GDG observed that the identified studies used a range of thresholds to maximise diagnostic accuracy within their study populations, with the thresholds rarely pre-specified. This may reflect that some of the diagnostic methods being assessed are relatively new technologies, which operators are still learning to use and for which normal ranges have not yet fully been defined. The following ranges of thresholds were used for each test:
  • When diagnosing steatosis greater than or equal to 5%: CAP: 219–289 dB/m (median 250); FLI: 60–79; MRI: 0–5.7 % (median 3.42); MRS: 1.8–4.73 (median 2.6); NAFLD-LFS: 0.16; SteatoTest: 0.38; ultrasound: unclear threshold interpretations.
  • When diagnosing steatosis greater than or equal to 30%: CAP: 230–311 dB/m (medians 285 and 288); FLI: 82–93.9; MRI: 6.5 %; MRS: 2.7–10.2 (median 7.7); NAFLD-LFS: 0.16; SteatoTest: 0.69–0.94; ultrasound: unclear threshold interpretations.
The original economic analysis developed for this guideline was of high quality, being directly applicable and with minor limitations.
Other considerationsAlthough there are many causes for steatosis and consequently a potentially wide differential diagnosis in practice, the principal differential is between primary, metabolic syndrome-related NAFLD and alcohol-related liver disease. Discriminating these is reliant upon a detailed history and seeking corroboration from family members where available to ensure that any history of occult excessive alcohol consumption is excluded. Ethanol consumption below a threshold determined by the prevailing low risk drinking advice in a country (possibly in line with WHO descriptions of hazardous drinking levels) is adopted to sustain a diagnosis of NAFLD. It was noted this guidance issued by the chief medical officer has recently changed for the UK. For people with abnormal liver blood tests and either suspected or confirmed fatty liver, alternative causes must be excluded with a detailed drug history and laboratory tests for chronic viral hepatitis (HBVsAg and HCV serology), autoimmune liver disease (ANA, AMA, SMA, LKM1 antibodies, immunoglobulins) and other treatable metabolic diseases (haemochromatosis, Wilson's disease, coeliac disease, alpha-1 antitrypsin deficiency). The GDG agreed that it would be expected that clinicians using these guidelines would apply their clinical discretion regarding the appropriate degree of further investigation for possible alternate causes for the finding of fatty liver, based on the specific clinical scenario, before confirming a diagnosis of NAFLD.
The GDG discussed the issue of the role of liver blood tests in assessing for presence of NAFLD. Specifically, the GDG noted that even though NAFLD is a very common cause of abnormal liver blood tests, published data have demonstrated that the majority of people with NAFLD (more than70%) in fact have normal serum liver enzyme levels.20 The GDG's shared experience was that many clinicians have the misperception that the finding of a person having normal liver blood tests is incompatible with them having NAFLD. The GDG agreed it was important to emphasise in the recommendations that clinicians should not rely on liver blood tests to rule out NAFLD.
The GDG noted that, although alcohol-related liver disease is considered to be rare in children and young people, the risks of alcohol intake should still be considered in a paediatric population and should be part of the clinical consultation; therefore, hazardous drinking should be taken into account when diagnosing NAFLD.

Research recommendation
The GDG made 3 high-priority research recommendation to; identify the most accurate non-invasive tests to diagnose NAFLD in adults, a separate research recommendation for children and young people and a further research recommendation for the diagnosis of advanced fibrosis. See Appendix Q for further details.
It was highlighted that a research recommendation that would contribute to the evidence base for a diagnosis of NAFLD in adults would potentially make the evidence base sufficiently robust to inform future updates of this guideline and would be high priority to inform the current gap in the pathway for adults.
Copyright © National Institute for Health and Care Excellence 2016.
Bookshelf ID: NBK384715

Views

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...