Comparison of acupuncture with other physical treatments for pain caused by osteoarthritis of the knee: a network meta-analysis

Hugh MacPherson; Andrew Vickers; Martin Bland; David Torgerson; Mark Corbett; Eldon Spackman; Pedro Saramago; Beth Woods; Helen Weatherly; Mark Sculpher; Andrea Manca; Stewart Richmond; Ann Hopton; Janet Eldred; Ian Watt

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

MacPherson H, Vickers A, Bland M, et al. Acupuncture for chronic pain and depression in primary care: a programme of research. Southampton (UK): NIHR Journals Library; 2017 Jan. (Programme Grants for Applied Research, No. 5.3.)

Acupuncture for chronic pain and depression in primary care: a programme of research.

Show details

Contents

< Prev Next >

Chapter 3Comparison of acupuncture with other physical treatments for pain caused by osteoarthritis of the knee: a network meta-analysis

Background

Osteoarthritis is a degenerative condition involving the progressive wearing down of (joint) bone and cartilage, normally resulting in pain, stiffness and functional disability. These symptoms usually worsen according to how much the affected joint is used. In adults aged ≥ 45 years, the knee represents the most common site of peripheral joint pain and the prevalence of painful, disabling knee osteoarthritis in people aged > 55 years is 10%.¹⁵⁸ Risk factors for knee osteoarthritis include age, sex, obesity, bone density, genetic factors and injury.

Diagnosis is usually made using clinical features of knee osteoarthritis, by radiological assessment of the knee or by a combination of the two. Radiographic features – the severity of which are commonly summarised using the Kellgren and Lawrence score¹⁵⁹ – have been significantly associated with knee pain.¹⁶⁰

The WOMAC index is a self-administered disability status measure for knee (or hip) osteoarthritis.¹¹⁵ Its individual components assess pain, stiffness and function, with the summed scores producing an overall measure of disability (WOMAC index). As a standardised and comprehensive assessment of disability and its components, the WOMAC index increases transparency and comparability within clinical research.

The treatment of knee osteoarthritis should be tailored according to knee risk factors (obesity, adverse mechanical factors, physical activity), general risk factors (age, comorbidity, polypharmacy), level of pain intensity and disability, sign of inflammation, and location and degree of structural damage.¹⁶¹ The main objective of a GP treating a patient with knee osteoarthritis is normally alleviation of pain; failure to control pain may result in reduced mobility and daily activities, leading to a reduction in quality of life.¹⁶¹ The more sedentary lifestyle that might follow may, in turn, exacerbate the symptoms of knee osteoarthritis through lack of exercise and joint movement, and weight gain.

In clinical practice, treatment often begins with analgesia [paracetamol and/or topical non-steroidal anti-inflammatory drugs (NSAIDs)] and, when these are ineffective, a cyclooxygenase-2 inhibitor is recommended. GP advice about exercise and weight loss, which NICE guidelines⁸² recommend as part of core therapy, is often given in addition to (rather than instead of) analgesic drugs. The regular and long-term use of pharmacological agents such as NSAIDs for pain may be associated with side effects, such as gastrointestinal bleeding, without necessarily resulting in worthwhile pain reduction.¹⁶² A UK review of qualitative studies of medicine taking¹⁶³ revealed considerable reluctance to take drugs and a preference to take as little as possible; many knee osteoarthritis patients want non-pharmacological treatments for pain relief.¹⁶⁴ The use of physical (i.e. non-pharmacological) treatments such as acupuncture is therefore likely to be attractive for patients seeking alternatives, particularly for a condition such as osteoarthritis of the knee for which there is currently no cure.

In patients for whom insufficient pain relief has been provided by the core interventions mentioned above (as recommended by NICE), coupled with paracetamol and/or topical NSAIDs, GPs may consider a range of physical treatments as the next step in the treatment pathway. The NICE guidelines⁸² list muscle-strengthening and aerobic exercise, manual therapy, TENS, braces and insoles, weight loss, and heat and cooling treatments as being among such alternatives, but acupuncture was not recommended.

Other non-pharmacological interventions used for osteoarthritis of the knee, but which would not be considered as alternatives to acupuncture, include surgery, an intervention that would be considered at a later stage in the treatment pathway. Similarly, structured psychosocial/educational interventions are generally considered for a different group of patients, that is, when pain-reducing therapies have failed and the emphasis is on a need for pain-coping skills, rather than pain reduction.¹⁶⁵

Many reviews have been undertaken of the varying types of physical therapies for osteoarthritis of the knee, but evaluation of a single therapy for a single condition provides only a limited basis for decision-making. Few randomised trials have directly compared physical therapies and no review has attempted to address the question of how effective such treatments are relative to each other using statistical methods. The focus of interest within our study was on acupuncture, as this review was funded as part of this programme of projects on acupuncture and chronic pain, and because of the uncertainty within the NICE decision-making process with regard to the level of evidence on acupuncture for osteoarthritis relative to other physical treatments.⁸⁴ The purpose of this systematic review, therefore, was to comprehensively synthesise both the direct and the indirect evidence – using mixed-treatment comparison methods in a network meta-analysis – to compare the effectiveness of different physical therapies used for the alleviation of knee pain caused by osteoarthritis.

In a separate substudy within this project we summarise the reporting methods of the WOMAC pain subscale and the WOMAC index from the trials identified for the main study, and make recommendations to improve reporting in future studies.

Methods

Using the methods recommended by the Centre for Reviews and Dissemination (CRD)¹⁶⁶ and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement,¹⁶⁷ the systematic review and network meta-analysis was first conducted in 2010. A report based on this study is available on the CRD website.¹⁶⁸ This chapter reports an update of this systematic review and network meta-analysis conducted in 2013 and published in Osteoarthritis and Cartilage.¹⁶⁹

Literature search

We searched 17 electronic databases from inception to June 2013. A combination of relevant free-text terms, synonyms and subject headings relating to osteoarthritis of the knee and named physical treatments were included in the strategy. A search filter was used to limit retrieval of studies to RCTs. No language or date restrictions were applied.

The base search strategy developed in MEDLINE was translated to run on the databases listed in Appendix 3. Adaptations to the search strategy were necessary for certain databases: Manual, Alternative and Natural Therapy Index System (MANTIS), PASCAL (database of the Institut de l’Information Scientifique et Téchnique), Inside Conferences, Physiotherapy Evidence Database (PEDro), CAMbase (Complementary and Alternative Medicine), Latin American and Caribbean Health Sciences Literature (LILACS) and ClinicalTrials.gov. Supplementary internet searches of websites relating to osteoarthritis were undertaken to locate any additional studies not found from the database searches. The bibliographies of all relevant reviews and guidelines were checked for further potentially relevant studies. The base MEDLINE search strategy can be found in Appendix 3.

Study selection and definitions of interventions and outcome

All abstracts were screened by two reviewers independently, followed by all relevant full papers. Disagreements were resolved by discussion or, when necessary, by a third reviewer. We included RCTs in adults with osteoarthritis of the knee (in which the mean age of the population was ≥ 55 years) that assessed pain as an outcome. Studies with mixed populations (e.g. including both patients with osteoarthritis of the knee and those with osteoarthritis of the hip) that presented results by site of osteoarthritis were eligible for inclusion. Trials of acute knee pain or trials in which the mean age of the population was < 55 years were excluded.

We included treatment with the following: acupuncture, balneotherapy, braces, aerobic exercise, muscle-strengthening exercise, heat treatment, ice/cooling treatment, insoles, interferential therapy, laser/light therapy, manual therapy, neuromuscular electrical stimulation (NMES), pulsed electrical stimulation (PES), pulsed electromagnetic fields (PEMFs), static magnets, t’ai chi, TENS and weight loss. We aimed not to be restrictive with regard to selecting the types of intervention within these categories. However, exercise interventions that were predominantly home based and unsupervised were excluded as being too similar to standard care. Trials evaluating surgery or medication were also excluded, as were studies evaluating the combination of two or more physical treatments and studies comparing only different regimens/durations/modalities of the same type of intervention.

When considering the electrotherapy interventions, we classed studies using ‘pulsed short-wave’ interventions as being PES. Although interferential therapy works in a similar way to TENS, we classed it as a distinct intervention. Similarly, NMES was considered separately from TENS as it is commonly used to elicit muscle contraction, as opposed to TENS, which stimulates nerves with the aim of blocking pain inputs to the brain. PEMFs (in which an electric current is generated in the treated area by means of a magnetic field) was also classed separately.

We classified adjunctive components to the main intervention into five categories, based on what was reported in the trials: (1) treatment as usual, (2) treatment as usual plus specified home exercise or education, (3) treatment as usual plus specified (trial-specific) analgesics, (4) no medication and (5) no medication plus specified home exercise or education. Using this coding we explored the impact of different adjunctive components on the main interventions and on variations in standard care.

Eligible comparators included any form of standard or usual care, including waiting-list control (which could incorporate one or more of analgesics, education and exercise advice), all of which we classed as being ‘standard care’. Placebo interventions, no intervention and sham acupuncture were also eligible. Because of evidence suggesting that sham acupuncture is more active than an inert ‘placebo’, it was treated as a separate comparator.¹⁴⁵ Commonly, sufficient details about standard care or usual treatment were not reported.

Data extraction

Using a standardised data extraction form created using EPPI-Reviewer software (version 4.0; Evidence for Policy and Practice Information and Co-ordinating Centre, University of London, London, UK), data were extracted on population characteristics [population type, method of diagnosis, age, sex, weight, body mass index (BMI) and Kellgren and Lawrence score], intervention parameters and study quality. Data on pain assessment at baseline, at the end of treatment and at all subsequent time points were extracted onto a Microsoft Excel^® 2010 spreadsheet (Microsoft Corporation, Redmond, WA, USA). Data extraction was performed by one reviewer and independently checked by a second reviewer. Any disagreements were resolved by discussion or by a third reviewer when necessary. Data from non-English-language papers were extracted by one reviewer together with a native speaker. Multiple publications of the same study were extracted as one study, using all of the information available.

Assessment of trial quality

Trial quality was assessed using 14 questions adapted from a checklist used in a previous review by researchers at the CRD.¹⁷⁰ Based on the number of criteria satisfied, studies were then graded as excellent, good, satisfactory or poor. To be of satisfactory quality studies had to report the number of randomised participants; have groups with comparable baseline characteristics for important variables, such as pain; adequately report eligibility criteria; clearly report on losses to follow-up; report data for the intention-to-treat population; and use an appropriate placebo (if relevant). Poor-quality trials were those that failed to satisfy one or more of the criteria required for satisfactory study quality. Beyond questions relating to the above grading, other questions in the assessment covered methods of randomisation and allocation concealment, level of blinding, use of a power calculation and level of losses to follow-up. A further quality assessment was conducted using the Cochrane risk-of-bias tool.¹⁷¹ Quality assessments were performed by one reviewer and independently checked by a second reviewer. Any disagreements were resolved by discussion or by a third reviewer when necessary.

Outcomes and data transformations

It was anticipated that pain – our primary outcome – would be measured using a variety of measures, for example a VAS, Likert scale, WOMAC pain subscale and Arthritis Impact Measurement Scale (AIMS), with all scales accepted.

The WOMAC is a widely used, self-administered health status measure that assesses the dimensions of pain, stiffness and physical function in patients with osteoarthritis of the hip or knee. It is available in five-point Likert, 11-point numerical rating and 100-mm VAS formats. Under each dimension there are a number of questions designed to assess the clinical severity of the disease (five questions for pain, two questions for stiffness and 17 questions for physical function). The patient’s response to each question produces a score, with the scores summed to derive an aggregate score for each dimension. There are three subscale scores (pain, stiffness and physical function) and a total score (WOMAC index), which reflects disability overall.

The WOMAC pain score range has been reported in various ways: a VAS 0–10 scale (commonly reported across a 0–50 range), a VAS 0–100 scale (commonly reported as a 0–500 range) or a Likert scale (commonly reported as a 0–20 range). The overall WOMAC score (index) is determined by summing the scores across the three dimensions and the score range includes the following: a VAS 0–10 scale (commonly reported as a 0–240 range), a VAS 0–100 scale (commonly reported as a 0–2400 range) and a 0–4 Likert scale (commonly reported as a 0–96 range). A number of transformations and modifications are reported in the literature.

The preferred measure of pain was the WOMAC pain scale (using either a VAS or a Likert scale). Another pain scale was included in the analysis when a trial did not measure the WOMAC pain scale, with prioritisation of pain scales made on a clinical, or prevalence, basis. The secondary outcome was the WOMAC index. Studies that did not report a pain outcome were excluded from the review. Outcome data were extracted for different time points: baseline, end of treatment and any follow-up time point.

As a variety of pain scales were used, Hedges’ g SMDs between treatment groups were calculated for the meta-analyses (studies reporting medians could not be analysed). Different doses/regimens of the same type of treatment within a study were pooled. Final values were used in the analysis to maximise the evidence available and to avoid the need to make assumptions about within-patient correlation between baseline and final values, which the use of change from baseline data would have necessitated. For trials reporting change from baseline but not final values, we calculated final values provided baseline data were reported along with variance estimates (e.g. SDs). When the number of patients included in a trial’s analysis was not reported, but the number of patients randomised was, we estimated the number of analysed patients by multiplying the number of patients randomised by the average proportion of patients included in an analysis across trials. SEs or 95% CIs were used to derive SDs when they were not reported. When this was not possible, trials that used the same or a similar scale as that used in the trial with missing SDs were identified and their SDs pooled, with this imputed estimate being used. We present results as SMDs, as well as SMDs converted to the WOMAC VAS 0–100 pain scale, to provide more clinically meaningful results.

Evidence synthesis

A network meta-analysis draws on both direct (treatments compared in the same trial) and indirect evidence (different treatments studied in separate trials, but compared when they share the use of a common comparator treatment). The summary treatment effect from each study is utilised, so the benefit of randomisation in each study is retained. To conduct a meta-analysis of trials, study characteristics must be similar within a comparison. For indirect and direct evidence to be consistent, study characteristics must be similar across comparisons.⁸⁵^,⁸⁶^,¹⁷²^–¹⁷⁵

We planned analyses for three different time points to assess both the immediacy and the durability of effects: (1) end of treatment, which was our primary time point, as defined in the individual studies; (2) 3 months from the start of treatment, which was the time point closest to 3 months from the start of treatment (excluding outcomes recorded at < 4 weeks from the start of treatment); and (3) 3 months after the end of treatment, which was the time point closest to 3 months, but between 8 and 16 weeks, from the end of treatment. However, there was a paucity of long- and medium-term data across the trials, and for the 3 months after the end-of-treatment time point no connected network incorporating acupuncture existed and this time point was evaluated by only 21 trials. Data for the 3 months from the start of treatment analysis were very similar to data for the end-of-treatment analysis because for around two-thirds of trials the two time points were the same. Furthermore, in most studies, the primary time point specified by investigators was the end of treatment and this time point produced the largest network, incorporating more interventions, and studies, than the other time points. We therefore report the results for the end-of-treatment time point.

To evaluate the impact of study quality on the results, two sets of analyses were performed: one included all studies regardless of quality (labelled ‘any quality’) and one was a primary sensitivity analysis including only studies of satisfactory, or better, quality (labelled ‘better quality’). Studies with atypical populations, interventions or results were excluded in a second sensitivity analysis. These studies were identified from pairwise meta-analyses conducted (in RevMan 5.0; The Cochrane Collaboration, The Nordic Cochrane Centre, Copenhagen, Denmark) using outcomes recorded at the end of treatment only. These studies were not intended as a comprehensive stand-alone synthesis, but as a means of informing and complementing the network meta-analysis. In particular, they were used to investigate the within-intervention clinical and statistical heterogeneity. We assessed for possible publication bias using a funnel plot when enough studies within individual treatments were available. This was deemed to be appropriate only for muscle-strengthening exercise and the funnel plot provided no evidence to suggest publication bias.¹⁶⁸

Analyses were conducted using WinBUGS software (version 1.4; MRC Biostatistics Unit, Cambridge, UK), which uses Markov chain Monte Carlo (MCMC) simulation to estimate model parameters and follows a Bayesian approach in which prior probabilities are specified for parameters (these were specified to be vague throughout the analysis). The treatment difference was assumed to be normally distributed and a random-effects network meta-analysis model was selected as clinical and methodological heterogeneity within the treatment definitions appeared likely.¹⁷⁶ A common between-study variance was modelled to allow a between-study variance to apply for comparisons with few data points.

Convergence of the MCMC chains was assessed by observing the history of the traces of the starting values for selected priors, the Brooks–Gelman–Rubin statistic and posterior distributions.¹⁷⁷ The first 10,000 iterations were discarded and then a further 50,000 iterations were conducted on which parameter estimates were based. The model fit was evaluated using the residual deviance, with this being approximately equal to the number of data points if the fit was good.¹⁷⁴^,¹⁷⁸^–¹⁸⁰

Inconsistency in the treatment effect estimates derived separately from direct and indirect evidence was assessed for many of the comparisons distributed across the networks using the node-splitting method where a p-value is 2 × min(prob, 1 – prob), where ‘prob’ is the probability that the direct estimate is higher than the indirect estimate.¹⁷²^,¹⁷⁹

Uncertainty in all estimates is presented using the upper and lower limits of the 95% credible intervals (CrIs) of these estimates. These credible limits describe the boundaries within which it is believed that there is a 95% chance that the true value lies. The median rank of each intervention and the 95% CrIs of the rank are presented to summarise the uncertainty across all of the treatment effect estimates.¹⁸¹

To present more clinically meaningful network meta-analysis results, we present both SMDs and the SMDs converted to the WOMAC VAS 0–100 pain scale (although it is acknowledged that back-transformation can be of limited value in heterogeneous populations).¹⁸² A pooled SD for the WOMAC VAS 0–100 pain scale was calculated from all of the arms of the six trials in the analysis that utilised this scale. The SMDs were then multiplied by this pooled SD (16.49) to produce a difference in WOMAC VAS 0–100 score.

In the substudy within this study that explored reporting of the WOMAC pain subscale and WOMAC index, further details were extracted for those trials that utilised the WOMAC: scale used (Likert/VAS 0–10, VAS 0–100/NRS); whether the WOMAC pain subscale or the WOMAC index was used; and whether any modifications were reported. In the light of inconsistencies and lack of clarity identified during the review, the WOMAC outcome details were re-examined by a third reviewer (NFW) and further information was extracted as necessary to address the following four questions:

Was it clear that all assessments had been conducted?
Was the score range clear?
Were details reported on how the final score had been calculated (sum, average or transformation to 0–100 scale)?
Were baseline scores reported (and approximate baseline score)?

In addition, the scale used and the score range that could be deduced from the information provided in each paper was recorded and the ease of identification of these was categorised as clearly stated in the paper (stated), required assumptions to be made (assumed) or unclear. All information that could support any assumptions, including baseline score (or, when not reported, follow-up scores), was also recorded. Further details of this substudy’s methods are reported elsewhere.¹⁸³

Results

In total, 3820 references were retrieved from searches, of which 156 trials (detailed in Appendix 3) including 18 distinct interventions and four comparators met the inclusion criteria. Four of 10 foreign-language papers that appeared eligible based on their English abstracts could not be translated and so had to be excluded. Thirty-eight trials reported data in ways that meant they could not be incorporated in the network meta-analyses. One study was found to have been retracted and was subsequently removed from all analyses. A study flow diagram is presented in Figure 4.

FIGURE 4

Study flow chart.

Study characteristics

An overview of all eligible studies is presented in Table 15, regardless of whether or not they reported data suitable for network meta-analysis. The mean treatment duration (and timing of the end-of-treatment assessment) varied widely, from just a single session (TENS) to 69 weeks (weight-loss interventions), although the majority of interventions were administered over a 2- to 6-week period. Most studies were classified as having recruited a general knee osteoarthritis population, although weight-loss trials (as expected) recruited only overweight or obese participants. The mean BMI in some studies recruiting a general population fell into the overweight or obese classification, although most studies did not report BMI.

TABLE 15

Summary of characteristics of eligible trials included in the review

Around three-quarters of the studies (110/152) were classed as ‘poor-quality’ studies. Of the remainder, 33 studies were classed as ‘satisfactory’ and nine studies were classed as ‘good’, which together were classed as ‘better quality’. Only 12 trials were considered to be at low risk of bias in the network meta-analysis. Trial quality was commonly compromised by a lack of adequate blinding and small sample sizes, which limited the effectiveness of randomisation, resulting in baseline imbalances. Quality assessment data are presented in Appendix 3. Study quality did vary by intervention, making the evidence base more robust in some areas than in others. No evidence was found for publication bias (only assessable for muscle-strengthening exercise). Individual study characteristics of all studies included in the systematic review can be found in Appendix 3.

Network meta-analysis

Suitable data for the end-of-treatment analyses were reported in 114 trials (9709 patients) (detailed in Appendix 3). This includes data from the 22 new studies identified from the search update conducted in 2013 and nine studies that had been excluded from the original review analyses but which were now included by calculating final values from the change from baseline data. In our original analyses (based on searches up to 2010) there was no indication that the majority of the adjunctive components of the experimental interventions were associated with a treatment effect difference.¹⁶⁹ The one exception was that standard care incorporating active analgesia was more effective than standard care with ‘treatment as usual’ (with or without home exercise/education). However, as analgesic adjuncts were used in only eight trials, and most studies were classified as using the ‘treatment as usual’ adjunct, with little adjunct detail defined, the focus of this study was on comparing the interventions categorised without adjuncts. The resulting network for any-quality studies, with analysis at the end of treatment and interventions categorised without adjuncts, is illustrated in Figure 5.

FIGURE 5

Network diagram for the end-of-treatment analysis of any-quality studies (with interventions categorised without adjuncts). In this figure, each solid arrow indicates that there is a data point for that comparison entered into the analysis. The thickness (more...)

The interventions drawn from the any-quality trials were compared with standard care and acupuncture (Tables 16 and 17, respectively), with caterpillar plots shown in Figures 6 and 7, respectively, and interventions ordered by treatment effect. Across all comparisons, inconsistency at a p-value of < 0.05 was identified only for the two comparisons involving PES. Eight physical treatments had a statistically significant mean beneficial effect compared with standard care, namely interferential therapy, acupuncture, TENS, PES, balneotherapy, aerobic exercise, sham acupuncture and muscle-strengthening exercise (see Table 16 and Figure 6). When acting as a comparator, acupuncture was statistically significantly better at reducing pain than sham acupuncture, muscle-strengthening exercise, weight loss, PEMF, placebo, insoles, NMES and no intervention (see Table 17 and Figure 7).

TABLE 16

Results of network meta-analyses for comparisons with standard care: studies of any quality

TABLE 17

Results of network meta-analyses for comparisons with acupuncture: studies of any quality

FIGURE 6

Standardised mean differences of each treatment compared with standard care for the analysis including studies of any quality. Numbers in parentheses indicate the numbers of studies. ACU, acupuncture; AE EX, aerobic exercise; BAL, balneotherapy; BRA, (more...)

FIGURE 7

Standardised mean differences of each treatment compared with acupuncture for the analysis including studies of any quality. Numbers in parentheses indicate the numbers of studies. AE EX, aerobic exercise; BAL, balneotherapy; BRA, braces; HEA, heat treatment; (more...)

Effect sizes for each intervention are presented in terms of both SMDs and the WOMAC VAS 0–100 pain scale. To help evaluate these conversions, one study reported the minimal clinically important change as –15 mm (on a VAS 0–100 scale and derived from a prior Delphi exercise)¹⁸⁴ and the minimal perceptible clinical improvement (MPCI, the smallest change detectable by the patient) as –9.7 mm (on a WOMAC VAS 0–100 scale).¹⁸⁵ Another study estimated the minimal clinically important improvement (MCII), although only for pain on movement, as –19.9 mm on a VAS 0–100 scale; this figure varied by baseline pain score, with patients with less pain having a smaller MCII (10.8 mm) and patients with severe pain having a larger MCII (36.6 mm).¹⁸⁶

When analysing only the better-quality studies (see Appendix 3) in the primary sensitivity analysis, 35 trials were included, with nine types of intervention and 3499 patients. One study was identified as causing inconsistency in the main analysis (a small study of muscle-strengthening exercise vs. PES) and was therefore excluded.¹⁸⁷ The network is illustrated in Figure 8, in which the analysis is at the end of treatment and interventions are categorised without adjuncts. Uncertainty around the true between-study variance increased because of the reduction in the number of studies per comparison, as well as loops in the network. Most studies were of acupuncture (n = 11) or muscle-strengthening exercise (n = 9), with some interventions represented by few studies.

FIGURE 8

Network diagram for the end-of-treatment analysis of better-quality studies (with interventions categorised without adjuncts).

When compared with standard care, there was a statistically significant reduction in pain for acupuncture, balneotherapy, sham acupuncture and muscle-strengthening exercise (Table 18 and Figure 9). When acupuncture was the comparator, it was statistically significantly better at a 95% level of credibility than sham acupuncture, muscle-strengthening exercise, weight loss, aerobic exercise and no intervention (Table 19 and Figure 10).

TABLE 18

Results of network meta-analyses for comparisons with standard care: studies of better quality

FIGURE 9

Standardised mean differences of each treatment compared with standard care for the analysis including better-quality studies. Numbers in parentheses indicate the numbers of studies. ACU, acupuncture; AE EX, aerobic exercise; BAL, balneotherapy; MU EX, (more...)

TABLE 19

Results of network meta-analyses for comparisons with acupuncture: studies of better quality

FIGURE 10

Standardised mean differences of each treatment compared with acupuncture for the analysis including better-quality studies. Numbers in parentheses indicate the numbers of studies. AE EX, aerobic exercise; BAL, balneotherapy; MU EX, muscle-strengthening (more...)

In terms of ranking, a probability statistic calculated from the treatment effect distributions showed that acupuncture and balneotherapy were the two interventions with the highest rank (Table 20). Because of overlapping CrIs for sham acupuncture, muscle-strengthening exercise and t’ai chi, there is some uncertainty around these rankings.

TABLE 20

Ranking of interventions (using only better-quality studies)

In a secondary sensitivity analysis several trials were excluded based on population or intervention differences, or on extreme data;¹³⁶^,¹⁸⁸^–¹⁹³ the results were not sensitive to these changes, although the model fit improved, as reported elsewhere.¹⁶⁹ No network link could be made with the placebo-controlled studies in the analysis of better-quality studies. We therefore conducted a separate network meta-analysis for these studies. Both interferential therapy and heat treatment were statistically significantly more effective than placebo, but laser therapy, PES and insoles were not; these data are also reported elsewhere.¹⁶⁹

In the substudy on the reporting of the WOMAC pain subscale and WOMAC index, the former was reported in 60 (45%) trials and the latter in 31 (23%) trials. Reporting of the exact method used in administering the WOMAC pain subscale scoring was poor in many cases. Overall, only 15 (25%) trials reported unambiguously both the scale and the score range for their use of the WOMAC pain subscale. Only four (13%) trials reported unambiguously both the scale and the score range for their use of the WOMAC index. Further details of the results of this substudy are reported elsewhere.¹⁸³

Discussion

Principal findings

In the comprehensive network meta-analysis that we report here we compared all physical treatments for osteoarthritis of the knee with each other within a coherent framework. This analysis provides the first estimate of the relative effect of these treatments, which can be viewed as essential for decision-makers when comparing treatment effects. By providing a basis for synthesising all of the available evidence in a consistent framework, a network meta-analysis obviates the need to make decisions based on subjective inferences from disconnected data.

Compared with standard care, eight of the 22 interventions that we evaluated produced a statistically significant reduction in pain: interferential therapy, acupuncture, TENS, PES, balneotherapy, aerobic exercise, sham acupuncture and muscle-strengthening exercise. Of these eight, only two interventions were represented by more than three trials in the sensitivity analysis of better-quality studies: acupuncture (11 trials) and muscle-strengthening exercise (nine trials), with acupuncture having statistically significantly better outcomes. Acupuncture and balneotherapy (only one trial) were the two interventions with the highest rank, although there is some uncertainty around this. For the better-quality placebo-controlled studies, interferential therapy (one trial) showed a strong effect compared with placebo.

Strengths and limitations

Numerous systematic reviews, some summarised in a review of reviews,¹⁹⁴ have evaluated the interventions (or classes of interventions) included in this review. However, our analysis represents the use of the most practical methods currently available to compare a large number of different types of treatment, enabling a fair comparison to be made of competing physical treatments (including acupuncture) with each other.

A network meta-analysis requires an assumption of exchangeability between the trials in the same way as is required for a standard meta-analysis. With regard to concerns that might arise from within- or between-intervention heterogeneity, we sought to minimise these by using an age restriction as part of our inclusion criteria and by excluding interventions consisting of more than one physical treatment. We found that patient characteristics appeared to be broadly comparable across interventions. Inevitably, there will be some clinical heterogeneity in a wide-ranging study such as this, but as far as it was possible to tell, given the wide variation of scales used, baseline pain did not appear to vary systematically between interventions.

We used a random-effects model to incorporate heterogeneity and we evaluated levels of inconsistency and model fit. We also conducted sensitivity analyses excluding trials causing heterogeneity. Although heterogeneity is accounted for in our results with the CrIs, it is possible that unknown confounding factors may be affecting the results of indirect comparisons. With regard to trials of placebo interventions, the majority used electrical or electromagnetic interventions and so it is not unreasonable to assume that the placebo effects were similar (as the interventions were similar). In our review the trials, covering a diverse range of interventions, were all assessed using the same quality assessment tools, which enabled equivalence in the comparisons and better interpretation of the evidence base for each intervention.

Our sensitivity analysis of the better-quality studies resulted in fewer trials per comparison and fewer network loops. This led to greater uncertainty about the true heterogeneity and about the differences between the direct and the indirect evidence. The uncertainty associated with inconsistency may not be fully captured in the results because fewer loops in relation to the size of the network meant that there were fewer data to quantify inconsistency.

We were not able to include all of the studies in our analyses because of the variable reporting of pain results. Moreover, our analyses focused only on the end-of-treatment data and these were available mostly for short-term time periods. Of the trials that investigated effectiveness over medium- or long-term time periods, only a few provided the data required for our analyses. However, a comparison of the maximum effect of interventions is not without merit, given that the treatments under consideration are not intended as being cures and that any treatment effect might be expected to attenuate over time.

It is important that our results are evaluated in context. Most of the studies in our review were rated as being of poor quality. Many of the better-quality studies were pragmatic trials in which blinding of patients was not possible, that is, most studies are likely to have been subject to some form of bias. For this reason there can be methodological limitations in clinical trials of physical treatments that are often inherent and unavoidable. For the trials in which patients were not blinded and treatments were compared with standard care, the overall treatment effect is likely to incorporate non-specific (placebo) effects. We assumed that such non-specific effects were similar across all interventions, but variation may in fact be present. However, there were also limitations that could have been avoided by triallists using better methodology and reporting practices. For example, in our substudy on WOMAC reporting,¹⁸³ we found poor reporting of both the WOMAC pain subscale and the WOMAC index, which in turn resulted in significant uncertainty in the interpretation of the results of individual trials and limited their contribution to our evidence synthesis.

Comparison with the wider literature

In light of our results, it is worth considering what might be the true (or specific) effect of acupuncture. In a Cochrane review,¹⁴² a statistically significant, clinically relevant, short-term improvement in pain was reported (acupuncture vs. waiting list control: SMD –0.96, 95% CI –1.19 to –0.72), a similar finding to what we have reported. A similar effect to ours was also observed in the comparison of acupuncture with sham acupuncture (SMD –0.35, 95% CI –0.55 to –0.15). It is worth noting that the largest study⁷⁰ in this Cochrane analysis, which showed no statistically significant difference between acupuncture and sham acupuncture, was one of two trials that used an intensive sham needling technique, which may have had physiological effects. Also, our analysis included a recent large trial¹²⁵ that used what appeared to be a very active sham. Therefore, the inclusion of trials with sham controls that might be more active than an inert placebo control could lead to pooled results that underestimate the short-term effect of acupuncture. It is of interest to note that the effect size of acupuncture compared with sham acupuncture is of the same order as that seen for NSAIDs compared with placebo (SMD 0.32, 95% CI 0.24 to 0.39), a difference that has also been described as being too small to be clinically significant.¹⁹⁵

An IPD meta-analysis that included an evaluation of acupuncture for patients with knee osteoarthritis was recently reported¹⁹⁶ (see Chapter 2). All included studies were deemed be of high quality because the allocation concealment methods were assessed to be unambiguously adequate. This study also found acupuncture to be more effective than sham acupuncture and with a smaller effect size than when acupuncture was compared with no-acupuncture (usual care) controls. These findings indicate that non-specific effects provide a partial contribution to the pain-alleviating effects of acupuncture. Non-specific effects will also be contributing to the effectiveness of other (non-acupuncture) interventions in our network meta-analysis. When interventions were not controlled by a placebo or a relevant sham, commonly when blinding was not possible, the contribution of non-specific effects to the overall effect cannot be estimated. However, given that there are inherent problems with identifying non-specific effects in interventions involving physical treatments, it is reasonable to assume that fair comparisons between treatments have been made.

There is some evidence to suggest that larger treatment effects are associated with sham acupuncture than with pharmacological or other physical placebos.¹⁴⁵^,¹⁹⁷ However, one of two contrasting factors may impact on the effect of sham acupuncture in a given trial: either there is inadequate patient blinding because of using unsuitable shams or there is the use of physiologically active shams. The former may lead to an overestimation of the true effect of acupuncture, whereas the latter may lead to an underestimation of the true effect. In the trials that we have reviewed in this study we found that important details about sham acupuncture (e.g. depth of insertion) were sometimes poorly reported or were not reported at all. As with the variations in styles of acupuncture, so too were there variations in the types of sham acupuncture, both contributing to the possibility of clinical heterogeneity.

Implications for clinical practice

Five guidelines⁸²^,¹⁶¹^,¹⁹⁸^–²⁰⁰ have evaluated treatment effects on key outcomes of knee osteoarthritis (including pain, function and disability). Only the Osteoarthritis Research Society International (OARSI) guideline²⁰⁰ is unequivocal in its recommendation to offer acupuncture for knee osteoarthritis. The American College of Rheumatology (ACR)¹⁹⁸ conditionally recommended acupuncture but only for patients with moderate to severe pain who are unable or unwilling to undergo total knee arthroplasty. The American Academy of Orthopaedic Surgeons (AAOS)¹⁹⁹ found the acupuncture evidence to be inconclusive and the European League Against Rheumatism (EULAR)¹⁶¹ and NICE⁸² did not recommend the use of acupuncture. Our analyses of the better-quality studies suggest that acupuncture should be considered as one of the short-term physical treatment options for relieving pain caused by osteoarthritis of the knee.

Guidance from all organisations recommended treatment with muscle-strengthening and aerobic exercise, education, weight loss (if required) and, when necessary, paracetamol and/or topical NSAIDs; when these are ineffective, a choice of one or more options from a range of pharmacological and non-pharmacological treatments is sometimes recommended, including TENS, thermal (heat/cooling) treatments, insoles and braces. Some of our results on effectiveness do not concur with existing guidance on the (non-acupuncture) physical treatments: our evidence differs from the EULAR guidelines¹⁶¹ with regard to insoles, braces and weight loss; from the NICE guidelines⁸² with regard to TENS, insoles, braces, weight loss, manual therapy and heat or cooling treatment; from the ACR guidelines¹⁹⁸ with regard to weight loss, insoles, thermal agents and t’ai chi; from the AAOS guidelines¹⁹⁹ with regard to weight loss; and from the OARSI guidelines²⁰⁰ with regard to insoles, braces, heat or cooling treatment, TENS and weight loss. Our analyses found little evidence (of significant differences from standard care, let alone clinically relevant differences) to support such guidance with respect to treating pain, other than for TENS, for which the evidence was of poor quality and likely to be unreliable. It should be remembered, however, that our review was focused on pain outcomes rather than on function, disability or cost-effectiveness.

The clinical relevance of improvements in knee pain scores has been quantified in several ways. In this context, our better-quality trial results appear to indicate that acupuncture produces both a MPCI¹⁸⁵ and quite possibly a minimum clinically important change,¹⁸⁴^,¹⁸⁵ but may yield a MCII only for patients with low levels of pain.¹⁸⁶ A MPCI remains a possibility for muscle-strengthening exercise (with evidence from nine trials). Our better-quality results suggest that few physical treatments are likely to have a clinically relevant pain-relieving effect. The exceptions were balneotherapy, interferential therapy and heat treatment for which we found evidence of effectiveness compared with standard care. However, the results for these three interventions were informed by single small studies and so a cautious interpretation is warranted.

When interpreting effectiveness results, other factors to consider beyond effectiveness are acceptability, safety, rapidity and durability of benefit, convenience, cost and likelihood of patient adherence to treatment.²⁰¹ Given the diverse range of interventions that we studied, these factors will clearly differ between interventions, as well as in relation to pharmacological and other treatments.

Recommendations for future research

To comprehensively assess the value of many of these interventions, larger RCTs, with risk of bias reduced and with longer treatment periods, are needed. Given the stronger evidence on acupuncture and muscle-strengthening exercise in the better-quality trials, there is a need in future studies to determine the optimum timing and parameters of treatment. Ideally, trials should examine the effectiveness of retreatment following treatment cessation (to evaluate durability and attenuation effects), which would match the way that these physical treatments are often delivered in practice.

In the substudy on standards of reporting of WOMAC scales,¹⁸³ we found that in general the reporting of methods and results in RCTs using the WOMAC assessment tool lacked clarity. Poor reporting of WOMAC scales limits the interpretation of trial results and their useability for evidence synthesis. Given that the various versions of WOMAC available are clearly defined and have all been validated, full descriptions by researchers are needed. Adherence to the standard WOMAC scoring system should be encouraged. As an absolute minimum, the type of WOMAC used and the score range must be reported. Clear reporting is important and should not be sacrificed to reduce word count.

Conclusion

The evidence available for our network meta-analyses, in which physical interventions for osteoarthritis of the knee were compared equally with each other within a coherent framework, suggests that the evidence of effectiveness for most interventions is weak. However, when comparing all interventions, whether based on the any-quality or the better-quality trials, acupuncture can be considered as one of the more effective physical treatments for alleviating pain in the short term. Despite the large evidence base found, the methodological limitations associated with many of the trials indicate that high-quality trials of many of the physical treatments are still required.

Copyright © Queen’s Printer and Controller of HMSO 2017. This work was produced by MacPherson et al. under the terms of a commissioning contract issued by the Secretary of State for Health. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.

Included under terms of UK Non-commercial Government License.

Bookshelf ID: NBK409499

Contents