Rapid Evidence Review: Measures for Patients with Chronic Musculoskeletal Pain

Elizabeth S. Goldsmith; Maureen Murdoch; Brent Taylor; Nancy Greer; Roderick MacDonald; Lauren G. McKenzie; Christina Rosebush

Rapid Evidence Review: Measures for Patients with Chronic Musculoskeletal Pain

Authors

Investigators: Elizabeth S. Goldsmith, MD, Maureen Murdoch, MD, MPH, Brent Taylor, PhD, Nancy Greer, PhD, Roderick MacDonald, MS, Lauren G. McKenzie, MPH, and Christina Rosebush, MPH.

Washington (DC): Department of Veterans Affairs (US); 2017 Aug.

Preface

The VA Evidence-based Synthesis Program (ESP) was established in 2007 to provide timely and accurate syntheses of targeted healthcare topics of particular importance to clinicians, managers, and policymakers as they work to improve the health and healthcare of Veterans. QUERI provides funding for four ESP Centers, and each Center has an active University affiliation. Center Directors are recognized leaders in the field of evidence synthesis with close ties to the AHRQ Evidence-based Practice Centers. The ESP is governed by a Steering Committee comprised of participants from VHA Policy, Program, and Operations Offices, VISN leadership, field-based investigators, and others as designated appropriate by QUERI/HSR&D.

The ESP Centers generate evidence syntheses on important clinical practice topics. These reports help:

Develop clinical policies informed by evidence;
Implement effective services to improve patient outcomes and to support VA clinical practice guidelines and performance measures; and
Set the direction for future research to address gaps in clinical knowledge.

The ESP disseminates these reports throughout VA and in the published literature; some evidence syntheses have informed the clinical guidelines of large professional organizations.

The ESP Coordinating Center (ESP CC), located in Portland, Oregon, was created in 2009 to expand the capacity of QUERI/HSR&D and is charged with oversight of national ESP program operations, program development and evaluation, and dissemination efforts. The ESP CC establishes standard operating procedures for the production of evidence synthesis reports; facilitates a national topic nomination, prioritization, and selection process; manages the research portfolio of each Center; facilitates editorial review processes; ensures methodological consistency and quality of products; produces “rapid response evidence briefs” at the request of VHA senior leadership; collaborates with HSR&D Center for Information Dissemination and Education Resources (CIDER) to develop a national dissemination strategy for all ESP products; and interfaces with stakeholders to effectively engage the program.

Comments on this evidence report are welcome and can be sent to Nicole Floyd, ESP CC Program Manager, at vog.av@dyolF.elociN.

Abstract

Objective

Developing successful interventions for chronic musculoskeletal pain requires valid, responsive, and reliable outcome measures. By request of the 2016 State of the Art Conference on nonpharmacological approaches to chronic musculoskeletal pain, the Minneapolis VA Evidence-based Synthesis Program completed a rapid evidence review. We addressed a key question regarding psychometric properties of selected self-report pain measures to assist in adoption of these measures as core outcomes in clinical trials and other research of nonpharmacological approaches to chronic musculoskeletal pain.

Methods

With input from operational partners, we identified 17 English-language candidate measures. All measures assessed pain severity or intensity or pain-related functional impairment. Our primary outcome was the measure’s minimally important difference (MID); secondary outcomes included the measure’s reliability, validity, and responsiveness to change. We searched MEDLINE (Ovid) from January 2000 to January 2017 for English language publications. We also searched reference lists of relevant studies and systematic reviews and websites specific to pain measures of interest, with no publication date restrictions for these searches. We included studies that 1) evaluated at least one of the 17 pain measures; 2) included adults with chronic musculoskeletal pain of at least 3 months duration or adults with musculoskeletal pain described as “chronic” by the study authors; and 3) reported on at least one of the 4 psychometric outcomes listed above. We excluded 1) studies that used non-English language versions of the pain measures; 2) studies of acute musculoskeletal pain or studies of musculoskeletal conditions often associated with chronic pain that did not specify the presence or duration of their participants’ pain; 3) intervention trials, unless the trial also assessed the psychometric properties of their measures and noted this in the abstract; and 4) studies of patients with rheumatoid arthritis, orofacial pain other than temporomandibular disorder, or headache. Abstracts and full text of articles meeting inclusion criteria were reviewed by trained staff, who extracted study/population characteristics and psychometric outcomes. Results were qualitatively synthesized. Our protocol was registered in PROSPERO (CRD42017056610).

Results

Of 1635 abstracts identified, 318 articles underwent full-text review, and 43 met inclusion criteria. Six of the 43 studies included Veteran populations. Eight studies provided MID estimates for 8 of the 17 measures. MIDs for individual measures differed considerably based on study design and analysis methods. Four measures – Oswestry Disability Index (ODI), Roland-Morris Disability Questionnaire (RMDQ), Numeric Rating Scale (NRS), and Visual Analog Scale (VAS) – had data reported on all 4 psychometric outcomes. However, the NRS and VAS, both single-item measures, were often modified across different studies; results from one study might therefore not apply to others using a different version. MIDs, responsiveness, and validity were reported for the Brief Pain Inventory (BPI), Global Chronic Pain Scale (GCPS), PEG, and Short Form 36 Bodily Pain Scale (SF-36 BPS). Responsiveness, validity, and test-retest reliability estimates were reported for the McGill Pain Questionnaire (MPQ), PROMIS Pain Interference (PROMIS-PI), West Haven-Yale Multidimensional Pain Inventory (WHYMPI), and Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC).

Conclusions

Among the multi-item pain measures we assessed, the ODI, RMDQ, and SF-36 BPS had the most complete psychometric evidence within chronic musculoskeletal pain populations. Several additional measures had at least some evidence for psychometric reliability, validity, and responsiveness. Research into pain measurement would be considerably strengthened if future investigators use consistent definitions of chronic musculoskeletal pain, standardized methods for assessing psychometric outcomes, and comprehensive descriptions of their patient populations.

Impacts

Findings from this review can inform recommendations on specific core outcome measures for clinical research on chronic musculoskeletal pain interventions. Further methods research is needed to validate patient-reported pain outcome measures in populations with chronic musculoskeletal pain and develop a framework for determining outcome measurement selection that incorporates feasibility and applicability.

Evidence Report

Introduction

Chronic musculoskeletal pain is a major source of disability and morbidity for Veterans in the US, affecting approximately 60% of Veterans with chronic health conditions in Veterans Health Administration (VHA) primary care.¹ Management of chronic musculoskeletal pain remains challenging, and groups ranging from pain expert coalitions to the National Institutes of Health and the Institute of Medicine have recently called for more focused and strategic pain therapy research.²^,³ As these groups note, successful development and testing of interventions to improve chronic musculoskeletal pain depends on the use of valid, reliable, and responsive measures of pain and pain-related outcomes domains.

Pain-related measures span multiple physical, emotional, and social domains that are affected by chronic musculoskeletal pain. To guide development and use of these measures, experts and stakeholders have formed such initiatives as Outcome Measures in Rheumatology (OMERACT), the Analgesic, Anesthetic, and Addiction Clinical Trial Translations, Innovations, Opportunities, and Networks (ACTTION), public-private partnership with the United States Food and Drug Administration (FDA), and the associated Initiative on Methods, Measurement and Pain Assessment in Clinical Trials (IMMPACT). These groups have published several reviews and compiled recommendations suggesting that pain outcome studies measure multiple domains via multiple modes of assessment.⁴^–⁸

Such expert groups have identified both pain intensity or severity (hereafter “severity”) and pain-related impairment of physical function (hereafter “functional impairment”) as key domains for study, as these reflect both pain symptoms and pain’s impact on people’s daily lives.⁴^,⁶ Functional impairment in particular has been identified as a priority concern for patients,⁹ and is an increasingly common primary outcome domain alongside pain severity. Self-report measures remain the gold-standard mode of assessing core pain outcomes, as they reflect the subjective pain experience, and as existing observer- and laboratory-based pain measures do not consistently reflect clinically meaningful changes in key pain domains.⁴^,⁵^,¹⁰

Researchers who wish to select appropriate self-report pain outcome measures for these key domains still face challenging evidence limitations. There is particular need for measures appropriate for non-pharmacological interventions. While available measures have been developed and adapted for multiple pain conditions and bodily locations, and have been studied in populations with a wide range of demographic traits, existing psychometric property and feasibility evidence is difficult to locate and compare across measures. Additionally, a consensus on ideal measures has not yet been achieved.

Therefore, it would be advantageous to have a core set of measures across intervention studies. This would make it easier to synthesize, disseminate, and provide recommendations to the VHA about the effectiveness and harms of different interventions. Even if evidence does not clearly demonstrate a single best measure or core set, identification of existing evidence would be informative.

As such, the 2016 State of the Art (SOTA) Conference on non-pharmacological approaches to chronic musculoskeletal pain management recognized the potential value of adopting a core set of measures and recommended that VA Health Services Research and Development (HSR&D) convene a small group of researchers to develop a short set of core outcome measures for prospective pain research. The set of measures should cover 2 core patient-reported outcomes: pain intensity and pain-related functioning. The group plans to consider many factors in selecting the core measures, choosing from among measures that have demonstrated suitable psychometric properties in the target population. The group requested a rapid evidence review to describe and compare the key psychometric qualities of commonly used measures, particularly those that might be suitable for clinical trials of nonpharmacological approaches to chronic pain management. These qualities would not be the only criterion for selecting core measures, but could serve as a basic requirement of measures considered candidates for wide implementation.

In conjunction with the topic nominators we identified the population of interest, pain measures to be reviewed, study inclusion and exclusion criteria, and primary and secondary outcomes and developed a protocol (registered in PROSPERO - CRD42017056610).

Key Question

We addressed the following key question:

What specific self-report measures of pain (intensity, severity) and pain-related functional impairment (activity limitations, participation, physical functioning, social role functioning, pain impact, pain interference, pain-related disability) have sufficient information on psychometric properties (eg, minimally important differences, validity, responsiveness, reliability) to consider their adoption for use as core outcome measures in prospective observational research and clinical trials of nonpharmacological approaches to care for persons (including Veterans) with chronic (≥ 3 months) musculoskeletal pain (eg, low back pain, osteoarthritis, and non-traumatic joint pain)?

Included Pain Measures

Our review focused on the following measures of pain intensity/severity, pain-related interference, or pain global change for persons with chronic musculoskeletal pain (as identified by the Operational Partners for the review and the SOTA Planning Committee):

Brief Pain Inventory (BPI)
Defense & Veterans Pain Rating Scale (DVPRS)
Graded Chronic Pain Scale (GCPS)
Hip Osteoarthritis Outcomes Scale (HOOS)
Knee Osteoarthritis Outcomes Scale (KOOS)
McGill Pain Questionnaire (MPQ)
Multidimensional Pain Inventory (MPI, WHYMPI)
Numeric Rating Scale (NRS)
Oswestry Disability Index (ODI)
Patient Global Impression of Change (PGIC)
PEG (assesses [P] pain intensity, [E] enjoyment of life, and [G] general activity)
Patient-Reported Outcomes Measurement Information System - Pain Interference (PROMIS-PI)
Roland-Morris Disability Questionnaire (RMDQ)
SF-36 Bodily Pain Scale (SF-36 BPS)
Visual Analogue Scale (VAS)
Western Ontario and McMaster Universities Arthritis Index (WOMAC)
Wong Faces Scale

Methods

We searched MEDLINE (Ovid) for English-language articles published from 2000 to January 2017. Our search strategy, developed with input from a medical librarian, included Medical Subject Heading (MeSH) terms for Pain Measurement and specific locations/types of pain (eg, Low Back, Shoulder, Chronic) along with title and abstract words. The search was designed to include all study designs, including systematic reviews. The full search strategy is presented in Supplemental Content, Table 1. At the request of peer reviewers, we repeated the search with MeSH and title/abstract terms for fibromyalgia.

We used Google Scholar, the National Center for Biotechnology Information (NCBI), and PubMed to search for Web sites associated with each pain measure and publications not retrieved by our MEDLINE search. Additional articles were obtained by reviewing reference lists of relevant systematic reviews identified in our MEDLINE search and reference lists of included studies. We also reviewed studies suggested by content experts. For these sources, there were no limits on publication date.

Study Selection

Abstracts of studies identified in our MEDLINE search were reviewed by a single investigator or research associate. The full text of potentially eligible articles from the abstract review and all articles identified from reference list searching or suggested by content experts were reviewed by 2 investigators or research associates.

At the abstract and full-text review levels, we included studies that:

1): Evaluated pain measures in adults with chronic musculoskeletal pain of at least 3 months duration (or was described as “chronic pain” by the study authors); if the study included multiple types of pain, at least 75% of the population must have had chronic musculoskeletal pain unless results were reported separately for the chronic musculoskeletal pain group,

2): Reported on self-reported measures of pain or pain-related functioning (17 measures as determined by Operational Partners and SOTA Planning Committee),

3): Reported outcomes of interest: minimally important difference (MID) (primary outcome), test-retest reliability, validity, feasibility (ie, number of items, public domain vs proprietary, self-report vs interviewer-administered), responsiveness, and generalizability.

Our exclusion criteria were as follows:

1): Studies that specified that they used non-English-language versions of the pain measures,

2): Studies of patients with chronic musculoskeletal conditions commonly associated with pain but without specifying that enrolled patients had chronic musculoskeletal pain (eg, osteoarthritis),

3): Trials of interventions for pain that did not note assessment of psychometric properties in the abstract,

4): Studies of patients with rheumatoid arthritis, orofacial pain (other than temporomandibular joint pain – a musculoskeletal condition), or headache.

Data Abstraction

From each eligible study, we abstracted the following:

1): Study/population characteristics: location of study, funding source, pain measures evaluated, time period of assessment (eg, reporting pain over past week, past month, etc), mode of administration, setting, chronic pain condition, study inclusion/exclusion criteria, baseline pain characteristics, sample size, age, gender, and race/ethnicity,

2): Outcomes: MID, reliability, validity, responsiveness, and other psychometric properties.

Quality Assessment

We included only studies that discussed psychometric properties of the pain measures. Trials that used the measures but did not comment on how well the measures performed were not included.

Data Synthesis

We narratively summarized included studies by pain measure to provide an overview of the populations and pain conditions for which the psychometric properties of the measure have been evaluated. We narratively summarized outcomes by psychometric properties. We focused on MID, responsiveness, validity, and test-retest reliability and highlighted comparative effectiveness when reported.

Rating the Body of Evidence

We did not rate the overall body of evidence.

Peer Review

A draft version of this report was reviewed by content experts and clinical leadership, and the report was modified in response to reviewers’ input. Reviewer comments and our responses are presented in Supplemental Content, Table 2.

Results

Literature Flow

After removing duplicate citations, we reviewed 1,635 abstracts and excluded 1,317. Of 318 articles reviewed at the full text level, 275 were excluded (Figure 1). Over 60% were excluded because they did not report outcomes of interest. Other reasons for exclusion were not including a pain measure of interest, using a non-English version of the pain measure, and not defining the study population as having chronic musculoskeletal pain.

Figure 1

Literature Flow Chart.

Overview of Pain Measures and Included Studies

Table 1 below summarizes the characteristics of the pain measures included in the review. Additional information about each pain measure is included in Supplemental Content, Table 3.

Table 1

Overview of Pain Measures.

Overview of Included Studies

We included 43 studies: 23 from the US,¹⁷^,²⁰^,²⁷^–⁴⁷ 3 from Canada,⁴⁸^–⁵⁰ one from South America,⁵¹ 5 from Australia,⁵²^–⁵⁶ and 11 from Europe.⁵⁷^–⁶⁷ Of the US studies, 4 enrolled exclusively Veterans¹⁷^,³⁵^,³⁹^,⁴⁴ and 2 enrolled both Veterans and non-Veterans.²⁰^,³⁷ Study characteristics are presented on Table 2 with additional detail in Supplemental Content, Table 4.

Table 2

Overview of Included Studies.

Study enrollments ranged from 30⁶² to 998,⁴³ with 29 enrolling more than 100 and 3 enrolling more than 500.²⁹^,³⁴^,⁴³ The most common chronic musculoskeletal pain condition was low back pain (LBP) with 16 studies enrolling only LBP patients.²⁸^,²⁹^,³³^,³⁴^,³⁶^,³⁹^,⁴¹^,⁴⁵^,⁵²^,⁵⁵^,⁵⁷^–⁵⁹^,⁶¹^,⁶³^,⁶⁶ Another 13 studies included patients with any chronic musculoskeletal pain.²⁷ ¹⁷^,³⁰^,³⁵^,³⁸^,⁴³^,⁴⁴^,⁴⁷^,⁴⁹^,⁵¹^,⁵³^,⁶²^,⁶⁵ One study reported that 62% of participants were over age 50 years.²⁸ In the remaining 40 studies that reported mean age, values ranged from 32 years⁶⁷ to 80 years.³³ The mean age was less than 50 years in 18 studies, 50 to 59 years in 15 studies, and 60 years and older in 7 studies. In the studies that enrolled exclusively US Veterans, the percentage of women ranged from 8% to 19%. In the remaining studies, 5 studies enrolled fewer than 50% women,³²^,⁴²^,⁵²^,⁶²^,⁶⁴ 29 enrolled 50% or more, and 5 did not report the percentage of women enrolled. Race/ethnicity was reported in 18 of the studies, all but one from the US. The percentage of white enrollees was 75% or higher for 11 of the 18 studies.

No studies meeting eligibility criteria evaluated psychometric properties of the DVPRS or KOOS. DVPRS studies intermixed patients with chronic and acute pain and either had fewer than 75% of patients with chronic pain⁶⁸ or did not specify the percentage with chronic pain.¹² Studies of the KOOS used non-English versions.¹⁵^,⁶⁹

Characteristics of Included Studies for Each Pain Measure

Brief Pain Inventory (BPI)

The BPI is a Likert-type scale (range 0–10) originally designed to measure cancer pain intensity and pain interference.¹¹ Pain intensity is measured by 4 items: current pain, and pain at its least, worst, and average over a time of interest (often the past 24 hours or week). Pain interference is measured for 7 domains: physical functioning, work, mood, walking, social activity, relations with others, and sleep. Scores for each BPI measure range from 0 “no pain/interference” to 10 “pain as bad as you can imagine/complete interference.”

We included 6 studies that evaluated the BPI’s psychometric properties (details of study and participant characteristics are provided in Supplemental Content, Table 4).²⁰^,³⁵^–³⁷^,⁴⁴^,⁵³ One study⁵³ assessed only the BPI’s pain severity subscale.

The BPI was administered by interview in 3 studies²⁰^,³⁵^,³⁷ and by self-report in another 3 studies.³⁶^,⁴⁴^,⁵³

Defense and Veterans Pain Rating Scale (DVPRS)

The DVPRS was developed to provide a standardized pain screening and assessment tool for the Department of Defense and VHA health systems.¹² It includes numeric rating scales for one question about pain intensity and 4 questions about pain interference. The numeric scale for pain intensity ranges from 0 to 10 and is enhanced with descriptors for each of the 11 levels, color-coded bars using traffic light colors where green indicates mild pain and red indicates severe pain, and facial expressions. The pain interference questions address activity, sleep, mood, and stress. We found no studies meeting eligibility criteria for the DVPRS.

Graded Chronic Pain Scale (GCPS)

The Graded Chronic Pain Scale (GCPS), also known as Chronic Pain Grade Questionnaire (CPG) is an interview or self-administered measure used to assess pain intensity and interference related to disability.¹³ It was designed in 1992 for use with chronic pain conditions including musculoskeletal and low back pain. Pain intensity is measured on an 11-point Likert scale from 0-10 anchored by “no pain” (0) and “pain as bad as can be” (10). The disability score is based on the number of days of disability and a numeric rating of pain disability.

We included 3 studies that evaluated psychometric properties of the GCPS (details of study and participant characteristics are provided in Supplemental Content, Table 4).²⁰^,³⁶^,³⁷ Each of the studies assessed both the severity and disability components.

Hip Osteoarthritis Outcomes Scale (HOOS)

The Hip Osteoarthritis Outcomes Scale (HOOS) was developed in 2002 as an extension of the WOMAC scale for hip disability among people with or without osteoarthritis.¹⁴ The self-administered HOOS evaluates pain intensity and interference related to physical functioning. The HOOS consist of 5 subscales: pain, symptoms, daily living limitations, sport and recreation limitations, and hip related quality of life. The HOOS uses a 5-point Likert type scale with anchors of “no problems” (0) to “extreme problems” (4).

One study of the HOOS was included in our review (details of the study and participant characteristics are provided Supplemental Content, Table 4).⁶⁴ We report outcomes from the pain and activities of daily living limitations subscales.

Knee Osteoarthritis Outcomes Scale (KOOS)

The KOOS is an extension of the WOMAC scale. It was designed to assess patient-relevant outcomes following a knee injury or post-traumatic osteoarthritis.¹⁵ Responses to the 42 items are on a 5 point scale ranging from “none” or “never” to “extreme” or “always.” The 42 items are grouped into pain intensity, symptoms, activities of daily living, sport and recreation, and quality of life subcategories. We found no studies meeting eligibility criteria for the KOOS.

McGill Pain Questionnaire (MPQ)

The MPQ measures general chronic pain using 78 items in 20 subscales. It is used to evaluate pain intensity.¹⁶ Respondents are asked to respond to sensory, affective, and evaluative word descriptors of their pain. Responses are used to create a Pain Rating Index (PRI) and/or a Total Number of Words Chosen score. There is also a single item, the Present Pain Intensity (PPI), with pain rated from 0 to 5. Two revised forms of the MPQ exist: the short-form MPQ (SFMPQ) and a revised and extended short-form MPQ (SF-MPQ-2).

We included 4 studies that assessed the psychometric properties of the MPQ (details of study and participant characteristics are provided in Supplemental Content, Table 4).¹⁷^,³⁹^,⁴⁸^,⁵⁹ Each of the studies assessed pain intensity using the Present Pain Intensity¹⁷^,⁵⁹; Total Number of Words Chosen⁵⁹; Total Pain Rating Index¹⁷; total score over the continuous, intermittent, neuropathic, and affective domains³⁹; or Adjective Checklist.⁴⁸ The MPQ was self-administered in all of the studies. One study administered a short-form version ³⁹; the others used the original version.

Multidimensional Pain Inventory (MPI/WHYMPI)

The 52-item MPI, also known as the West Haven-Yale Multidimensional Pain Inventory (WHYMPI) was designed to measure chronic pain, including lower back pain and temporomandibular disorders.¹⁷ It uses a Likert-type scale of 0-6 to measure pain intensity and pain interference. Pain interference is measured for daily activities including vocational, social, and familial functioning.

We included 4 studies that evaluated properties of the MPI (details of study and participant characteristics are provided in Supplemental Content, Table 4).¹⁷^,³⁹^,⁴⁷^,⁴⁹ Each of the studies assess both pain intensity and pain interference.

Numeric Rating Scale (NRS) for Pain

Numerical Rating Scales (NRS) were developed to measure pain intensity for general chronic pain conditions.¹⁶ The NRS studied for this report was typically an 11-point Likert type scale ranging from 0 (no pain) to 10 (severe pain), with subcategories of mild (1-3) and moderate (4-6). This self-administered NRS can be written or verbal. Of the 11 included studies, 2 administered the NRS by mail or by phone.³²^,⁵⁰

We included 11 studies for psychometric properties of the NRS (details of study and participant characteristics are provided in Supplemental Content, Table 4).³²^,³⁸^,⁴²^,⁴⁶^,⁵⁰^,⁵¹^,⁵⁴^,⁵⁶^,⁵⁸^,⁶³^,⁶⁶ All of the studies used the NRS to assess pain intensity. One study also assessed pain “bothersomeness” – a measure of interference or functional impairment.⁵⁶

The timeframe over which patients were asked to rate pain intensity differed across the studies, such that some asked patients to rate their “current pain,”³⁸^,⁵⁰ their average pain over the last 24 hours,⁵⁴^,⁵⁶ their pain on the day prior to their study visit,⁵¹ their pain in the past week,⁴⁶^,⁵⁰ or their pain in the past month.⁵⁰ Several did not specify or report a timeframe,³²^,⁵⁸^,⁶⁶ and one study asked patients to rate their pain intensity before and after a hand grip test.⁴²

Oswestry Disability Index (ODI)

The ODI was developed to assess disability from acute and chronic lower back pain.¹⁸ It measures a combination of pain intensity and interference, referred to as disability, using a Likert-type scale. Scores range from 0 “no pain/interference/disability” to 5 “worst scenario of pain/interference/disability.” The ODI includes 10 items, one for pain or need for pain medications and 9 for interference in daily activities.

Ten studies that evaluated the psychometric properties of the ODI met our criteria for inclusion in this review (details of study and participant characteristics are provided in Supplemental Content, Table 4).²⁷^,³³^,⁴¹^,⁴⁷^–⁴⁹^,⁵⁷^,⁵⁹^,⁶¹^,⁶³ All studies reported using self-administered questionnaires with one administered through the mail.³³

Patient Global Impression of Change (PGIC)

The Patient Global Impression of Change (PGIC) is a Likert-type scale used to assess the respondent’s overall impression of change in pain, often following an intervention.¹⁹ Two studies that reported on the PGIC were included in our review (details of study and participant characteristics are provided in Supplemental Content, Table 4).²⁰^,⁶⁵ In one study, pain and function were assessed.⁶⁵ The other study used the scores on the PGIC to categorize whether pain intensity was improved, unchanged, or worse over a 6 month period.²⁰

PEG

The PEG is a 3-item pain questionnaire designed to quickly assess chronic pain in primary care settings. Respondents are asked about pain intensity (P), interference with enjoyment of life (E), and interference with general activity (G) in the past week. Each item is assessed on a Likert-type scale 0-10, and individual item scores are averaged. Questions on the PEG are derived from the longer, more comprehensive BPI.

Three studies included in our review evaluated the psychometric properties of the PEG (details of study and participant characteristics are provided in Supplemental Content, Table 4).²⁰^,³⁵^,³⁷ In all 3 studies, the PEG was administered by an interviewer.

Patient-Reported Outcomes Measurement Information System - Pain Interference (PROMIS-PI)

The PROMIS-PI was developed in 2004 and is used for general chronic pain conditions to examine interference related to physical functioning.²¹ PROMIS-PI consists of a 5-point Likert type scale corresponding to 1 (not at all) and 5 (very much). This PROMIS-PI can be self-administered, interview-administered, or administered through a proxy.

Five studies were included that examined the psychometric properties of the PROMIS-PI (details of study and participant characteristics are provided in Supplemental Content, Table 4).²⁸^,³⁰^,³¹^,³⁵^,⁴⁰

Roland Morris Disability Questionnaire (RMDQ)

The Roland Morris Disability Questionnaire (RMDQ) was developed in 1983 to evaluate disability and physical functioning interference from low back pain.²² The RMDQ is self-administered with 24 items scored from 0 (no disability) to 24 (severe disability). Since its origination, 11-item, 12-item, and 18-item versions have been developed.

Psychometric properties were assessed in 9 studies (details of study and participant characteristics are provided in Supplemental Content, Table 4).²⁰^,²⁹^,³⁶^,³⁷^,⁴³^,⁴⁴^,⁵²^,⁵⁵^,⁶³ All but 2 studies²⁹^,⁵² administered the 24-item version of the RMDQ. Three studies assessed multiple versions.²⁹^,⁴³^,⁵⁵

SF-36 Bodily Pain Scale (SF-36 BPS)

The SF-36 Bodily Pain Scale (SF-36 BPS) uses 2 items to assess pain intensity and interference in daily activities over the past 4 weeks.²³ The Bodily Pain Scale is one of 8 scaled scores in the SF-36, a measure of overall health status.

We included 10 studies that evaluated the psychometric properties of the SF-36 BPS in our review (details of study and participant characteristics are provided in Supplemental Content, Table 4).²⁰^,³¹^,³³^,³⁵^–³⁷^,⁴⁷^,⁵⁴^,⁵⁶^,⁶⁴ Four studies asked participants to complete the SF-36 in its entirety and reported results specific to the bodily pain scale.³¹^,³³^,⁴⁷^,⁶⁴ One study used only the pain intensity question from the bodily pain scale.⁵⁴

One study used an interviewer-administered SF-36.³⁵ The remaining studies used a self-administered questionnaire (SAQ). One of these studies specified that the questionnaire was mailed.³³

Visual Analogue Scale (VAS) for Pain

Development of the Visual Analogue Scale (VAS) dates to 1952. It is used to measure pain intensity and interference related to disability.²⁴ The VAS is composed of an incrementally measured vertical line anchored with 2 opposing descriptors, such as “no pain” and “pain as bad as can be” when measuring pain intensity. The participant then places a perpendicular line at the point that best describes their pain. A ruler is then used to indicate the score.

Ten studies were included that assessed the psychometric properties of the Visual Analogue Scale (details of study and participant characteristics are provided in Supplemental Content, Table 4).³⁴^,⁴¹^,⁴²^,⁴⁵^,⁵¹^,⁵⁷^,⁶⁰^–⁶²^,⁶⁷ One study did not specify whether the VAS was used to assess pain severity or interference.⁵⁷ In the other 9 studies, pain intensity was assessed.

Patients were asked to rate pain during the week after physical activity in one study.⁶⁰ One study asked patients to rate their pain in the last week.⁴⁵ A third study required patients to keep a VAS log of their pain for 14 days.⁶⁷ Another study asked participants to rate their pain level on the previous day.⁵¹ One study asked patients to rate their change in pain from baseline (3 month study period)³⁴ while another asked patients to rate pain prior to surgery and at 2-year follow-up.⁴¹ One study asked patients to rate pain before and after performing grip exercises.⁴² One study assessed present pain.⁶¹ Two studies did not specify a timeframe.⁵⁷^,⁶²

Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC)

The WOMAC was developed in 1982 for assessing pain severity and function in individuals with knee and hip pain associated with osteoarthritis.²⁵ Another domain, stiffness, is not addressed in this review. The index includes 24 items and can be self-administered or completed by interview. Different response formats have been used including a 5-point Likert scale, 11-point numerical rating scale, and a 100-mm visual analog scale.

We included 5 studies of the WOMAC (details of study and participant characteristics are provided in Supplemental Content, Table 4).³¹^,⁴⁶^,⁵⁰^,⁶⁰^,⁶⁴ One study assessed pain severity⁴⁶ while 4 studies used the WOMAC to assess both pain and function.

In all studies, the WOMAC was self-administered. One used a postal survey.⁵⁰ Two studies specified that participants were asked to recall pain over the past 48 hours.³¹^,⁴⁶ The others did not specify a timeframe.

Wong Faces Scale/Wong-Baker Faces Scale

Wong Faces Scale (also known as Wong-Baker Faces Scale) is an interview-administered, 6-point Likert-type scale ranging from 0 to 5 with corresponding faces.²⁶ Higher numbers represent greater pain. It was originally developed in 1985 to assess general pain intensity among children.²⁶

We included one study that measured the psychometric properties of the Wong-Baker Faces Scale (details of study and participant characteristics are provided in Supplemental Content, Table 4).⁵¹

Outcomes

Table 3 provides an overview of included pain measures and studies reporting each outcome. Of the measures that include assessment of both pain severity and pain interference, we found the greatest reporting of psychometric properties for the BPI, GCPS, MPI/WHYMPI, PEG, SF-36 BPS, and WOMAC. Of the measures that primarily assessed pain severity, we found the greatest reporting of psychometric properties for the NRS and VAS followed by the MPQ. Of the measures of pain interference, we found the greatest reporting of psychometric properties for the ODI, PROMIS-PI, and RMDQ. There was little or no reporting of psychometric properties for the DVPRS, HOOS, KOOS, PGIC, or Wong Faces Scale. Detailed psychometric data are reported in Supplemental Content, Table 5 and summarized below.

Table 3

Summary of Results: Studies Assessing Psychometric Properties of Self-Report Measures of Pain Severity (S) and Functional Interference (I) in Chronic Musculoskeletal Pain Populations.

Minimally Important Difference

We identified 8 studies that estimated MIDs of 8 separate pain measures: BPI, GCPS, NRS, ODI, PEG, RMSQ, SF-36 BPS, and VAS (Table 3, Supplemental Content, Tables 5 and 6).³³^,³⁷^,⁴¹^,⁵²^,⁵⁸^,⁶³^,⁶⁶^,⁶⁷ Six of the 8 measures assess pain intensity and interference/function (BPI, GCPS, PEG, SF-36 BPS, ODI, VAS), one (RMDQ) interference/function, and one (NRS) focused on intensity. Several methods for estimating MIDs were reported, including both distribution-based and anchor-based approaches. Distribution-based methods involve estimation of MID based on the distribution of the observed scores. Anchor-based methods use an external indicator (eg, patient rating of change) to put patients into positive change, no change, and negative change groups.⁷⁰ For each pain measure, MID estimates differed considerably depending on the estimation method used, the type of pain being studied, and the interval between evaluations. We broadly describe this outcome as minimally important difference, but note where studies describe the outcome differently.

Three studies calculated MID values for more than one pain measure.³⁷^,⁴¹^,⁶³ One US study (n=427) estimated minimal clinically important change (MCIC) for BPI, GCPS (labeled the Chronic Pain Grade [CPG] in this study), PEG, RMDQ, and SF-36 BPS over 12 months.³⁷ A distribution-based standard error of measurement (SEM) was used to estimate MCIC. The SEM was then used to categorize patients as better, the same, or worse for each measure. “Better” indicated that the score improved at least one SEM from baseline and “worse” indicated that the score worsened at least one SEM from baseline. Kappa statistics for agreement between one-SEM and an anchor of patient’s global rating classifications were fair. The measures with the best agreement were the BPI (Kappas = 0.29 and 0.34 for trial and cohort data, respectively), the GCPS intensity (Kappas = 0.35 and 0.27), and the PEG (Kappas = 0.33 and 0.23).

Another retrospective cohort study from the US estimated minimum clinically important differences (MCID) based on 4 anchor-based approaches for 47 participants undergoing surgical treatment for pseudoarthrosis-related back pain.⁴¹ MCIDs were calculated for the ODI and VAS 2 years postoperatively. The anchors were 1) patient rated global assessment with choices of ‘‘worse,’’ ‘‘unchanged,” ‘‘slightly better,” or ‘‘markedly better’’ and 2) patient rating of satisfaction with the results of their surgery (yes indicating responders, no indicating nonresponders). The 4 MCID approaches included 1) average change (average change score seen in the group defined to be responders); 2) minimum detectable change (MDC) (equal to the upper value of the 95% confidence interval for average change score seen in the cohort defined to be non-responders); 3) change difference (difference of the average change score for responders and non-responders); and 4) ROC approach (the change value that provides the greatest sensitivity and/or specificity for a positive response). For the ODI, the calculated MCIDs differed by the approach used and ranged from 2.0 points for MDC up to 8.3 points for change difference. Fewer differences were seen for the VAS, where MCIDs ranged from 2.0 to 3.2 points.

One small study from the UK (n=48) estimated MCID for the ODI, NRS, and RMDQ after a 5-week class of exercise and education among patients with low back pain.⁶³ The PGIC was used to categorize patients into groups of “improved,” “unchanged,” and “deteriorated.” An anchor-based ROC approach estimated the MCID was 4 points for the NRS and RMDQ and 8 points for the ODI.

Test-retest Reliability

Test-retest reliability, the extent to which a measure achieves the same result on 2 or more occasions when the condition is stable, was reported in 10 studies (Table 3, Supplemental Content, Table 5).¹⁷^,³⁰^,³³^,⁴²^,⁴⁸^,⁵⁰^–⁵²^,⁶¹^,⁶² Several studies reported test-retest reliability for multiple pain measures. However, measure and timeframe comparisons differed across studies, making comparative evaluation of test-retest reliability difficult.

Test-retest reliabilities, assessed with Pearson correlations or intraclass correlations, were 0.90 or higher in many studies.³³^,⁴²^,⁵⁰^,⁵¹ Pain measures evaluated in these studies included the Faces Scale, VAS, NRS, ODI, and WOMAC. There were few reports of test-retest reliabilities less than 0.80. One study evaluated test-retest reliability of the RMDQ (ICC=0.68) with approximately 3 months between assessments in patients who reported no change in work status.⁵² Another study reported test-retest using the PROMIS-PI at baseline and 3 months apart (ICC=0.58).³⁰

Inter-rater Reliability

None of the included studies reported inter-rater reliability (ie, agreement between raters).

Internal Consistency

The extent to which items in a measure are correlated and thus can be said to be measuring the same construct (ie, internal consistency) was reported in 8 studies.¹⁷^,²⁰^,³⁶^,³⁹^,⁴⁰^,⁵⁰^,⁵³^,⁵⁹ In 7 studies, Cronbach’s alpha was calculated; one calculated Spearman correlation coefficients.⁵³

In studies reporting Cronbach’s alpha, results were generally greater than 0.70, indicating good to excellent internal consistency. Pain measures evaluated include the BPI, GCPS, MPQ, ODI, PEG, PROMIS-PI, RMDQ, SF-36 BPS, WHYMPI, and WOMAC. In the study reporting Spearman correlation coefficients between elements of the BPI, values ranged from 0.38 to 0.84.⁵³

Concurrent and/or Criterion Validity

Concurrent validity is a measure of the extent to which scores on one measure relate to another measuring the same or a similar construct, while criterion validity measures a measure’s correspondence to a gold standard or another measure. Nineteen studies reported concurrent/criterion validity.¹⁷^,²⁰^,²⁹^,³¹^,³³^,³⁶^,³⁹^,⁴²^–⁴⁴^,⁴⁷^,⁴⁹^,⁵⁰^,⁵⁴^,⁵⁷^–⁵⁹^,⁶¹^,⁶⁴ Pain measures assessed for concurrent/criterion validity include the BPI, CGPS, HOOS, MPQ, NRS, ODI, PEG, PROMISPI, RMDQ, SF-36 BPS, VAS, WHYMPI/MPI, and WOMAC. Table 3 provides an overview of studies reporting this outcome; more details are presented in Supplemental Content, Table 5.

Reported correlations indicate fair to excellent concurrent and criterion validity across pain measures. Four studies provided results from multiple comparisons.²⁰^,³¹^,³⁶^,⁴⁷ Krebs et al reported correlations between the PEG and other measures ranging from 0.60 (RMDQ) to 0.89 (BPI Interference component) with similar values for correlations of the BPI Severity and BPI Interference components with other measures.²⁰ Wittink et al computed R² values; values above 0.4 were considered high overlap between measures.⁴⁷ Observed R² values ranged from 0.37 to 0.58 among the MPI pain severity and interference components, the ODI, and the SF-36 BPS. Correlations between the PROMIS-PI and the SF-36 BPS (−0.73) and the WOMAC pain subscale (0.47) were reported in one study.³¹ Keller et al reported correlations between the BPI, SF-36 BPS, RMDQ, and GCPS, with values ranging from 0.47 to 0.81.³⁶

One study reported intercorrelations (Kendall’s tau) between components of the ODI and behavioral assessments of the components.⁵⁹ The correlation of the ODI Lifting Subscale with observed lifting was −0.38. The correlation of the ODI Walking Subscale with observed walking was −0.54. The correlation of the ODI Sitting Subscale with observed sitting was −0.40.

Two studies assessed correlations between different versions of the RMDQ.²⁹^,⁴³ In one study, Computer Adaptive Test versions with 5, 7, 9, and 11 items were evaluated with respect to a 23-item version of the RMDQ. The correlations were 0.93, 0.95, 0.97, and 0.98 for the 5-, 7-, 9-, and 11-item versions, respectively.²⁹ In the other study intercorrelations were reported for the 24-, 18-, and 11-item versions, with all values greater than 0.95.⁴³

Discriminant Validity

Discriminant validity is the ability of a measure to discriminate between groups. Four studies reported discriminant validity (Table 3, Supplemental Content, Table 5).³⁰^,³³^,³⁸^,³⁹ One study evaluated the ability of the MPQ Short Form to discriminate between number of pain diagnoses and between none/mild, moderate, and severe pain as determined with the MPI Pain Severity component.³⁹ No significant difference in total MPQ Short Form score was observed between study participants with one or with 2-3 pain diagnoses. However, scores were significantly higher in the group with 4 or more diagnoses. MPQ Short Form scores were significantly different across the 3 pain severity levels.

Krebs et al evaluated the accuracy of the NRS for predicting level of pain that interferes with function (defined in the study as BPI of 5 or higher) and level of pain that motivates a physician visit.³⁸ For both outcomes, the area under the ROC curve was 0.75-0.78 (indicating “fair” accuracy) and NRS scores of 4 and above increase the probability of interference with function or a physician visit as indicated by likelihood ratios substantially greater than 1.0.

A third study reported that ODI scores differed significantly (P<.001) between groups with and without 1) high pain severity and high functional limitations and 2) chronic pain and high functional limitation.³³ Another study reported that PROMIS-PI scores differed significantly (P<.001) between those seeking worker’s compensation or not and those who had a fall in the past 3 months or not.³⁰

Responsiveness

We identified 22 studies that reported responsiveness, the ability of a measure to detect change in an outcome over time, in 14 of the 17 pain measures of interest. Details of study and population characteristics are provided in Table 3, Supplemental Content, Tables 5 and 7.²⁰^,²⁷^,²⁸^,³⁰^,³²^,³⁵^–³⁷^,⁴²^,⁴⁴^,⁴⁶^–⁴⁸^,⁵²^,⁵³^,⁵⁵^–⁵⁷^,⁶⁰^,⁶³^–⁶⁵

Two common approaches to estimating responsiveness are external and internal. Internal responsiveness reflects the ability of a measure to change over a pre-specified time interval. External responsiveness relies on an anchor or external standard which is considered independent of the pain measure (eg, patient global rating of change) to assess the agreement between change in the measure and change in the external standard. Responsiveness was calculated by a variety of metrics across studies including standardized response means (SRM) and standardized effect sizes (SES). The SRM is an effect size measure of within-group change and is calculated by taking the change of scores from time 1 to time 2 divided by the standard deviation of the change score. The studies also reported standardized effect sizes (SES), an effect size measure of between-group change which is calculated by taking the change-score means of 2 independent groups divided by the pooled the standard deviation of change. Magnitude of effect for SRM and SES are interpreted by the guidelines suggested by Cohen (0.2 is considered a small and 0.8 or greater is a large).⁷¹ Area under the curve (AUC) values estimated from ROC analyses were used by several studies to also assess probability of correctly measuring discrimination between patients who improved and those who did not. A value of 0.5 can be interpreted to be the same as chance and a value of 1.0 indicates perfect discrimination. Thirteen studies estimated external (“anchored”) responsiveness.²⁰^,²⁸^,³⁰^,³²^,³⁵^–³⁷^,⁵²^,⁵³^,⁵⁵^–⁵⁷^,⁶³

Comparative Studies

Six studies compared external responsiveness across multiple pain measures (Supplemental Content, Table 7).³⁵^–³⁸^,⁵⁶^,⁶³ Studies that determined responsiveness based on AUC values are summarized on Table 4. The remaining 2 studies calculated SRMs for responsiveness for the BPI,²⁰^,³⁶ PEG,²⁰ GCPS,³⁶ and SF-36 BPS.³⁶

Table 4

Comparative External Responsiveness based on (AUC) Values for Detecting Any Improvement.

Seven studies reported internal responsiveness for multiple pain measures (Supplemental Content, Tables 5 and 7).⁴²^,⁴⁶^–⁴⁸^,⁶⁰^,⁶⁴ Measures evaluated included the HOOS,⁶⁴ MPI,⁴⁷ MPQ,⁴⁸ NRS,⁴⁶ ODI,⁴⁷^,⁴⁸ SF-36 BPS,⁴⁷ VAS,⁶⁰ and WOMAC.⁴⁶^,⁶⁰^,⁶⁴

Measure-specific

Eleven studies reported responsiveness for one pain measure only (Supplemental Content, Tables 5 and 7).²⁷^,²⁸^,³⁰^,³²^,⁴⁴^,⁵²^,⁵³^,⁵⁵^,⁵⁷^,⁶⁰^,⁶⁵ Responsiveness varied within the individual measures, the populations, time intervals, and methods used to calculate. Pain measures included the BPI,⁴⁴^,⁵³ NRS,³² ODI,²⁷^,⁵⁷ PGIC,⁶⁵ PROMIS-PI,²⁸^,³⁰ and RMDQ.⁵²^,⁵⁵

Feasibility

Number of Items

Among the 17 pain measures reviewed, the number of items used to assess pain ranged from 1 (NRS, PGIC, VAS, Wong-Baker Faces Scale) to 78 (MPQ). The 4 single-item measures assessed different dimensions of pain including pain intensity (Faces), pain intensity and/or interference (NRS and VAS), and changes in pain (PGIC). The phrasing of questions used to elicit pain scores was not consistent across the studies included in this review, and therefore in some cases it was not clear if multiple dimensions of pain were being assessed. These single-item measures also varied in how they were administered, as both the VAS and Faces involve visual cues.

Other low-item measures include the SF-36 BPS (2 items), the PEG (3 items), the DVPRS (5 items), and the GCPS (7 items). While still brief, these measures have the advantage of measuring both pain intensity and pain interference related to function. Mid-item measures include the ODI (10 items), the BPI (17 items), the WOMAC (24 items), and the RMDQ (24 items). The ODI includes one item related to pain severity (need for analgesic medications) and the RMDQ specifically measures disability related to pain.

The pain measures with the most items are the HOOS (40 items), KOOS (42 items), PROMIS-PI (41 items), MPI (52 items), and MPQ (78 items). Though lengthy, the HOOS and KOOS are the only measures that directly assess pain of the hip and knee, respectively. The PROMIS-PI is specific to pain interference in physical functioning and has 4 short form versions that are commonly used. The MPI queries pain intensity and interference in multiple domains, including social and family functioning. The highest-item measure, the MPQ, presents the patient with a list of adjectives from which to select descriptors for their subjective pain experience rather than asking them to answer questions on a Likert-type scale. In determining which measure provides sufficient items for a given research study, the intended use of the measure and the research setting will largely determine the appropriate choice.

Mode of Administration

Desired mode of administration may also inform the appropriate choice of pain measure for research. Many of the pain measures can be self-administered or administered by an interviewer. Measures such as the KOOS and HOOS have been administered through the mail, while computer-based surveys have also been developed for the WOMAC, SF-36 BPS, PROMIS-PI, ODI, NRS, MPI, and HOOS. Four measures have been assessed for telephone administration, including the WOMAC, SF-36 BPS, ODI, and NRS.

Availability

Pain measures readily available without restrictions on use include the DVPRS, GCPS, HOOS, KOOS, NRS, PGIC, PEG, RMDQ, and VAS. The MPI and MPQ can be obtained freely and directly from the developer. Free use of the ODI is permitted for non-funded academic research and individual clinical practice. Additionally, the PROMIS-PI is freely available after registering with an assessment center and endorsing terms and conditions. Measures that require purchase or permission to use are the BPI, SF-36 BPS, Wong-Baker Faces Scale, and WOMAC.

Summary and Discussion

Key Messages

Among 17 multi-item pain measures assessed, the most complete evidence on psychometric properties in chronic musculoskeletal pain populations was found for the ODI, RMDQ, and SF-36 BPS. Several key psychometric properties were available for the BPI, GCPS, MPI/WHYMPI, MPQ, PEG, PROMIS-PI, and WOMAC. Most of these measures include both pain severity/intensity and functional impairment.
- Of the measures focused primarily on pain severity, we found the greatest reporting of psychometric properties for the NRS and VAS, followed by the MPQ.
- Of the measures of pain interference, we found the greatest reporting of psychometric properties for the ODI, PROMIS-PI, and RMDQ.
- MID assessment methods differed and were often based on statistical rather than patient-noticeable differences.
- Reliability, internal consistency, concurrent or criterion validity, discriminate validity, and responsiveness differed widely but generally were in the fair to excellent range.
- Feasibility, measured by number of items, delivery mode, and public availability differed widely. The choice of measure may depend on population/condition of interest, research questions and settings, and resources available.
Our review supplements earlier IMMPACT guidance on core outcome measures by providing recent findings on psychometric properties of measures specifically targeted for chronic musculoskeletal pain, using English language versions of measures, and including recently developed measures of pain severity and/or pain interference.
Primary psychometric research on key measures in chronic musculoskeletal pain populations was limited overall. Future research should use consistent chronic musculoskeletal pain definitions, standardized psychometric outcomes assessment, and comprehensive descriptions of patient populations.
Findings from this review can inform recommendations on specific core outcome measures for clinical research on chronic musculoskeletal pain interventions. Researchers’ final choice of measures should consider population characteristics, pain site and type, recall period of interest and intervention length, analytic goals, and study resources.

Discussion

This rapid evidence review identified published research on psychometric properties of English-language versions of 17 key patient-reported pain outcome measures assessed in chronic musculoskeletal pain populations. The ODI, RMDQ, and SF-36 BPS were the most frequently studied multi-item pain measures and had reported data for all 4 main psychometric outcomes of interest: MID, responsiveness, validity, and test-retest reliability. Each of these measures assesses interference; the SF-36 BPS, and ODI include a question about pain severity but no study reported separate outcomes for severity and interference. The BPI, GCPS, and PEG had data on MID, responsiveness, and validity. Each of these measures assess both pain severity and interference with all but one study reporting separate results for the 2 subscales of the BPI and GCPS; severity and interference are combined in the PEG. MPI/WHYMPI, MPQ, PROMIS-PI; and WOMAC had data on responsiveness, validity, and test-retest reliability. The MPQ is a measure of pain severity and the PROMIS-PI is a measure of pain interference. The MPI/WHYMPI and WOMAC include severity and interference subscales. All but one study reported separate results for those subscales.

Findings from our review supplement the work of IMMPACT⁴ and IMMPACT/OMERACT.⁶ The 2005 IMMPACT guidance on core outcome measures for chronic pain clinical trials was based on studies of any chronic pain, including cancer, dental, and neuropathic pain. The literature reviews to support the guidance included studies published through early 2003.⁴ The 2016 IMMPACT/OMERACT guidance on assessment of physical function and participation in chronic pain clinical trials identified patient-reported outcome measures of physical functioning, including 8 addressed in our review, but did not perform detailed assessments of the measures and did not make recommendations for use of specific measures.⁶

While IMMPACT provides recommendations for measures that can be used to assess pain severity and/or pain interference across a broad range of pain types,⁴ there have since been many new studies in the area of chronic musculoskeletal pain. Of 43 studies included in our review, 38 were published from 2003 to January 2017. In addition, new pain measures have been developed, notably the DVPRS, PEG, and PROMIS-PI. Therefore, our report provides updated information and a broader look at psychometric properties of measures for assessment of both pain severity and pain interference for chronic musculoskeletal pain.

Further, our findings are consistent with pain outcome measurement reviews focused on specific pain-related diagnoses or pain measures. Three reviews focused on patient-reported health outcome measures for LBP found the ODI and RMDQ to be the most comprehensively studied both for responsiveness⁷² and for other psychometric properties.⁷³^,⁷⁴ There were few data on psychometric properties of pain severity measures (ie, NRS, VAS, BPI, MPQ) commonly used in RCTs of interventions for LBP.⁷³ Another review of back-specific functional status questionnaires for LBP found the ODI and RMDQ to have been most frequently studied, with good measurement properties in their original forms as retested in multiple settings.⁷⁵ A review of studies that had evaluation of psychometric properties as a main purpose found 2 of our measures of interest, the HOOS and WOMAC, to be adequately assessed for use in patients with hip and groin disability.⁷⁶ A review of 6 studies that used the KOOS to evaluate patients undergoing total knee arthroplasty found acceptable psychometric properties.⁷⁷ None of the studies included in that review were eligible for our review due to language of publication, use of a non-English language version of the KOOS, or inadequate definition of pain duration. A review of 76 studies assessing the measurement properties of the WOMAC, predominantly in patients with hip and/or knee osteoarthritis, found acceptable reliability.⁷⁸ Few studies assessed responsiveness and MID was not an outcome of interest for that review.

For purposes of measure selection, psychometric properties must be considered alongside conceptual and practical concerns.⁷⁹ The ODI and RMDQ were developed for and most often tested in low back pain, and the WOMAC was developed for and most often tested in knee and hip pain. The BPI, GCPS, MPI/WHYMPI, MPQ, PEG, PROMIS-PI, and SF-36 BPS were designed to assess more broadly defined pain, and were tested in populations with varying chronic pain-related diagnoses. Most of these measures assess severity and functional impairment; exceptions are the MPQ (severity only) and the RMDQ and PROMIS-PI (interference only). Researchers’ choice of measures should include their research goals, such as pain site, pain type, recall period of interest and length of intervention (with respect to measure responsiveness data), analytic goals (with respect to measure range and scale), and study resources (with respect to measure feasibility, including available time and mode of administration).

Versions of the NRS and the VAS were also frequently studied with respect to the 4 key psychometric outcomes of interest. However, NRS and VAS are single-item response measures, and the associated questions to which study participants responded varied with respect to phrasing, recall periods, and score ranges. For the NRS and VAS, our evidence review was thus less a review of psychometric research on 2 clearly defined pain measures and more a cataloging of multiple single-item numeric rating-based or visual analog scale-based approaches to assessing primarily pain severity.

Challenges in Assessment of Psychometric Properties

Minimally Important Different (MID)

The range of assessment methods reflects variation in current MID-related research (Supplemental Content, Table 6). Assessments of minimum clinically important difference (MCID) for a patient-reported outcome measure should ideally involve anchoring the measure to an indicator of meaningful patient-reported change in a clinical outcome.⁷⁰^,⁸⁰^,⁸¹ While some MID estimates reported here constitute MCIDs anchored to patient-reported clinical improvement via adaptations of the Patient Global Impression of Change (PGIC),³⁷^,⁴¹^,⁵⁸^,⁶³^,⁶⁶ others are purely estimates of statistical minimum detectable change (MDC) based on study population distribution characteristics³³^,⁵²^,⁶⁷ without reference to clinical import of that change. Comparing anchor-based MCID findings with distribution-based MDC findings can be useful in MID estimation, as this allows researchers to consider both an external benchmark of clinical change and a measure of change detectable despite variation.⁵⁸^,⁷⁰^,⁸⁰ Reviewed studies, however, contained relatively few estimates via any method, precluding comparison and generalization of measure-specific MIDs. MIDs for patient-reported measures are likely to vary based on the constructs assessed by each measure, as well as by patient population, study design, and baseline measure value. It is possible that widespread application of a 30% change from baseline as an MID, originally assessed using an NRS for pain severity¹⁹ and ultimately recommended for a range of patient-reported pain outcome measures,⁸² has discouraged measure-specific MID development. Further research could explore whether the broadly adopted figure of a 30% change from baseline is empirically generalizable across patient-reported outcome measures in chronic musculoskeletal pain studies and populations. Consensus is needed on optimal approaches to developing and reporting MID for patient-reported measures in chronic musculoskeletal pain.

Validity

There is no gold standard comparator for assessment of pain measure validity in the domains assessed. Most included studies’ methods of assessing concurrent/criterion validity involved finding correlations between a measure of interest and another measure or subscale of interest. Perhaps unsurprisingly, therefore, our review identified a self-referential network of patient-reported outcome measures validated against one another. Other assessments arguably relevant to construct validity, such as relationships of self-reported pain-related functioning measures to objective physical performance measures, were less commonly identified, consistent with the state of current physical function research in pain.⁶ Estimates of measure validity are difficult to compare within or across measures in this review. Future research could further investigate the network of validity comparisons between measures of interest, to clarify underlying assumptions that support the validity of these measures and to identify gaps requiring conceptual research.

Responsiveness

Responsiveness findings in reviewed studies are challenging to compare both within and across measures (Supplemental Content, Table 7). Some methods of comparing pain measure changes within clinical trials of pain interventions cannot separate the effectiveness of a pain intervention from the responsiveness of the pain measure used to assess it. Few methods recognize the inherent challenge that short-term fluctuations in pain, which commonly occur in chronic musculoskeletal pain conditions, pose to the capacity of pre-post assessments to track pain trajectory over time. Further, included pain measures have a wide range of recall periods (from 24 hours for the RMDQ to 4 weeks for the SF-36 BPS), and reviewed studies have a range of time periods between assessment points. Clinical researchers interested in comparing measures’ responsiveness should consider available psychometric evidence in the context of their own work, including the recall period of interest, the expected amount and timeframe of change in the pain domains they plan to assess, and their desired study design (eg, pre-post assessment vs longitudinal repeated-measures assessment).

Test-retest Reliability

Interpreting test-retest reliability estimates has conceptual challenges similar to those of responsiveness: it can be difficult to separate undesirable variability in a measure from variability that reflects actual fluctuations in subjective pain constructs, and can thus be difficult to determine the optimal test-retest reliability interval for a given measure. A short-term fluctuation in a measure may not indicate a lack of test/retest reliability, and may in fact be evidence of responsiveness to true changes in pain course. As with responsiveness, we recommend that researchers interested in specific measures’ reliability consider reliability-related timeframes and design features in the context of their own work.

Limitations and Future Research

Limitations of a Rapid Evidence Review

Rapid evidence review development requires streamlining the scope of literature search and eligibility criteria, and language and date restrictions are among current best practice recommendations.⁸³^–⁸⁵ Our review was limited to studies that assessed measures or published results in English. However, this decision was also influenced and supported by evidence on the limited generalizability of self-report measures’ psychometric properties derived in languages other than that of the intended population,⁸⁶^,⁸⁷ and highlights the need for linguistic and cultural validation of pain measures. With respect to search strategy, our primary abstract search was limited to dates from 2000 onward. We complemented this, however, by hand-searching reference lists of included studies and relevant reviews, searching websites of each specific pain measure, and by querying experts for supplementary suggestions. We included identified eligible articles regardless of date, though we acknowledge that we may have missed a relevant publication. Our criteria may have excluded some studies of psychometric properties of measures developed and validated prior to the popularization of specifying chronicity and duration of pain. Researchers considering such pain measures will need to consider the relevance of past psychometric work in the context of current conceptual pain research, and of their planned studies’ objectives and target populations. We excluded studies that enrolled patients with chronic musculoskeletal conditions commonly associated with pain but did not specify that enrolled patients had chronic pain (eg, radiologically defined osteoarthritis). In addition to a decision based on scope we believe this is justifiable scientifically as it is not clear if individuals in these studies had chronic pain, and some of these studies specifically noted that patients either did not have pain or had acute or subacute pain. We also excluded trials of interventions for pain that did not note assessment of psychometric properties in the abstract. Our focus was on primary psychometric research on the pain measures of interest, and accordingly our search required psychometric properties to be mentioned in the abstract. It is possible that this search approach did not identify some psychometric assessments embedded in studies that used the measures of interest as primary clinical outcomes. However, we believe it is unlikely that this decision excluded a large body of relevant information and took steps to address this concern within the scope of a rapid review. For example, our search of included studies from other similar evidence reviews and query of specific measures websites failed to identify trials that did not describe psychometric properties in the abstract.

Chronic Musculoskeletal Pain Definition and Reporting

Chronic musculoskeletal pain definition and reporting differed widely across reviewed studies. The required duration for pain to be considered “chronic” was inconsistent, and was not always reported. Pain type (eg, musculoskeletal), primary diagnostic cause (eg, osteoarthritis), and primary bodily site(s) (eg, low back) were inconsistently reported. In some studies, pain-related diagnoses or bodily pain sites were reported without reference to the existence of pain duration or chronicity (eg, radiologically defined osteoarthritis); these studies did not meet inclusion criteria for this review. We also found inconsistent reporting of pain-relevant participant characteristics such as pain duration at baseline, baseline level of relevant pain domains, current use of pharmacological and/or non-pharmacological treatments, and co-existing physical or mental health conditions. Such differences in chronic musculoskeletal pain definition and reporting reflect active discussion in current pain research: when and how duration affects key pain qualities, when and how causal diagnoses and bodily site affect key pain qualities, and when and how intermittent pain differs meaningfully from chronic continuous pain.¹⁰^,⁸⁸ These conceptual uncertainties underlie the wide range of approaches to defining target populations for pain studies. Research is needed to define target populations for psychometric research on measures for use in chronic musculoskeletal pain, as well as standards for reporting of pain duration, relevant diagnoses, and bodily sites. Additional work is needed to define target populations for psychometric research on measures for use in chronic musculoskeletal pain, as well as standards for reporting of chronic musculoskeletal pain duration, relevant diagnoses, and pain sites.

Study Populations

Most studies were conducted in populations with over 50% women and mean ages 40-59. Most studies did not report race or ethnicity; of those that did, all included more than 50% white participants, and most included more than 75% white participants. No studies reported outcomes stratified by sex or gender, age range, or race/ethnicity. Generalizability of psychometric findings is thus limited by both demographic underreporting and population homogeneity. Given substantial evidence of the influence of age and psychosocial factors on individuals’ experiences and reporting of both pain-related functional impairment and pain severity,⁸⁷^,⁸⁹^–⁹¹ there is a need for consensus on key study population demographic and clinical characteristics, more consistent reporting of these population characteristics within studies, and further research on how measures’ psychometric properties generalize or change across age ranges and psychosocial categories.

Applicability to VHA Research

Our findings are highly applicable to research on chronic musculoskeletal pain in the VA population. Four studies enrolled only Veterans¹⁷^,³⁵^,³⁹^,⁴⁴ and 2 included Veterans.²⁰^,³⁷ These studies evaluated psychometric properties of several of the pain measures that overall had substantial evidence, including the BPI, MPI/WHYMPI, MPQ, PEG, PROMIS-PI, RMDQ, and SF-36 BPS.

The chronic musculoskeletal pain conditions are representative of conditions seen in a Veteran population, with measurement of back, knee, and hip pain most common. Mean ages of study participants ranged from 32 to 80 years. However, studies, other than those of Veterans, included a large percentage of women and studies reporting race/ethnicity, most from the US, enrolled a high percentage of white individuals. Additional methods work is needed in broader populations and for more consistent and complete demographic reporting.

Conclusions

Among multi-item pain measures assessed, the most complete evidence on psychometric properties of interest within chronic musculoskeletal pain populations was found for the ODI, RMDQ and SF-36 BPS, while several additional measures (BPI, GCPS, MPI/WHYMPI, MPQ, PEG, PROMIS-PI, and WOMAC) also had evidence for several of the key psychometric properties. Most of these measures include both pain severity/intensity and functional impairment. In addition to evidence on psychometric properties, choice of pain outcome measures for a specific research study must consider both conceptual elements (eg, pain domains of interest, pain sites and diagnoses, time course, and population characteristics) and practical concerns (eg, burden to complete, mode of assessment, cost). Limitations of current chronic musculoskeletal pain measurement research relate to variations in (1) definition and reporting of chronic musculoskeletal pain and pain-related diagnoses, (2) methods of assessing psychometric outcomes, and (3) reporting on demographics of patient populations. Findings from this review can inform recommendations on specific core outcome measures for clinical research on chronic musculoskeletal pain interventions. Further methods research is needed to validate patient-reported pain outcome measures in populations with chronic musculoskeletal pain and develop a framework for determining outcome measurement selection that incorporates feasibility and applicability.

References

1.: Butchart A, Kerr EA, Heisler M, Piette JD, Krein SL. Experience and management of chronic pain among patients with other complex chronic conditions. Clin J Pain. 2009;25(4):293–298. [PMC free article: PMC2709743] [PubMed: 19590477]

2.: Gereau RWt, Sluka KA, Maixner W, et al A pain research agenda for the 21st century. J Pain. 2014;15(12):1203–1214. [PMC free article: PMC4664454] [PubMed: 25419990]

3.: Department of Health and Human Services. National pain strategy: a comprehensive population health-level strategy for pain. 2015. Available at: https://iprcc.nih.gov/docs/HHSNational_Pain_Strategy.pdf. Accessed 1 August 2017.

4.: Dworkin RH, Turk DC, Farrar JT, et al Core outcome measures for chronic pain clinical trials: IMMPACT recommendations. Pain. 2005;113(1-2):9–19. [PubMed: 15621359]

5.: Dworkin RH, Turk DC, McDermott MP, et al Interpreting the clinical importance of group differences in chronic pain clinical trials: IMMPACT recommendations. Pain. 2009;146(3):238–244. [PubMed: 19836888]

6.: Taylor AM, Phillips K, Patel KV, et al Assessment of physical function and participation in chronic pain clinical trials: IMMPACT/OMERACT recommendations. Pain. 2016;157(9):1836–1850. [PMC free article: PMC7453823] [PubMed: 27058676]

7.: Turk DC, Dworkin RH, Burke LB, et al Developing patient-reported outcome measures for pain clinical trials: IMMPACT recommendations. Pain. 2006;125(3):208–215. [PubMed: 17069973]

8.: Turk DC, Dworkin RH, McDermott MP, et al Analyzing multiple endpoints in clinical trials of pain treatments: IMMPACT recommendations. Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials. Pain. 2008;139(3):485–493. [PubMed: 18706763]

9.: Turk DC, Dworkin RH, Revicki D, et al Identifying important outcome domains for chronic pain clinical trials: an IMMPACT survey of people with pain. Pain. 2008;137(2):276–285. [PubMed: 17937976]

10.: Younger J, McCue R, Mackey S. Pain outcomes: a brief review of instruments and techniques. Curr Pain Headache Rep. 2009;13(1):39–43. [PMC free article: PMC2891384] [PubMed: 19126370]

11.: Cleeland CS, Ryan KM. Pain assessment: global use of the Brief Pain Inventory. Ann Acad Med Singapore. 1994;23(2):129–138. [PubMed: 8080219]

12.: Buckenmaier CC, 3rd, Galloway KT, Polomano RC, McDuffie M, Kwon N, Gallagher RM. Preliminary validation of the Defense and Veterans Pain Rating Scale (DVPRS) in a military population. Pain Med. 2013;14(1):110–123. [PubMed: 23137169]

13.: Von Korff M, Ormel J, Keefe FJ, Dworkin SF. Grading the severity of chronic pain. Pain. 1992;50(2):133–149. [PubMed: 1408309]

14.: Klassbo M, Larsson E, Mannevik E. Hip disability and osteoarthritis outcome score: An extension of the Western Ontario and McMaster Universities Osteoarthritis Index. Scand J Rheumatol. 2003;32:46–51. [PubMed: 12635946]

15.: Roos EM, Lohmander LS. The Knee injury and Osteoarthritis Outcome Score (KOOS): from joint injury to osteoarthritis. Health Qual Life Outcomes. 2003;1:64. [PMC free article: PMC280702] [PubMed: 14613558]

16.: McCaffery M, Beebe A. Pain: Clinical Manural for Nursing Practice. St. Louis, MO: Mosby, 1989.

17.: Kerns RD, Turk DC, Rudy TE. The West Haven-Yale Multidimensional Pain Inventory (WHYMPI). Pain. 1985;23(4):345–356. [PubMed: 4088697]

18.: Smeets R, Koke A, Lin CW, Ferreira M, Demoulin C. Measures of function in low back pain/disorders: Low Back Pain Rating Scale (LBPRS), Oswestry Disability Index (ODI), Progressive Isoinertial Lifting Evaluation (PILE), Quebec Back Pain Disability Scale (QBPDS), and Roland-Morris Disability Questionnaire (RDQ). Arthritis Care Res. 2011;63 Suppl 11:S158–173. [PubMed: 22588742]

19.: Farrar JT, Young JP, Jr., LaMoreaux L, Werth JL, Poole RM. Clinical importance of changes in chronic pain intensity measured on an 11-point numerical pain rating scale. Pain. 2001;94(2):149–158. [PubMed: 11690728]

20.: Krebs EE, Lorenz KA, Bair MJ, et al Development and initial validation of the PEG, a three-item scale assessing pain intensity and interference. J Gen Intern Med. 2009;24(6):733–738. [PMC free article: PMC2686775] [PubMed: 19418100]

21.: Pain Interference: A brief guide to the PROMIS Pain Interference instruments. 2015. Available at: https://www.assessmentcenter.net/documents/PROMIS%20Pain%20Interference%20Scoring%20Manual.pdf. Accessed 1 August 2017.

22.: Roland MO, Morris RW. A study of the natural history of back pain. Part 1: Development of a reliable and sensitive measure of disability in low back pain. Spine. 1983;8:141–144. [PubMed: 6222486]

23.: Ware JE, Jr., Gandek B. Overview of the SF-36 Health Survey and the International Quality of Life Assessment (IQOLA) Project. J Clin Epidemiol. 1998;51(11):903–912. [PubMed: 9817107]

24.: Wewers ME, Lowe NK. A critical review of visual analogue scales in the measurement of clinical phenomena. Res Nurs Health. 1990;13(4):227–236. [PubMed: 2197679]

25.: American College of Rheumatology. Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC). Available at: http://www.rheumatology.org/I-Am-A/Rheumatologist/Research/Clinician-Researchers/Western-Ontario-McMaster-Universities-Osteoarthritis-Index-WOMAC. Accessed 1 August 2017.

26.: Wong-Baker FACES^® History. Available at: http://wongbakerfaces.org/us/wong-baker-faces-history/. Accessed 1 August 2017.

27.: Anagnostis C, Gatchel RJ, Mayer TG. The pain disability questionnaire: a new psychometrically sound measure for chronic musculoskeletal disorders. Spine. 2004;29(20):2290–2302. [PubMed: 15480144]

28.: Askew RL, Cook KF, Revicki DA, Cella D, Amtmann D. Evidence from diverse clinical populations supported clinical validity of PROMIS pain interference and pain behavior. J Clin Epidemiol. 2016;73:103–111. [PMC free article: PMC4957699] [PubMed: 26931296]

29.: Cook KF, Choi SW, Crane PK, Deyo RA, Johnson KL, Amtmann D. Letting the CAT out of the bag: comparing computer adaptive tests and an 11-item short form of the Roland-Morris Disability Questionnaire. Spine. 2008;33(12):1378–1383. [PMC free article: PMC2671199] [PubMed: 18496352]

30.: Deyo RA, Katrina R, Buckley DI, et al Performance of a Patient Reported Outcomes Measurement Information System (PROMIS) Short Form in older adults with chronic musculoskeletal pain. Pain Med. 2016;17(2):314–324. [PMC free article: PMC6281027] [PubMed: 26814279]

31.: Driban JB, Morgan N, Price LL, Cook KF, Wang C. Patient-Reported Outcomes Measurement Information System (PROMIS) instruments among individuals with symptomatic knee osteoarthritis: a cross-sectional study of floor/ceiling effects and construct validity. BMC Musculoskelet Disord. 2015;16:253. [PMC free article: PMC4570513] [PubMed: 26369412]

32.: Godil SS, Parker SL, Zuckerman SL, Mendenhall SK, McGirt MJ. Accurately measuring the quality and effectiveness of cervical spine surgery in registry efforts: determining the most valid and responsive instruments. Spine J. 2015;15(6):1203–1209. [PubMed: 24076442]

33.: Hicks GE, Manal TJ. Psychometric properties of commonly used low back disability questionnaires: are they useful for older adults with low back pain? Pain Med. 2009;10(1):85–94. [PMC free article: PMC5323267] [PubMed: 19222773]

34.: Jensen MP, Schnitzer TJ, Wang H, Smugar SS, Peloso PM, Gammaitoni A. Sensitivity of single-domain versus multiple-domain outcome measures to identify responders in chronic low-back pain: pooled analysis of 2 placebo-controlled trials of etoricoxib. Clin J Pain. 2012;28(1):1–7. [PubMed: 21705875]

35.: Kean J, Monahan PO, Kroenke K, et al Comparative responsiveness of the PROMIS Pain Interference Short Forms, Brief Pain Inventory, PEG, and SF-36 Bodily Pain Subscale. MedCare. 2016;54(4):414–421. [PMC free article: PMC4792763] [PubMed: 26807536]

36.: Keller S, Bann CM, Dodd SL, Schein J, Mendoza TR, Cleeland CS. Validity of the Brief Pain Inventory for use in documenting the outcomes of patients with noncancer pain. Clin J Pain. 2004;20:309–318. [PubMed: 15322437]

37.: Krebs EE, Bair MJ, Damush TM, Tu W, Wu J, Kroenke K. Comparative responsiveness of pain outcome measures among primary care patients with musculoskeletal pain. Med Care. 2010;48(11):1007–1014. [PMC free article: PMC4876043] [PubMed: 20856144]

38.: Krebs EE, Carey TS, Weinberger M. Accuracy of the pain numeric rating scale as a screening test in primary care. J Gen Intern Med. 2007;22(10):1453–1458. [PMC free article: PMC2305860] [PubMed: 17668269]

39.: Lovejoy TI, Turk DC, Morasco BJ. Evaluation of the psychometric properties of the revised short-form McGill Pain Questionnaire. J Pain. 2012;13(12):1250–1257. [PMC free article: PMC3513374] [PubMed: 23182230]

40.: Merriwether EN, Rakel BA, Zimmerman MB, et al Reliability and construct validity of the Patient-Reported Outcomes Measurement Information System (PROMIS) Instruments in women with fibromyalgia. Pain Med. 2016. [PMC free article: PMC6279305] [PubMed: 27561310]

41.: Parker SL, Adogwa O, Mendenhall SK, et al Determination of minimum clinically important difference (MCID) in pain, disability, and quality of life after revision fusion for symptomatic pseudoarthrosis. Spine J. 2012;12(12):1122–1128. [PubMed: 23158968]

42.: Sindhu BS, Shechtman O, Tuckey L. Validity, reliability, and responsiveness of a digital version of the visual analog scale. J Hand Ther. 2011;24(4):356–363; quiz 364. [PubMed: 21820864]

43.: Stroud MW, McKnight PE, Jensen MP. Assessment of self-reported physical activity in patients with chronic pain: development of an abbreviated Roland-Morris disability scale. J Pain. 2004;5(5):257–263. [PubMed: 15219257]

44.: Tan G, Jensen MP, Thornby JI, Shanti BF. Validation of the Brief Pain Inventory for chronic nonmalignant pain. J Pain. 2004;5(2):133–137. [PubMed: 15042521]

45.: Tong HC, Geisser ME, Ignaczak AP. Ability of early response to predict discharge outcomes with physical therapy for chronic low back pain. Pain Pract. 2006;6(3):166–170. [PubMed: 17147593]

46.: Trudeau J, Van Inwegen R, Eaton T, et al Assessment of pain and activity using an electronic pain diary and actigraphy device in a randomized, placebo-controlled crossover trial of celecoxib in osteoarthritis of the knee. Pain Pract. 2015;15(3):247–255. [PubMed: 24494935]

47.: Wittink H, Turk DC, Carr DB, Sukiennik A, Rogers W. Comparison of the redundancy, reliability, and responsiveness to change among SF-36, Oswestry Disability Index, and Multidimensional Pain Inventory. Clini J Pain. 2004;20(3):133–142. [PubMed: 15100588]

48.: Burnham R, Stanford G, Gray L. An assessment of a short composite questionnaire designed for use in an interventional spine pain management setting. PM R. 2012;4(6):413–418; quiz 418. [PubMed: 22732153]

49.: Mikail SF, DuBreuil S, D’eon JL. A Comparative Analysis of Measures Used in the Assessment of Chronic Pain Patients. Psychol Assess. 1993;5(1):117–120.

50.: Pinsker E, Inrig T, Daniels TR, Warmington K, Beaton DE. Reliability and validity of 6 measures of pain, function, and disability for ankle arthroplasty and arthrodesis. Foot Ankle Int. 2015;36(6):617–625. [PubMed: 25652665]

51.: Gallasch CH, Alexandre NM. The measurement of musculoskeletal pain intensity: a comparison of four methods. Rev Gaucha Enfer. 2007;28(2):260–265. [PubMed: 17907648]

52.: Chansirinukor W, Maher CG, Latimer J, Hush J. Comparison of the functional rating index and the 18-item Roland-Morris Disability Questionnaire: responsiveness and reliability. Spine. 2005;30(1):141–145. [PubMed: 15626994]

53.: Chien CW, Bagraith KS, Khan A, Deen M, Strong J. Comparative responsiveness of verbal and numerical rating scales to measure pain intensity in patients with chronic pain. J Pain. 2013;14(12):1653–1662. [PubMed: 24290445]

54.: Kamper SJ, Grootjans SJ, Michaleff ZA, Maher CG, McAuley JH, Sterling M. Measuring pain intensity in patients with neck pain: does it matter how you do it? Pain Pract. 2015;15(2):159–167. [PubMed: 24433369]

55.: Macedo LG, Maher CG, Latimer J, Hancock MJ, Machado LA, McAuley JH. Responsiveness of the 24-, 18- and 11-item versions of the Roland Morris Disability Questionnaire. Eur Spine J. 2011;20(3):458–463. [PMC free article: PMC3048224] [PubMed: 21069545]

56.: Stewart M, Maher CG, Refshauge KM, Bogduk N, Nicholas M. Responsiveness of pain and disability measures for chronic whiplash. Spine. 2007;32(5):580–585. [PubMed: 17334294]

57.: Changulani M, Shaju A. Evaluation of responsiveness of Oswestry low back pain disability index. Arch Orthop Trauma Surg. 2009;129(5):691–694. [PubMed: 18521617]

58.: de Vet HC, Ostelo RW, Terwee CB, et al Minimally important change determined by a visual method integrating an anchor-based and a distribution-based approach. Qual Life Res. 2007;16(1):131–142. [PMC free article: PMC2778628] [PubMed: 17033901]

59.: Fisher K, Johnston M. Validation of the Oswestry Low Back Pain Disability Questionnaire, its sensitivity as a measure of change following treatment and its relationship with other aspects of the chronic pain experience. Physiother Theory Pract. 1997;13:67–80.

60.: Gentelle-Bonnassies S, Le Claire P, Mezieres M, Ayral X, Dougados M. Comparison of the responsiveness of symptomatic outcome measures in knee osteoarthritis. Arthritis Care Res. 2000;13(5):280–285. [PubMed: 14635296]

61.: Gronblad M, Hupli M, Wennerstrand P, et al Intercorrelation and test-retest reliability of the Pain Disability Index (PDI) and the Oswestry Disability Questionnaire (ODQ) and their correlation with pain intensity in low back pain patients. Clin J Pain. 1993;9:189–195. [PubMed: 8219519]

62.: Lund I, Lundeberg T, Sandberg L, Budh CN, Kowalski J, Svensson E. Lack of interchangeability between visual analogue and verbal rating pain scales: a cross sectional description of pain etiology groups. BMC Med Res Methodol. 2005;5:31. [PMC free article: PMC1274324] [PubMed: 16202149]

63.: Maughan EF, Lewis JS. Outcome measures in chronic low back pain. Eur Spine J. 2010;19:1484–1494. [PMC free article: PMC2989277] [PubMed: 20397032]

64.: Nilsdotter AK, Lohmander LS, Klassbo M, Roos EM. Hip disability and osteoarthritis outcome score (HOOS)--validity and responsiveness in total hip replacement. BMC Musculoskelet Disord. 2003;4:10. [PMC free article: PMC161815] [PubMed: 12777182]

65.: Scott W, McCracken LM. Patients’ impression of change following treatment for chronic pain: global, specific, a single dimension, or many? J Pain. 2015;16(6):518–526. [PubMed: 25746196]

66.: van der Roer N, Ostelo RW, Bekkering GE, van Tulder MW, de Vet HC. Minimal clinically important change for pain intensity, functional status, and general health status in patients with nonspecific low back pain. Spine. 2006;31(5):578–582. [PubMed: 16508555]

67.: van Grootel RJ, van der Bilt A, van der Glas HW. Long-term reliable change of pain scores in individual myogenous TMD patients. Eur J Pain. 2007;11(6):635–643. [PubMed: 17118682]

68.: Polomano RC, Galloway KT, Kent ML, et al Psychometric testing of the defense and veterans pain rating scale (DVPRS): a new pain scale for military population. Pain Med. 2016;17:1505–1519. [PubMed: 27272528]

69.: Ornetti P, Parratte S, Gossec L, et al Cross-cultural adaptation and validation of the French version of the Knee injury and Osteoarthritis Outcome Score (KOOS) in knee osteoarthritis patients. Osteoarthritis Cartilage. 2008;16:423–428. [PubMed: 17905602]

70.: Revicki D, Hays RD, Cella D, Sloan J. Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. J Clin Epidemiol. 2008;61(2):102–109. [PubMed: 18177782]

71.: Cohen J. Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates; 1988.

72.: Cleland J, Gillani R, Bienen EJ, Sadosky A. Assessing dimensionality and responsiveness of outcomes measures for patients with low back pain. Pain Pract. 2011;11(1):57–69. [PubMed: 20602714]

73.: Chapman JR, Norvell DC, Hermsmeyer JT, et al Evaluating common outcomes for measuring treatment success for chronic low back pain. Spine. 2011;36(21 Suppl):S54–68. [PubMed: 21952190]

74.: Rocchi MB, Sisti D, Benedetti P, Valentini M, Bellagamba S, Federici A. Critical comparison of nine different self-administered questionnaires for the evaluation of disability caused by low back pain. Eura Medicophys. 2005;41(4):275–281. [PubMed: 16474281]

75.: Grotle M, Brox J, Vollestad N. Functional status and disability questionnaires: what do they assess?: a systematic review of back-specific outcome questionnaires. Spine. 2005;30(1):130–140. [PubMed: 15626993]

76.: Thorborg K, Roos EM, Bartels EM, Petersen J, Holmich P. Validity, reliability and responsiveness of patient-reported outcome questionnaires when assessing hip and groin disability: a systematic review. Br J Sports Med. 2010;44:1186–1196. [PubMed: 19666629]

77.: Peer MA, Lane J. The Knee Injury and Osteoarthritis Outcome Score (KOOS): a review of its psychometric properties in people undergoing total knee arthroplasty. J Orthop Sports Phys Ther. 2013;43(1):20–28. [PubMed: 23221356]

78.: Gandek B. Measurement properties of the Western Ontario and McMaster Universities Osteoarthritis Index: a systematic review. Arthritis Care Res. 2015;67(2):216–229. [PubMed: 25048451]

79.: Mokkink LB, Prinsen CA, Bouter LM, Vet HC, Terwee CB. The COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) and how to select an outcome measurement instrument. Braz J Phys Ther. 2016;20(2):105–113. [PMC free article: PMC4900032] [PubMed: 26786084]

80.: Crosby RD, Kolotkin RL, Williams GR. Defining clinically meaningful change in health-related quality of life. J Clin Epidemiol. 2003;56(5):395–407. [PubMed: 12812812]

81.: Turner D, Schunemann HJ, Griffith LE, et al The minimal detectable change cannot reliably replace the minimal important difference. J Clin Epidemiol. 2010;63(1):28–36. [PubMed: 19800198]

82.: Dworkin RH, Turk DC, Wyrwich KW, et al Interpreting the clinical importance of treatment outcomes in chronic pain clinical trials: IMMPACT recommendations. J Pain. 2008;9(2):105–121. [PubMed: 18055266]

83.: Haby MM, Chapman E, Clark R, Barreto J, Reveiz L, Lavis JN. What are the best methodologies for rapid reviews of the research evidence for evidence-informed decision making in health policy and practice: a rapid review. Health Res Policy Syst. 2016;14(1):83. [PMC free article: PMC5123411] [PubMed: 27884208]

84.: Tricco AC, Antony J, Zarin W, et al A scoping review of rapid review methods. BMC Medicine. 2015;13:224. [PMC free article: PMC4574114] [PubMed: 26377409]

85.: Tricco AC, Zarin W, Antony J, et al An international survey and modified Delphi approach revealed numerous rapid review methods. J Clin Epidemiol. 2016;70:61–67. [PubMed: 26327490]

86.: Beaton D, BscOt M, Bombardier C, et al Guidelines for the process of cross-cultural adaptation of self-report measures. Spine. 2000;25(24):3186–3191. [PubMed: 11124735]

87.: Booker SS, Herr K. The state-of-“cultural validity” of self-report pain assessment tools in diverse older adults. Pain Med. 2014;16(2):232–239. [PubMed: 25219949]

88.: Von Korff M. Assessment of chronic pain in epidemiological and health services research. New York: Guilford Publications; 2011.

89.: Fillingim RB, King CD, Ribeiro-Dasilva MC, Rahim-Williams B, Riley JL, 3rd. Sex, gender, and pain: a review of recent clinical and experimental findings. J Pain. 2009;10(5):447–485. [PMC free article: PMC2677686] [PubMed: 19411059]

90.: Kroenke K, Spitzer RL. Gender differences in the reporting of physical and somatoform symptoms. Psychosom Med. 1998;60(2):150–155. [PubMed: 9560862]

91.: Tait RC, Chibnall JT. Racial/ethnic disparities in the assessment and treatment of pain: psychosocial perspectives. Am Psychol. 2014;69(2):131–141. [PubMed: 24547799]

Supplemental Table 1

Search Strategy.

Supplemental Table 2

Peer Review Comments/Author Responses.

Supplemental Table 3

Characteristics of Included Pain Measurement Scales.

Supplemental Table 4

Study Characteristics.

Supplemental Table 5

Outcomes Reported.

Supplemental Table 6

Summary of Minimally Important Difference Outcomes.

Supplemental Table 7

Summary of Responsiveness Outcomes.

References

1.: Cleeland CS, Ryan KM. Pain assessment: global use of the Brief Pain Inventory. Ann Acad Med Singapore. 1994;23(2):129–138. [PubMed: 8080219]

2.: Buckenmaier CC, 3rd, Galloway KT, Polomano RC, McDuffie M, Kwon N, Gallagher RM. Preliminary validation of the Defense and Veterans Pain Rating Scale (DVPRS) in a military population. Pain Med. 2013;14(1):110–123. [PubMed: 23137169]

3.: Hawker GA, Mian S, Kendzerska T, French M. Measures of adult pain: Visual Analog Scale for Pain (VAS Pain), Numeric Rating Scale for Pain (NRS Pain), McGill Pain Questionnaire (MPQ), Short-Form McGill Pain Questionnaire (SF-MPQ), Chronic Pain Grade Scale (CPGS), Short Form-36 Bodily Pain Scale (SF-36 BPS), and Measure of Intermittent and Constant Osteoarthritis Pain (ICOAP). Arthritis Care Res. 2011;63 Suppl 11:S240–252. [PubMed: 22588748]

4.: Von Korff M, Ormel J, Keefe FJ, Dworkin SF. Grading the severity of chronic pain. Pain. 1992;50(2):133–149. [PubMed: 1408309]

5.: Klassbo M, Larsson E, Mannevik E. Hip disability and osteoarthritis outcome score: An extension of the Western Ontario and McMaster Universities Osteoarthritis Index. Scand J Rheumatol. 2003;32:46–51. [PubMed: 12635946]

6.: Roos EM, Lohmander LS. The Knee injury and Osteoarthritis Outcome Score (KOOS): from joint injury to osteoarthritis. Health Qual Life Outcomes. 2003;1:64. [PMC free article: PMC280702] [PubMed: 14613558]

7.: Burckhardt CS, Jones KD. Adult Measures of Pain: The McGill Pain Questionnaire (MPQ), Rheumatoid Arthritis Pain Scale (RAPS), Short-Form McGill Pain Questionnaire (SFMPQ), Verbal Descriptive Scale (VDS), Visual Analog Scale (VAS), and West Haven-Yale Multidisciplinary Pain Inventory (WHYMPI). Arthritis Rheum. 2003;49(5S):S96–S104.

8.: McCaffery M, Beebe A. Pain: Clinical Manural for Nursing Practice. St. Louis, MO: Mosby, 1989.

9.: Kerns RD, Turk DC, Rudy TE. The West Haven-Yale Multidimensional Pain Inventory (WHYMPI). Pain. 1985;23(4):345–356. [PubMed: 4088697]

10.: Smeets R, Koke A, Lin CW, Ferreira M, Demoulin C. Measures of function in low back pain/disorders: Low Back Pain Rating Scale (LBPRS), Oswestry Disability Index (ODI), Progressive Isoinertial Lifting Evaluation (PILE), Quebec Back Pain Disability Scale (QBPDS), and Roland-Morris Disability Questionnaire (RDQ). Arthritis Care Res. 2011;63 Suppl 11:S158–173. [PubMed: 22588742]

11.: Farrar JT, Young JP, Jr., LaMoreaux L, Werth JL, Poole RM. Clinical importance of changes in chronic pain intensity measured on an 11-point numerical pain rating scale. Pain. 2001;94(2):149–158. [PubMed: 11690728]

12.: Krebs EE, Lorenz KA, Bair MJ, et al Development and initial validation of the PEG, a three-item scale assessing pain intensity and interference. J Gen Intern Med. 2009;24(6):733–738. [PMC free article: PMC2686775] [PubMed: 19418100]

13.: Pain Interference: A brief guide to the PROMIS Pain Interference instruments. 2015. Available at: https://www.assessmentcenter.net/documents/PROMIS%20Pain%20Interference%20Scoring%20Manual.pdf; Accessed 1 August 2017.

14.: Cella D, Riley W, Stone A, et al The Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005-2008. J Clin Epidemiol. 2010;63(11):1179–1194. [PMC free article: PMC2965562] [PubMed: 20685078]

15.: Roland MO, Morris RW. A study of the natural history of back pain. Part 1: Development of a reliable and sensitive measure of disability in low back pain. Spine 1983; 8: 141–144. [PubMed: 6222486]

16.: Ware JE, Jr., Gandek B. Overview of the SF-36 Health Survey and the International Quality of Life Assessment (IQOLA) project. J Clin Epidemiol. 1998;51(11):903–912. [PubMed: 9817107]

17.: Wewers ME, Lowe NK. A critical review of visual analogue scales in the measurement of clinical phenomena. Res Nurs Health. 1990;13(4):227–236. [PubMed: 2197679]

18.: American College of Rheumatology. Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC). Available at: http://www.rheumatology.org/I-Am-A/Rheumatologist/Research/Clinician-Researchers/Western-Ontario-McMaster-Universities-Osteoarthritis-Index-WOMAC, Accessed 1 August 2017.

19.: Wong-Baker FACES^® History. Available at: http://wongbakerfaces.org/us/wong-baker-faces-history/, Accessed 1 August 2017.

20.: Anagnostis C, Gatchel RJ, Mayer TG. The pain disability questionnaire: a new psychometrically sound measure for chronic musculoskeletal disorders. Spine. 2004;29(20):2290–2302. [PubMed: 15480144]

21.: Askew RL, Cook KF, Revicki DA, Cella D, Amtmann D. Evidence from diverse clinical populations supported clinical validity of PROMIS pain interference and pain behavior. J Clin Epidemiol. 2016;73:103–111. [PMC free article: PMC4957699] [PubMed: 26931296]

22.: Burnham R, Stanford G, Gray L. An assessment of a short composite questionnaire designed for use in an interventional spine pain management setting. PM R. 2012;4(6):413–418. [PubMed: 22732153]

23.: Changulani M, Shaju A. Evaluation of responsiveness of Oswestry low back pain disability index. Arch Orthop Trauma Surg. 2009;129(5):691–694. [PubMed: 18521617]

24.: Chansirinukor W, Maher CG, Latimer J, Hush J. Comparison of the functional rating index and the 18-item Roland-Morris Disability Questionnaire: responsiveness and reliability. Spine. 2005;30(1):141–145. [PubMed: 15626994]

25.: Chien CW, Bagraith KS, Khan A, Deen M, Strong J. Comparative responsiveness of verbal and numerical rating scales to measure pain intensity in patients with chronic pain. J Pain. 2013;14(12):1653–1662. [PubMed: 24290445]

26.: Cook KF, Choi SW, Crane PK, Deyo RA, Johnson KL, Amtmann D. Letting the CAT out of the bag: comparing computer adaptive tests and an 11-item short form of the Roland-Morris Disability Questionnaire. Spine. 2008;33(12):1378–1383. [PMC free article: PMC2671199] [PubMed: 18496352]

27.: de Vet HC, Ostelo RW, Terwee CB, et al Minimally important change determined by a visual method integrating an anchor-based and a distribution-based approach. Qual Life Res. 2007;16(1):131–142. [PMC free article: PMC2778628] [PubMed: 17033901]

28.: Deyo RA, Katrina R, Buckley DI, et al Performance of a Patient Reported Outcomes Measurement Information System (PROMIS) Short Form in older adults with chronic musculoskeletal pain. Pain Med. 2016;17(2):314–324. [PMC free article: PMC6281027] [PubMed: 26814279]

29.: Driban JB, Morgan N, Price LL, Cook KF, Wang C. Patient-Reported Outcomes Measurement Information System (PROMIS) instruments among individuals with symptomatic knee osteoarthritis: a cross-sectional study of floor/ceiling effects and construct validity. BMC Musculoskelet Disord. 2015;16:253. [PMC free article: PMC4570513] [PubMed: 26369412]

30.: Fisher K, Johnston M. Validation of the Oswestry Low Back Pain Disability Questionnaire, its sensitivity as a measure of change following treatment and its relationship with other aspects of the chronic pain experience. Physiother Theory Pract. 1997;13:67–80.

31.: Gallasch CH, Alexandre NM. The measurement of musculoskeletal pain intensity: a comparison of four methods. Rev Gaucha Enferm. 2007;28(2):260–265. [PubMed: 17907648]

32.: Gentelle-Bonnassies S, Le Claire P, Mezieres M, Ayral X, Dougados M. Comparison of the responsiveness of symptomatic outcome measures in knee osteoarthritis. Arthritis Care Res. 2000;13(5):280–285. [PubMed: 14635296]

33.: Godil SS, Parker SL, Zuckerman SL, Mendenhall SK, McGirt MJ. Accurately measuring the quality and effectiveness of cervical spine surgery in registry efforts: determining the most valid and responsive instruments. Spine J. 2015;15(6):1203–1209. [PubMed: 24076442]

34.: Gronblad M, Hupli M, Wennerstrand P, et al Intercorrelation and test-retest reliability of the Pain Disability Index (PDI) and the Oswestry Disability Questionnaire (ODQ) and their correlation with pain intensity in low back pain patients. Clin J Pain. 1993;9:189–195. [PubMed: 8219519]

35.: Hicks GE, Manal TJ. Psychometric properties of commonly used low back disability questionnaires: are they useful for older adults with low back pain? Pain Med. 2009;10(1):85–94. [PMC free article: PMC5323267] [PubMed: 19222773]

36.: Jensen MP, Schnitzer TJ, Wang H, Smugar SS, Peloso PM, Gammaitoni A. Sensitivity of single-domain versus multiple-domain outcome measures to identify responders in chronic low-back pain: pooled analysis of 2 placebo-controlled trials of etoricoxib. Clin J Pain. 2012;28(1):1–7. [PubMed: 21705875]

37.: Kamper SJ, Grootjans SJ, Michaleff ZA, Maher CG, McAuley JH, Sterling M. Measuring pain intensity in patients with neck pain: does it matter how you do it? Pain Pract. 2015;15(2):159–167. [PubMed: 24433369]

38.: Kean J, Monahan PO, Kroenke K, et al Comparative responsiveness of the PROMIS Pain Interference Short Forms, Brief Pain Inventory, PEG, and SF-36 Bodily Pain Subscale. Med Care. 2016;54(4):414–421. [PMC free article: PMC4792763] [PubMed: 26807536]

39.: Keller S, Bann CM, Dodd SL, Schein J, Mendoza TR, Cleeland CS. Validity of the Brief Pain Inventory for use in documenting the outcomes of patients with noncancer pain. Clin J Pain. 2004;20:309–318. [PubMed: 15322437]

40.: Krebs EE, Bair MJ, Damush TM, Tu W, Wu J, Kroenke K. Comparative responsiveness of pain outcome measures among primary care patients with musculoskeletal pain. Med Care. 2010;48(11):1007–1014. [PMC free article: PMC4876043] [PubMed: 20856144]

41.: Krebs EE, Carey TS, Weinberger M. Accuracy of the pain numeric rating scale as a screening test in primary care. J Gen Intern Med. 2007;22(10):1453–1458. [PMC free article: PMC2305860] [PubMed: 17668269]

42.: Lovejoy TI, Turk DC, Morasco BJ. Evaluation of the psychometric properties of the revised short-form McGill Pain Questionnaire. J Pain. 2012;13(12):1250–1257. [PMC free article: PMC3513374] [PubMed: 23182230]

43.: Lund I, Lundeberg T, Sandberg L, Budh CN, Kowalski J, Svensson E. Lack of interchangeability between visual analogue and verbal rating pain scales: a cross sectional description of pain etiology groups. BMC Med Res Methodol. 2005;5:31. [PMC free article: PMC1274324] [PubMed: 16202149]

44.: Macedo LG, Maher CG, Latimer J, Hancock MJ, Machado LA, McAuley JH. Responsiveness of the 24-, 18- and 11-item versions of the Roland Morris Disability Questionnaire. Eur Spine J. 2011;20(3):458–463. [PMC free article: PMC3048224] [PubMed: 21069545]

45.: Maughan EF, Lewis JS. Outcome measures in chronic low back pain. Eur Spine J. 2010;19:1484–1494. [PMC free article: PMC2989277] [PubMed: 20397032]

46.: Merriwether EN, Rakel BA, Zimmerman MB, et al Reliability and construct validity of the Patient-Reported Outcomes Measurement Information System (PROMIS) Instruments in women with fibromyalgia. Pain Med. 2016. [PMC free article: PMC6279305] [PubMed: 27561310]

47.: Mikail SF, DuBreuil S, D’eon JL. A Comparative Analysis of Measures Used in the Assessment of Chronic Pain Patients. Psychol Assess. 1993;5(1):117–120.

48.: Nilsdotter AK, Lohmander LS, Klassbo M, Roos EM. Hip disability and osteoarthritis outcome score (HOOS)--validity and responsiveness in total hip replacement. BMC Musculoskelet Disord. 2003;4:10. [PMC free article: PMC161815] [PubMed: 12777182]

49.: Parker SL, Adogwa O, Mendenhall SK, et al Determination of minimum clinically important difference (MCID) in pain, disability, and quality of life after revision fusion for symptomatic pseudoarthrosis. Spine J. 2012;12(12):1122–1128. [PubMed: 23158968]

50.: Pinsker E, Inrig T, Daniels TR, Warmington K, Beaton DE. Reliability and validity of 6 measures of pain, function, and disability for ankle arthroplasty and arthrodesis. Foot Ankle Int. 2015;36(6):617–625. [PubMed: 25652665]

51.: Scott W, McCracken LM. Patients’ impression of change following treatment for chronic pain: global, specific, a single dimension, or many? J Pain. 2015;16(6):518–526. [PubMed: 25746196]

52.: Sindhu BS, Shechtman O, Tuckey L. Validity, reliability, and responsiveness of a digital version of the visual analog scale. J Hand Ther. 2011;24(4):356–363. [PubMed: 21820864]

53.: Stewart M, Maher CG, Refshauge KM, Bogduk N, Nicholas M. Responsiveness of pain and disability measures for chronic whiplash. Spine. 2007;32(5):580–585. [PubMed: 17334294]

54.: Stroud MW, McKnight PE, Jensen MP. Assessment of self-reported physical activity in patients with chronic pain: development of an abbreviated Roland-Morris disability scale. J Pain. 2004;5(5):257–263. [PubMed: 15219257]

55.: Tan G, Jensen MP, Thornby JI, Shanti BF. Validation of the Brief Pain Inventory for chronic nonmalignant pain. J Pain. 2004;5(2):133–137. [PubMed: 15042521]

56.: Tong HC, Geisser ME, Ignaczak AP. Ability of early response to predict discharge outcomes with physical therapy for chronic low back pain. Pain Pract. 2006;6(3):166–170. [PubMed: 17147593]

57.: Trudeau J, Van Inwegen R, Eaton T, et al Assessment of pain and activity using an electronic pain diary and actigraphy device in a randomized, placebo-controlled crossover trial of celecoxib in osteoarthritis of the knee. Pain Pract. 2015;15(3):247–255. [PubMed: 24494935]

58.: van der Roer N, Ostelo RW, Bekkering GE, van Tulder MW, de Vet HC. Minimal clinically important change for pain intensity, functional status, and general health status in patients with nonspecific low back pain. Spine. 2006;31(5):578–582. [PubMed: 16508555]

59.: van Grootel RJ, van der Bilt A, van der Glas HW. Long-term reliable change of pain scores in individual myogenous TMD patients. Eur J Pain. 2007;11(6):635–643. [PubMed: 17118682]

60.: Wittink H, Turk DC, Carr DB, Sukiennik A, Rogers W. Comparison of the redundancy, reliability, and responsiveness to change among SF-36, Oswestry Disability Index, and Multidimensional Pain Inventory. Clin J Pain. 2004;20(3):133–142. [PubMed: 15100588]

Prepared for: Department of Veterans Affairs, Veterans Health Administration, Quality Enhancement Research Initiative, Health Services Research & Development Service, Washington, DC 20420. Prepared by: Evidence-based Synthesis Program (ESP) Center, Minneapolis VA Health Care System, Minneapolis, MN, Timothy J. Wilt, MD, MPH, Director, Nancy Greer, PhD, Program Manager

Suggested citation:

Goldsmith ES, Murdoch M, Taylor B, Greer N, MacDonald R, McKenzie LG, Rosebush C, Wilt TJ. Rapid Evidence Review: Measures for Patients with Chronic Musculoskeletal Pain. VA ESP Project #09-009; 2017.

This report is based on research conducted by the Evidence-based Synthesis Program (ESP) Center located at the Minneapolis VA Health Care System, Minneapolis, MN, funded by the Department of Veterans Affairs, Veterans Health Administration, Office of Research and Development, Quality Enhancement Research Initiative. The findings and conclusions in this document are those of the author(s) who are responsible for its contents; the findings and conclusions do not necessarily represent the views of the Department of Veterans Affairs or the United States government. Therefore, no statement in this article should be construed as an official position of the Department of Veterans Affairs. No investigators have any affiliations or financial involvement (eg, employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties) that conflict with material presented in the report.

Created: August 2017.

Bookshelf ID: NBK525003PMID: 30183221

Figure 1Literature Flow Chart

Table 1Overview of Pain Measures

Pain Measure	Development Pain Type				Pain Domain		Length	Restrictions on Use
Pain Measure	General	LBP	Knee/hip	Other	Severity/Intensity	Function/Interference	Number of Items	Restrictions on Use
BPI¹¹				X	X	X	11	Yes
DVPRS¹²	X				X	X	5	No
GCPS¹³	X				X	X	7	No
HOOS¹⁴			X		X	X	40	No
KOOS¹⁵			X		X	X	42	No
MPQ¹⁶	X				X		78^a	No
MPI/WHYMPI¹⁷	X	X			X	X	52	No
NRS for Pain¹⁶	X				X	X	1	No
ODI¹⁸		X			X	X	10	Yes
PGIC¹⁹	?				?	?	1	No
PEG²⁰)	X				X	X	3	No
PROMIS-PI²¹	X					X	41	Yes
RMDQ²²		X				X	24	No
SF-36 BPS²³	X				X	X	2	Yes
VAS for Pain²⁴				X	X	X	1	No
WOMAC²⁵			X		X	X	24	Yes
Wong Faces Scale²⁶	X			X	X		1	Yes

: ?=not identified

: S=pain severity; I=pain interference

a: Tain Rating Index (PRI) based on 78 pain descriptors; Present Pain Intensity (PPI) based on 6 additional items

Table 2Overview of Included Studies

Author Year	Pain Measures Evaluated	Study Characteristics
Author Year	Pain Measures Evaluated	Sample Size	Pain Condition	Mean Age (years)	Women (%)	Race/Ethnicity (%)
Anagnostis 2004²⁷	ODI	230	CDMD	43	53	White: 60 African American/Black: 29 Hispanic: 11 Other: 0.1
Askew 2016²⁸	PROMIS-PI	218	LBP	NR	56	White: 84 African American/Black: 4 Other: 12
Burnham 2012⁴⁸	MPQ, ODI	60	Spine	60	67	NR
Changulani 2009⁵⁷	ODI, VAS	107	LBP	58	58	NR
Chansirinukor 2005⁵²	RMDQ	143	LBP	38	26	NR
Chien 2013⁵³	BPI	254	General MSP	51	50	NR
Cook 2008²⁹	RMDQ	875	LBP	47	NR	White: 85 African American/Black: 9 Other: 6
de Vet 2007⁵⁸	NRS	438	LBP	NR	NR	NR
Deyo 2016³⁰	PROMIS-PI-SF	198	General MSP	67	62	White: 92 Hispanic: 4 Other: 4
Driban 2015³¹	PROMIS-PI, SF-36 BPS, WOMAC	204	Knee (OA)	60	70	White: 53 African American/Black: 36 Other: 12
Fisher 1997⁵⁹	MPQ, ODI	54	LBP	41	63	NR
Gallasch 2007⁵¹	Wong Faces, NRS, VAS	32	General MSP	51	NR	NR
Gentelle-Bonnassies 2000⁶⁰	VAS, WOMAC	80	Knee (OA)	62	70	NR
Godil 2015³²	NRS	88	Neck and arm	52	44	NR
Gronblad 1993⁶¹	ODQ, VAS	94	LBP	43	51	NR
Hicks 2009³³	ODI, SF-36 BPS	107	LBP	80	72	White: 100
Jensen 2012³⁴	VAS	639	LBP	52	62	White: 90 African American/Black: 5 Other: 5
Kamper 2015⁵⁴	NRS, SF-36 BPS	280	Whiplash	44	65	NR
Kean 2016^a³⁵	BPI, PEG, PROMIS-PI-SF, SF-36 BPS	244	MSP	55	17	White: 77 African American/Black: 19 Other: 4
Keller 2004³⁶	BPI, GCPS, SF-36 BPS, RMDQ	131	LBP	46	NR	NR
Kerns 1985^a¹⁷	MPI (WHYMPI), MPQ	120	Chronic MSP	51	19	NR
Krebs 2010^b³⁷	BPI, GCPS, PEG, RMDQ, SF-36 BPS	427	Back, hip, knee	59	53	White: 58 African American/Black: 38 Other: 4
Krebs 2009^b²⁰	BPI, GCPS, PEG, PGIC, RMDQ, SF-36 BPS	500	Back, hip, knee	59	52	White: 58 African American/Black: 38 Other: 4
Krebs 2007³⁸	NRS	275	General MSP	59	59	White: 70 African American/Black: 24 Other: 6
Lovejoy 2012^a³⁹	MPI, MPQ-2-SF, MPQ	186	LBP, neck, joint	54	8	White: 75 Other: 15
Lund 2005⁶²	VAS	30	MSP	43	43	NR
Macedo 2011⁵⁵	RMDQ	461	LBP	53	61	NR
Maughan 2010⁶³	NRS, ODI, RMDQ	48	LBP	52	67	NR
Merriwether 2016⁴⁰	PROMIS-PI	106	Fibromyalgia	49	100	White: 96 Other: 4
Mikail 1993⁴⁹	MPI, ODI	315	General MSP	44	53	NR
Nilsdotter 2003⁶⁴	HOOS, WOMAC, SF-36 BPS	62	Hip (OA)	73	45	NR
Parker 2012⁴¹	ODI, VAS	47	LBP	55	64	NR
Pinsker 2015⁵⁰	NRS, WOMAC	142	Ankle	61	54	NR
Scott 2015⁶⁵	PGIC	476	Not specified	46	67	White: 72 African American/Black: 17 Other: 11
Sindhu2011⁴²	NRS, VAS	33	Elbow, forearm, hand	39	48	NR
Stewart 2007⁵⁶	NRS, SF-36 BPS	132	Whiplash	43	67	NR
Stroud 2004⁴³	RMDQ	998	Not specified	44	57	White: 84 African American/Black: 3 Native American: 4 Other: 9
Tan 2004^a⁴⁴	BPI, RMDQ	440	Not specified	55	8	White: 72 African American/Black: 21 Other: 7
Tong 2006⁴⁵	VAS	52	LBP	41	62	White: 88 African American/Black: 3 Other: 9
Trudeau 2015⁴⁶	WOMAC, NRS	47	Knee (OA)	NR	NR	NR
van der Roer 2006⁶⁶	NRS	138	LBP	44	59	NR
van Grootel 2007⁶⁷	VAS	118	TMD	32	93	NR
Wittink 2004⁴⁷	MPI, ODI, SF-36 BPS	87	Chronic pain	47	67	White: 79 Other: 21

a: Enrolled exclusively US Veterans

b: Enrolled US Veterans and non-Veterans; results not stratified by Veteran status

: Abbreviations: CDMD=chronic disabling musculoskeletal disorders; LBP=low back pain; MSP=musculoskeletal pain; NR=not reported; OA=osteoarthritis; TMD=temporomandibular disorder

Table 3Summary of Results: Studies Assessing Psychometric Properties of Self-Report Measures of Pain Severity (S) and Functional Interference (I) in Chronic Musculoskeletal Pain Populations

Pain Measure	Number of studies	Total Participants	MID	Responsiveness	Concurrent validity	Discriminant validity	Test-retest reliability
Brief Pain Inventory (BPI)	6	1,996	Krebs 2010^b³⁷ (S,I)	Chien 2013⁵³ (S) Kean 2016^a³⁵ (S,I) Keller 2004³⁶ (S, I) Krebs 2010^b³⁷ (S,I) Krebs 2009^b²⁰ (S,I) Tan 2004^a⁴⁴ (S,I)	Keller 2004³⁶ (S,I) Krebs 2009^b²⁰ (S,I) Tan 2004^a⁴⁴ (S,I)	-	-
Defense and Veterans Pain Rating Scale (DVPRS)	0	-	-	-	-	-	-
Graded Chronic Pain Scale (GCPS)	3	1,058	Krebs 2010^b³⁷ (S,I)	Keller 2004³⁶ (S,I) Krebs 2010^b³⁷ (S,I)	Keller 2004³⁶ (S, I) Krebs 2009^b²⁰ (S,I)	-	-
Hip Osteoarthritis Outcomes Scale (HOOS)	1	62	-	Nilsdotter 2003⁶⁴ (S,I)	Nilsdotter 2003⁶⁴ (S,I)		-
Knee Osteoarthritis Scale (KOOS)	0	-	-	-	-	-	-
McGill Pain Questionnaire (MPQ)	3	366	-	Burnham 2012⁴⁸ (S)	Kerns 1985^a¹⁷ (S) Lovejoy 2012^a³⁹ (S)	Lovejoy 2012^a³⁹ (S)	Burnham 2012⁴⁸ (S)
Multidimensional Pain Inventory (MPI/WHYMPI)	4	708	-	Wittink 2004⁴⁷ (S,I)	Kerns 1985^a¹⁷ (S,I) Lovejoy 2012^a³⁹ (S,I) Mikail 1993⁴⁹ (S,I) Wittink 2004⁴⁷ (S,I)		Kerns 1985^a¹⁷ (S,I)
Numerical Rating Scale (NRS) for Pain	11	1,653	De Vet 2007⁵⁸ (S) Maughan 2010⁶³ (S) Van der Roer 2006⁶⁶ (S)	Godil 2015³² (S) Maughan 2010⁶³ (S) Sindhu 2011⁴² (S) Stewart 2007⁵⁶ (S, I) Trudeau 2015⁴⁶ (S)	De Vet 2007⁵⁸ (S) Kamper 2015⁵⁴ (S) Pinsker 2015⁵⁰ (S) Sindhu 2011⁴² (S)	Krebs 2007³⁸ (S)	Gallasch 2007⁵¹ (S) Sindhu 2011⁴² (S)
Oswestry Disability Index (ODI)	10	1,149	Hicks 2009³³ (I) Maughan 2010⁶³ (I) Parker 2012⁴¹ (I)	Anagnostis 2004²⁷ (I) Burnham 2012⁴⁸ (I) Changulani 2009⁵⁷ (I) Maughan 2010⁶³ (I) Wittink 2004⁴⁷ (I)	Changulani 2009⁵⁷ (I) Fisher 1997⁵⁹ (I) Gronblad 1993⁶¹ (I) Hicks 2009³³ (I) Mikail 1993⁴⁹ (I) Wittink 2004⁴⁷ (I)	Hicks 2009³³ (I)	Burnham 2012⁴⁸ (I) Gronblad 1993⁶¹ (I) Hicks 2009³³ (I)
Patient Global Impression of Change (PGIC)	1	476	-	Scott 2015⁶⁵ (S,I)		-	-
PEG	3	1,171	Krebs 2010^b³⁷ (S,I)	Kean 2016^a³⁵ (S,I) Krebs 2010^b³⁷ (S,I) Krebs 2009^b²⁰ (S,I)	Krebs 2009^b²⁰ (S,I)	-	-
Patient-reported Outcomes Measurement Information System-Pain Interference (PROMIS-PI)	4	864	-	Askew 2016²⁸ (I) Deyo 2016³⁰ (I) Kean 2016^a³⁵ (I)	Driban 2015³¹ (I)	Deyo 2016³⁰ (I)	Deyo 2016³⁰ (I)
Roland-Morris Disability Questionnaire (RMDQ)	9	4,023	Chansirinukor 2005⁵² (I) Krebs 2010^b³⁷ (I) Maughan 2010⁶³ (I)	Chansirinukor 2005⁵² (I) Krebs 2010^b³⁷ (I) Macedo 2011⁵⁵ (I) Maughan 2010⁶³ (I)	Cook 2008²⁹ (I) Keller 2004³⁶ (I) Krebs 2009^b²⁰ (I) Stroud 2004⁴³ (I) Tan 2004^a⁴⁴ (I)	-	Chansirinukor 2005⁵² (I)
SF-36 Bodily Pain Scale (SF-36 BPS)	10	2,174	Krebs 2010^b³⁷ (S,I)	Kean 2016^a³⁵ (S,I) Keller 2004³⁶ (S, I) Krebs 2010^b³⁷ (S,I) Stewart 2007⁵⁶ (S,I) Wittink 2004⁴⁷ (S,I)	Driban 2015³¹ (S,I) Hicks 2009³³ (S,I) Kamper 2015⁵⁴ (S) Keller 2004³⁶ (S,I) Krebs 2009^b²⁰ (S,I) Nilsdotter 2003⁶⁴ (S,I) Wittink 2004⁴⁷ (S,I)
Visual Analogue Scale (VAS) for Pain	8	541	Parker 2012⁴¹ (S) Van Grootel 2007⁶⁷ (S)	Gentelle-Bonnassies 2000⁶⁰ (S) Sindhu 2011⁴² (S)	Changulani 2009⁵⁷ (NR) Gronblad 1993⁶¹ (S) Sindhu 2011⁴² (S)	-	Gallasch 2007⁵¹ (S) Lund 2005⁶² (S) Sindhu 2011⁴² (S)
Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC)	5	535	-	Gentelle-Bonnassies 2000⁶⁰ (S,I) Nilsdotter 2003⁶⁴ (S,I) Trudeau 2015⁴⁶ (S)	Driban 2015³¹ (S,I) Pinsker 2015⁵⁰ (S,I)	-	Pinsker 2015⁵⁰ (S,I)
Wong Faces Scale/ Wong-Baker Face Scale	1	32	-	-	-	-	Gallasch 2007⁵¹ (S)

a: Enrolled exclusively US Veterans

b: Enrolled US Veterans and non-Veterans; results not stratified by Veteran status

: MID=minimally important difference

Table 4Comparative External Responsiveness based on (AUC) Values for Detecting Any Improvement

Study (sample size)	Pain Measures
Study (sample size)	BPI (total)	PEG	SF-36 BPS	PROMIS	RMDQ	CPG	NRS	ODI
Kean 2016³⁵ (n=244)	0.73	0.71	0.68	Range 0.56 to 0.61^a	-
Krebs 2010³⁷ RCT (n=205)	0.81	0.78	0.72	-	0.81	Range 0.75 to 0.78^b
Krebs 2010³⁷ Cohort (n=222)	0.78	0.73	0.68	-	0.70	Range 0.65 to 0.75^b
Maughan 2010⁶³ (n=48)					0.64		0.50	0.67
Stewart 2007⁵⁶ (n=134)			0.73				Range 0.68 to 0.70^c

: AUC=Area Under the Curve (Values range 0.5 (the same as chance) to 1.0 (perfect discrimination) and are interpreted as the probability of a measure correctly discriminating between participants who have improved and those who have not improved). BPI= Brief Pain Inventory; CPG= Chronic Pain Grade Questionnaire; ODI=Oswestry Disability Index; PROMIS=Patient-Reported Outcomes Measurement Information System; RCT=randomized controlled trial; SF-36 BPS=Short Form (36) Bodily Pain Scale

a: Three versions were administered: 1) PROMIS-29 Profile Pain Interference Short Form (4-item). AUC=0.56; 2) PROMIS-57 Profile Pain Interference Short Form (8-item). AUC=0.57; 3) PROMIS Pain Interference Short Form 6b(6-item). AUC=0.61.

b: Pain intensity and disability

c: Two measures administered, pain intensity and pain bothersomeness

Supplemental Table 1Search Strategy

1	exp Low Back Pain/ or exp Shoulder Pain/ or exp Back Pain/ or exp Musculoskeletal Pain/ or exp Chronic Pain/ or exp Neck Pain/
2	(pain and (musculoskeletal or (low adj back) or neck or shoulder or hip or knee or joint)).mp.
3	osteoarthritis.mp. or exp Osteoarthritis/
4	1 or 2 or 3
5	exp Pain Measurement/mt
6	(pain adj5 (questionnaires or assess$ or measur$ or scale$ or inventor$ or rating$ or tool$)).mp.
7	(BPI or PEG or SF-36 or PROMIS or McGill or DVPRS or Roland-Morris or WOMAC or Oswestry or KOOS or HOOS or (Faces adj Scale)).mp.
8	5 or 6 or 7
9	(pain adj (severity or intensity or function$ or limit$ or activit$ or impact$ or interfer$ or disabilit$)).mp.
10	(valid$ or reliab$ or feasib$ or generalizab$ or respons$ or implements).mp.
11	4 and 8 and 9 and 10
12	limit 12 to (english language and humans and yr=“2000 -Current”)

Supplemental Table 2Peer Review Comments/Author Responses

Question Text	Comment	Author Response
Are the objectives, scope, and methods for this review clearly described?	Yes	Thank you
	Yes	Thank you
	Yes	Thank you
	Yes	Thank you
	Yes	Thank you
Is there any indication of bias in our synthesis of the evidence?	No	Thank you
	No	Thank you
	No	Thank you
	No	Thank you
	No	Thank you
Are there any published or unpublished studies that we may have overlooked?	Yes - Please see my major comment below.	Please see our response to these major comments below.
	No	Thank you
	Yes - I have some concerns about the time period examined, as detailed below.	Please see our response below.
	Yes - I am concerned that some studies may have been missed. For example, re: the PROMIS-PI scale please verify that the following studies were screened and excluded. Amtmann, D. A., Cook, K. F., Jensen, M. P., Chen, W-H., Choi, S. W., Revicki, D., Cella, D., Rothrock, N., Keefe, F., Callahan, L, Lai, J-S. (2010). Development of a PROMIS item bank to measure pain interference. Pain, 150(1), 173-82. Amtmann, D., Kim, J., Chung, H., Askew, R. L, Park, R., & Cook, K. F. (2016). Minimally important differences for Patient Reported Outcomes Measurement Information System pain interference for individuals with back pain. Journal of Pain Research, 9, 251-255. Askew, R. L, Kim, J., Chung, H., Cook, K. F., Johnson, K. L, & Amtmann, D. (2013). Development of a crosswalk for pain interference measured by the BPI and PROMIS pain interference short form. Quality of Life Research, 10.1007/s11136-013-0398-5. Broderick, J. E., Schneider, S., Junghaenel, D. U., Schwartz, J. E., & Stone, A. A. (2013). Validity and reliability of Patient-Reported Outcomes Measurement Information System instruments in osteoarthritis. Arthritis Care and Research, 5(10), 1625-1633. Merriwether, E. N., Rakel, B. A., Zimmerman, M. B., Dailey, D. L, Vance, C. G., Darghosian, L, … Sluka, K. A. (2016). Reliability and construct validity of the Patient-Reported Outcomes Measurement Information System (PROMIS) instruments in women with fibromyalgia. Pain Medicine. doi:10.1093/pm/pnw187 Papuga, M. O., Mesfin, A., Molinari, R., & Rubery, P. T. (2016). Correlation of PROMIS physical function and pain CAT instruments with Oswestry Disability Index and Neck Disability Index in spine patients. Spine. doi:10.1097/BRS.0000000000001518 Also, I am concerned that the exclusion criteria may have resulted in exclusion of relevant studies (see comments below). For example, many studies investigating the psychometric properties of the pain scales have been published using non-English language versions of the scales of interest. It is not clear to me the rationale for excluding such studies, as this excludes a huge chunk of the literature on this topic. If the authors were concerned that findings could be affected by use of translated versions of a scale, it would be easy to assess whether that is the case.	The suggested references were reviewed for eligibility and did not meet inclusion criteria. Amtmann (2010): the study population did not meet the requirement that >75% of participants have chronic musculoskeletal pain Amtmann (2016): the study population did not meet the requirement that >75% of participants have chronic musculoskeletal pain Askew: the study population was comprised of multiple sclerosis patients (not musculoskeletal pain) Broderick: the study population was comprised of patients who self-reported a physician diagnosis of osteoarthritis. It is unclear whether such diagnoses were radiologically or clinically defined. Details were not provided on presence or duration of pain. Merriwether: we agree with a reviewer’s suggestion to include fibromyalgia; this study is included in the final report. Papuga: the duration of pain associated with conditions of the spine in this population was not reported. We excluded results from non-English language versions of scales. We added information to support this decision in the Limitations section (page 31 We disagree that “it would be easy to assess” whether findings could be affected by use of translated versions of a scale, as psychometric properties are affected by a number of factors other than linguistic and cultural variation, and isolating the influence of language variation would not be a straightforward process.
	No	Thank you.
Additional suggestions or comments can be provided below. If applicable, please indicate the page and line numbers from the draft report.	I have one major concern and a number of smaller comments. Major comment: I am confused about the inclusion/exclusion criteria related to chronic pain conditions. Exclusion criteria include “studies of patients with chronic conditions typically associated with pain unless the study specified that the patients had CMP (eg, osteoarthritis).” Does this mean that a study conducted in an osteoarthritis population would be excluded unless the authors specified that patients had “CMP”? If so, this seems to contradict the key question, which indicates that the population of interest has “chronic (≥ 3 months) musculoskeletal pain (eg, low back pain, osteoarthritis, and non-traumatic joint pain).” The pain field suffers from a lack of consensus on terminology (e.g., “chronic pain” vs “persistent pain”) and pain diagnosis categories, so substantial heterogeneity in descriptions of clinical populations is to be expected. “Chronic musculoskeletal pain” is not a specific entity, just an umbrella term used to capture a group of patients with chronic painful conditions, such as those with low back pain and osteoarthritis. Excluding studies of chronic pain measures conducted in patients with chronic back pain and osteoarthritis that do not describe patients as specifically having “chronic musculoskeletal pain” does not seem to make sense. It’s possible that I’m just misunderstanding the exclusion criterion. It would be helpful to have a table of excluded studies along with the reasons for exclusion so readers of the report can better understand how criteria were applied. Without this information, 1 am wondering why the following studies were not included: Keller S, Bann CM, Dodd SL, Schein J, Mendoza TR, Cleeland CS. Validity of the brief pain inventory for use in documenting the outcomes of patients with noncancer pain. Clin J Pain. 2004 Sep-Oct;20(5):309-18. Elliott AM, Smith BH, Smith WC, Chambers WA. Changes in chronic pain severity over time: the Chronic Pain Grade as a valid measure. Pain. 2000 Dec 1;88(3):303-8. Holm I, Friis A, Storheim K, Brox Jl. Measuring self-reported functional status and pain in patients with chronic low back pain by postal questionnaires: a reliability study. Spine (Phila Pa 1976). 2003 Apr 15;28(8):828-33. Other comments: I would not use “CMP” in the text because I think it’s generally preferable to avoid unnecessary use of idiosyncratic abbreviations. (I have no objection to use in the tables, where space is limited). It would be helpful to provide a bit more descriptive information about each of the included measures. Many of these measures are currently described incorrectly as using “Likert” type items. Numeric rating items, such as those in the BPI and PEG, are not the same as Likert-type items. Reporting of pain medication use is commented on for most of the measures and in the table, but it is not clear to me how this information is relevant to this report. Given the purpose, it would have made more sense to describe patients’ use of non-pharmacological therapies. (Please note: I am not suggesting adding info about reporting of non-pharm therapies. Rather, I think the text about reporting pain medications, such as “studies failed to report if patients were using pain medication,” could be eliminated because it’s not a relevant limitation for these types of studies.) For measures with no included studies (e.g., DVPRS, KOOS), it would be helpful to provide information about why studies were excluded. Table 1: What is meant by “yes” and “request” in the public domain column? Some measures are copyrighted but available without charge. Some measures require payment. The relevant information here is the requirement for payment. If “public domain” is meant to mean the scale is available without charge, then the information is incorrect for several of the scales. I am confident that the GCPS and PROMIS measures are both available and free. BPI is not free. The SF-36 is copyrighted and some versions are available free while others are only available with payment. I don’t know about many of the other measures. Page 5: Authors may wish to add publication prior to 2000 to the list of exclusion criteria, provide a brief justification (e.g., rapid review, focus on measures in current wide use) and briefly describe how they handled situations where the original measure paper was published prior to 2000. Page 9: The BPI items are 0-10 numeric rating scales, not Likert scales. Page 12: Numeric rating scale is a generic term not specific to pain numeric rating scales. Page 17: Shouldn’t the van Grootel study been excluded because it is a study of patients with orofacial pain? Page 23: It would be helpful to comment on the fact that pain normally varies in intensity and would not be expected to remain static over days, weeks, or months. For most of the measures assessed, test-retest reliability doesn’t make conceptual sense. Page 29: The BPI scales have a total of 11 items (not 17). Page 29: Shouldn’t the Brazilian study (Gallasch 2007) be excluded because measures were not administered in English? Page 30: I’m confused overall by the wording in this section. What does “all other scales we reviewed can be self-administered” mean? Does this mean they can’t be administered any other way or that the preceding scales can’t be self-administered? If so, why? Also, there is at least one error: the PEG is not designed to be administered by an interviewer; like the BPI from which it originated, it can be self-administered or administered by an interviewer. I would guess that almost all of the scales are commonly administered both by self-complete questionnaire and by interview. I suspect very few have been subject to rigorous evaluation of whether they perform similarly when self-administered or telephone/in-person interviewer-administered. Either way, I think it is outside of the scope of the review to determine all of the validated modes of administration for each scale. It would be helpful to know which scales require specific tools or modes for administration (e.g., computerized adaptive testing, visual aids). Page 30: There are several errors in the availability section. GCPS and PROMIS are freely available without charge. I think the MPI is too. Also, availability of SF-36 and its pain subscale is complex. The original version is available from RAND for free, but there is a revised copyrighted commercial version that requires payment to use.	We clarified the study inclusion criteria. The requested scope of this rapid review was to assess the psychometric properties of specified scale scores in individuals with chronic musculoskeletal pain (defined as at least 3 months duration). We were generous in our inclusion criteria. We did not specifically require the phrase “chronic musculoskeletal pain” be used. We included studies if the authors reported that participants had pain of at last 3 months duration or the authors described the participants as having chronic pain associated with a musculoskeletal condition (eg, osteoarthritis) even if duration was not described. We disagree that articles should have been included if they evaluated pain measures in patients with chronic conditions often associated with pain. Such individuals do not necessarily have chronic pain. From a clinical perspective, many patients with radiologically defined osteoarthritis do not have pain or only have acute or subacute pain and thus would fall outside the scope of this review. Nonetheless, we recognize that some may wish to extrapolate less reliable findings from individuals with acute or subacute pain or those with osteoarthritis without pain. We provide a discussion of findings from results of systematic reviews that assessed pain scale scores from studies that included these populations. In the process of full text review, we exclude an article if it meets any one of our exclusion criteria - we do not document all the reasons it was not eligible. Therefore a table of excluded studies would not provide the level of requested detail. The suggested references were reviewed for eligibility. Keller is now included in the final report. Elliott did not meet all inclusion criteria. The study was designed to assess the Chronic Pain Grade as a measure useful in prospective studies of the general population and included patients with any chronic pain, including pain due to angina, arthritis, back pain, injury, women’s problems, and unknown sources. Results were not stratified by pain type. Holm did not meet all inclusion criteria (not English version of scale of interest). The study assessed the Norwegian version of the ODI. We removed the CMP abbreviation. Thank you for the suggestion. We reviewed Table 1 (table of measures) and Supplemental Content Table 3 and updated them to reflect correct descriptions of the scales. Scales, including the BPI and PEG, have been corrected to indicate they are numeric rating scales. Of note, there are discrepancies among various sources of information we reviewed about the scales. We thought that “use of pain medication” might provide an important descriptor of the study population but there are limitations, as the reviewer noted. We have deleted this information from the text and tables. We appreciate the reviewer’s point about the value of information on why no studies of the DVPRS and KOOS met inclusion criteria. We have added this information to the report (page 10). As noted above, we updated Table 1. We replaced the “domain” column with “Restrictions on Use.” The BPI and SF-36 have restrictions and may require payment. Some scales may also be obtained directly from the original author. Although our literature search was limited to 2000 to the present, we included studies prior to 2000 that were identified in hand-searching of reference lists of eligible studies and systematic reviews as well as websites of individual pain scales. This is noted in the Methods section (page 6). We updated Supplemental Content Table 3 to show the BPI as a set of numeric rating scales. We reviewed Table 1 and Supplemental Content Table 3 and added “for pain” to the titles for the NRS and VAS We included TMJ references, as this type of orofacial pain is musculoskeletal in nature. We clarified this in the report (page 7). We agree that pain varies in intensity and is not expected to remain static over specific periods of time. We comment further on the limited conceptual value and applicability of test-retest reliability in the discussion (page 31). We updated Table 1 and Supplemental Content Table 3 showing the BPI as an 11 item scale. We reviewed Gallasch 2007. For non-US studies, we included the study if the authors did not specify the language used for the measures and Gallasch doesn’t provide this information. We modified our exclusion criteria to reflect this. The section on “Mode of Administration” has been revised. We appreciate the reviewer’s attention to the information about availability of measures. We updated Table 1 and Supplemental Content Table 3 as well as the text on Availability. We chose to indicate that there are restrictions on the SF-36.
	This report will serve an important role in informing a Pain Measurement WG deliberations regarding optimal measures for use in clinical and research settings. The process for establishing the parameters of the review, the enactment of a high fidelity review process, and an exceptionally clear report are important strengths. At the same time, the narrow scope of the review and the narrow parameters for identifying published articles on this topic will likely limit the value of the report. In particular, the decision to “exclude trials of interventions for pain, unless assessment of psychometric properties of interest was noted in the abstract” seems to be a “fatal flaw.” It seems intuitive that most published clinical trials would not report psychometric properties of the key measures in the Abstract, since this would not likely be the major focus of the report. It would also seem intuitive that published reports of the psychometric properties of the measures should have been considered, as well as systematic reviews of the psychometric properties of these measures. These decisions likely greatly limited the data pool upon which measures’ quality could be evaluated. Other comments: Why only Medline? Results seem to be based on a binary determination about whether specific psychometric properties were reported in published studies, rather than the strength of the psychometric properties. Results are reported for CMP generally, without regard to specific CMP conditions. The failure to examine non-English language versions of the measures will greatly limit the value of the report beyond its use in informing VHA policy. Use of quality assessment (COSMIN) is a strength. Some of the data reported in Table 1 could be considered misleading. The pain severity and pain interference scales of the WHYMPI total only 12 items. The Pain Rating Index of the MPQ that assesses pain severity has only five response options. There are questions about the accuracy of Table 2. Kerns (1985) reported on the concurrent and criterion-related validity of the measure. Page 24 reports on internal consistency, but it reports on a range of alphas for the several scales of the WHYMPI, not just the pain severity and interference scales. Also on page 24, the section on concurrent and criterion-related validity does include a reference to Kerns et al (1985) but this publication is not listed among those that were reviewed earlier in the same section and the fact that these psychometric properties were evaluated is not acknowledged in Table 2. I’m not clear how MID or responsiveness is operationalized. It seems intuitive that if an RCT included one of these measures as an outcome, and if there is evidence of significant change over time, it should be considered as evidence or responsiveness. Similarly, if the study included a prespecified MID and included a “responder analysis” then this should be included.	Thank you. We disagree that the scope and parameters are too narrow. Our report was based on decisions made jointly with our partners and within the parameters of a Rapid Review and the Topic Nomination. We previously described our concerns about including findings from studies of non-English language versions of scales and recommendations for providing some information related to these studies. We also disagree that our decision to “exclude trials of interventions for pain, unless assessment of psychometric properties of interest was noted in the abstract” is a “fatal flaw.” We reviewed search and triage strategies of relevant systematic reviews. It is extremely likely that these authors used similar strategies. For example, the systematic review by Gandek (2015) evaluating the WOMAC excluded nearly 2000 articles at the abstract level because they did not meet inclusion criteria (without further elaboration). Included articles described psychometric properties in the abstract and none of the included articles identified in prior systematic reviews or scale score websites reported information in the body of the manuscript without also describing in the abstract. A review of a subset of excluded articles confirmed these findings and support our rationale. We considered MEDLINE to be the most pertinent database. Rapid Reviews typically utilize a single electronic search engine that is likely to capture the most relevant information in an expedient fashion. As noted, in the Methods, we searched beyond MEDLINE to identify relevant evidence. Evaluating the quality of the statistical and other methodological approaches to psychometric assessments, and therefore the quality of their findings, was not set out as one of our goals for this systematic review. We report the CMP conditions present within the population for each study, and thus for the psychometric properties of interest assessed in that study. We attempted to synthesize results across CMP conditions as per the understood goals of this report. We have commented further on patterns in the CMP conditions of populations within which the most frequently studied measures were assessed. We have previously described our rationale for excluding non-English language versions of the measures. We modified the Methods section. Although the COSMIN checklist is an appropriate tool for the quality of studies of measurement properties, on further examination, the checklist (beyond identifying the appropriate measurement properties to evaluate) is extensive and not feasible to use in a Rapid Review. We reviewed Table 1 (table of measures) and Supplementary Content Table 3 to include the number of pain severity and number of pain interference items in each scale (as appropriate based on purpose of scale) Thank you for noting this. We cross-checked all tables and text for accuracy. We appreciate the reviewer’s point about the role of change over time in a measure used in an RCT. We also recognize that in such a situation, some forms of responsiveness assessment cannot separate the effect of an intervention from the ability of a measure to assess that effect. We considered the variety of approaches to responsiveness in assessed studies, and comment on this in the discussion (page 31). We have also attempted to clarify our approach to MID assessment, which focuses on whether studies developed an estimate of a minimum clinically important difference and/or minimum detectable change specific to a given measure. This question of primary MID development is distinct from questions about whether studies used, for example, prespecified MIDs and responder analyses as part of their approach to an RCT.
	This is overall a very well done review. The methods are clearly described and the conclusions are appropriate for the findings. I have one main concern, which is the time period examined, starting in 2000. I think this would be an ample look-back period. However, some of these measures are a bit older, and the early psychometric data may have been published before 2000. Even though the included articles from 2000 forward were scanned for references other relevant articles, this strategy could still miss relevant articles for measures that did not have any newer articles meeting inclusion criteria. If it is decided that the review will not search for articles prior to 2000, I think this limitation should be very clear and prominent, and the report should make it clear that the lack of finding psychometric data does not necessarily mean it isn’t there in earlier years, or that this finding means the measure is not recommended. I have some familiarity with the KOOS and was very surprised to see that there were no relevant articles, since there certainly are papers on its psychometric properties in the literature. I am not sure if the lack of finding is because these manuscripts were published before 2000. Some additional minor comments: - Page 2, Lines 18-19: This seems to be somewhat in conflict with the first sentence of the paragraph - Page 26, lines 32-34: This does not seem to be a complete sentence.	Thank you. As noted in responses above, we have taken multiple other steps to enhance the literature retrieval process. We appreciate the point about the inherent limitation in any date restrictions, and have commented on this in the Limitations section (page 31). The reviewer does not provide article references that we may have missed. We have reviewed and included (or excluded) all suggested references provided by peer reviewers. Several articles reporting on the KOOS were included in our full text review (eg, Roos 2003, Ornetti 2008) but were not eligible because they used non-English versions of the KOOS. We revised the Conclusions paragraph in the Abstract. We corrected this sentence.
	This manuscript gives an overview of the properties of a number of pain-related measures in persons with chronic pain. Though it does provide some summarization of the literature, I have some concerns about the methods as well as the conclusions. My main concern is that relevant studies were likely to have been excluded due to the way the inclusion/exclusion criteria were specified, failure to assess the quality of included studies, and unclear synthesis methods. I also feel that the conclusions--which are basically “we can’t make any conclusions” are rather superficial when there do appear to be some measures that are supported by more evidence/testing/validation than others. It is not clear to me why studies that used non-English language versions of the scales were excluded. This is a big chunk of the literature. As the main conclusion is that there isn’t enough evidence to know the properties of the scales, it is problematic to focus exclusively on English language versions of the scales when there is a lot of other data available. I don’t understand why studies of patients with chronic conditions typically associated with pain were excluded unless the study specified that the patient had CMP. Why else would these studies be using/assessing pain-related outcome measures? It isn’t clear to me what conditions were included. The “Key Question” section says “e.g. LBP, OA, and non-traumatic joint pain.” What about things like chronic neck pain, fibromyalgia, tension HA’s, shoulder pain, etc. I guess there may be some debate about whether FM and tension HA’s are “musculoskeletal” but I would generally consider them in that category. Also RA and the inflammatory arthritis conditions seem to have been excluded. If the focus is specifically on musculoskeletal pain that should be specified in the title--right now it talks about measures for pain in general. The methods indicate that studies were excluded unless they specifically note that duration of pain was >3 months. But OA is almost by definition a chronic condition so I don’t think that studies should have been excluded if they didn’t specify duration of symptoms. It is not clear why quality of studies was not assessed. This is not the same as the checklist on measurement properties that is mentioned in the Methods, which mainly seem to be about what kinds of properties should be evaluated to determine whether a measure if valid or not, not about internal validity of the studies themselves. The methods say that quality was not assessed because they did a “qualitative synthesis of findings” but I don’t see that as a valid reason--we assess quality all the time when we do quantitative syntheses. I don’t see how we are to assess the validity of the studies without some quality assessment. The Data Synthesis/Rating the Body of Evidence sections really don’t describe any methods. Saying that the methods were “qualitative” is not sufficient--we do qualitative syntheses all the time and account for the same kinds of things (quality, inconsistency, directness, precision, etc) as we do for quantitative syntheses. 1 get very little sense of whether the findings are reliable--the results mostly read like a laundry list of results. Also, there are no pre-specified criteria for interpreting the findings--e.g. what would be considered decent test-retest reliability, responsiveness, etc? What would be necessary to establish a MID? I think the conclusions are too quick to basically say that they can’t support any of the measures. There are clearly some measures that have been validated/tested more than others. The conclusions should do a more nuanced job of highlighting those measures that are supported by better evidence. It should be noted that the PEG scale is derived from the BPI (takes three items from the BPI). There is also a lot of other overlap between scales--e.g. the NRS or a VAS is included in a number of outcome measures and similar items regarding function have been incorporated into a number of scales. I think that the overlap between scales warrants some discussion. If several items or scales has been validated how much additional validation is required when they are incorporated into another scale?	Thank you for the feedback. We have modified the conclusion statements and address specific comments below. As noted above, the review team decided that non-English language versions of the scales of interest could potentially produce different results due to variations in interpretation of descriptors of subjective ratings for pain intensity and interference. We provide references and descriptions that highlight the limitations in extrapolating findings from non-English language versions though we recognize that our stakeholders and other researchers/clinicians/policy makers may wish to make decisions based on studies of more broadly defined populations and study settings, in particular information from studies: 1) using non-English language versions of scales; 2) evaluating patients with musculoskeletal conditions often associated with chronic pain but not specifying the presence or duration of pain; 3) pain related to conditions outside of chronic musculoskeletal disorders (eg, headache, cancer). As noted above, we clarified the study inclusion criteria. It was not sufficient for studies to include participants with a condition that is potentially associated with pain. This decision was based on the understanding that not all patients with a diagnosed condition such as radiologically defined osteoarthritis experience chronic pain. We agree that fibromyalgia should have been an included condition and thank the reviewer for noting this. We reviewed our excluded studies and, as described in the Methods, did a separate search for studies of patients with fibromyalgia. We excluded rheumatoid arthritis (an inflammatory condition), headache, and orofacial pain (with the exception of temporomandibular pain, a musculoskeletal condition). We modified the title to include musculoskeletal pain. We included studies with duration of pain ≥3 months or pain described as “chronic” by study authors. We did not automatically include studies on osteoarthritis as it is possible to have the condition without chronic pain. For example, some excluded studies included participants with radiologically defined osteoarthritis per a series of imaging reviews, which does not necessarily address the presence or chronicity of pain. We modified the Quality Assessment section of the report. We established inclusion criteria that would focus our review on studies that appropriately evaluated the psychometric properties of the pain measures. We did not go further into evaluating the quality of the articles since it is very difficult to evaluate the wide range of statistical approaches to assessing multiple psychometric attributes. A study that is good for some aspects could be poor for others. The extensive list of criteria set forth in the COSMIN checklist speaks to this difficulty in evaluating a large list of measures for multiple quality criteria and clinical contexts. We modified these sections of the report. Regarding criteria for interpreting findings, we describe some of the difficulties with establishing such across-the-board criteria in the Discussion. We provide particular attention to methods of developing and interpreting MID (the primary outcome, as approved by Topic Nominators). As previously noted, evaluating the quality of the statistical and other methodological approaches to psychometric assessments, and therefore the quality of their findings, was not set out as one of our goals in this systematic review. We appreciate the reviewer’s point, and have attempted to highlight measures that have been more frequently assessed with respect to psychometric properties of interest. Thank you for the clarification. We changed the wording on Supplemental Content Table 3 from “based on” to “derived from.” We appreciate the reviewer’s point, and have commented further on the conceptual overlap between some scales. We also comment on the wide variation in content among NRS-based and VAS-based approaches. The question of how much additional validation is required when items of an existing scale are adapted and incorporated into a new scale could inspire interesting debate within the field of psychometric methodology. To our knowledge, there are no concrete criteria addressing this question that we could operationalize in this review.
	Boonstra, Anne M. et al. “Cut-Off Points for Mild, Moderate, and Severe Pain on the Numeric Rating Scale for Pain in Patients with Chronic Musculoskeletal Pain: Variability and Influence of Sex and Catastrophizing.” Frontiers in Psychology 7 (2016): 1466. PMC. Web. 8 May 2017.	The suggested reference was reviewed for eligibility and did not meet all inclusion criteria (English language versions of scales). The scales were administered in Dutch.

Supplemental Table 3Characteristics of Included Pain Measurement Scales

Scale Reference	Measure Properties				Feasibility
Scale Reference	Year Developed	Developed for Specific Conditions (write in)	Pain Severity/Intensity or Functioning/Interference	Scoring (write in)	Number of Items	Scale Description	Restrictions on use: Yes, No	Reading Level (write in)
Brief Pain Inventory (BPI)¹	1983	Cancer pain	Pain intensity Interference (physical functioning, work mood, walking, social activity, relations with others, and sleep)	11-point numeric rating scale of 0-10, corresponding to: 0=no pain, no interference, to 10=pain as bad as you can imagine, complete interference Diagram also provided for respondents to indicate where pain is felt Mean of pain intensity and interference scores indexed separately	11 total (4 severity, 7 interference)	Range: none=0, mild= 1-3, moderate=4-6, severe=7-10 Direction: Hiqher indicates worse	Scale available for purchase with price dependent on use	NR
Defense & Veterans Pain Rating Scale²	2010	Pain among military and Veterans Designed to enhance NRS with visual cues and word descriptors to anchor pain	Pain intensity Interference (general activity, sleep, mood, stress)	11-point numeric rating scale of 0-10, corresponding to: 0=no pain, no interference, no affect, to 10=pain as bad as can be, completely interferes, completely affects	5 total (1 severity, 4 interference)	Range: Green (0-4)=mild pain or interference Yellow (5-6)= moderate pain or interference Red (7-10)= severe pain or interference Direction: Higher indicates worse	Free use of the scale is permitted without revisions or alterations	9^th grade reading level
Graded Chronic Pain Scale³^,⁴	1992	Chronic pain conditions including musculoskeletal pain and LBP	Pain Intensity Interference (disability)	11-point Likert-type scale of 0-10, corresponding to: 0=no pain to 10=pain as bad as can be. Mean intensity ratings multiplied by 10 calculated for 2 subscales ranging from 0-100 and 1 subscale ranging from 0-3 points Subscale scores for pain intensity and disability are combined to calculate chronic pain grade. Patients are then divided into 5 hierarchical categories: grade 0 (no pain) and 5 (high disability and severely limiting)	7 total	Range: Pain intensity= 0-100 Disability score= 0-100 (0-3 points) Disability pts (points from disability days + disability score) =0-6 Disability days 0-6 days=0 points 7-14 days=1 points 15-30 days=2 points 31+ days=3 points Disability score 0-29=0 points 30-49=1 points 50-69=2 points 70+=3 points Direction: Hiqher indicates worse	Free version of scale is available from original reference or directly from author	Basic
Hip Osteoarthritis Outcomes Scale (HOOS)⁵	2002	Hip disability with or without OA Extension of WOMAC scale	Pain intensity Interference (physical functioning)	5-pt Likert-type scale of 0-4, corresponding to: 0=no problems to 4=extreme problems All subscale scores transformed to a 0-100 scale with zero representing no hip problems and 100 representing extreme hip problems	40 total with 5 subscales pain, symptoms, daily living limitations, sport and recreation limitations, hip-related quality of life	Range: 0-100 Direction: Hiqher indicates worse	Free version of scale available online	NR
Knee Osteoarthritis Outcomes Scale (KOOS)⁶	1998	Knee injury or OA Extension of the WOMAC pain scale	Pain intensity Interference (physical functioning)	5-point Likert-type scale of 0-4, corresponding to: 0=no problems to 4=extreme problems All subscale scored as sum of items answered Scores are then transformed to a 0-100 scale with zero representing extreme knee problems and 100 representing no knee problems	42 total with 5 subscales: pain (9 items), symptoms (7 items), daily living limitations (17 items), sport and recreation limitations (5 items), knee-related quality of life (4 items)	Range: 0-100 Direction: Hiqher indicates better	Free version of scale available online	NR
McGill Pain Questionnaire⁷^,⁸	1970	General chronic pain	Pain Intensity and quality in multiple domains (eg, sensory, affective, evaluative)	Three classes of rank order-type words and a 5-point numeric rating scale MPQ scored by counting number of words selected to obtain a Number of Words Chosen score (0-20). PRI scores (0-78) based on rank order of words in each subclass Rank scores are summed in each subclass as well as overall Normative scored range from 24-50% of maximum score	78 total with 20 subscales (PRI) sensory=42, affective=14, evaluative=5, miscellaneous=17; 6 additional items (5 point score range) for the present pain intensity scale (PPI)	Range: Number of Words Chosen= 0-20 PRI= 0-78 PPI= 0-6 Direction: Hiqher indicates worse *No established cut points	Free version of scale available from author	Words may be defined by administrator
Multidimension-al Pain Inventory⁷^,⁹ (also known as the West Haven-Yale Multidimensional Pain Inventory [WHYMPI])	1985	Chronic pain including LBP and temporo-mandibular disorders	Pain intensity Interference (daily activities including vocational, social, and familial functioning)	7-point numeric rating scale of 0-6, corresponding to: 0=none, not at all, extremely low, never, to 6=extreme, very intense, very often Subscale scores are derived by from sum of individual terms in subscale divided buy number of items in subscale To calculate total score divide by the number of items	52 total with 3 parts interference=9, support=3, pain severity=3, life-control=2, affective distress=3, negative responses=4, solicitous responses=6, distracting responses=4, household chores=5, outdoor work=5, activities away from home=4, social activities=4	Range: Pain experience= (0-120) Significant others’ responses to communication of pain=(0-84) Participation in common daily activities= (0-108) Direction: Hiqher indicates worse	Free version of scale available from author	Words may be defined by administrator
Numeric Rating Scale for Pain (NRS)³^,⁸	NR	General chronic pain	Pain intensity Pain interference	11-point numeric rating scale of 0-10, corresponding to: 0=none to 10=severe Horizontal line commonly used	1 total	Range: none=0, mild= 1-3, moderate=4-6, severe=7-10 Direction: Hiaher indicates worse	Free version of scale available online	Basic
Oswestry Disability Index/Oswestry Low Back Pain Disability Questionnaire (ODI/ODQ)¹⁰	1980	Disability from acute and chronic LBP	Pain intensity (need for pain medications) Interference (physical functioning, disability)	6-point ordinal scale of 0-5, corresponding to: 0=no pain, no interference/disability, to 5=worst scenario of pain, interference/disability Scoring for each item increases from 0-5 Missing values omitted. Sum of scores divided by total possible scores to obtain percentage	10 total with 2 possible subscales; pain or need for pain medication=1 item, interference on daily activities=9 items	Range: Minimal disability=0-20 Moderate disability=20-40 Severe disability=40-60 Housebound-60-80 Bedbound=80-100 Direction: Hiqher indicates worse (disability)	Free use of scale permitted for non-funded academic research and individual clinical practice	NR
Patient Global Impression of Change¹¹	NR	NR	NR	7-point categorical scale of 1-7, which corresponds to: 1=no change in condition to 7= a great deal better	1 total	Range: 1-7 Direction: Hiqher indicates better	Free version of scale available online	N/A
PEG¹²	2008	Chronic pain in primary care Derived from the BPI	Pain intensity Interference (physical functioning)	11-point numeric rating scale of 0-10, corresponding to: 0=no pain, no interference, to 10=pain as bad as you can imagine, completely interferes The PEG is scored by averaging the 3 individual item scores	3 total	Range: 0-10 Direction: Higher indicates worse	Free use of the scale is permitted	NR
PROMIS Pain Interference (PROMIS-PI)¹³^,¹⁴	2004	General chronic pain conditions	Interference (physical functioning)	5-point numeric rating scale of 1-5, corresponding to: 1=not at all, never, to 5=very much, always, every few hours Sum response scores for questions that were answered. Multiply sum by total number of items in form then divide by number of items answered	41 total 4a, 6a, 6b, and 8a item short version SFs often used	Range:4a=4-20 6a=6-30 6b=6-30 8a=8-40 Direction: Higher indicates worse	Free use of the scale is permitted after registration with assessment center and endorsing terms and conditions of use	NR
Roland Morris Disability Questionnaire (RMDQ)¹⁵	1983	Disability from LBP	Interference (physical functioning, disability)	1 point for each item completed	24 total	Range: 0=no disability to 24=severe disability Direction: Higher indicates worse (disability)	Free version of scale available and in the public domain	Basic
SF-36 Bodily Pain Scale (SF-36 BPS)³^,¹⁶	1996	Overall health status in ages ≥14	Pain intensity Interference (daily activities)	6-point pain severity rating where 1=none and 6=very severe; 5-point pain interference rating where 1=not at all and 5=extremely Responses transformed to a 0-100 point scale.	2 total	Range: 0-100 Direction: Hiqher indicates more favorable health state.	Scale available for purchase with price dependent on use	NR
Visual Analogue Scale for Pain (VAS)⁷^,¹⁷	1952	Rheumatic diseases	Pain intensity Interference (disability)	One vertical line (usually 10cm or 100 mm) in length anchored with verbal descriptors of “no pain” to “pain as bad as it could be”. Perpendicular lines placed at point that best indicates pain. Metric ruler placed along line to indicate score in mm or cm	1 total	Range: 0-10 cm or 0-100 mm Direction: Hiqher indicates worse Scores below 4 cm or 20 mm considered desirable for chronic pain management	Free version of scale available and in the public domain	NR
Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC)¹⁸	1982	OA (knee and hip)	Pain intensity Interference (physical functioning)	5-point Likert-type scale of 0-4, corresponding to: 0=none to 4=extreme 100mm Visual Analog version uses anchors of no pain/ stiffness/ difficulty and extreme pain/ stiffness/ difficulty	24 total with 3 subscales pain=5, interference (functioning)=17, stiffness=2	Range: Pain 0-20 Function 0-68 Stiffness 0-8 Direction: Hiqher indicates worse	Scale available for purchase with price dependent on use	NR
Wong Faces Scale¹⁹	1985	Pain among children	Pain intensity	6-point numeric rating scale of 0-10 (increasing by 2), which corresponding to: 0=no pain to 10=hurts worst Person chooses the face that best describes their pain	1 total	Range: No pain=0 hurts little bit=2 hurts little more=4 hurts even more=6 hurts whole lot=8 hurts worst=10 Direction: Hiqher indicates worse pain	Scale available for purchase with price dependent on use	NR

: OA=Osteoarthritis; NR=not reported, PPI=present pain intensity, PRI=Pain rating index, LBP= low back pain, CAT= computer adaptive test, N/A= not applicable

Supplemental Table 4Study Characteristics

Study (name and year)/Location/Funding	Scale of Interest/Others	Mode of Administration	Setting^a	Condition/Study Inclusion/Exclusion Criteria	Baseline Pain Characteristics	Demographics
Anagnostis 2004²⁰ Location: United States Funding: Government	ODI	SAQ, written	Community treatment clinic	Condition: CDMD (current) Inclusion: Enrolled in chronic pain management course; ≥ 4 months partial or total disability since work related injury; ≥ 1 injury related to spine or extremities, failed response to primary or secondary non-operative care or surgery; severe functional limitations; English or Spanish speaking Exclusion: NR	Baseline pain score(s): NR Average intensity: NR	N=230 Age (mean, SD): 43.3, δ 9.4 Women (%): 53 Race/Ethnicity (%): White: 59.7 African-Amer./Black: 29.2 Hispanic: 11 Other: 0.1
Askew 2016²¹ Location: United States Funding: Government	PROMIS-PI	SAQ, written	Spine center, local clinics	Condition: LBP Inclusion: Receiving, or about to receive; a spinal injection Exclusion: NR	Baseline pain score(s): NRS= 78% scored ≥8, range 0-10 Average intensity: NR	N=218 Age: 62% ≥ 50 years Women (%): 56 Race/Ethnicity (%): White: 84 African-Amer./Black: 4.1 Hispanic: 1.3 Other: 10.6
Burnham 2012²² Location: Canada Funding: NS	MPQ ODI	SAQ, written	Chronic pain management clinic	Condition: Spine pain Inclusion: Attending a chronic pain management clinic; received a lumbopelvic spinal intervention Exclusion: NR	Baseline pain score(s): NR Average intensity: NR	N=60 Age (mean, SD): 60, δ 12.4 Women (%): 67 Race/Ethnicity (%):NR
Changulani 2009²³ Location: United Kingdom Funding: NS	ODI VAS	SAQ, written	Outpatient clinic	Condition: LBP Inclusion: Undergoing caudal epidural steroid injections for lumbosacral radicular pain with symptoms persisting for more than 4 weeks; unrelieved by analgesia and physiotherapy Exclusion: NR	Baseline pain score(s): ODI Spinal stenosis= 48 (δ15); Disc prolapse=50 (δ16); Spondylolisthesis=4 1 (δ15) Average intensity: NR Type of pain (%): Spinal stenosis=59 Disc prolapse=36 Spondylolisthesis=5	N=107 Age (mean): 58 Women (%): 58 Race/Ethnicity (%): NR
Chansirinukor 2005²⁴ Location: Australia Funding: None	RMDQ	SAQ, written	Physical therapy clinic	Condition: LBP Inclusion: Work-related pain, at least 2 complete Functional Rating Indexes and RMDQs Exclusion: NR	Baseline pain score(s): RMDQ= 57.2 (δ23.7) Average intensity: NR Type of pain (%): LBP=78.3	N=143 Age (mean, SD): 37.9, δ 9.8 Women (%): 26.4 Race/Ethnicity (%):NR
Chien 2013²⁵ Location: Australia Funding: Academic	BPI	SAQ, written	Pain clinic	Condition: General musculoskeletal pain Inclusion: Age ≥18 years; nonmalignant pain Exclusion: Cancer-related pain	Baseline pain score(s): BPI (S) =6.0 (δ 1.6); BPI (I) =5.9 (δ 1.9) Average intensity: Moderate	N=254 Age (mean): 51 Women (%): 50 Race/Ethnicity (%): NR
Cook 2008²⁶ Location: United States Funding: Government	RMDQ (24-, modified, 18-, 12-, and 11-item)	SAQ, written and CATs	NR	Condition: LBP Inclusion: Study 1 (Discogenic study) participants had 1- or 2-level disc degeneration Study 2 (Seattle Lumbar Imaging Project) participants randomly assigned to rapid magnetic resonance imaging or standard radiographs Exclusion: NR	Baseline pain score(s): NR Average intensity: NR	*Data combined from 2 studies N=875 Age (mean, range): 47, 18-93 Women (%):NR Race/Ethnicity (%): White: 85 African-Amer./Black: 9 Hispanic: 3 Asian: 2 Other: 1
de Vet 2007²⁷ Location: Netherlands Funding: NS	NRS	SAQ, written	Physiotherapy clinics	Condition: LBP Inclusion: Referred for physiotherapy Exclusion: NR	Baseline pain score(s): NR Average intensity: NR	N=438 Age (mean, range): NR Women (%): NR Race/Ethnicity (%): NR
Deyo 2016²⁸ Location: United States Funding: Government	PROMIS-PI SF	SAQ, Telephone interview	Primary care clinics	Condition: General musculoskeletal pain Inclusion: Age ≥55 years; ≥ 2 visits for musculoskeletal pain; moderate pain (≥ 5 points on 10-point pain scale); no opioid use for ≥ 1 month; telephone access; no cognitive impairments Exclusion: Adverse reaction to opioids; life expectancy <2 years	Baseline pain score(s): NR Average intensity: ≥5, 10-point scale Type of pain (%): Back=30.8 Neck=7.5 Joint=14.1 Arthritis=15.6 Other=31.8	N=198 Age (mean, SD): 66.5, δ 8.2 Women (%): 62.1 Race/Ethnicity (%): White: 92.3 Hispanic: 3.6 Other: 4.1
Driban2015²⁹ Location: United States Funding: Government	PROMIS-PI SF-36 BPS WOMAC	SAQ, written	University hospital	Condition: Pain from OA of the knee Inclusion: Participation in RCT (comparison of Tai Chi and physical therapy); age ≥40 years, WOMAC pain subscale score (100 mm visual analog scales) >40 on at least 1 out of 5 questions; fulfillment of the American College of Rheumatology criteria for knee osteoarthritis; radiographic evidence of knee osteoarthritis; confirmation of knee pain, discomfort; or disability by clinical examination Exclusion: Experience with physical therapy in past year, Tai Chi training/ use of alternative medicine; serious medical conditions limiting ability to participate, intraarticular steroid injections or replacement surgery on the affected knee in the last 3 months; or a Mini-Mental examination score <24	Baseline pain score(s): PROMIS-Pl= 58 (δ 7.0); SF-36 BPS= 47.5 (δ 18.6); WOMAC= 254 (δ 98.6) Average intensity: NR	N=204 Age (mean): 60.2 Women (%): 70 Race/Ethnicity (%): White: 52.7 African-Amer./Black: 35.5 Other: 11.8
Fisher 1997³⁰ Location: United Kingdom Funding: NS	MPQ ODI	SAQ, written	Clinical Psychology Department, outpatient clinic	Condition: LBP Inclusion: Undergoing, or about to undergo, a back pain rehabilitation program Exclusion: NR	Baseline pain score(s): ODI= 54.5 (δ12.3); MPQ =2.8 (δ1.1) Average intensity: NR Type of pain (%): Back=87 Leg or neck=13	N=54 Age (mean, range): 41, 20-62 Women (%): 63 Race/Ethnicity (%): NR
Gallasch 2007³¹ Location: Brazil Funding: Government	Faces NRS VAS	SAQ, written	University health center	Condition: General musculoskeletal pain Inclusion: Physiotherapy treatment due to musculoskeletal symptoms, age 18 to 70 years; education no more than middle school level Exclusion: Illiterate	Baseline pain score(s): NR Average intensity: NR Type of pain (%): OA=19 Tendonitis=16 Back=13	N=32 Age (mean, range): 51, 33-69 Women (%): NR Race/Ethnicity (%): NR
Gentelle-Bonnassies 2000³² Location: France Funding: NS	VAS WOMAC	SAQ, written and mail	Hospital	Condition: Pain from OA of the knee Inclusion: OA fulfilling the criteria of the American College of Rheumatology; primary or secondary OA (osteonecrosis; chondro-calcinosis); involvement of the medial tibiofemoral; the lateral tibiofemoral, or the patellofemoral compartment of the knee joint; active disease (pain and disability) justifying joint lavage Exclusion: Serious chronic disease; intra-articular procedures (arthroscopy or surgery) performed ≤ 2 years or osteotomy performed ≤ 3 years; prescription of intra-articular injections ≤ 1 month before entry	Baseline pain score(s): VAS=57 (δ22) (pain after activity) Average intensity: NR	N=80 Age (mean, SD): 62, δ 12 Women (%): 70 Race/Ethnicity (%): NR
Godil 2015³³ Location: United States Funding: Industry	NRS - neck pain NRS - arm pain	Telephone interview	Medical center	Condition: Neck and radicular arm pain Inclusion: Age 18-70 years; undergoing anterior cervical discectomy and fusion for neck and radicular arm pain; radiological evidence of cervical nerve root impingement from herniated disc or osteophyte Exclusion: Myelopathic symptoms; previous cervical spine surgery	Baseline pain score(s): NRS-neck pain=6.3 (δ2.6); NRS-arm pain= 5.5 (δ3) Average intensity: NR	N=88 Age (mean, SD): 52.3, δ 10.7 Women (%): 44 Race/Ethnicity (%): NR
Gronblad 1993³⁴ Location: Finland Funding: Foundation	ODQ VAS	SAQ	Tertiary care center	Condition: LBP Inclusion: With or without radiation to legs Exclusion: Pain due to underlying disease, psychiatric disease requiring continuous medication	Baseline pain score(s): NR Average intensity: Among subset VAS=54.1 (δ19.48)	N=94 Age (mean): 42.7 Women (%): 51 Race/Ethnicity (%): NR N=20 (re-test) Age (mean, SD): 42.3 Women (%): 55 Race/Ethnicity (%): NR
Hicks 2009³⁵ Location: USA Funding: Government	ODI SF-36 BPS	SAQ, mail	Retirement communities	Condition: LBP requiring activity modification Inclusion: Age ≥ 62 years, living independently Exclusion: NR	Baseline pain score(s): ODI=29.4 (δ16.6) Average intensity: NR	N=107 (validity) Age (mean): 80 Women (%): 72 Race/Ethnicity (%): White: 100 N=56 (re-test) Age (mean, range): 79 Women (%): 71 Race/Ethnicity (%): White: 100
Jensen 2012³⁶ Location: United States Funding: Industry	VAS	SAQ, written	Clinic	Condition: LBP Inclusion: Participation in an RCT (comparison of Etoricoxib and placebo); age 18-75 years; pain for majority previous month; taking NSAID or acetaminophen for 24 of previous 30 days; pain met Quebec Task Force criteria for spinal disorders (class 1 or 2); no surgery for LBP in past 6 months; no symptomatic depression or drug/alcohol abuse in past 5 years; no opioids > 4 days/month; no corticosteroid injections within 3 months Exclusion: NR	Baseline pain score(s): VAS= 76.7; RMDQ= 14.7 Average intensity: NR	N=639 Age (mean): 52.4 Women (%): 61.5 Race/Ethnicity (%): White: 90.1 African-Amer./Black: 5.1 Asian: 0.6 Other: 4.2
Kamper 2015³⁷ Location: Australia Funding: Government, Industry	NRS-24 hours NRS-week SF-36 BPS	SAQ (See Stewart 2007)	Clinic (See Stewart 2007)	Condition: Whiplash associated disorders (neck pain) Inclusion: Participation in RCT (exercise therapy for chronic whiplash); pain from car accident; age 18-65 years; English speaking Exclusion: Cervical fractures or dislocations; serious spinal pathology; serious psychiatric illness	Baseline pain score(s): NR Average intensity: NR	N=280 Age (mean): 43.5 Women (%): 65 Race/Ethnicity (%): NR
Kean 2016³⁸ Location: United States (Enrolled Veterans) Funding: Government	BPI PEG PROMIS-PI SF-36 BPS	IAQ	Primary care clinic	Condition: Musculoskeletal pain Inclusion: Participation in RCT (effectiveness of collaborative telecare management for moderate to severe and persistent musculoskeletal pain); Veteran; age 18-65 years; receiving care at a VAMC; persistent pain despite trying ≥ 1 analgesic medication; other non-musculoskeletal pain; English speaker; pain of moderate severity Exclusion: Inflammatory arthritis; pending pain-related disability claim; cognitive impairment; psychoses; actively suicidal; current illicit drug use; life expectancy < 12 months	Baseline pain score(s): BPI= 5.3 (δ 1.8); SF-36 BPS= 34.8 (δ 16.8); PROMIS-PI=22.1 (δ 8.8) Average intensity: Moderate	N=244 Age (mean, range): 55, 28-65 Women (%): 17 Race/Ethnicity (%): White: 77 African-Amer./Black: 19 Other: 4
Keller 2004³⁹ Location: United States Funding: Government	BPI-SF GCPS SF-36 BP RMDQ	SAQ	Primary care clinic	Condition: LBP Inclusion: Age 18-80 years, not permanently disabled, at least 8^th grade reading level, prescribed change of therapy requiring follow-up visit Exclusion: NR	Baseline pain score(s): NR Average intensity: NR	N=131 Age (mean): 46.5 Women (%): NR Race/Ethnicity (%): NR
Kerns 1985⁹ Location: United States (Enrolled Veterans) Funding: Government	WHYMPI (MPI) MPQ	SAQ	Pain clinic	Condition: Chronic pain Inclusion: consecutive referrals to pain management program at 2 VAMCs Exclusion: NR	Baseline pain score(s): NR Average intensity: NR Type of pain (%): Back=36	N=120 (test-retest reliability for n=60 from one site) Age (mean, SD): 51, δ 14.5 Women (%): 18.5 Race/Ethnicity (%): NR
Krebs 2010⁴⁰ Location: United States (Enrolled Veterans) Funding: Government	BPI GCPS PEG RMDQ SF-36 BPS	SAQ, written See Scamp papers^b	See Scamp papers	Condition: Back, hip, or knee pain Inclusion: Participation in SCAMP study^b; Veteran; primary care patients, receiving care at a VAMC; persistent pain of at least moderate severity [BPI≥5]) (See SCAMP Study papers) Exclusion: NR	Baseline pain score(s): BPI (S)= 5.7; BPI (I)= 5.8; PEG=6.0;GCPS= 68.3; RMDQ=14.8; SF-36 BPS=35.3 Average intensity: NR Type of pain (%): Back=55 Hip or knee=45	N=427 Age (mean): 59 Women (%): 53.4 Race/Ethnicity (%): White: 58 African-Amer./Black: 38 Other: 4
Krebs 2009¹² Location: United States (Enrolled Veterans) Funding: Government	BPI GCPS PEG PGIC RMDQ SF-36 BPS	SAQ, written See Scamp papers^b	University and VA affiliated clinics	Condition: Back, hip, or knee pain Inclusion: Participation in SCAMP study^b; Veteran; primary care patients, receiving care at a VAMC; persistent pain of at least moderate severity [BPI≥5]) (See SCAMP Study papers) Exclusion: NR	Baseline pain score(s): (Mean) NRS = 6.1 (δ1.9), 0-10 scale Average intensity: NR	N=500 Age (mean): 59 Women (%): 52 Race/Ethnicity (%): White: 58 African-Amer./Black: 38 Other: 4
Krebs 2007⁴¹ Location: United States Funding: Foundation, Government	NRS	IAQ	Hospital	Condition: General musculoskeletal pain (including extremities, back, and neck) Inclusion: Adults presenting to general medicine clinic for a return visit Excluded: Non-English speaking; patients chosen by physicians	Baseline pain score(s): NRS=6.0 (among those with NRS ≥1), 0-10 scale Average intensity: NR Type of pain (%): Lower extremity= 21 Back or neck= 18 Upper extremity= 8 No pain= 28	N=275 Age (mean): 59 Women (%): 59 Race/Ethnicity (%): White: 70 African-Amer./Black: 24 Other: 6
Lovejoy 2012⁴² Location: United States (Enrolled Veterans) Funding: Government, Industry	MPQ-2 SF MPQ MPI	SAQ, written	Unclear	Condition: LBP, neck or joint pain Inclusion: Veterans; age ≥18 years; English speaking; ≥1 pain diagnosis in medical record; reported current symptoms of (or receiving treatment for) chronic pain, previous tests for hepatitis C Exclusion: Age >70 years; current unstable psychiatric disorder; pending litigation or disability compensation for pain; advanced liver disease	Baseline pain score: MPQ-2 SF= 3.22 (δ2.36), range 0-9.82 Average intensity: NR Type of pain (%): Neck orjoint=76 Back=59	N=186 Age (mean, SD): 54.4, δ 7.7 Women (%): 7.5 Race/Ethnicity (%): White: 75.3 Other: 14.7
Lund 2005⁴³ Location: Sweden Funding: NS	VAS	SAQ, written	Unclear	Condition: Idiopathic musculoskeletal pain Inclusion: Recruited from rehabilitation medicine clinic; previously classified as chronic/idiopathic pain by physician Exclusion: NR	Baseline pain score(s): (Median) VAS=59, range12-96 Average intensity: NR	N=30 Age (mean, SD): 42.8, δ 10.6 Women (%): 43 Race/Ethnicity (%): NR
Macedo 2011⁴⁴ Location: Australia Funding: Government	RMDQ (24-, 18^c^,^d-, and 11- item)	SAQ, written	Unclear	Condition: LBP Inclusion: Patients with or without leg pain, age 18-80 years Exclusion: Previous spinal surgery; specific pathology; contraindication to exercise; insufficient English ability to complete questionnaires	Baseline pain score(s): (Mean) RMDQ 24=12.8 (δ 5.1);18^c= 10.6 (δ 4.6); 18^d = 10.8 (δ 4.4); 11=7.3 (5 2.9) Average intensity: NR	N=461 Age (mean, SD): 52.5, δ 14 Women (%): 61 Race/Ethnicity (%): NR
Maughan 2010⁴⁵ Location: United Kingdom Funding: NS	NRS ODI-2 RMDQ	SAQ, written	Pain management back class	Condition: LBP Inclusion: Age ≥ 18 years, not undergoing treatment for pain, sufficient level of spoken and written English Exclusion: Spinal surgery in past 12 months, unstable neurological symptoms, pregnancy	Baseline pain score(s): NRS= 5 (δ2.6), RMDQ= 11 (δ6.1), ODI= 29 (δ20) Average intensity: NR	N=48 Age (mean): 52 Women (%): 67 Race/Ethnicity (%): NR
Merriwether 2016⁴⁶ Location: United States Funding: Government	PROMIS- Pl-SF	SAQ	University outpatient clinics	Condition: Fibromyalgia Inclusion: Women, ages 20-67 years, English speaking, stable medical treatment regime Exclusion: Prior transcutaneous electrical nerve stimulation use in last 5 years, pain intensity less than 4 out of 10 on the NRS	Baseline pain score: NR Average intensity: NR	N=106 Age (mean): 49.1 Women (%): 100 Race/Ethnicity (%): White: 96 Other: 4
Mikai 1993⁴⁷ Location: Canada Funding: Foundation	MPI ODI	SAQ	Pain clinic	Condition: Chronic pain Inclusion: Patients seen at Chronic Pain clinic; diagnosis of chronic pain by physiatrist, psychologist, and physiotherapist Exclusion: NR	Baseline pain score: NR Average intensity: NR Type of pain (%): Neck=6, Back=43, Extremities=18, Multiple=25, Other=8	N=315 Age (mean): 43.5 Women (%): 53 Race/Ethnicity (%): NR
Nilsdotter 2003⁴⁸ Location: Sweden Funding: NS	HOOS SF-36 BPS WOMAC	SAQ, written	Clinic	Condition: OA of the hip Inclusion: Assigned total hip replacement; completed follow-up Exclusion: NR	Baseline pain score(s): NR Average intensity: NR	N=62 Age (mean, range): 72.8, 53-85 Women (%): 45 Race/Ethnicity (%): NR
Parker 2012⁴⁹ Location: United States Funding: NS	ODI VAS-back pain	SAQ, written	University medical center	Condition: Symptomatic pseudoarthrosis, mechanical LBP Inclusion: Patients undergoing revision-instrumented fusion; age 18-70 years; prior lumbar instrumented fusion; failed to complete at least 3 months of non-operative care Exclusion: Extra-spinal cause of back pain; trauma, infection, or neoplasm; previous lumbar revision surgery for other causes	Baseline pain score(s): (Mean) VAS-back pain=7.3mm (δ 0.8mm); ODI= 59.4% (δ 10.8%) Average intensity: NR	N=47 Age (mean, SD): 54.5, δ 10.5 Women (%): 64 Race/Ethnicity (%): NR
Pinsker 2015⁵⁰ Location: Canada Funding: Foundation	NRS WOMAC	SAQ, mail	Patients home	Condition: Ankle arthroplasty or arthrodesis Inclusion: Age ≥ 18 years; able to complete survey in English; end-stage ankle arthritis (pre- or post-operative); surgical patients ≥ 6 months post-surgery Exclusion: NR	Baseline pain score(s): NRS pre-op=6.6, range 2-10; NRS post-op= 4.0, range 0-10; WOMAC (overall)=51.4, range 0-95.2 Average intensity: NR Type of pain (%): Arthroplasty=60 Arthrodesis=10 Pre-operative=30	N=142 N=124 (test/retest) Age (mean, range): 61.2, 22-92 Women (%): 54 Race/Ethnicity (%): NR
Scott 2015⁵¹ Location: United Kingdom Funding: International Association	PGIC	SAQ, written	Pain treatment center	Condition: Chronic pain Inclusion: Significant levels of distress and disability Exclusion: Incomplete data	Baseline pain score(s): NR Average intensity: NR Type of Pain (%): Low back/spine=43.8 Upper shoulder=7.80 Lower limbs=13.30 Other=35.1	N=476 Age (mean, SD): 46.2, δ 11.2 Women (%): 66.8 Race/Ethnicity (%): White: 71.9 African-Amer./Black: 16.6 Asian: 7.1 Other: 4.4
Sindhu 2011⁵² Location: United States Funding: NS	NRS VAS-digital VAS	VAS, written and digital NRS, verbal	Hand therapy clinics	Condition: Unilateral musculoskeletal disorder or injury to elbow, forearm, or hand Inclusion: Age 18-65 years; recruited from hand therapy clinics Exclusion: Verbally reported pain intensity > 7 (1-10); unable to perform grip test	Baseline pain score(s): NR Average intensity: (Mean) NRS=<2; VAS=<2mm	N=33 Age (Mean, SD): 39, δ 12.3 Women (%): 48% Race/Ethnicity (%): NR
Stewart 2007⁵³ Location: Australia Funding: Government	NRS SF-36 BPS	SAQ, written Baseline; follow up after completion of 6-week treatment period	New-South Wales community, physiotherapy clinics	Condition: Whiplash associated disorders (neck pain) Inclusion: Patients enrolled in RCT (effects of exercise and advice to exercise alone); Motor Accident Authority claimants seeking medical care for whiplash associated disorder (Grades I to III) within 1 month of accident; reported at least “mild” disability compared to pre-injury; significant pain or disability indicated by score of at least 20% on NRS scales or Patient-Specific Functional Scale Exclusion: Previous neck surgery, nerve root compromise, current physical therapy neck treatment	Baseline pain score(s): NR Average intensity: NR	N= 132 Age (mean, SD): 43, δ 14.7 Women (%): 67 Race/Ethnicity (%): NR
Stroud 2004⁵⁴ Location: United States Funding: NS	RMDQ (24-, 18-, and 11-item)	SAQ, written	University pain treatment center	Condition: Chronic pain Inclusion: Available RMDQ scale data Exclusion: NR	Baseline pain score(s): NR Average intensity: NR Type of pain (%): Lower back=36.2 Lower extremities=14.1 Head=12.5 Shoulder and arms=9.8 Upper back=4.8 Other=22.6	N=993 Age (mean, SD): 43.5, δ 12.6 Women (%): 57 Race/Ethnicity (%): White: 84.4 African-Amer./Black: 3 Asian: 2.3 Hispanic: 3.9 Native American: 3.7 Other: 2.7
Tan 2004⁵⁵ Location: United States (Enrolled Veterans) Funding: NS	BPI RMDQ	SAQ, written Baseline; follow up assessments on subsequent visits	VA chronic pain center	Condition: Chronic pain Inclusion: Completed BPI before initial visit; referred to chronic pain center Exclusion: NR	Baseline pain score(s): NR Average intensity: NR Type of pain (%): Multiple sites (including back)=50 Back only=28	N=440 Age (mean, SD): 54.9, δ 21-85 Women (%): 8.2 Race/Ethnicity (%): White: 72.3 African-Amer./Black: 21.2 Other: 6.5
Tong 2006⁵⁶ Location: United States Funding: Academic institution	VAS	SAQ, written Baseline; follow up 2^nd, 3^rd, and 4^th visits	University spine care center	Condition: LBP Inclusion: Referred for physical therapy at University spine care facility Exclusion: NR	Baseline pain score(s): (Mean) VAS=5.2mm (δ 2.1mm) Average intensity: NR	N=52 Age (mean, SD): 41.1, δ 12.6 Women (%): 61.5 Race/Ethnicity (%): White: 88 African-Amer./Black: 3 Asian: 3 Other: 6
Trudeau 2015⁵⁷ Location: United States Funding: Industry	NRS-Now NRS- 24 hours NRS-1 week WOMAC-48 hours	SAQ, written and digital NRS-Now reported 4 times daily, NRS-24 hours reported daily, NRS-1 week reported weekly, WOMAC-48 hours reported every 48 hours	Unclear	Condition: Pain from OA of the knee Inclusion: Age ≥ 21 years; diagnoses of functional classes 1-3 of knee OA; pain intensity on NRS ≤ 6; able to withdraw from OA medications; ≤ 10 on hospital anxiety and depression scale, Exclusion: History of major depressive disorders not controlled with medication	Baseline pain score(s): Treatment-placebo: NRS-24 =3.7 (δ1.22); WOMAC-48=8.6 (2.69); NRS-1 wk =4.2 (δ1.61) Placebo-treatment: NRS-24 =3.9 (δ1.32) WOMAC-48=8.7 (δ2.97); NRS-1 wk =4.4 (δ1.33) Average intensity: NR	N=47 Age (mean, SD): NR Women (%): NR Race/ethnicity (%): NR
Van der Roer, 2006⁵⁸ Location: Netherlands Funding: NS	NRS	SAQ, written Baseline; follow up 6, 12, 26, and 52 weeks	Physiotherapy clinics	Condition: LBP Inclusion: Participants in an RCT (comparison of physiotherapy strategies); referred to physiotherapy treatment by physician for new episode of pain. Exclusion: Pregnant; unable to give consent	Baseline pain score(s): NRS (by GPE category) Improved= 6.0 (δ 2.1); Unchanged= 6.4 (δ1.8) Average intensity: NR	N=138 Age (mean, SD): 44.0, δ 13.4 Women (%): 58.7 Race/ethnicity (%): NR
van Grootel 2007⁵⁹ Location: Netherlands Funding: International academic research institute	VAS	SAQ, written Baseline (pre-treatment); follow up post-treatment	University medical center	Condition: Myogenous temporomandibular disorders (TMD) Inclusion: Pain and tenderness of the mastication muscles; restricted mandibular opening of 3 months duration or longer; age 18-65 years Exclusion: Clinical and/or radiographic evidence of organic TMJ changes; recent TMD treatment (<1 year); other pain treatment; evidence of serious psychopathology (psychotherapy and/or psychomedication, recent dramatic life events)	Baseline pain score(s): (Mean) VAS=40mm (δ 22.3mm) Average intensity: NR	N=118 Age (mean, range): 31.6, 18-65 Women (%): 93 Race/Ethnicity (%): NR
Wittink 2004⁶⁰ Location: United States Funding: Government, Industry	MPI ODI SF-36 BPS	SAQ, written Baseline; follow up after 3 visits	Medical center pain program	Condition: Chronic pain Inclusion: More than 3 visits to medical center; referred to pain program	Baseline pain score(s): NR Average intensity: NR Type of Pain (%): Back=52.9 Neck=21.8 Myofascial=19.5	N=87 Age (mean): 46.9 Women (%): 66.5 Race/Ethnicity (%): White: 79.3 Other: 20.7

: δ=standard deviation

: (I)=interference; (S)=severity; BPI=Brief Pain Inventory; CAT= computer adaptive testing (subsequent questions depend on previous response); CDMD= chronic disabling musculoskeletal disorders; CPG=Chronic Pain Grade Questionnaire; IAQ=interview administered questionnaire; LBP=low back pain; MPQ=McGill Pain Questionnaire; MPI=Multidimensional Pain Inventory; MS=multiple sclerosis; NRS=numeric rating scale; NR=not reported; NS=not specified; OA=osteoarthritis arthritis; ODI=Oswestry Disability Index (also known as Oswestry Low Back Pain Disability Questionnaire); PEG=items assess average pain intensity (P), interference with enjoyment of life (E), and interference with general activity (G); PGA=patient global assessment; PGART=patient-rated global assessment of response to therapy; PRGC=patient-reported Global Change; PROMIS-PI=Patient-Reported Outcomes Measurement Information System-Pain Interference; SAQ=self-administered questionnaire; SCAMP=Stepped Care for Affective Disorders and Musculoskeletal Pain; SF-36 BPS=Medical Outcomes Study Short Form-36 Bodily Pain Scale; SF= Short form; RCT=randomized controlled trial; RMDQ=Roland-Morris Disability Questionnaire; VAMC=Veterans Affairs Medical Center; WHYMPI=West Haven-Yale Multidimensional Pain Inventory (see also MPI); WOMAC=Western Ontario and McMaster Universities Arthritis Index

a: Primary care, pain clinic, etcetera

b: SCAMP study included RCT of combined depression medication and pain self-management vs usual care in patients with depression of at least moderate severity and observational study in patients with absence of clinical depression; responsive results analyzed separately

c: William and Myers Version

d: Stratford and Binkley Version

Supplemental Table 5Outcomes Reported

Author Year/Scale (range)/Mean time between surveys	OUTCOMES REPORTED
Author Year/Scale (range)/Mean time between surveys	Minimally Important Difference (Describe)	Test-Retest Reliability	Inter-Rater Reliability	Concurrent and/or Criterion Validity	Discriminant and/or Predictive Validity	Responsiveness (Describe)	Other Outcomes
Anagnostis 2004²⁰ Scale(s): ODI (0-100) Time: Varied (upon completion of program)						Responsiveness assessed by effect size P<.001 ES=0.95 Mean pre-post treatment change = 14.8, δ 15.6
Askew 2016²¹ Scale(s): PROMIS-PI (0-66) Time: 12 weeks						Responsiveness assessed using SRMs for PROMIS-PI scores SRM scores ≥0.30 indicated responsiveness Change by “general health” anchor Better=−0.94, δ 7.96 Same=−0.58, δ 7.97 Worse=−0.47, δ 7.18 Change by “pain” anchor Better=−1.09, δ 7.43 Same=−0.26, δ 6.33 Worse=0.44, δ 4.95
Burnham 2012²² Scale(s): MPQ (short form) (pain 0-6), ODI (0-100) Time: 2 weeks (corticosteroid injection); 6-8 weeks (radiofrequency neurotomy or TransDiscal Biacuplasty)		Pearson’s correlation coefficient between mean change 1 month before intervention and day of intervention MPQ r=0.88 (95%CI 0.72, 0.95) ODI r=0.89 (95%CI 0.75, 0.95)				Pre-post treatment responsiveness ratios (RR) (significant RR values >1.96) MPQ RR=1.9 ODI RR=2.3
Changulani 2009²³ Scale(s): ODI (0-100%), VAS (domain not reported) Time: 6 weeks				Pearson’s correlation coefficient between mean change in ODI scores and mean change in VAS scores r=0.44 (P<.05)		Based on: ES=1.05 Measured by SRM=0.84
Chansirinukor 2005²⁴ Scale(s): RM-18 (Shortened version of RMDQ () Time: 12.1 weeks (±0.9)	Minimal detectable change (MDC) for RM-18: 7.5 points	Assessed in subset of patients whose work status had not changed from baseline to follow-up visit ICC=0.68 (95% CI 0.52, 0.79)				RM-18 correlated with change in work status using Spearman’s ρ (0.30; Z=123, P=.02) AUC=0.69 (95%CI=0.60, 0.78) *1=perfect discrimination ES=0.44 (0.37-0.51) SES=0.38 (0.32-0.44) SRM=0.44 (0.37-0.51); paired t= 5.25
Chien 2013²⁵ Scale(s): BPI pain intensity items* (4), each item scored 0-10 Time: 10 days *NOTE: 4 BPI items (current pain, worst pain [past 24 hr], least pain [past 24 hr], average pain) used to compute composite average pain						SMRs all participants (improved/unimproved) Current pain: 0.36 (0.89/−0.03) Worst pain: 0.37 (0.63, 0.14) Least pain: 0.17 (0.50, −0.03) Average pain: 0.40 (0.53, 0.28) Composite average pain: 0.42 (0.81/0.10) ROC Current pain: 0.75 Worst pain: 0.66 Least pain: 0.65 Average pain: 0.61 Composite average pain: 0.71	Internal consistency (Spearman correlation) -moderate/high correlations between BPI composite average pain score and component items: ρ=0.71-0.84, P< .01 -small/moderate correlations between BPI pain items: ρ=0.38-0.65, P<.01
Cook 2008²⁶ Scale(s): RMDQ-23- (0-23);11- (0-11); 5- (0-5) Time: Single administration of scale *Data from 2 previously published studies				Correlations between each CAT condition and scores based on RMDQ 23 ranged from 0.93 (5-item) to 0.98 (11-item) Standard error of measurement-based CAT scores correlated 0.95 with RM-MODirt scores
de Vet 2007²⁷ Scale(s): NRS (pain intensity) (0-10) Time: 12 weeks	Anchor-based (with global perceived effect) MIC, for chronic pain subjects (n=135): 1) 95% cut-off limit 4.7 points 2) ROC cut-off 3.5 points Change on NRS 1) 0.5 sensitivity 95%, specificity 37% 2) 1.5 sensitivity 89% specificity 59% 3) 2.5 sensitivity 81% specificity 78% 4) 3.5 sensitivity 69% specificity 89% 5) 4.5 sensitivity 53% specificity 94%			Changes in Pl-NRS scores and the global perceived effect categories (Spearman correlation): ρ=0.6
Deyo 2016²⁸ Scale(s): PROMIS-PI SF (4-20) Time: 12 weeks		At 3 months: – Patients that rated pain “about the same” ICC=0.58 (0.44, 0.71) – Patients that rated pain as “changed ± 1 point” ICC=0.67 (0.56, 0.77)			PROMIS-PI scores in those a) seeking worker’s compensation (65.0) or not (59.8) (P<.001) b) who had a fall in past 3 months (62.7) or not (59.7) (P<.001)	Change of pain (much less to much worse) at 3 months compared to baseline Pain Interference: ES range: −1.03 (much less) to 0.71 (much worse) SRM range: −1.07 (much less) to 0.74 (much worse)
Driban 2015²⁹ Scale(s): PROMIS-PI SF (41-78.3), SF-36 BPS (0-100), WOMAC (pain 0-500), WOMAC (function 0-1700) Time: baseline data *Secondary analysis of previously published RCT				Spearman’s correlation coefficient (95%CI) PROMIS-PI/SF-36 BPS: p=-0.73 (−0.79, 0.65) PROMIS-PI/WOMAC Pain: p=0.47 (0.35, 0.57) PROMIS-PI/WOMAC Function: p=0.42 (estimated from Figure 3, confidence interval not provided)
Fisher 1997³⁰ Scale(s): ODI (0-100), MPQ (pain 0-6) Time: 15 weeks				Criterion Validity (Kendall’s tau, all P<.01) a) ODI Lifting Subscale with behavioral assessment of lifting: t=0.38 b) ODI Walking Subscale with behavioral assessment of walking: t=0.54 c) ODI Sitting Subscale with behavioral assessment of sitting: t=−0.40 Sensitivity/Specificity of ODI Subscales a) Lifting: 81 %/52% b) Walking: 76%/96% c) Sitting: 72%/69%			Internal Consistency Cronbach’s alpha ODI=0.76 Effect size for post-treatment change ODI=0.6 MPQ a) Total Number of Words Chosen=0.5 b) Present Pain lnventory=NR but reported to be not significant
Gallasch 2007³¹ Scale(s): Wong Faces, VAS (0-10), NRS (0-10) All scales: pain on previous day Time: Same day (pre- and post-physiotherapy)		Before and after physiotherapy session ICC: Faces=0.96 VAS=0.97 NRS=0.99					Rated easiest to understand: 1) Faces scale 38.7% 2) NRS 32.3% Easiest to fill out: 1) NRS 37.5% 2) verbal rating scale 32.2% Most difficult to understand VAS 58% Most difficult to fill out: VAS 67.8%
Gentelle-Bonnassies 2000³² Scale(s): VAS (pain, 0-100 mm), WOMAC (pain and function 0-100) Time: 6 months						Based on SRM (95%CI) VAS. Pain, ITT (n=80) Month 1:−0.40 (−0.64, −0.16) Month 3:−0.13 (−0.35, 0.10) Month 6: −0.25 (−0.48, −0.02) WOMAC, Pain, ITT (N=80) Month 1: −0.39 (−0.60, −0.18) Month 3: −0.28 (−0.53, −0.02) Month 6: −0.30 (−0.55, −0.06) WOMAC, Function, ITT (N=80) Month 1:−0.37 (−0.64,−0.10) Month 3: −0.15 (−0.39, 0.09) Month 6: −0.09 (−0.33, 0.14)
Godil 2015³³ Scale(s): NRS-neck pain, NRS-arm pain (intensity, 0-10) Time: 12 months						Based on SRM Responders: NRS-neck pain=0.95 NRS-arm pain=0.97 Non-responders: NRS-neck pain=0.49 NRS-arm pain=0.38 SRMs in patients reporting meaningful improvement (responders) versus non-responders (greater difference = more responsive scale) Mean change: NRS-neck pain=0.46 NRS-arm-pain=0.59 Based on ROC (AUC) curve NRS-neck pain: AUC=0.69 (poor discriminator) NRS-arm-pain: AUC=0.74 (valid discriminator)
Gronblad 1993³⁴ Scale(s): ODQ (0-50), VAS (present pain intensity, 0-100) Time: Subset after 1 week		Subset chosen, n=20, 1 week interval ODQ ICC=0.83		Pearson’s correlation ODQ/VAS r=0.62
Hicks 2009³⁵ Scale(s): ODI (0-100), SF-36 BPS (0-100) Time: Mean 11 days	Standard error of measurement = 4.57 (using data from participants classified as stable) Minimum detectable change ODI: 10.7 points 14.5% scored below 10.7 0% scored above 89.3	ODI subset of patients with stable LBP status from baseline to follow-up (mean 11 days) ICC 0.92 (95% CI 0.86, 0.95)		“Convergent” ODI/ SF-36 BPS: r=−0.69 (−0.78, −0.60) (P<.0001)	ODI scores significantly different (P<.0001) between groups with and without 1) high pain severity/high functional limitation and 2) chronic pain/high functional limitation (n=107)
Jensen 2012³⁶ Scale(s): VAS (pain intensity, 0-100mm) Time: 12 weeks post-randomization *Data obtained from 2 previously published RCTs							Discriminating active tx from placebo VAS ≥20mm: OR 1.94 (1.37, 2.75) VAS ≥30%: OR 1.97 (1.41, 2.77) VAS ≥50%: OR 2.46 (1.72, 3.50) Agreement between response criteria (kappa) a) 20 mm improvement with 30% improvement: k=.90 b) 20 mm improvement with 50% improvement: k=.51 c) 30% improvement with 50% improvement: k=.58
Kamper2015³⁷ Scale(s): SF-36 BPS (pain only, 0-100), NRS-24hr and NRS-Wk (pain intensity, 0-10) Time: 3, 6, and 12 months *Secondary analysis of data from 3 clinical studies; studies 2 and 3 were chronic pain cohorts				Pearson’s correlation coefficient Study 2:* NRS-24/SF-36 BPS Baseline: r=0.37 3 months: r= 0.71 12 months: r= 0.68 Study 3:* NRS-24/SF-36 BPS Baseline: r= 0.40 3 months: r=0.65 6 months: r=0.66 12 months: r=0.65 NRS-Wk/SF-36 BPS Baseline: r=0.46 3 months: r=0.64 6 months: r=0.72 12 months: r=0.70 NRS-24/NRS-Wk Baseline: r=0.72 3 months: r=0.87 6 months: r=0.90 12 months: r=0.93
Kean 2016³⁸ Scale(s): PROMIS-PI SF (6b) (6 items, total score 6-30), BPI (4 item severity [S], 7 item interference [I], and 11 item total, each item scored 0-10), PEG (severity and interference combined, each item scored 0-10), SF-36 BPS (0-100) Time: 3 months						Responsiveness to intervention (SCOPE trial) (Cohen’s d) BPI-S: 0.37 BPI-I: 0.33 BPI total: 0.38 PEG: 0.35 SF-36 BPS: -0.24 PROMIS-PI-SF: 0.21 AUC (SE) for detecting any improvement BPI-S= 0.73 (0.03) BPI-I= 0.68 (0.04) BPI total= 0.73 (0.03) PEG= 0.71 (0.03) SF-36 BPS= 0.68 (0.04) PROMIS-PI-SF= 0.61 (0.04) AUC (SE) for detecting moderate improvement BPI-S= 0.74 (0.04) BPI-I= 0.69 (0.04) BPI total= 0.74 (0.04) PEG= 0.72 (0.04) SF-36 BPS= 0.64 (0.05) PROMIS-PI-SF= 0.66 (0.04) SRMs significantly different (P<.05) between those who report being better vs stayed the same and those who report being worse vs stayed the same for all BPI scales, PEG, SF-36 BPS, and PROMIS-PI-SF
Keller 2004³⁹ Scale(s): BPI-SF (15 items, 4 severity, 7 interference, 4 other), GCPS (7 items, 3 intensity, 4 disability), SF-36 BPS, RMDQ-24 Time: First follow-up visit Note: all results for low back pain group only				Pearson’s r correlations BPI severity and GCPS intensity=0.60 GCPS disability=0.49 RMDQ=0.57 SF-36 BPS=0.61 BPI interference and GCPS intensity=0.64 GCPS disability=0.69 RMDQ=0.64 SF-36 BP=0.64 SF-36 BPS and GCPS intensity=0.47 GCPS disability=0.45 RMDQ=0.53		Standardized Response Means among improved patients BPI severity=−1.09 BPI interference=−1.13 GCPS intensity=−0.47 GCPS disability=−0.47 SF-36 BPS=0.69	Internal consistency (Cronbach’s alpha) BPI severity=0.82 BPI interference=0.93 GCPS intensity=0.65 GCPS disability=0.94 RMDQ=0.92 SF-36 BPS=0.84
Kerns 1985⁹ Scale: WHYMPI (pain severity and pain interference), MPQ (Present Pain Intensity, Total Pain Rating Index) Time: 2 weeks		WHYMPI scales Pain severity: r=0.82 Pain interference: r=0.86		Correlation with a factor related to severity and interference WHYMPI Pain Severity: 0.81 WHYMPI Pain Interference: 0.70 MPQ Total Pain Rating Index: 0.47 MPQ Present Pain Intensity: 0.44			Internal consistency (Cronbach’s alpha) WHYMPI Pain severity=0.72 Pain interference=0.90
Krebs 2010⁴⁰ Scale(s): BPI (4 item severity [S], 7 item interference [I], and 11 item total, each item scored 0-10), PEG (severity and interference combined, each item scored 0-10), GCPS (3 item intensity [S], 3 item disability [D], 0-10), RMDQ-24 (0-24), SF-36 BPS (0-100) Time: 12 months	Kappa for agreement between one-SEM and global rating classification Observational cohort BPI-S=0.31 BPI-l=0.20 BPI total=0.34 PEG=0.23 GCPS-S=0.27 CGPS-D=0.14 RMDQ-24=0.18 SF-36 BPS=0.27 RCT group BPI-S=0.32 BPI-l=0.24 BPI total=0.29 PEG=0.33 GCPS-S=0.35 GCPS-D=0.27 RMDQ-24=0.36 SF-36 BPS=0.19					AUC for responsiveness -*detecting moderate* improvement** Observational cohort BPI-S=0.81 (0.04) BPI-l=0.67 (0.05) BPI total=0.76 (0.04) PEG=0.70 (0.05) GCPS-S=0.73 (0.06) GCPS-D=0.66 (0.06) RMDQ-24=0.70 (0.05) SF-36 BPS=0.70 (0.05) RCT group BPI-S=0.85 (0.04) BPI-I= 0.77 (0.05) BPI total=0.81 (0.04) PEG=0.79 (0.04) GCPS-S=0.82 (0.04) CGPS-D=0.76 (0.04) RMDQ-24=0.85 (0.04) SF-36 BPS=0.77 (0.04) Observational cohort Mean SRMs differed significantly between “worse” and “same” groups and between “better” and “same” groups for each measure RCT group Mean SRMs differed significantly between “better” and “same” groups for each measure; values did not differ significantly between “worse” and “same” groups
Krebs 2009¹² Scale(s): BPI (4 item severity [S] and 7 item interference [I]; each item scored 0-10), PEG (severity and interference combined, each item scored 0-10), SF-36 BPS (0-100), GCPS (3 item intensity [S], 3 item disability [D], transformed to 0-100 scores), PGIC (1 item, change in pain, 1-7), RMDQ-24 (0-24) Time: 6 months *Brief Pain Inventory (BPI), Graded Chronic Pain (GCP), RMDQ, and SF-36BPS administered at baseline; BPI, GCP, and patient global rating of change administered at 6 months				Validity (Pearson’s r) PEG/BPI-S: r=0.69 PEG/BPI-I: r=0.89 PEG/GCPS-S: r=0.64; PEG/GCPS-D: r=0.67 PEG/RMDQ-24: r=0.60 PEG/SF-36 BPS: r=−0.61 BPI-S/BPI-I: r=0.58 BPI-S/CPGS-S: r=0.82 BPI-S/CPGS-D: r=0.47 BPI-S/RMDQ-24: r=0.41 BPI-S/SF-36 BPS: r=−0.46 BPI-I/CPGS-S: r=0.62 BPI-I/CPGS-D: r=0.71 BPI-S/RMDQ-24: r=0.70 BPI-S/SF-36 BPS: r=−0.65		Proportion of pain improvement after 6 months according to PGIC (31.4%) and GCPS (29.5%) - “similar” Improved group (based on PGIC) had mean improvement on PEG of 3 points (δ 2.5) and GCPS of 2.6 points (δ 2.7) - “similar” SRM among improved patients according to PGIC similar for PEG (1.20, 95%CI 0.96, 1.44), BPI-S (1.04, 95%CI0.80, 1.28), and BPI-I (1.13, 95%CI 0.89, 1.37) For all measures of improvement ES and SRM were consistent with large effects	Reliability (internal consistency) – PEG: 0.73 Construct validity “good” PEG: r=0.60-0.89
Krebs 2007⁴¹ Scale(s): NRS (current pain intensity, 0-10) Time: Single administration of scale					Accuracy of NRS-predicting 1) pain that interferes with function (BPI≥5): a) AUC=0.76 b) likelihood ratios: i) NRS=0: 0.39 (0.29, 0.53) ii) NRS=1-3: 0.99 (0.38, 2.60) iii) NRS=4-6: 2.67 (1.56, 4.57) iv) NRS=7-10: 5.60 (3.06, 10.26) 2) pain that motivates a visit: a) AUC=0.78 b) likelihood ratios: i) NRS=0: 0.35 (0.26, 0.48) ii) NRS=1-3: 2.00 (0.78, 5.13) iii) NRS=4-6: 3.06 (1.75, 5.37) iv) NRS=7-10: 6.04 (3.18, 11.48)
Lovejoy 2012⁴² Scale(s): MPQ-2 SF (22 pain descriptors [6 continuous, 6 intermittent, 6 neuropathic, and 4 affective] each rated 0-10 and total pain score), MPI (severity [S] and interference [I] scales) Time: Single administration of scale				Bivariate correlations using Pearson’s r MPQ-2 SF/ MPI-S: r=0.72 MPQ-2 SF/ MPI-I: r=0.66	MPQ-2-SF discriminant validity (mean) vs a) 1 pain diagnoses 2.44 (δ2.14)* b) 2-3 pain diagnoses 2.97 (δ2.13)* c) ≥4 pain diagnoses 3.81 (52.36) P<.01 vs c) And vs MPI-S a) None/Mild (score 0-2) 1.16 (δ1.69)* b) Moderate (score 2-4) 3.08 (δ1.68) c) Severe (score >4) 5.55 (δ2.00) **AII different (P<.01)		Internal consistency reliability (Cronbach’s α) MPQ-2-SF Total score: α =0.96
Lund 2005⁴³ Scale(s): VAS (pain intensity, 0-100) Time: Same day		Same day agreement: 20%
Macedo 2011⁴⁴ Scale(s): RMDQ-24 (0-24), RMDQ-18^a (0-18), RMDQ-18^b (0-18), RMDQ-11 (0-11) Time: 8-12 months						Internal responsiveness assessed using ES (84%CI): RMDQ-24: 0.67 (0.63-0.71) RMDQ-18³: 0.75 (0.71-0.79) RMDQ-18^b: 0.78 (0.73-0.82) RMDQ-11: 0.65 (0.61-0.69) External responsiveness assessed using AUC values for patients classified as improved and not improved (based on GPE scale): RMDQ-24=0.78 (0.76-0.81) RMDQ-18^a=0.78 (0.75-0.81) RMDQ-18^b=0.78 (0.75-0.81) RMDQ-11 =0.75 (0.72-0.78)
Maughan 2010⁴⁵ Scale(s): NRS (intensity, 0-10), ODI-2 (0-50), RMDQ-24 (0-24) Time: 5 weeks	MCID (ROC approach) RMDQ-24: 3.5 ODI-2: 7.5 NRS: 4.0					AUC RMDQ-24: 0.64 ODI-2: 0.67 NRS: 0.5
Merriwether 2016⁴⁶ Scale(s): PROMIS-PI-SF (6b) Time: 2^nd visit							Internal Consistency Cronbach’s alpha 0.90
Mlkail 1993⁴⁷ Scale(s): MPI (interference and pain severity), ODI Time: Same day				MPI Interference correlated with: ODI: 0.66 MPI Pain Severity: 0.55
Nilsdotter 2003⁴⁸ Scale(s): HOOS (40 items, 10 pain, 5 symptoms, 17 activity limitations [ADL], 4 sport/recreation function, 4 hip-related quality of life, 0-100), WOMAC LK 3.0 (pain, function, 0-20), SF-36 BPS (0-100) Time: 6 months				Spearman’s correlation HOOS (pain)/SF-36 BPS: ρ=0.61 HOOS (ADL)/SF-36 BPS; ρ=0.62		Responsiveness calculated as SRM after 6 months HOOS(pain)=2.11 WOMAC (pain)=1.83 HOOS (ADL)=1.70 WOMAC (function)=1.70
Parker 2012⁴⁹ Scale(s): VAS-back pain (severity, 0-10), ODI (0-100) Time: 2 years	MCID thresholds (4 anchor-based approaches): 1) Mean change approach: VAS-back pain=3.2 ODI=8.2 2) Minimum detectable change (95% CI) approach: VAS-back pain=2.2 ODI=2.0 3) Change difference approach: VAS-back pain=2.0 ODI=8.3 4) Receiving operating characteristic curve approach: VAS-back pain= 3.0, AUC=0.71 ODI=4.0, AUC= 0.90
Pinsker 2015⁵⁰ Scale(s): WOMAC (pain, physical function, 0-20), NRS (pain, 0-10) Time: NS		Mean of 15.5 days (range 4-35) between ICC WOMAC: Overall=0.90 Pain=0.90 Function=0.89 NOTE: limited to individuals who completed retest survey and reported condition to be stable on global change question		Spearman’s rank correlations NRS pain/WOMAC Overall=0.78 Pain=0.78 Function=0.73			Internal Consistency (Cronbach’s a) WOMAC Overall=0.97 Pain=0.91 Function=0.96
Scott 2015⁵¹ Scale(s): PGIC (pain, physical function, 1-7) Time: Single administration of scale						Effect sizes computed from pre- to post-treatment (Cohen’s d) Pain=0.56 Physical Function=0.56
Sindhu 2011⁵² Scale(s): VAS-D (digital, pain level, 1-10), VAS-P (paper, pain level, 1-10), NRS-V (verbal, pain level, 1-10) Time: Administered twice on one visit, before and after grip tests (5-10 minutes apart) Up to 4 grip tests performed (1-minute apart)		ICC (pre-grip): VAS-P 0.96 VAS-D 0.96 NRS-V 1.00		Concurrent validity measured by Pearson’s r: Pre-grip VAS-D/VAS-P=0.97 NRS-V/VAS-P=0.84 NRS-V/VAS-D=0.84 Post-grip VASD/VAS-P=0.95 NRS-V/VAS-P=0.93 NRS-V/VAS-D=0.93		Mean score change between pre- and post-grip pain levels: VAS-P=0.40 VAS-D=0.48 NRS-V=0.54 Effect size coefficient (change score average/SD of pre-grip pain score): VAS-P=0.29 VAS-D=0.32 NRS-V=0.37 ANOVA on change scores showed no significant difference in responsiveness among scales: F= 1.36, P<=.25
Stewart, 2007⁵³ Scale(s): NRS (pain intensity [1], bothersomeness [B], 1-10), SF-36 BPS (0-100) Time: 6 weeks						Internal responsiveness ES (84% CI): NRS-I= 0.75 (0.61, 0.89) NRS-B=1.17 (1.02, 1.31) SF-36 BPS=0.49 (0.36, 0.61) Subpopulation* NRS-l=1.03 (0.88, 1.18) NRS-B=1.40 (1.24, 1.56) SF-36 BPS=0.72 (0.58, 0.86) SRM (84% CI): NRS-I= 0.64 (0.52, 0.77) NRS-B=0.98 (0.86, 1.10) SF-36 BPS=0.48 (0.35, 0.60) Subpopulation* NRS-I= 0.96 (0.82, 1.10) NRS-B=1.20 (1.06, 1.34) SF-36 BSP=0.71 (0.57, 0.85) Subpopulation (n=101) participants who improved on GPE scale External responsiveness Pearson’s r for change score and AUC:* NRS-I=0.49 (0.68) NRS-B=0.47 (0.70) SF-36 BPS= 0.41 (0.73)
Stroud 2004⁵⁴ Scale(s): RMDQ-24 (0-24), RMDQ-18 (0-18), RMDQ-11 (0-11) Time: Single administration of scale				Intercorrelations among RMDQ 24-, 18-, and 11-item scales P<.01 24/18=0.98 24/11 =0.93 18/11 =0.95
Tan 2004⁵⁵ Scale(s): BPI (intensity [S], interference [1], 0-10), RMDQ-24 (0-24) Time: Varied (upon follow up visits)				Concurrent validity (Pearson’s r) BPI-l/RMDQ-24 r=0.57 BPI-S/RMDQ-24 r=0.40		Significant improvement with treatment confirms responsiveness of BPI intensity (S) and interference scales (I) (P<.001) Mean change (Visit 1 to Visit 3): BPI-S 0.93, t=5.33 (P<.001) BPI-I 0.96, t=4.66 (P<.001)
Tong 2006⁵⁶ Scale(s): VAS (pain intensity, 0-100mm) Time: Administered at 2^nd, 3^rd, 4^th, and final visits *Patients usually seen 2x/week for physical therapy							Spearman’s rank order correlation (r): early responses at second (r=0.32, P=. 02) third (r=0.34, P=.01), and fourth visits (r=0.62, P<.001) significantly correlated with discharge change in pain Discriminant analysis: early responses (2^nd-4^th visits) correctly predicted 80.4% of discharge outcomes (P<.001) defined by 30% improvement vs no improvement
Trudeau 2015⁵⁷ Scale(s): NRS-24 hr (pain intensity, 0-10), NRS-1 wk (pain intensity, 0-10), WOMAC (pain 0-20) Time: 4 × daily, 24 hours, 48 hours, 1 week						Differences between treatment and placebo were measured using SES NRS-24hr=0.33, P=.02 WOMAC-48hr=0.54, P=.001 NRS-1 wk=0.38, P=.01
Van der Roer 2006⁵⁸ Scale(s): NRS (pain intensity, 0-10) Time: 6, 12, 26, and 52 weeks	Chronic pain subgroup results Minimal Clinically Important Difference (MCIC) with NRS using 3 methods: 1) Δ=3.7(δ 2.1) 2) Minimal detectable change (95%CI= 4.5 (3.4-6.7) 3) Optimal cutoff point (sensitivity; specificity): 2.5 (77; 82) NRS sensitivity analysis showing range of MCIC results for lowest tertile baseline scores and highest tertile baseline scores: Low scores: 1.5-3.3 High scores 4.5-5.5
van Grootel 2007⁵⁹ Scale(s): VAS (pain intensity, 0-100mm) Time intervals: 1, 7 and 13 days (diary/SDD); 2-18 months (question-naire/CID)	Smallest detectable difference (SDD) determined by calculating difference between duplicate VAS scores for each subject SDD=49mm (for 13 days – longest interval)
Wittink 2004⁶⁰ Scale(s): SF-36 BPS (0-100), MPI (pain [S] 0-120; interference [I] 0-108), ODI (0-100) Time: After 3 visits				Overlap of the instruments measured using R² values (≥0.4 is high overlap) MPI-S/ODI=0.43 MPI-I/ODI=0.43 SF-36 BPS/MPI-S=0.58 SF-36 BPS/ODI=0.37		Responsiveness to change determined by ES from baseline to posttreatment (ES of <0.4 is small, >0.5 moderate, and >0.8 large) MPI-S=−0.41 MPI-I=−0.42 ODI=−0.39 SF 36 BPS=0.44

: δ=standard deviation; τ=Kendall’s Tau; ρ=Spearman’s rho; r=Pearson’s r

: ADL=activities of daily life; AUC=area under curve; BP=bodily pain; BPI=Brief Pain Inventory; CAT= computer adaptive testing; CDMD= chronic disabling musculoskeletal disorders; D=disability; ES=effect size; GCPS=Graded Chronic Pain Scale; GPE=global perceived effect; CPG=Chronic Pain Grade Questionnaire; I=Interference; ICC=Intraclass correlation coefficients; KOOS=Knee Osteoarthritis Outcome Score; MDC=minimal detectable change; MPQ=McGill Pain Questionnaire; NRS=numeric rating scale; ODI=Owestry Disability Index (also known as Oswestry Low Back Pain Disability Questionnaire); PEG=items assess average pain intensity (P), interference with enjoyment of life (E), and interference with general activity (G); PF= physical functioning; PGA=patient global assessment; PI=pain interference; PROMIS-PI=Patient-Reported Outcomes Measurement Information System-Pain Interference; RMDQ=Roland Morris Disability Questionnaire; ROC=receiver operating characteristic curve; S=severity/intensity; SCOPE=Stepped Care to Optimize Pain Care Effectiveness; SE=standard error; SES=standardized effect size; SF-36 BPS=Medical Outcomes Study short form-36 Bodily Pain Scale; SF-MPQ-2=Short Form McGill Pain Questionnaire; SRM=standardized response mean (SRM value 0.2-0.5 = small change, 0.5-0.8 = moderate, and >0.8 = large)

a: William and Myers version

b: Stratford and Binkley version

Supplemental Table 6Summary of Minimally Important Difference Outcomes

Study (ref)/ mode of administration/ (version)	n Condition of pain Time interval	MID equivalent	Approach(es) used to estimate MID equivalent
Studies Estimating MID for More Than One Scale
Parker 2012⁴⁹ SAQ, on-site	47 Pseudoarthrosis (revision fusion patients) 2 years	Oswestry Disability Index (range 0-100) Average chanqe approach 8.2 points Minimum detectable chanqe 2.0 points Chanqe difference approach 8.3 points ROC approach 4.0 points VAS (range 0-10) Averaqe chanqe approach 3.2 points Minimum detectable chanqe 2.2 points Chanqe difference approach 2.0 points ROC approach 3.0 points	Distribution and anchor-based Four approaches to MCID: 1) Average change approach: the average change score seen in the group defined by anchor to be responders 2) Minimum detectable change: the upper value of the 95% confidence interval for average change score seen in the cohort defined by anchor to be non-responders 3) Change difference approach: difference of the average change score for anchor-determined responders and non-responders 4) ROC approach: the change value that provides the greatest sensitivity and/or specificity for an anchor-determined positive response Two anchors produced the same responder/non-responder split: 1) SF-36 Health Transition Index, adapted: Patient rating of health before vs after surgery (markedly better or slightly better vs unchanged or worse) 2) Satisfied with results of surgery (yes vs no)
Krebs 2010⁴⁰ SAQ, on-site Randomized trial	205 Back, hip, or knee 12 months	SEM BPI (range 0-10) BPI-S: 0.7 BPI-I: 0.7 BPI total: 0.6 PEG (range 0-10): 1.8 CPG intensity (range 0-100): 9.0 CPG disability (range 0-100): 8.7 RMDQ (range 0-24): 1.0 SF-36 BPS (range 0-100): 9.8	Distribution and anchor-based minimal clinically important chanqe (MCIC) Distribution: Change classified by one-SEM criteria as follows: better score improved ≥1 SEM from baseline, same score change <1 SEM from baseline, and worse score worsened ≥1 SEM. Anchor: Patient-reported retrospective global rating of change (better, about the same, worse) Agreement between anchor and SEM was then examined via weighted kappa statistics.
Krebs 2010⁴⁰/ SAQ, on-site Cohort study	222 Back, hip, or knee 12 months	SEM BPI (range 0-10) BPI-S: 0.8 BPI-I: 0.8 BPI-total: 0.7 PEG (range 0-10): 1.9 CPG intensity (range 0-100): 9.9 CPG disability (range 0-100): 10.3 RMDQ (range 0-24): 1.2 SF-36 BPS (range 0-100): 11.8
Maughar 2010⁴⁵ SAQ, on-site	63 (48)^a Back 5 weeks	Oswestry Disability Index (range 0-100) Minimum detectable chanqe 16.7 points ROC approach 7.5 points RMDQ (range 0-24) Minimum detectable change 4.9 points ROC approach 3.5 points Numeric Rating Scale (range 0-10) Minimum detectable chanqe 2.4 ROC approach 4 points	Distribution and anchor-based minimal clinically important difference (MCID) Distribution: Minimal detectable change approach, estimated by 1.96 × square root of 2 × SEM test-retest. Anchor: Patient-reported global impression of change (much improved/completely better, unchanged, worse than ever). ROC analysis assessed the ability to distinguish patients who had and had not changed according to patient-reported global impression of change
Single Studies by Pain Scale
Numeric Rating Scale for pain intensity (range 0 to 10)
de Vet 2007²⁷ SAQ, on-site [see van der Roer, same study population]	135 (chronic) Lower back 12 weeks	ROC approach 3.5 points 95% limit cut-off approach 4.7 points	Distribution and anchor-based: Minimally important chanqe (MIC) Distribution: distribution of the change in scores was plotted on anchor-based a×es and 2 cut-points were applied, ROC and 95% limit Anchor: global perceived effect (completely recovered, much improved, slightly improved, no change, slightly worse, much worse). These were then clustered into 1) importantly improved, 2) not importantly changed, and 3) importantly deteriorated
van der Roer 2006⁵⁸ SAQ, on-site	138 (chronic) Lower back 12 weeks	Minimal detectable chanqe approach 4.5 points Mean chanqe approach 3.7 points Optimal cutoff point approach 2.5 points	Distribution and anchor-based: Minimal Clinicallv Important Chanqe (MCIC) 1) Minimal detectable change approach: estimated by 1.96 × square root of 2 × SEM test-retest. 2) Mean change approach: mean change score of all patients who “improved” based on the GPE 3) Optimal cutoff point approach: point that yields the lowest overall misclassification, based on ROC curve. Anchor: global perceived effect (completely recovered, much improved, slightly improved, no change, slightly worse, much worse). These were then clustered into 1) improved, 2) unchanged, and 3) deteriorated
Oswestry Disability Inde× (range 0 to 100 points)
Hicks 2009³⁵ SAQ, mail (modified)	107 (56)^a Lower back 11 days	10.7 points	Distribution: minimum detectable chanqe (MDC) SEM determined from participants classified as stable. The SEM was then used to calculate the 90% Cl and then multiplied by the square root of 2, which resulted in an estimate of MID
Roland Morris Disability Questionnaire (range 0-24 points)
Chansirinukor 2005²⁴ SAQ, on-site (18-item)	143 Lower back 3 months	MDC_95% 7.5 points	Distribution: minimal detectable chanqe (MDC) 95% Cl of the MDC was estimated by ± square root of 2 × SEM _test-retest × 1.96
Visual Analog Scale (range 0 to 100 mm)
van Grootel 2007⁵⁹ SAQ, on-site	118 (95-109)^a Temporomandibular disorders 2 weeks	49 mm	Distribution: smallest detectable difference (SDD) Estimated by the standard deviation of the difference values × 1.96

: BPI=Bodily Pain Inde×; BPS=Bodily Pain Scale: CPG=Chronic Pain Grade Questionnaire (also known as the Graded Chronic Pain Questionnaire); LBP=low back pain; NR=not reported; RMDQ=Roland Morris disability questionnaire; ROC=receiver operating characteristic curve; SAQ=self-administered questionnaire; SEM=standard error of measurement; VAS= visual analog scale;

a: Post-treatment, no further details

Supplemental Table 7Summary of Responsiveness Outcomes

Study (ref)/ Mode of administration (version)	N Condition of Pain Time interval	Responsiveness Results	Approach(es) used to estimate Responsiveness
Comparative Studies
Kean 2016³⁸/ Interview, on-site	250 (244)^a Musculoskeletal (moderate) 3 months	AUC, anv improvement BPI-S 0.73; BPI-I 0.68; BPI total 0.73 PEG 0.71 PROMIS-29-Profile PI 0.56; PROMIS-57-Profile PI 0.57; PROMIS-PI Short form 6b 0.61 SF-36 Bodily Pain 0.68 SRMs BPI-S: Worse −0.47; Same 0.13; Better 0.71 BPI-I: Worse 0.03; Same 0.38; Better 0.94 BPI total: Worse −0.22; Same 0.31; Better 0.93 PEG: Worse −0.14; Same 0.25: Better 0.86 PROMIS-29 Profile PI: Worse −0.11; Same 0.29; Better 0.33 PROMIS-57 Profile PI: Worse −0.16; Same 0.30; Better 0.37 PROMIS-PI Short form 6b: Worse −0.02; Same 0.27 Better 0.51 SF-36 Bodily Pain: Worse 0.17; Same −0.38; Better −0.71 SES (ES) BPI-S: 0.38 (Cohen’s d 0.37) BPI-I: 0.37 (Cohen’s d 0.33) BPI total: 0.42 (Cohen’s d 0.38) PEG: 0.37 (Cohen’s d 0.35) PROMIS-29 Profile PI: SES 0.17 (Cohen’s d 0.14) PROMIS-57 Profile PI: SES 0.24 SES 0.42 (Cohen’s d 0.38) PROMIS-PI Short form 6b: SES 0.28 (Cohen’s d 0.21) SF-36 Bodily Pain: −0.25 (Cohen’s d −0.24)	Based on SRM, SES and ES (0.2 is small, 0.5 is medium, and 0.8 is large) and ROC/AUC (0.5 is the same as chance to 1.0 is perfect discrimination). Anchored by patient-reported global change (much better, moderately better, a little better, no change, a little worse, moderately worse, and much worse)
Trudeau 2015⁵⁷ SAQ, on-site	47 Knee OA 1 week	SES NRS, 1 week: 0.38 NRS, 24 hours: 0.33 WOMAC-pain, 48 hours: 0.54	Based on SES of differences in pain scores between treatment and placebo
Bumham 2012²² SAQ, on-site	67 Lower back 2-8 weeks	Responsiveness ratios Oswestry Disability Index 2.3; MPQ 1.9	Based on responsiveness ratio (RR). The RR evaluates intervention-related change over time while considering the between-subject variability in within-subject changes in stable subjects. Significant RR values should be >1.96
Sindhu 2011⁵² SAQ, on-site (paper and digital)	33 Arm/hand Pre-post gripping	ES VAS-paper 0.29; VAS-digital 0.32 NRS 0.37	Based on ES of change scores between pre- and post-gripping pain levels
Krebs 2010⁴⁰ SAQ, on-site	427 Back, hip, or knee 12 months	AUC, any improvement BPI-S-cohort 0.83; BPI-S-RCT 0.81 BPI-l-cohort 0.70; BPI-I-RCT 0.78 BPI total-cohort 0.78; BPI total-RCT 0.81 PEG-cohort 0.73; PEG-RCT 0.78 CPG intensity-cohort 0.75, CPG intensity-RCT 0.78 CPG disability-cohort 0.65, CPG disability-RCT 0.75 RMDQ-cohort 0.70: RMDQ-RCT 0.81 SF-36 Bodily Pain-cohort 0.68: SF-36 Bodily Pain-RCT 0.72 SMRs BPI-S-cohort: Worse 0.75; Same 0.08: Better −1.07 BPI-S-RCT: Worse 0.29; Same −0.02; Better −0.99 BPI-I-cohort: Worse 0.43; Same −0.09: Better −0.69 BPI-I-RCT: Worse 0.06; Same −0.50; Better −1.06 BPI total-cohort: Worse 0.63; Same −0.04: Better −0.99 BPI total-RCT: Worse 0.15; Same −0.42; Better −1.15 PEG-cohort: Worse 0.35; Same −0.13; Better −0.83 PEG-RCT: Worse −0.05; Same −0.49; Better −1.14 CPG intensity-cohort: Worse 0.60; Same 0.07; Better −0.68 CPG intensity-RCT: Worse 0.56; Same −0.03; Better −0.73 CPG disability-cohort: Worse 0.37; Same −0.03; Better −0.57 CPG disability-RCT: Worse 0.14; Same −0.25; Better −0.94 RMDQ-cohort: Worse 0.57; Same −0.03; Better −0.67 RMDQ-RCT: Worse 0.35; Same −0.29; Better −1.09 SF-36 Bodily Pain-cohort: Worse −0.58; Same 0.17; Better 0.67 SF-36 Bodily Pain-RCT: Worse −0.17; Same 0.31; Better 0.76	Based on SRM and ROC/AUC. Anchored by global rating of change at 12 months (worse, same, or better)
Maughan 2010⁴⁵ SAQ, on-site	63 (48)^a Back 5 weeks	AUC Oswestry Disability Index 0.67 RMDQ-24 0.64 NRS 0.5	Based on ROC/AUC. Anchored by patient-reported global impression of change (much improved /completely better, unchanged, worse than ever)
Krebs 2009¹² SAQ, on-site	210 Back, hip, or knee 6 months	SRM (ES) Global rating of change PEG Improved 1.20 (1.29); Unchanged 0.29 (0.26); Worse −0.06 (−0.06) BPI-severity Improved 1.04 BPI-interference Improved 1.13 Chronic Pain Grade questionnaire PEG Decreased by ≥1 level 0.99 (1.51); Baseline = follow-up 0.29 (0.25); Increased by ≥1 level 0.04 (0.05)	Based on SRM. Anchored by global rating of change (improved, unchanged, worse) and Chronic Pain Grade questionnaire grade (pain grade decreased by ≥1 level, pain grade at baseline = pain grade at follow-up, pain grade increased by ≥1 level) at 6 months
Stewart 2007⁵³ SAQ, on-site	134 Chronic whiplash 6 weeks	AUC (Pearson’s r) NRS- pain intensity 0.68 (0.49) NRS- pain bothersomeness 0.70 (0.47) SF-36 Bodily Pain 0.73 (0.41) SRMs (ES) NRS- pain intensity Total cohort 0.64 (0.75); Improved 0.96 (1.03) NRS- pain bothersomeness Total cohort 0.98 (1.17); Improved 1.20 (1.40) SF-36 Bodily Pain Total cohort 0.48 (0.49); Improved 0.71 (0.72)	Based on SRMs, ES, ROC/AUC and Pearson’s r. Anchored by global perceived effect scored on an 11-point numerical rating scale (−5 = vastly worse, 0 = unchanged, 5 = completely recovered) at 6 weeks
Keller 2007³⁹ SAQ	131 LBP First follow-up visit	Among improved patients BPI severity=−1.09 BPI interference=−1.13 GCPS intensity=−0.47 GCPS disability=−0.47 SF-36 BPS=0.69	Based on SRM
Wittink 2004⁶⁰ SAQ, on-site	87 Mostly back and neck NR^b	ES MPI-S: −0.41; MPI-I: −0.42 Oswestry Disability Index −0.39 SF-36 Bodily Pain 0.44	Based on ES of differences between the baseline visit and post-treatment
Nilsdotter 2003⁴⁸ SAQ, on-site	62 OA, hip 6 months	SMRs HOOS pain: All patients 2.11; age ≤66 years 2.60; age >66 years 1.97 HOOS ADL: All patients 1.70; age ≤66 years 2.51; age >66 years 1.52 WOMAC pain: All patients 1.83; age ≤66 years 2.37; age >66 years 1.68 WOMAC function: All patients 1.70; age ≤66 years 2.51; age >66 years 1.52	Based on SRM
Gentelle-Bonnassie 2000³² SAQ, on-site and mailed	80 Knee OA 6 months	SRMs Intent-to-treat VAS Month 1: −0.40; Month 3: −0.13; Month 6: −0.25 WOMAC Pain Month 1: −0.39; Month 3: −0.28; Month 6: −0.30 WOMAC Function Month 1: −0.37; Month 3: −0.15; Month 6: −0.09	Based on SRM
Single Studies by Pain Scale
Brief Pain Inventory
Chien 2013²⁵ SAQ, on-site	254 Chronic 10 days	AUC 0.71 BPI composite average SMRs BPI composite average All subjects 0.42; Improved 0.81; Unimproved 0.10	Based on SRM and ROC/AUC. Anchored by patient-reported rating of pain improvement “Would you say that your pain has improved as a result of your treatment?” (strongly disagree, disagree, neutral, agree, and strongly agree)
Tan 2004⁵⁵ SAQ, on-site	440 Chronic NR^b	BPI-S, mean change P <.01 at all visits BPI-I, mean change P <.001 between visits 1 and 2, and visits 1 and 3. NS for visits 2 and 3.	Was assessed by using paired t tests to compare changes in the BPI scale scores across a span of 3 visits.
Numeric Rating Scale (range
Godil 2015³³ Interview, off-site (neck and arm versions)	88 Neck and radicular arm 1 year	AUC NRS-neck pain 0.69; NRS-arm-pain 0.74 SRMs Responders, NRS-neck pain 0.95; NRS-arm-pain 0.97 Non-responders NRS-neck pain 0.49; NRS-arm-pain 0.38	Based on SRM and ROC/AUC. Anchored by Meaningful improvement versus not’ (taken as the “gold standard” or the external criterion)
Oswestry Disability Index
Anagnostis 2004²⁰ Unclear, on-site	230 Chronic disabled musculoskeletal disorder NR^b	ES 0.95	Based on ES through comparison of pre- and post-treatment scores using paired t tests
Changulani 2009²³ SAQ, on-site	107 Lower back 6 weeks	SRM (ES) 0.84 (1.05)	Based on SRM and ES. Anchored by reported change in symptoms (much better, better, same, worse, much worse)
Patient Global Impression of Change
Scott 2015⁵¹ unclear, on-site	476 Back, upper body, other	ES Pain: 0.56 Physical function: 0.56	Based on within subject ES of differences between pre- and post-treatment means
PROMIS PI
Askew 2016²¹ SAQ, on-site	218 (175)^a Lower back 3 months	SRMs Better −1.09; Same −0.26; Worse 0.44	Based on SRM. SRM ≥ 0.30 indicated responsiveness. Anchored by self-reported magnitude of changes (better, same, or worse) in overall pain scores
Deyo 2016²⁸ Interview, written survey	198 Musculoskeletal 3 months	SRM (ES) Pain interference Much better −1.07 (−1.03); Slightly better −0.29 (−0.28); Same −0.08 (−0.08); Slightly worse 0.18 (0.17); Much worse 0.74 (0.71)	Based on SRM and ES. Anchored by patient-reported global change (much better, slightly better, same, slightly worse, and much worse)
Roland Morris Disability Questionnaire
Macedo 2011⁴⁴ SAQ, on-site (24, 18-item^{willams and Myers (WM)}, 18-item^{Stratford and Binkley (SB)}, 11-item)	461 Lower back Up to 1 year	AUC (cut-off of ≥3 qlobal perceived effect units) 24-item 0.78; 18-item^WM 0.78; 18-item^SB 0.78; 11-item 0.75 ES 24-item 0.67; 18-item^WM 0.75; 18-item^SB 0.78; 11-item 0.65 GRI 24-item 1.55; 18-item^WM 1.49; 18-item^SB 1.52; 11 -item 1.30	Based on ES, Guyatt’s responsiveness index (GRI, calculated by dividing the mean change of patients who have improved by the standard deviation of change of patients reporting no improvement) and ROC/AUC. Anchored by global perceived effect (cut-off of 3 units was used to identify patients that improved and did not improve)
Chansirinukor 2005²⁴ SAQ, on-site (18-item)	143 Lower back 3 months	AUC 0.69 SRM (ES) 0.44 (0.44) SES 0.38	Based on SRM, SES, ES, and ROC/AUC Anchored by work status (working preinjury duties, full time; working preinjury duties, part-time or working other duties, full-time; working other duties, part time; and not working)

: ES=effect size; HOOS=Hip Disability and Osteoarthritis Outcome Score; LBP=low back pain; MCID=minimum clinically important difference; MPQ=McGill Pain Questionnaire; NR=not reported; ROC=receiver operating characteristic curve (AUC area under the curve); SAQ=self-administered questionnaire; SEM=standard error of measurement; SES=standardized effect sizes; SRM=standardized response mean; VAS= visual analog scale; WOMAC=Western Ontario and McMaster Universities Osteoarthritis Index

a: Available at follow-up

b: Post-treatment, no further details