U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Watson JM, Crosby H, Dale VM, et al.; AESOPS trial team. AESOPS: a randomised controlled trial of the clinical effectiveness and cost-effectiveness of opportunistic screening and stepped care interventions for older hazardous alcohol users in primary care. Southampton (UK): NIHR Journals Library; 2013 Jun. (Health Technology Assessment, No. 17.25.)

Cover of AESOPS: a randomised controlled trial of the clinical effectiveness and cost-effectiveness of opportunistic screening and stepped care interventions for older hazardous alcohol users in primary care

AESOPS: a randomised controlled trial of the clinical effectiveness and cost-effectiveness of opportunistic screening and stepped care interventions for older hazardous alcohol users in primary care.

Show details

Chapter 6Fidelity process rating

Treatment fidelity plays a crucial role in considering the inferences drawn from effectiveness studies.66 It provides a means of evaluating whether or not therapists delivered the interventions as described in the session protocols and demonstrates that interventions were distinguishable from one another.96 In other words, it reports on the internal validity of the study. It also assesses the quality of such delivery, i.e. it measures practitioner skill. This is particularly important as treatment adherence is not always related to therapist competence.97 A therapist can adhere to a session protocol but deliver the components in a poor or unacceptable manner, such as asking questions at inappropriate times and adopting a cold and judgemental demeanour. Fidelity checks can therefore identify differences in therapist competence and enable potential treatment effects to be accurately attributed.

Methods

Development of the rating scale

The AESOPS PRS (Appendix 10) was adapted from the validated UKATT PRS66 and was designed to rate the delivery of all three trial interventions, namely the minimal intervention, the 20 minutes of BCC (step 1) and MET (step 2). Content and style items from the validated UKATT PRS, including those that rated the delivery of MET, were used as the basis for adapting the scale. These items were examined to ensure that they covered all of the treatment components specified in the session protocols. At this point an item was added to rate the number of open questions asked by the practice/research nurse.

Items described behaviours that were referred to in each of the session protocols and were therefore relevant to each intervention. Style items were largely based on a motivational interviewing approach. In order to distinguish interventions delivered in this style, two items denoting behaviours that were inconsistent with a motivational interviewing approach were included in the pilot phase. These included item 15, the extent to which the practice/research nurse provided unsolicited advice to the patient, and item 17, the number of closed questions asked by the practice/research nurse within the intervention.

The rating scale

The PRS is an 18-item scale, divided into four sections. The first section contains four items relating to overall session management. The middle two sections include eight items measuring specific tasks and five items measuring therapist style. The last section, listed as a single item, contains a session content/activity checklist.

All but three items were rated on two 5-point scales. The first scale provided a frequency rating that showed the extent to which an item was present. The second scale gave a quality rating and showed how well the practice/research nurse performed the behaviour; this scale was rated only if the item received a frequency rating. The frequency ratings ranged from 0 (‘not at all’), indicating that the item never explicitly occurred, to 4 (‘extensively’), signifying that the item was performed numerous times during the intervention. Intermittent points were labelled ‘a little’, ‘somewhat’ and ‘considerably’. On the quality scale, a rating of 0 (‘very poor’) showed that the item was performed in an unacceptable manner, and a rating of 4 (‘very well’) indicated that the therapist had demonstrated a high level of skill and expertise. Intermittent labels were ‘poor’, ‘good enough’ and ‘well’.

Global ratings were given for three of the items; two were associated with session management (‘session structure’ and ‘consistency of problem focus’) and one with therapist style (‘empathy’). The remaining items consisted of frequency counts of specific behaviours with corresponding quality ratings. Each point on the frequency scale related to a predefined number of behaviour counts. For example, a frequency rating of 2 (‘somewhat’) indicated that the item behaviour occurred either once and in some detail or three or time times but briefly. Quality scores also had corresponding definitions; for instance, 0 (‘very poor’) indicated that the practice/research nurse performed the behaviour within each item in an unacceptable manner. Where appropriate, an average quality score was given for each of the item behaviours. For example, if a practice/research nurse attempted to elicit optimism three times within the intervention and received quality scores of 2, 3 and 4, the overall quality rating given for that behaviour would be 3 (‘well’).

Two items carried a frequency rating only: item 15 (‘unsolicited advice’) and item 17 (‘closed questions’). Given that these behaviours were inconsistent with a motivational interviewing style, it followed that a rating of how well therapists performed these items was not needed. The final item, a session content/activity checklist, asked for a yes/no answer to illustrate whether or not the following content had occurred within the intervention: review AUDIT score, obtain an account of drinking, give correct advice/information, set a target, and make a drinking plan. The checklist also included a tick box question to indicate whether the recording was good or poor.

The rating manual

The rating manual was similarly adapted from that used in the UKATT study.66 General guidelines were issued for the process of rating, such as rating practice/research nurse behaviours, distinguishing between frequency and quality scores, and avoiding sources of rater bias. Item definitions with guidelines for making higher or lower ratings were provided. These were illustrated with examples of practice/research nurse dialogue and differentiated from closely related items. Explanatory notes were included regarding the rating of session content.

Rater training and supervision

An independent rater was trained to use the adapted scale and rating manual. Supervised practice ratings were held at weekly intervals reviewing a total of 17 recordings split evenly between the trial interventions. Recordings were simultaneously rated and the scores discussed with reference to the manual and rater notes. This ensured rater consistency and prevented rater drift. Familiarity with the manual and rating scale was essential. Independent practice was carried out whereby item definitions were read each time they were scored. Recordings used during rater training were not used in the study. Regular supervision continued after training to discuss independently rated recordings. Selected recordings were rated by the independent rater and the supervisor for the purposes of calibration.

The process of rating

Following guidelines outlined in the rating manual, raters listened to the interventions and scored item behaviours. Where appropriate, frequency counts were given a corresponding quality rating. Item definitions, as specified in the manual, were referred to throughout the process in order to prevent rater drift. Raters had the option to pause the recording or consult the manual without stopping. Brief notes were made during the session to help substantiate assigned scores. These were particularly useful for discussing ratings during supervision. At the end of the session, appropriate global ratings and overall frequency and quality ratings were given. Each session was timed to ascertain duration. All scores were entered into the Statistical Package for Social Sciences (SPSS) version 19 (SPSS Inc., Chicago, IL, USA) for analysis.

Sampling

One hundred and sixty sessions of brief advice (minimal) and BCC (step 1) were selected for independent process rating (Figure 9). Only these two treatments were rated, as there were not enough MET sessions (step 2) to enable meaningful results to be obtained. The sample was stratified by site, practice/research nurse and treatment. Replacement sampling was used for eight inaudible recordings. In total, 79 sessions of brief advice and 81 sessions of BCC were rated. Nineteen per cent of these were double rated (i.e. scored by both the independent rater and supervisor): 11 sessions of brief advice and 20 of BCC.

FIGURE 9. Flow diagram of process rating procedure.

FIGURE 9

Flow diagram of process rating procedure.

Analyses

The PRS consisted of four sections: session management, specific task, practice/research nurse style and session content. The summaries for the scores for each of the treatment sessions are displayed in Tables 28 to 32.

TABLE 28

TABLE 28

Session management

TABLE 32

TABLE 32

Summary measures

TABLE 29

TABLE 29

Specific tasks

TABLE 30

TABLE 30

Practice/research nurse style

TABLE 31

TABLE 31

Session content

Four summary measures were calculated; these were used in the analyses in addition to the time taken to complete the sessions. The summary measures were analysed using mixed models, with practice/research nurse fitted as a random effect. There was a significant difference in time of session between the 5- and 20-minute sessions. The average time for the 5-minute session was 422.57 seconds (7 minutes) and for the 20-minute session 1174.78 seconds (20 minutes).

The 20-minute sessions had significantly higher task frequency and task quality scores. The 20-minute sessions also had significantly higher style frequency and style quality scores. The results can be seen in Table 33.

TABLE 33

TABLE 33

Summary measures analyses

When comparing the session content, there were significant differences between the two sessions on only two measures, ‘obtaining a drinking account’ and ‘setting a target’; both of these were more likely to be performed in the 20-minute sessions. The results of all of the analyses can be seen in Table 34.

TABLE 34

TABLE 34

Session content analyses

The analysis of the summary measures was repeated but this time it included a variable to represent specialist practitioners. When comparing the session rating scores for specialist and non-specialist practitioners there were no significant differences found. The full results can be seen in Table 35.

TABLE 35

TABLE 35

Comparison of specialist and non-specialist practitioners' rating scores

Reliability of ratings

A sample of the recordings was rated by two raters. Inter-rater reliability of the individual frequency items of the summary scores was examined using the ICC two-way mixed-effects model (case 3).67 For the four summary scores, the average of the two raters’ summary scores was plotted against the difference in their summary scores69 to make pairwise comparisons between raters (Figure 10).

FIGURE 10. Bland–Altman plots.

FIGURE 10

Bland–Altman plots.

The ICCs for the summary measures ranged from 0.64 to 0.81 (Table 36), which indicates acceptable levels of agreement.67

TABLE 36

TABLE 36

Intraclass correlation coefficient analyses of the individual frequency items of the summary scores

The Bland–Altman plots for each of the summary measures compare the two raters (Figure 10). A positive difference indicates that the second rater scores higher than the first rater. A negative difference indicates than the second rater scores lower than the first.

Summary

The scale identified significant differences between the 5- and 20-minute interventions, indicating that the two types of session were distinct. Validation of the rating showed an acceptable level of agreement between the raters. There were no significant differences in the rating scores between practice/research nurses with different levels of experience (specialist vs non-specialist practitioners).

Copyright © Queen's Printer and Controller of HMSO 2013. This work was produced by Watson et al. under the terms of a commissioning contract issued by the Secretary of State for Health. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.

Included under terms of UK Non-commercial Government License.

Bookshelf ID: NBK260687

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (5.1M)

In this Page

Other titles in this collection

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...