U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Viswanathan M, Berkman ND. Development of the RTI Item Bank on Risk of Bias and Precision of Observational Studies [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2011 Sep.

Cover of Development of the RTI Item Bank on Risk of Bias and Precision of Observational Studies

Development of the RTI Item Bank on Risk of Bias and Precision of Observational Studies [Internet].

Show details

Appendix AAC1 Statistic

AC1 was originally introduced by Gwet in 2001 (Gwet, 2001). The interpretation of AC1 is similar to generalized kappa (Fleiss, 1971), which is used to assess interrater reliability of when there are multiple raters. Gwet (2002) demonstrated that AC1 can overcome the limitations that kappa is sensitive to trait prevalence and rater's classification probabilities (i.e., marginal probabilities), whereas AC1 provides more robust measure of interrater reliability. The section below shows the formula used to compute AC1. The first formula also shows that AC1 differs from generalized kappa in the way that how the chance correction was computed (i.e. how the p is computed). In addition, the computation is unweighted, thus the ordering of the response category is not taken into account. Our computation of AC1 is conducted using the macro code provided by Blood et al (2007) (http://mcrc.hitchcock.org/SASMacros/Agreement/AC1AC2.TXT).

AC1=papeγ1peγ,

where pa is the overall agreement probability including by chance or not by chance, and P is the chance-agreement probability. Their computation formulas are as follows:

pa=1ni=1n{q=1Qriq(riq1)r(r1)}peγ=1Q1q=1Qπq(1πq)πq=1ni=1nriqr,

where i is the number of studies rated, q is the number of categories in the rating scale,

riq is the number of raters who classified the ith studies into the qth category,

r is the total number of raters, and

πq is the probability that a rater classifies an study into categories q and computed as follows

References

  1. Blood E, Spratt KF. Disagreement on agreement: Two alternative agreement coefficients. Paper 186-2007. SAS Global Forum; 2007. pp. 1–12.
  2. Fleiss JL. Measuring nominal scale agreement among many raters. Psychological Bulletin. 1971;76:378–2.
  3. Gwet K. Handbook of Inter-Rater Reliability: How to Estimate the Level of Agreement Between Two or Multiple Raters. Gaithersburg, MD: STATAXIS Publishing Company; 2001.
  4. Gwet K. Inter-Rater Reliability: Dependency on Trait Prevalence and Marginal Homogeneity. Statistical Methods for Inter-Rater Reliability Assessment. 2002;2:1–9.

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (471K)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...