Measuring Correlation, Association, Reliability and Validity

# Part II: The Kappa Statistic to Measure Agreement

Given that the results of the McNemar Chi-Square statistic, calculated in the previous chapter, were not significant, then the question becomes, “if the outcome variables representing the results of a participant’s performance on each test are not statistically significant in their difference, does that necessarily mean that the outcome scores are in agreement?”

Since the Kappa statistic is a measure of agreement we can test this notion using the Kappa statistic applied to the fourfold or 2 x 2 table. Converse to the McNemar Chi-square which processes the data in the off-diagonal elements (cell “b” and cell “c”), the Kappa computations focus on the data in the major diagonal from upper left to lower right (cell “a” and cell “d”), examining whether counts along this diagonal differ significantly from what is expected to occur by chance. If no agreement exists between the counts on the major diagonal then we would expect the proportion of individuals scoring high or low on the lab and field tests to be similar.

Similar to the computation of the McNemar Chi-square, the Kappa statistic uses the data from the row and column probabilities of the 2 x 2 table.  The exact computations for Kappa are specifically shown as follows:

1. COMPUTE ROW PROPORTIONS

Row 1 Proportion:   p1. = (a+b) ÷ N;  p1. = (23+12) ÷ 86 = 0.41

Row 2 Proportion:   p2. = (c+d) ÷ N;  p2. = (19+32) ÷ 86 = 0.59

Column 1 Proportion: p.1 = (a+c) ÷ N; p.1 = (23+19) ÷ 86 = 0.49

Column 1 Proportion: p.2 = (b+d) ÷ N; p.2 = (12+32) ÷ 86 = 0.51

1. COMPUTE THE $P_{i}$ TERMS

OBSERVED: $({\pi}) \textit{obs}$

$({\pi}) \textit{obs}$: the observed term of the main diagonal elements

$({\pi}) \textit{obs}$ = ((cell a) ÷ N) + ((cell d) ÷ N);

$({\pi}) \textit{obs}$ = ((23÷86) + (32 ÷ 86));

$({\pi}) \textit{obs}$ = (.27+.37);

$({\pi}) \textit{obs}$ =  0.64

EXPECTED: $({\pi}) \textit{exp}$

$({\pi}) \textit{exp}$:  the expected term of the main diagonal elements

$({\pi}) \textit{exp}$  = ((p1. * p.1) + (p2. * p.2);

$({\pi}) \textit{exp}$  = ((0.41 * 0.49) + (0.59 * 0.51));

$({\pi}) \textit{exp}$  = (0.20 + 0.30);

$({\pi}) \textit{exp}$  = (0.50);

1. COMPUTE KAPPA $({\kappa})$

Kappa = (( $({\pi}) \textit{obs} - ({\pi}) \textit{exp}) ÷ (1- ({\pi}) \textit{exp}$  ))

Kappa = ((0.64 – 0.50) ÷ (1- 0.50))

Kappa = (0.14  ÷ 0.50)

Kappa = 0.28

The computed Kappa value is  κ = 0.28.  Our next task is then to determine if this is a true measure of agreement or an agreement that can happen by chance.  Therefore, in order to evaluate this Kappa statistic we need to determine if the computed value is significantly different than 0.

We can do this by first computing the standard error of the Kappa statistic and then using this value to determine the z statistic for Kappa and comparing the value to the normal curve.   Recall that 95% of scores on the normal curve are < ±1.96.  Therefore, if our Zκ score is between -1.96 and +1.96 then we would accept the null hypothesis that  κ=0.

To compute the standard error for our computed KAPPA SCORE we use the following procedure under the null hypothesis that Ho: k=0

1. COMPUTE THE SUM OF PROPORTIONS

p1. = 0.41; p.1 = 0.49; p2. = 0.59; p.2 = 0.51

sumP = (p1. * p.1 * (p1. + p.1)) + (p2. * p.2 * (p2. + p.2));
sumP = (0.41 * 0.49 * (0.41 + 0.49)) + (0.59 * 0.51 * (0.59 + 0.51));
sumP = (0.20 * (0.90)) + (0.30 * (1.10));
sumP = (0.18) + (0.33);

sumP = (0.51);

1. COMPUTE THE STANDARD ERROR

std error = 1/((1- ) *  )*

std error = 1/((1- 0.5) *  ) *
std error = 1/((0.5) *  ) * 0.49

std error = 0.22 * 0.49

std error = 0.106

Use the following formula to compute Zκ which is the z score for Kappa, under the null hypothesis of Ho: k=0:

zKappa = (kappa/ stderr1)          zKappa = (0.28/ 0.106)                       zKappa = 2.65

Considering that 2.65 is greater that 1.96 we can say that the zKappa is within the region of rejection in regard to the null hypothesis stated as Ho:k=0 and therefore we can say that there is agreement between the lab and field test.

Finally, we can also determine the significant difference of our Kappa estimate from  0 by using the standard error to compute the 95% confidence intervals for the Kappa statistic as follows:

1. COMPUTE THE STANDARD ERROR AND 95% CONFIDENCE INTERVAL

Use the following measurement terms taken from the McNemar Chi-square table:

 p1. = 0.41 P11 = 23/86=0.27 p.1 = 0.49 p12 = 12/86=0.14 p2. = 0.59 p21 = 19/86 = 0.22 p.2 = 0.51 p22 = 32/86 =0.37

Aterm = (p11*(1-(p1. + p.1)*(1-kappa))**2 + p22*(1-(p2. + p.2)*(1-kappa))**2);

Aterm = (0.27*(1-(0.41 + 0.49)*(1-0.28))**2 +  0.37*(1-(0.59 + 0.51)*(1-0.28))**2);

Aterm = (0.27*(1-(0.9)*(1-0.28))**2 + 0.37*(1-(1.10)*(1-0.28))**2);

Aterm = (0.27*(1-(0.9)*(0.72))**2 + 0.37*(1-(1.10)*(0.72))**2);

Aterm = (0.27*(1-(0.648))**2 + 0.37*(1-(0.792))**2);

Aterm = (0.27*(0.352)**2 + 0.37*(0.208)**2);

Aterm = (0.27*(0.124) + 0.37*(0.043));

Aterm = (0.033 + 0.02);

Aterm = (0.049);

Bterm=((p12*(p.1 + p2.)2 + p21*(p.2 + p1.)2)*(1-kappa)2);

Bterm=((0.14*(0.49 + 0.59)2 + 0.22*(0.51 + 0.41)2)*(1-0.28)2);

Bterm=((0.14*(1.08)2 + 0.22*(0.92)2)*(0.72)2);

Bterm=((0.14*(1.17) + 0.22*(0.84))*(0.52));

Bterm=((0.16 + 0.185)*(0.52));

Bterm=(0.179);

Cterm=((kappa – *(1-kappa))**2);

Cterm=((0.28 – 0.5*(0.72))2);

Cterm=(0.0064)

A + B + C= (Aterm + Bterm + Cterm);
A + B + C= (0.049 + 0.179 + 0.0064);

A + B + C= 0.23

Compute the standard error used in the computation of the confidence interval:

stderr =   =    =    =  0.01

ci95LL = (kappa – 1.96*(stderr));

ci95LL = (0.28 – 1.96 * 0.01);

ci95LL = (0.28 – 0.022);

ci95LL = (0.258)

ci95UL = (kappa + 1.96*(stderr2));

ci95UL = (0.28 + 1.96 * 0.01);

ci95UL = (0.28 + 0.022);

ci95UL = (0.302);

If the upper and lower limits of the 95% confidence interval do not include 0 then we can say that the Kappa value is significantly different from 0.

The SAS program to produce KAPPA in the 2 x 2 matrix was handled by the McNemar Chi-Square, where a=23, b=12, c= 19, d=32. Since the data were entered as cell summary data and not strings of raw data, the weight <dependent variable> format is used to read each cell value. The essential option is /AGREE which produces the Kappa measure of agreement.

PROC FREQ;

TABLES ROW*COL /AGREE;

WEIGHT OUTCOME;

RUN;

Statistics for Table of ROW BY COL

 Simple Kappa Coefficient Kappa 0.2759 ASE 0.1024 95% Lower Conf Limit 0.0752 95% Upper Conf Limit 0.4767 