Goodness of Fit and Related Chi-Square Tests

# Part 2: Calculating Fisher’s Exact Statistic

In the chi-square test statistic shown in the previous chapter, we were interested in measuring the association between breastfeeding duration and the mother’s level of education that she had completed.  This is not a causal model but a measure of association that lets us evaluate the relationship between two independent measures. We began with the null hypothesis that there was no association between the two variables, but after testing the association with the chi-square test and finding that the chi-square estimate that we calculated exceeded the chi-square estimate expected we rejected the null hypothesis and our conclusion was that there appears to be a relationship between the level of maternal education and breastfeeding duration.

Our decision to reject the null hypothesis was based on the chi-square estimate that we calculated is compared to a critical value associated with a 95% probability that our observed estimate was representative of that which we should find in a normal population. We could however be more precise than 95% and compute the exact probability of the chi-square statistic that we calculated by using the Fisher’s Exact test.

The formula for the Fisher’s Exact test is shown here as:

$p = {(a+b)! (c+d)! (a+c)! (b+d)!\over (n! \times a! \times b! \times c! \times d!)}$

The term n! refers to “n”- factorial and it is computed by simply recursively calculating out the run of n x (n-1) x (n-2) x (n-3) … until (1).  So that 6! is actually 6 x (6-1) x (6-2) x (6-3) x (6-4) x (6-5) or in simpler terms 6! is 6 x 5 x 4 x 3 x 2 x 1.

Applying the Fisher’s exact test to our scenario we would compute the following exact probability for our chi-square statistic, where a=43; b=27; c=21; and d=34;

Note that you will have difficulty computing the exact probability with numbers as large as those represented by a, b, c, and d. Therefore, we can reduce this computation by using the following SAS code.  This approach demonstrates the versatility of the SAS programming language to enable complex computations without requiring an apriori dataset.

SAS Code to compute Fisher’s Exact Test from known values

OPTIONS PAGESIZE=55 LINESIZE=120 CENTER DATE;
DATA FACT;
X1=FACT(70); X2=FACT(55); X3=FACT(64); X4=FACT(61);
Y1=FACT(125); Y2=FACT(43); Y3=FACT(27); Y4=FACT(21); Y5=FACT(34);

/* REDUCE THESE FACTORIALS TO COMPUTE FISHER’S EXACT.
IMPROVE THE EFFICIENCY OF THE REDUCTION BY MATCHING LARGEST
NUMERATORS AND DENOMINATORS TO CANCEL NUMBERS WITHIN THE SEQUENCE */

REDUCE1=(X1/Y1);  REDUCE2=(X2/Y2);  REDUCE3=(X3/Y5);  REDUCE4=(X4/Y3);
REDUCE5=(1/Y4);

OUTCOME=ROUND (REDUCE1*REDUCE2*REDUCE3*REDUCE4*REDUCE5, 0.001);

PROC PRINT DATA=FACT;
VAR X1 X2 X3 X4 Y1 Y2 Y3 Y4 Y5 REDUCE1 REDUCE2 REDUCE3 REDUCE4 REDUCE5 OUTCOME;
RUN;

The SAS code above produces the following results.

VARIABLE: X1 VARIABLE: X2 VARIABLE: X3 VARIABLE: X4
1.1979E100 1.2696E73 1.2689E89 5.0758E83
 VARIABLE: Y1 VARIABLE: Y2 VARIABLE: Y3 VARIABLE: Y4 VARIABLE: Y5 1.8827E209 6.0415E52 1.0889E28 5.1091E19 2.9523E38
 REDUCE_1 REDUCE_2 REDUCE_3 REDUCE_4 REDUCE_5 OUTCOME 6.3625E-110 2.1015E20 4.2979E50 4.6615E55 1.9573E-20 .005243165

The OUTCOME value shown in the table above is the EXACT P-value which when rounded is p = 0.005.

We could also compute the Fisher’s Exact value by hand using the following formula with the following cell values:  a=43, b= 27, c=21 and d=34.

$p = {(a+b)! (c+d)! (a+c)! (b+d)!\over (n! \times a! \times b! \times c! \times d!)}$

$p = {(43+27)! (21+34)! (43+21)! (27+34)!\over (125! \times 43! \times 27! \times 21! \times 34!)}$

$p = {(70)! (55)! (64)! (61)!\over (125! \times 43! \times 27! \times 21! \times 34!)}$

$p = {0.005243165}$

# Part 3: Calculating Associations in 2 x 2 tables with the Phi Coefficient

In addition to computing the exact probability for statistical comparison, we can also determine the strength of the association between the two variables using a simple computation to produce the phi-coefficient. The phi-coefficient provides an estimate of association in a 2 x 2 table.  If there is no association between the rows and columns then the outcome is 0. The maximum value of phi is 1, which indicates an extremely strong relationship. It is also common to observe that when there appears to be a very low probability associated with a chi-square outcome, the phi-coefficient may also appear to demonstrate a low estimate.

The formula for the phi coefficient is:   ${\phi^2} = {\chi^2 \over n} = {\phi ={\sqrt {\chi^2 \over n}}}$

In the chi-square example shown in the previous chapter and in the calculation of the Fisher’s Exact Test shown above, the Phi Coefficient is reported in the output in that was generated by our SAS program.

Statistics for Table of ROW by COL from the original chi-square

 Statistic DF Value Prob Chi-Square 1 6.6617 0.0099 Likelihood Ratio Chi-Square 1 6.7197 0.0095 Continuity Adj. Chi-Square 1 5.7638 0.0164 Mantel-Haenszel Chi-Square 1 6.6084 0.0101 Phi Coefficient 0.2309 Contingency Coefficient 0.2249 Cramer’s V 0.2309

The Phi Coefficient was reported as 0.2309 which is approximately the same as the value we can compute by hand from the formula shown here:

${\phi^2} = {\chi^2 \over n} = {\phi ={\sqrt {\chi^2 \over n}}} = {\phi ={\sqrt {6.66 \over 125}}} = 0.23$

The Phi Coefficient reported here demonstrates that while the chi-square result was significant, and thereby indicating a significant association, the actual measure of association is low at 0.23.