SES GRPS  Frequency  Percent  Cumulative Frequency 
Cumulative Percent 

HIGH SES  165  8.05  165  8.05 
MODERATE SES  283  13.80  448  21.85 
LOW SES  622  30.34  1070  52.20 
VERY LOW SES  980  47.80  2050  100.00 
Goodness of Fit and Related ChiSquare Tests
15 Introducing the Goodness of Fit ChiSquare
So you are asking yourself, “goodness of fitting what to what?”
The chisquare (pronounced “kie” square) is an extremely useful, nonparametric statistical technique, that allows a researcher to compare responses from a sample to expected responses in a – hypothetical distribution of responses for a population. Hence the name goodness of fit test.
The chisquare goodness of fit test can be used to evaluate data at all variable levels, but because the currency of this test is count data, the goodness of fit test can be used to compute nominal and ordinal data.
The chisquare test evaluates data in the form of counts or frequencies, as in the number of responses within a given category, or the number of people who responded a given way to a specific question, or the number of cases across outcome categories.
The goodness of fit chisquare for one sample with four categories
In the following example, we consider the goodness of fit chisquare with four response categories. In this problem, we are studying a cohort of cancer patients to determine if cancer was more likely to be diagnosed in patients who are in a lowincome category, based on socioeconomic status (SES) quartiles. We begin by establishing that the expected distribution of cancer patients within the community is equally distributed across the four income categories so that in any community 25% of our population are in the highest SES category, 25% are in the moderate SES category, 25% are in the low SES income category, and 25% are in the very low SES category.
Proportional Distribution of Sample Across Socioeconomic Categories
However, in the observed data set for our sample of cancer patients, we recorded the following distribution of patients.
Highest SES 25% 
Moderate SES 25% 
Lower SES 25% 
Very Low SES 25% 
Data from the community sample of cancer patients collected over a 10 year period in a community with an average population of greater than 1 million households 

165 patients 
283 patients 
622 patients 
980 patients 
The null hypothesis for this study is stated in an unbiased way so that each SES quartile is expected to have an equal percentage of households with cancer patients. Therein, the term f_{(k)} = refers to the frequency or number of patients within the quartile indicated by the subscript (k). Since we have four groups representing four quartiles then (k) ranges from 1 to 4.
H_{0}: f_{1} = f_{2} = f_{3} = f_{4}
Since we have a total sample size of N = 2050, then each cell of the SES quartiles is expected to have a frequency (an expected number of patients) equal to 512.5 individuals.
The chisquare formula to test the null hypothesis is:
The equation measures how closely an observed set of responses (the“o” for “observed”) matches an expected set of responses (the “e” for “expected”).
So then how do we calculate the items that we use in the chisquare equation?
The observed frequencies are simply taken from the data recording sheet, but the expected frequencies
are computed from the following formula:
Another way to view the computation of the expected frequencies is to consider the null hypothesis which stated that:
H_{0}: f_{1}= f_{2}= f_{3}= f_{4}
and multiply the total frequency by the probability associated with each category, as in the following computations.
2050 x 0.25 = 512.5
The chisquare is then used to compute whether or not the observed distribution fits a hypothetical or expected distribution. This can be accomplished by setting up the following table below:


= 788.24 

Response Category 
Observed Frequency 
Expected Frequency 
(Obs – Exp)^{2 }÷ Exp 
1: High SES 
165 
512.5 
235.62 
2: Moderate SES 
283 
512.5 
102.77 
3: Low SES 
622 
512.5 
23.40 
4: Very Low SES 
980 
512.5 
425.45 
In this calculation for a onesample scenario with 4 outcome categories, we see that the Here the chisquare statistic is: 788.24. So what does this mean?
To evaluate the meaning of the variable we calculated for the Chisquare we need to review the decision rule for the Chisquare statistic, and shown here.
ChiSquare decision rule (onesample chisquare test): The computed score is referred to as the chisquare observed. After computing the chisquare observed value, determine the chisquare critical score from a table of chisquare values. The chisquare critical score represents what we should expect to observe for a distribution with five responses. The critical value is determined by computing the degrees of freedom for our response set. The computation of the degrees of freedom is: degrees of freedom = k possible responses 1 degrees of freedom = 51 degrees of freedom = 4 and the chisquare critical value for degrees of freedom of 4 at p<0.05 = 9.49 If the chisquare observed value is GREATER THAN the chisquare critical value of 9.49, we must reject the null hypothesis and state that the distribution of responses across the four categories IS NOT EQUAL. A large chisquare value, that is a value that exceeds the chisquare critical value demonstrates that the outcome is less likely to occur by chance. 
The chisquare statistic is computed as 788.24.
We, therefore, compare the chisquare observed value of 788.24 against a chisquare expected, based on the expected probability level and the degrees of freedom. In the k=4 chisquare, the degrees of freedom are: degrees of freedom = “k” possible responses 1, so that given k=4, then the degrees of freedom is 41 = 3 and at p<0.05 the chisquare critical value is 7.82. Therefore, since our chisquare observed value of 788.24 exceeds the chisquare critical (7.82) we reject the null hypothesis and state that the distribution of cancer patients is not equally distributed across the SES categories, and given the numbers we observed we can state that in this sample, the number of cancer patients in the very low SES group was significantly greater than the number of cancer patients in the high socioeconomic category.
The following is the SAS code used to analyze the data in the scenario above.
PROC FORMAT;
VALUE SLICE 1='HIGH SES' 2='MODERATE SES' 3='LOW SES' 4='VERY LOW SES';
DATA GFIT_1;
INPUT SESGRP N_PATNTS;
/* DEFINE THE AXIS CHARACTERISTICS */
AXIS1 LABEL=("SES CATEGORIES")
VALUE=(JUSTIFY=CENTER);
AXIS2 LABEL=(ANGLE=90 "ACTUAL NUMBER OF PATIENTS")
ORDER=(0 TO 1000 BY 100)
MINOR=(N=3);
AXIS3 LABEL=(ANGLE=90 "SES CATEGORIES");
AXIS4 LABEL=("ACTUAL NUMBER OF PATIENTS") ;
DATALINES;
1 165
2 283
3 622
4 980
;
/* HERE WE USE THE OPTION SUMVAR TO GRAPH THE SUM OF THE FREQ */
PROC FREQ ORDER=DATA; TABLES SESGRP/CHISQ CL CELLCHI2;
WEIGHT N_PATNTS;
FORMAT SESGRP SLICE. ;
TITLE 'FREQUENCY DISTRIBUTION FOR PROPORTION OF PATIENTS IN EACH SES GROUP';
TITLE2 'ONE SAMPLE GOODNESS OF FIT EXAMPLE FOR K=4';
RUN;
The output for Chisquare computation is shown here:
The FREQUENCY procedure including the chisquare statistic to evaluate the null hypothesis H_{0}: f_{1} = f_{2} = f_{3} = f_{4}.
ChiSquare Test for Equal Proportions 


ChiSquare  788.2400 
DF  3 
Pr > ChiSq  <.0001 
The SAS code to produce the pie chart is as follows:
PROC FORMAT;
VALUE SLICE 1='HIGH SES' 2='MODERATE SES' 3='LOW SES' 4='VERY LOW SES';
PROC GCHART DATA=GFIT_1;
PIE3D SESGRP/SUMVAR=N_PATNTS TYPE=SUM DISCRETE PERCENT=inside
COUTLINE=RED WOUTLINE=1 FILL=SOLID SLICE =ARROW CLOCKWISE
NOLEGEND NOHEADING VALUE=NONE;
FORMAT SESGRP SLICE. ;
TITLE1 'PIE CHART FOR PROPORTION OF PATIENTS IN EACH SES GROUP';
PATTERN1 COLOR = LIGHTBLUE;
RUN;
PIE CHART FOR PROPORTION OF PATIENTS IN EACH SES GROUP
Webulator Form 1:
The following is a Goodness of Fit Webulator for k= 4 responses In the table above we used the values for socioeconomic status:
HIGH SES  165 

MODERATE SES  283 
LOW SES  622 
VERY LOW SES  980 
Enter these data into the webulator below for each of your four options and then click the button labelled compute expected frequencies. This will produce the sum of the four values that you entered and compute the expected frequency for the values in the table.
The important value from this Webulator is the computed chisquare score. The computed score is referred to as the chisquare observed. After computing the chisquare observed value, determine the chisquare critical score from a table of chisquare values. The chisquare critical score represents what we should expect to observe for the distribution with “k” responses. The critical value is determined by computing the “degrees of freedom” for our response set.
The computation of the degrees of freedom is: degrees of freedom = “k” possible responses 1
degrees of freedom = 41 –> degrees of freedom = 3
and the “chisquare critical value” for degrees of freedom of “3” at p<0.05 = 7.815
If the “chisquare observed value ” is › the “chisquare critical value of 7.815”, we must reject the null hypothesis and state that the distribution of responses across the response categories IS NOT EQUAL.