Parametric Statistics

# 31 Research Design Applications with PROC GLM

Learner Outcomes

After reading this chapter you should be able to:

- Compute the significance of the difference between three or more sample means using PROC GLM for the one-way analysis of variance test
- Compute the significance of the association between an outcome and one or several predictors using PROC GLM as a linear regression model
- Compute the post hoc comparison between sample means when the F statistic is significant using posthoc analysis procedures (in either ANOVA applications or linear regression applications)

**INTRODUCTION TO**** GENERAL LINEAR MODELS IN SAS **

A univariate general linear model is defined as a statistical model in which a dependent variable is modeled in relation to a set of predictor variables. The predictor variables can be categorical independent variables with multiple levels, or they can be a continuous variable, or the predictor variables can be a combination of categorical and continuous independent variables. In the application of statistical processing for research designs, where the dependent variable is a continuous scaled score, and the independent variables are categorically scored, the researcher can use either the analysis of variance or a general linear model.

In SAS, the F statistic can be computed with either the PROC ANOVA procedures described previously or with the PROC GLM procedure with similar post-analytic processes to establish not only the significance of the main effects but also of the characteristics of the distribution, like measures of normality and equality of variance, there are limitations to the application of the PROC ANOVA which suggest that the use of PROC GLM is more appropriate. For example, the PROC GLM procedure is preferable to PROC ANOVA when using unbalanced comparison groups, when combining categorical and continuous predictors as in an analysis of covariance, and when attempting to evaluate the dependent measure using complex interactions as in nested designs.

In this chapter, we will explore the SAS application of the PROC GLM procedures to evaluate the F statistic represented by the statement: F = variance between samples divided by the variance within samples. Next, we will explore the relationship between the outcome and predictor variables based on the concept that the dependent variable = independent variable ± error, which we can represent algebraically as: [latex]Y_{ij} = \beta_{0} \pm \beta_{i}X_{i} + \epsilon[/latex]

Extending from this General Linear Model (GLM) approach, we will introduce the General Linear **Mixed** Model, which we will analyze with the **PROC MIXED** application, which adds the following parameter [latex]U_{i}[/latex] into the General Linear Model Equation. This parameter represents the random effect in the model. [latex]Y_{ij} = \beta_{0} \pm \beta_{i}X_{i} \pm U_{i} + \epsilon[/latex]

**Applying PROC GLM to evaluate a one-way ANOVA design. **

The following describes a 12 week experiment in which researchers were interested in the effects of coffee consumption on resting systolic blood pressure for a sample of healthy male participants. The study participants were randomly selected from the total sample of volunteers and randomly allocated into three groups. Group 1 was comprised of 20 individuals that were asked to consume a total of 2000 ml of coffee each morning of the 12-week program between the hours of 6 and 8 am. Group 2 was comprised of 20 individuals that were asked to consume a total of 2000 ml of de-caffeinated coffee each morning of the 12-week program between the hours of 6 and 8 am, and Group 3 was comprised of 20 individuals that were asked to consume a total of 2000 ml of hot water with no additive each morning of the 12-week program between the hours of 6 and 8 am. Resting systolic blood pressure measures were taken on day 84 and recorded in the following table. The dependent variable was then determined to be the systolic resting blood pressure on day 84. The raw data and SAS code are shown below:

Group 1 – caffeinated coffee
Systolic Blood Pressure (mmHg) |
Group 2 – de-caffeinated coffee
Systolic Blood Pressure (mmHg) |
Group 3 – Placebo
Systolic Blood Pressure (mmHg) |

134 | 115 | 125 |

152 | 114 | 126 |

161 | 119 | 128 |

139 | 115 | 122 |

149 | 114 | 126 |

158 | 113 | 117 |

167 | 115 | 113 |

151 | 111 | 116 |

148 | 123 | 114 |

144 | 110 | 115 |

124 | 115 | 129 |

122 | 116 | 116 |

121 | 113 | 118 |

129 | 119 | 112 |

129 | 111 | 116 |

128 | 112 | 127 |

127 | 110 | 123 |

131 | 115 | 126 |

128 | 111 | 124 |

124 | 114 | 125 |

data glm1;

Title ‘GLM analysis of Systolic Blood Pressure Data’;

input id 1-2 @4 grp sysbp;

datalines;

134 115 125

152 114 126

161 119 128

139 115 122

149 114 126

158 113 117

167 115 113

151 111 116

148 123 114

144 110 115

124 115 129

122 116 116

121 113 118

129 119 112

129 111 116

128 112 127

127 110 123

131 115 126

128 111 124

124 114 125

;

proc sort data=glm1; by id;

proc glm;

class grp; model sysbp = grp;

run;

The output from this SAS Program is explained below.

**GLM analysis of Systolic Blood Pressure Data using Systolic Blood Pressure (SYSBP) as the ****Dependent Variable**

Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |

Model |
2 | 6169.23333 | 3084.61667 | 37.57 | <.0001 |

Error |
57 | 4679.75000 | 82.10088 | ||

Corrected Total |
59 | 10848.98333 |

R-Square |
Coeff Var |
Root MSE |
sysbp Mean |

0.568646 | 7.278849 | 9.060953 | 124.4833 |

Source |
DF |
Type I SS |
Mean Square |
F Value |
Pr > F |

grp |
2 | 6169.233333 | 3084.616667 | 37.57 | <.0001 |

Source |
DF |
Type III SS |
Mean Square |
F Value |
Pr > F |

grp |
2 | 6169.233333 | 3084.616667 | 37.57 | <.0001 |

The comparison of means across groups was analyzed using the SAS code **lsmeans grp/ adjust= scheffe; **as shown here.

**GLM analysis of Systolic Blood Pressure Data**

**The GLM Procedure using ****Least Squares Means ****Adjustment for Multiple Comparisons: Scheffe**

grp |
sysbp LSMEAN |
LSMEAN Number |

1 |
138.300000 | 1 |

2 |
114.250000 | 2 |

3 |
120.900000 | 3 |

Dependent Variable: sysbpLeast Squares Means for effect grpPr > |t| for H0: LSMean(i)=LSMean(j) |
|||

i/j |
1 |
2 |
3 |

1 |
<.0001 | <.0001 | |

2 |
<.0001 | 0.0763 | |

3 |
<.0001 | 0.0763 |

**means grp /hovtest welch tukey scheffe;**

**GLM analysis of Systolic Blood Pressure Data- Main Effects Analysis**

Levene’s Test for Homogeneity of sysbp VarianceANOVA of Squared Deviations from Group Means |
|||||

Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |

grp |
2 | 406309 | 203155 | 15.59 | <.0001 |

Error |
57 | 742833 | 13032.2 |

Welch’s ANOVA for sysbp |
|||

Source |
DF |
F Value |
Pr > F |

grp |
2.0000 | 33.35 | <.0001 |

Error |
32.1316 |

** ****GLM analysis of Systolic Blood Pressure Data with the Post Hoc ****t Tests (LSD) for sysbp**

**Note: **This test controls the Type I comparison wise error rate, not the experiment wise error rate.

Alpha |
0.05 |

Error Degrees of Freedom |
57 |

Error Mean Square |
82.10088 |

Critical Value of t |
2.00247 |

Least Significant Difference |
5.7377 |

Means with the same letter are not significantly different. |
|||

t Grouping |
Mean |
N |
grp |

A | 138.300 | 20 | 1 |

B | 120.900 | 20 | 3 |

C | 114.250 | 20 | 2 |

** ****GLM analysis of Systolic Blood Pressure Data with the ****Tukey’s Studentized Range (HSD) Test for sysbp**

** ****Note: **This test controls the Type I experiment-wise error rate, but it generally has a higher Type II error rate than REGWQ.

Alpha |
0.05 |

Error Degrees of Freedom |
57 |

Error Mean Square |
82.10088 |

Critical Value of Studentized Range |
3.40311 |

Minimum Significant Difference |
6.895 |

Means with the same letter are not significantly different. |
|||

Tukey Grouping |
Mean |
N |
grp |

A | 138.300 | 20 | 1 |

B | 120.900 | 20 | 3 |

B | 114.250 | 20 | 2 |

**GLM analysis of Systolic Blood Pressure Data with the ****Scheffe’s Test for sysbp**

**Note: **This test controls the Type I experiment-wise error rate.

Alpha |
0.05 |

Error Degrees of Freedom |
57 |

Error Mean Square |
82.10088 |

Critical Value of F |
3.15884 |

Minimum Significant Difference |
7.202 |

Means with the same letter are not significantly different. |
|||

Scheffe Grouping |
Mean |
N |
grp |

A | 138.300 | 20 | 1 |

B | 120.900 | 20 | 3 |

B | 114.250 | 20 | 2 |

If we rerun the analysis with the class statement removed we can generate the coefficients for the independent variables.

**proc glm ;
model sysbp = grp;**

Parameter |
Estimate |
StandardError |
t Value |
Pr > |t| |

Intercept |
141.8833333 | 3.96644269 | 35.77 | <.0001 |

grp |
-8.7000000 | 1.83610618 | -4.74 | <.0001 |

** ****Adding A Second Grouping Factor To a GLM Model**

Consider the analysis we used in the PROC ANOVA computations used in Chapter 9, where we were interested in evaluating the effects of a one-hour activity break into the workday, believing that such an opportunity could reduce the resting heart rates of the participants and thereby lead to a healthier workforce.

You will recall that the research design began with 66 participants that were randomly selected from a sample of employees within the company, and randomly allocated to one of three treatment groups. In the following analysis, we used PROC GLM and the post hoc procedure LSMEANS to evaluate the cell-wise interaction component to evaluate the individual cell means between the treatment levels (walking versus dancing versus book reading), for each level of sex (males versus females).

**PROC glm data=anova2x3;
title ‘Using PROCGLM to determine interaction effect ‘;
class sex group ;
model hrchange =sex group sex*group;
lsmeans sex*group/ diff;
run;**

The results from the LSMEANS analysis are shown here Using PROC GLM to determine interaction effect

**The GLM Procedure: ****Least Squares Means**

sex |
group |
hrchange LSMEAN |
LSMEAN Number |

F |
1 |
-4.5454545 | 1 |

F |
2 |
-10.3181818 | 2 |

F |
3 |
5.8181818 | 3 |

M |
1 |
-4.2727273 | 4 |

M |
2 |
-2.0000000 | 5 |

M |
3 |
6.5454545 | 6 |

Dependent Variable: hrchangeLeast Squares Means for effect sex*groupPr > |t| for H0: LSMean(i)=LSMean(j) |
||||||

i/j |
1 |
2 |
3 |
4 |
5 |
6 |

1 |
<.0001 | <.0001 | 0.8183 | 0.0336 | <.0001 | |

2 |
<.0001 | <.0001 | <.0001 | <.0001 | <.0001 | |

3 |
<.0001 | <.0001 | <.0001 | <.0001 | 0.5404 | |

4 |
0.8183 | <.0001 | <.0001 | 0.0573 | <.0001 | |

5 |
0.0336 | <.0001 | <.0001 | 0.0573 | <.0001 | |

6 |
<.0001 | <.0001 | 0.5404 | <.0001 | <.0001 |

Notice the matrix indicates the probability level at which the pairwise comparisons between cell means are different. Sine most comparisons were significantly different, only the comparisons that showed a probability level of p >0.05, are highlighted in red. These results support the notion that being physically active, whether it be dancing or walking as planned exercise, has a positive effect on reducing resting heart rates, and more so for females than males.