Research Design Applications with PROC GLM

William J. Montelpare; Emily Read; Teri McComber; Alyson Mahar; Krista Ritchie

Parametric Statistics

32 Research Design Applications with PROC GLM

Learner Outcomes

After reading this chapter you should be able to:

Compute the significance of the difference between three or more sample means using PROC GLM for the one-way analysis of variance test
Compute the significance of the association between an outcome and one or several predictors using PROC GLM as a linear regression model
Compute the post hoc comparison between sample means when the F statistic is significant using posthoc analysis procedures (in either ANOVA applications or linear regression applications)

INTRODUCTION TO GENERAL LINEAR MODELS IN SAS

A univariate general linear model is defined as a statistical model in which a dependent variable is modeled in relation to a set of predictor variables. The predictor variables can be categorical independent variables with multiple levels, or they can be a continuous variable, or the predictor variables can be a combination of categorical and continuous independent variables. In the application of statistical processing for research designs, where the dependent variable is a continuous scaled score, and the independent variables are categorically scored, the researcher can use either the analysis of variance or a general linear model.

In SAS, the F statistic can be computed with either the PROC ANOVA procedures described previously or with the PROC GLM procedure with similar post-analytic processes to establish not only the significance of the main effects but also of the characteristics of the distribution, like measures of normality and equality of variance, there are limitations to the application of the PROC ANOVA which suggest that the use of PROC GLM is more appropriate. For example, the PROC GLM procedure is preferable to PROC ANOVA when using unbalanced comparison groups, when combining categorical and continuous predictors as in an analysis of covariance, and when attempting to evaluate the dependent measure using complex interactions as in nested designs.

In this chapter, we will explore the SAS application of the PROC GLM procedures to evaluate the F statistic represented by the statement: F = variance between samples divided by the variance within samples. Next, we will explore the relationship between the outcome and predictor variables based on the concept that the dependent variable = independent variable ± error, which we can represent algebraically as: [latex]Y_{ij} = \beta_{0} \pm \beta_{i}X_{i} + \epsilon[/latex]

Extending from this General Linear Model (GLM) approach, we will introduce the General Linear Mixed Model, which we will analyze with the PROC MIXED application, which adds the following parameter [latex]U_{i}[/latex] into the General Linear Model Equation. This parameter represents the random effect in the model. [latex]Y_{ij} = \beta_{0} \pm \beta_{i}X_{i} \pm U_{i} + \epsilon[/latex]

Applying PROC GLM to evaluate a one-way ANOVA design.

The following describes a 12 week experiment in which researchers were interested in the effects of coffee consumption on resting systolic blood pressure for a sample of healthy male participants. The study participants were randomly selected from the total sample of volunteers and randomly allocated into three groups. Group 1 was comprised of 20 individuals that were asked to consume a total of 2000 ml of coffee each morning of the 12-week program between the hours of 6 and 8 am. Group 2 was comprised of 20 individuals that were asked to consume a total of 2000 ml of de-caffeinated coffee each morning of the 12-week program between the hours of 6 and 8 am, and Group 3 was comprised of 20 individuals that were asked to consume a total of 2000 ml of hot water with no additive each morning of the 12-week program between the hours of 6 and 8 am. Resting systolic blood pressure measures were taken on day 84 and recorded in the following table. The dependent variable was then determined to be the systolic resting blood pressure on day 84. The raw data and SAS code are shown below:

Group 1 – caffeinated coffee Systolic Blood Pressure (mmHg)	Group 2 – de-caffeinated coffee Systolic Blood Pressure (mmHg)	Group 3 – Placebo Systolic Blood Pressure (mmHg)
134	115	125
152	114	126
161	119	128
139	115	122
149	114	126
158	113	117
167	115	113
151	111	116
148	123	114
144	110	115
124	115	129
122	116	116
121	113	118
129	119	112
129	111	116
128	112	127
127	110	123
131	115	126
128	111	124
124	114	125

options pagesize=55 linesize=120 center date;
data glm1;
Title ‘GLM analysis of Systolic Blood Pressure Data’;
input id 1-2 @4 grp sysbp;
datalines;
134 115 125
152 114 126
161 119 128
139 115 122
149 114 126
158 113 117
167 115 113
151 111 116
148 123 114
144 110 115
124 115 129
122 116 116
121 113 118
129 119 112
129 111 116
128 112 127
127 110 123
131 115 126
128 111 124
124 114 125
;
proc sort data=glm1; by id;
proc glm;
class grp; model sysbp = grp;
run;

The output from this SAS Program is explained below.

GLM analysis of Systolic Blood Pressure Data using Systolic Blood Pressure (SYSBP) as the Dependent Variable

Source	DF	Sum of Squares	Mean Square	F Value	Pr > F
Model	2	6169.23333	3084.61667	37.57	<.0001
Error	57	4679.75000	82.10088
Corrected Total	59	10848.98333

R-Square	Coeff Var	Root MSE	sysbp Mean
0.568646	7.278849	9.060953	124.4833

Source	DF	Type I SS	Mean Square	F Value	Pr > F
grp	2	6169.233333	3084.616667	37.57	<.0001

Source	DF	Type III SS	Mean Square	F Value	Pr > F
grp	2	6169.233333	3084.616667	37.57	<.0001

The comparison of means across groups was analyzed using the SAS code lsmeans grp/ adjust= scheffe; as shown here.
GLM analysis of Systolic Blood Pressure Data
The GLM Procedure using Least Squares Means Adjustment for Multiple Comparisons: Scheffe

grp	sysbp LSMEAN	LSMEAN Number
1	138.300000	1
2	114.250000	2
3	120.900000	3

Least Squares Means for effect grp Pr > \|t\| for H0: LSMean(i)=LSMean(j)Dependent Variable: sysbp
i/j	1	2	3
1		<.0001	<.0001
2	<.0001		0.0763
3	<.0001	0.0763

means grp /hovtest welch tukey scheffe;

GLM analysis of Systolic Blood Pressure Data- Main Effects Analysis

Levene’s Test for Homogeneity of sysbp Variance ANOVA of Squared Deviations from Group Means
Source	DF	Sum of Squares	Mean Square	F Value	Pr > F
grp	2	406309	203155	15.59	<.0001
Error	57	742833	13032.2

Welch’s ANOVA for sysbp
Source	DF	F Value	Pr > F
grp	2.0000	33.35	<.0001
Error	32.1316

GLM analysis of Systolic Blood Pressure Data with the Post Hoc t Tests (LSD) for sysbp

Note: This test controls the Type I comparison wise error rate, not the experiment wise error rate.

Alpha	0.05
Error Degrees of Freedom	57
Error Mean Square	82.10088
Critical Value of t	2.00247
Least Significant Difference	5.7377

Means with the same letter are not significantly different.
t Grouping	Mean	N	grp
A	138.300	20	1
B	120.900	20	3
C	114.250	20	2

GLM analysis of Systolic Blood Pressure Data with the Tukey’s Studentized Range (HSD) Test for sysbp

Note: This test controls the Type I experiment-wise error rate, but it generally has a higher Type II error rate than REGWQ.

Alpha	0.05
Error Degrees of Freedom	57
Error Mean Square	82.10088
Critical Value of Studentized Range	3.40311
Minimum Significant Difference	6.895

Means with the same letter are not significantly different.
Tukey Grouping	Mean	N	grp
A	138.300	20	1
B	120.900	20	3
B	114.250	20	2

GLM analysis of Systolic Blood Pressure Data with the Scheffe’s Test for sysbp

Note: This test controls the Type I experiment-wise error rate.

Alpha	0.05
Error Degrees of Freedom	57
Error Mean Square	82.10088
Critical Value of F	3.15884
Minimum Significant Difference	7.202

Means with the same letter are not significantly different.
Scheffe Grouping	Mean	N	grp
A	138.300	20	1
B	120.900	20	3
B	114.250	20	2

If we rerun the analysis with the class statement removed we can generate the coefficients for the independent variables.

proc glm ;
model sysbp = grp;

Parameter	Estimate	Standard Error	t Value	Pr > \|t\|
Intercept	141.8833333	3.96644269	35.77	<.0001
grp	-8.7000000	1.83610618	-4.74	<.0001

Adding A Second Grouping Factor To a GLM Model

Consider the analysis we used in the PROC ANOVA computations used in Chapter 9, where we were interested in evaluating the effects of a one-hour activity break into the workday, believing that such an opportunity could reduce the resting heart rates of the participants and thereby lead to a healthier workforce.

You will recall that the research design began with 66 participants that were randomly selected from a sample of employees within the company, and randomly allocated to one of three treatment groups. In the following analysis, we used PROC GLM and the post hoc procedure LSMEANS to evaluate the cell-wise interaction component to evaluate the individual cell means between the treatment levels (walking versus dancing versus book reading), for each level of sex (males versus females).

PROC glm data=anova2x3;
title ‘Using PROCGLM to determine interaction effect ‘;
class sex group ;
model hrchange =sex group sex*group;
lsmeans sex*group/ diff;
run;

The results from the LSMEANS analysis are shown here Using PROC GLM to determine interaction effect

The GLM Procedure: Least Squares Means

sex	group	hrchange LSMEAN	LSMEAN Number
F	1	-4.5454545	1
F	2	-10.3181818	2
F	3	5.8181818	3
M	1	-4.2727273	4
M	2	-2.0000000	5
M	3	6.5454545	6

Least Squares Means for effect sex*group Pr > \|t\| for H0: LSMean(i)=LSMean(j)Dependent Variable: hrchange
i/j	1	2	3	4	5	6
1		<.0001	<.0001	0.8183	0.0336	<.0001
2	<.0001		<.0001	<.0001	<.0001	<.0001
3	<.0001	<.0001		<.0001	<.0001	0.5404
4	0.8183	<.0001	<.0001		0.0573	<.0001
5	0.0336	<.0001	<.0001	0.0573		<.0001
6	<.0001	<.0001	0.5404	<.0001	<.0001

Notice the matrix indicates the probability level at which the pairwise comparisons between cell means are different. Sine most comparisons were significantly different, only the comparisons that showed a probability level of p >0.05, are highlighted in red. These results support the notion that being physically active, whether it be dancing or walking as planned exercise, has a positive effect on reducing resting heart rates, and more so for females than males.

License

Icon for the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Applied Statistics in Healthcare Research Copyright © 2020 by William J. Montelpare, Ph.D., Emily Read, Ph.D., Teri McComber, Alyson Mahar, Ph.D., and Krista Ritchie, Ph.D. is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, except where otherwise noted.

License

Share This Book