Parametric Statistics

32 Research Design Applications with PROC GLM

Learner Outcomes

After reading this chapter you should be able to:

  • Compute the significance of the difference between three or more sample means using PROC GLM for the one-way analysis of variance test
  • Compute the significance of the association between an outcome and one or several predictors using PROC GLM as a linear regression model
  • Compute the post hoc comparison between sample means when the F statistic is significant using posthoc analysis procedures (in either ANOVA applications or linear regression applications)

INTRODUCTION TO GENERAL LINEAR MODELS IN SAS

A univariate general linear model is defined as a statistical model in which a dependent variable is modeled in relation to a set of predictor variables. The predictor variables can be categorical independent variables with multiple levels, or they can be a continuous variable, or the predictor variables can be a combination of categorical and continuous independent variables. In the application of statistical processing for research designs, where the dependent variable is a continuous scaled score, and the independent variables are categorically scored, the researcher can use either the analysis of variance or a general linear model.

In SAS, the F statistic can be computed with either the PROC ANOVA procedures described previously or with the PROC GLM procedure with similar post-analytic processes to establish not only the significance of the main effects but also of the characteristics of the distribution, like measures of normality and equality of variance, there are limitations to the application of the PROC ANOVA which suggest that the use of PROC GLM is more appropriate. For example, the PROC GLM procedure is preferable to PROC ANOVA when using unbalanced comparison groups, when combining categorical and continuous predictors as in an analysis of covariance, and when attempting to evaluate the dependent measure using complex interactions as in nested designs.

In this chapter, we will explore the SAS application of the PROC GLM procedures to evaluate the F statistic represented by the statement: F = variance between samples divided by the variance within samples. Next, we will explore the relationship between the outcome and predictor variables based on the concept that the dependent variable = independent variable ± error, which we can represent algebraically as: [latex]Y_{ij} = \beta_{0} \pm \beta_{i}X_{i} + \epsilon[/latex]

Extending from this General Linear Model (GLM) approach, we will introduce the General Linear Mixed Model, which we will analyze with the PROC MIXED application, which adds the following parameter [latex]U_{i}[/latex] into the General Linear Model Equation. This parameter represents the random effect in the model. [latex]Y_{ij} = \beta_{0} \pm \beta_{i}X_{i} \pm U_{i} + \epsilon[/latex]

Applying PROC GLM to evaluate a one-way ANOVA design.

The following describes a 12 week experiment in which researchers were interested in the effects of coffee consumption on resting systolic blood pressure for a sample of healthy male participants.  The study participants were randomly selected from the total sample of volunteers and randomly allocated into three groups.  Group 1 was comprised of 20 individuals that were asked to consume a total of 2000 ml of coffee each morning of the 12-week program between the hours of 6 and 8 am.  Group 2 was comprised of 20 individuals that were asked to consume a total of 2000 ml of de-caffeinated coffee each morning of the 12-week program between the hours of 6 and 8 am, and Group 3 was comprised of 20 individuals that were asked to consume a total of 2000 ml of hot water with no additive each morning of the 12-week program between the hours of 6 and 8 am. Resting systolic blood pressure measures were taken on day 84 and recorded in the following table. The dependent variable was then determined to be the systolic resting blood pressure on day 84. The raw data and SAS code are shown below:

Group 1 – caffeinated coffee

Systolic Blood Pressure (mmHg)

Group 2 – de-caffeinated coffee

Systolic Blood Pressure (mmHg)

Group 3 – Placebo

Systolic Blood Pressure (mmHg)

134 115 125
152 114 126
161 119 128
139 115 122
149 114 126
158 113 117
167 115 113
151 111 116
148 123 114
144 110 115
124 115 129
122 116 116
121 113 118
129 119 112
129 111 116
128 112 127
127 110 123
131 115 126
128 111 124
124 114 125
options pagesize=55 linesize=120 center date;
data glm1;
Title ‘GLM analysis of Systolic Blood Pressure Data’;
input id 1-2 @4 grp sysbp;
datalines;
134 115 125
152 114 126
161 119 128
139 115 122
149 114 126
158 113 117
167 115 113
151 111 116
148 123 114
144 110 115
124 115 129
122 116 116
121 113 118
129 119 112
129 111 116
128 112 127
127 110 123
131 115 126
128 111 124
124 114 125
;
proc sort data=glm1; by id;
proc glm;
class grp; model sysbp = grp;
run;

The output from this SAS Program is explained below.

GLM analysis of Systolic Blood Pressure Data using Systolic Blood Pressure (SYSBP) as the Dependent Variable

Source DF Sum of Squares Mean Square F Value Pr > F
Model 2 6169.23333 3084.61667 37.57 <.0001
Error 57 4679.75000 82.10088
Corrected Total 59 10848.98333
R-Square Coeff Var Root MSE sysbp Mean
0.568646 7.278849 9.060953 124.4833
Source DF Type I SS Mean Square F Value Pr > F
grp 2 6169.233333 3084.616667 37.57 <.0001
Source DF Type III SS Mean Square F Value Pr > F
grp 2 6169.233333 3084.616667 37.57 <.0001

The comparison of means across groups was analyzed using the SAS code lsmeans grp/ adjust= scheffe;  as shown here.
GLM analysis of Systolic Blood Pressure Data
The GLM Procedure using Least Squares Means Adjustment for Multiple Comparisons: Scheffe

grp sysbp LSMEAN LSMEAN Number
1 138.300000 1
2 114.250000 2
3 120.900000 3
Least Squares Means for effect grp
Pr > |t| for H0: LSMean(i)=LSMean(j)
Dependent Variable: sysbp
i/j 1 2 3
1 <.0001 <.0001
2 <.0001 0.0763
3 <.0001 0.0763

means grp /hovtest welch tukey scheffe;

GLM analysis of Systolic Blood Pressure Data- Main Effects Analysis

Levene’s Test for Homogeneity of sysbp Variance
ANOVA of Squared Deviations from Group Means
Source DF Sum of Squares Mean Square F Value Pr > F
grp 2 406309 203155 15.59 <.0001
Error 57 742833 13032.2
Welch’s ANOVA for sysbp
Source DF F Value Pr > F
grp 2.0000 33.35 <.0001
Error 32.1316

 GLM analysis of Systolic Blood Pressure Data with the Post Hoc t Tests (LSD) for sysbp

Note: This test controls the Type I comparison wise error rate, not the experiment wise error rate.

Alpha 0.05
Error Degrees of Freedom 57
Error Mean Square 82.10088
Critical Value of t 2.00247
Least Significant Difference 5.7377
Means with the same letter are not significantly different.
t Grouping Mean N grp
A 138.300 20 1
B 120.900 20 3
C 114.250 20 2

 GLM analysis of Systolic Blood Pressure Data with the Tukey’s Studentized Range (HSD) Test for sysbp

 Note: This test controls the Type I experiment-wise error rate, but it generally has a higher Type II error rate than REGWQ.

Alpha 0.05
Error Degrees of Freedom 57
Error Mean Square 82.10088
Critical Value of Studentized Range 3.40311
Minimum Significant Difference 6.895
Means with the same letter are not significantly different.
Tukey Grouping Mean N grp
A 138.300 20 1
B 120.900 20 3
B 114.250 20 2

GLM analysis of Systolic Blood Pressure Data with the Scheffe’s Test for sysbp

Note: This test controls the Type I experiment-wise error rate.

Alpha 0.05
Error Degrees of Freedom 57
Error Mean Square 82.10088
Critical Value of F 3.15884
Minimum Significant Difference 7.202
Means with the same letter are not significantly different.
Scheffe Grouping Mean N grp
A 138.300 20 1
B 120.900 20 3
B 114.250 20 2

If we rerun the analysis with the class statement removed we can generate the coefficients for the independent variables.

proc glm ;
model sysbp = grp;

Parameter Estimate Standard
Error
t Value Pr > |t|
Intercept 141.8833333 3.96644269 35.77 <.0001
grp -8.7000000 1.83610618 -4.74 <.0001

 Adding A Second Grouping Factor To a GLM Model

Consider the analysis we used in the PROC ANOVA computations used in Chapter 9, where we were interested in evaluating the effects of a one-hour activity break into the workday, believing that such an opportunity could reduce the resting heart rates of the participants and thereby lead to a healthier workforce.

You will recall that the research design began with 66 participants that were randomly selected from a sample of employees within the company, and randomly allocated to one of three treatment groups.  In the following analysis, we used PROC GLM and the post hoc procedure LSMEANS  to evaluate the cell-wise interaction component to evaluate the individual cell means between the treatment levels (walking versus dancing versus book reading), for each level of sex (males versus females).

PROC glm data=anova2x3;
title ‘Using PROCGLM to determine interaction effect ‘;
class sex group ;
model hrchange =sex group sex*group;
lsmeans sex*group/ diff;
run;

The results from the LSMEANS analysis are shown here Using PROC GLM to determine interaction effect

The GLM Procedure: Least Squares Means

sex group hrchange LSMEAN LSMEAN Number
F 1 -4.5454545 1
F 2 -10.3181818 2
F 3 5.8181818 3
M 1 -4.2727273 4
M 2 -2.0000000 5
M 3 6.5454545 6
Least Squares Means for effect sex*group
Pr > |t| for H0: LSMean(i)=LSMean(j)
Dependent Variable: hrchange
i/j 1 2 3 4 5 6
1 <.0001 <.0001 0.8183 0.0336 <.0001
2 <.0001 <.0001 <.0001 <.0001 <.0001
3 <.0001 <.0001 <.0001 <.0001 0.5404
4 0.8183 <.0001 <.0001 0.0573 <.0001
5 0.0336 <.0001 <.0001 0.0573 <.0001
6 <.0001 <.0001 0.5404 <.0001 <.0001

Notice the matrix indicates the probability level at which the pairwise comparisons between cell means are different. Sine most comparisons were significantly different, only the comparisons that showed a probability level of p >0.05, are highlighted in red. These results support the notion that being physically active, whether it be dancing or walking as planned exercise, has a positive effect on reducing resting heart rates, and more so for females than males.

License

Icon for the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Applied Statistics in Healthcare Research Copyright © 2020 by William J. Montelpare, Ph.D., Emily Read, Ph.D., Teri McComber, Alyson Mahar, Ph.D., and Krista Ritchie, Ph.D. is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, except where otherwise noted.

Share This Book