Parametric Statistics

# 33 Logistic Regression Analysis using PROC LOGISTIC

Learner Outcomes

After reading this chapter you should be able to:

• Define and compute a logit
• Define and describe simple logistic regression
• Create a SAS program to compute the outcome for a logistic regression application
• Describe the use of logistic regression in evaluating the null hypothesis
• Identify the critical components in the output generated from a logistic regression application
• Determine if your study design is multi-level (hierarchical design) and how to use PROC GLIMMIX to account for this design in your analyses
Introduction to Logistic Regression

Consider the application of logistic regression to be synonymous with the computation of ordered least squares regression (OLS) which we studied previously using Proc Reg and Proc GLM applications. However, the difference between these general linear model applications is that the dependent variable was a continuous variable. Conversely, in the use of logistic regression we are interested in evaluating a dependent variable that is binary and has outcome values limited to two possibilities (e.g. 0 or 1).

Most often we apply the logistic regression approach when the dependent variable is binary or dichotomous. We can call this approach binary logistic regression. The dependent variable can take on one of two outcome values like yes or no, 0 or 1, success or failure.

However, we can also use logistic regression to analyze data when the dependent variable has multiple categories, which we call multinomial logistic regression. In the case of multinomial logistic regression the dependent variable is categorical – presenting a discrete value in which there are more than two possible responses, as is the case in a multiple response categorical scale. The outcome measure can be a subjective value produced by a respondent, or it can result from arranging participants into specific groups or categories.

Figure 15.1 presents an example of a binary logistic regression model in which the dependent variable has one of two outcome values: cancer positive or cancer negative, and an exposure variable: exposure to tobacco smoke, as shown here.

Figure 15.1 Example of a binary logistic regression model

Consider the following data set for the model shown above.

 Table of cancer outcomes related to smoking Smoker status Disease status: lung cancer Cancer positive Cancer negative Total Smoker 13 20 33 Non-smoker 6 41 47 Total 19 61 80

Table 15.1 Distribution of data for the nominal dependent variable cancer and the independent variable smoking status.

In studying linear regression analyses we discussed the computation of the coefficients that are used to adjust the x variables (independent measure(s) – here the measure is smoking status) as they influence the y variable (dependent measure). That is, we used simple linear regression and the PROC REG procedure to produce a slope score (the regression coefficient or parameter estimate) which acts on to produce () the outcome.  However, as we stated previously, in simple linear regression the dependent or outcome variable can take on any value from the real number line.

In logistic regression the measure of interest is a binary value (one of two possible outcomes) which is converted mathematically to a value ranging from 0 to 1 that we call a logit. The mathematical transformation of the binary outcome score to a logit value is computed using the following process.

Logit =

The logit is used in the logistic regression procedure where the logit represents the dependent variable and is forecast by a linear combination of the predictor variables. 