{"id":996,"date":"2020-06-02T12:36:02","date_gmt":"2020-06-02T16:36:02","guid":{"rendered":"http:\/\/pressbooks.library.upei.ca\/montelpare\/?post_type=chapter&#038;p=996"},"modified":"2020-08-24T14:19:48","modified_gmt":"2020-08-24T18:19:48","slug":"survival-analysis","status":"publish","type":"chapter","link":"https:\/\/pressbooks.library.upei.ca\/montelpare\/chapter\/survival-analysis\/","title":{"raw":"Survival Analysis","rendered":"Survival Analysis"},"content":{"raw":"<h2 class=\"ABodyCopy\"><span lang=\"EN-US\">Essential Background in Survival Analysis<\/span><\/h2>\r\n<p class=\"ABodyCopy\"><span lang=\"EN-US\">Survival analysis can be considered in its simplest form as a method to analyze longitudinal data for a cohort, or for a comparison of cohorts with a specific interest in the proportion of individuals that reached or exceeded a definite point on a time scale. <\/span><\/p>\r\nIn survival analysis, the demarcation point for the event of interest on a time scale is referred to in a variety of ways but is dependent upon the perspective of the researcher.\u00a0 For example, if the researcher is interested in the application of survival analysis to estimate mortality as a result of a given treatment regimen then the demarcation point may be used to count the number of individuals that died within the interval up to a specific time, versus the number of individuals that lived beyond the selected time (i.e. survived).\u00a0 However, given the intention of the research, the mathematics of survival analysis need not be limited to only counting deaths (or survival), rather, the approaches of survival analyses may be thought of as a set of mathematical functions that enable statistical techniques which can be applied to the evaluation of any selected event at a specific period of time. Hence, there are several methods that can be used to perform survival analysis, however, in this chapter, the focus will be on the application of SAS for survival analysis using life tables, the calculation of the log-rank test, and the application of the Cox Proportional Hazard Model.\r\n<h2>Important Functions Used in Survival Analysis<\/h2>\r\nThe progression of information about functions used in the computation of survival analyses is presented in Figure 19.1. In the following section, we will review the important concepts of the probability density function for a random discrete variable and a random continuous variable, the cumulative distribution function, the survival function, and the hazard function.\r\n<h3 style=\"text-align: center\"><strong>The flow of function processing in survival analysis<\/strong><\/h3>\r\n<img src=\"http:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv1-300x98.png\" alt=\"\" class=\"wp-image-1006 aligncenter\" width=\"508\" height=\"166\" \/>\r\n\r\nThere are several ways to demonstrate survival analysis, but we will begin here by reviewing the basic terminology and the elements of the different functions used in the calculation of survival analysis so that we can measure the risk of an event happening at a specific period of time.\r\n\r\nThe probability density function represents a value that describes the probability of an outcome or a combination of outcomes occurring within a known outcome space \u2013 such as an interval.\r\n\r\nThe probability density function (pdf) can refer to either the associated probability value from a discrete random variable or from a continuous random variable.\u00a0 When the pdf refers to a discrete random variable then it is also referred to as the probability mass function (pmf) for a positive discrete random variable. In this case, we define a positive discrete random variable as a variable that holds numbers from the whole number line, meaning that the scores are whole numbers (ranging from 0 to + \u221e) and may resemble (0,1,2,3, \u2026, \u221e) without decimal values.\r\n<h6 style=\"text-align: center\">Probability Density Function (pdf) Related to Tossing a Single die<\/h6>\r\n<div align=\"center\">\r\n<table style=\"width: 451px\">\r\n<thead>\r\n<tr class=\"shaded\">\r\n<td style=\"text-align: center;width: 263.283px\">Possible outcome expressed as [latex]P(X = x)[\/latex]<\/td>\r\n<td style=\"text-align: center;width: 159.283px\">The probability associated with the outcome<\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr class=\"border\">\r\n<td style=\"width: 263.283px;text-align: center\">[latex]P(X = 1)[\/latex]<\/td>\r\n<td style=\"width: 159.283px;text-align: center\">1\/6<\/td>\r\n<\/tr>\r\n<tr class=\"border\">\r\n<td style=\"width: 263.283px;text-align: center\">[latex]P(X = 2)[\/latex]<\/td>\r\n<td style=\"width: 159.283px;text-align: center\">1\/6<\/td>\r\n<\/tr>\r\n<tr class=\"border\">\r\n<td style=\"width: 263.283px;text-align: center\">[latex]P(X = 3)[\/latex]<\/td>\r\n<td style=\"width: 159.283px;text-align: center\">1\/6<\/td>\r\n<\/tr>\r\n<tr class=\"border\">\r\n<td style=\"width: 263.283px;text-align: center\">[latex]P(X = 4)[\/latex]<\/td>\r\n<td style=\"width: 159.283px;text-align: center\">1\/6<\/td>\r\n<\/tr>\r\n<tr class=\"border\">\r\n<td style=\"width: 263.283px;text-align: center\">[latex]P(X = 5)[\/latex]<\/td>\r\n<td style=\"width: 159.283px;text-align: center\">1\/6<\/td>\r\n<\/tr>\r\n<tr class=\"border\">\r\n<td style=\"width: 263.283px;text-align: center\">[latex]P(X = 6)[\/latex]<\/td>\r\n<td style=\"width: 159.283px;text-align: center\">1\/6<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\nA graph of the frequency distribution for these data would produce a platykurtic (flat) distribution profile since each outcome value has a frequency of 1.\r\n\r\nHowever, we could create a graph to demonstrate the cumulative outcomes for the probabilities of the random discrete variable (X) ranging from 1 to 6; which would be to consider the discrete outcome ranging as follows: \u03a1(X=1) \u2264 \u03a1(X=6).\r\n\r\nThe Cumulative Distribution Function commonly referred to as the c.d.f. and written as F(x)=P(X\u2264x)\u00a0 represents the set of values associated with the probabilities of the random variable (X) occurring equal to or less than a given value (x) in an outcome space.\r\n\r\nIn the example of the toss of a fair six-sided die, the outcome space is based only on the discrete numbers 1 through 6, as shown in the following outcome chart.\r\n<h6>Cumulative Distribution Function (c.d.f) Related to Tossing a Single die<\/h6>\r\n<table style=\"height: 121px\">\r\n<thead>\r\n<tr class=\"shaded\" style=\"height: 31px\">\r\n<td style=\"text-align: center;height: 31px;width: 285.95px\">Possible outcome expressed as [latex]P(X \\le x)[\/latex]<\/td>\r\n<td style=\"text-align: center;height: 31px;width: 215.617px\">Probability associated with the outcome<\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr class=\"border\" style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 285.95px;text-align: center\">[latex]P(X \\le 1)[\/latex]<\/td>\r\n<td style=\"height: 15px;width: 215.617px;text-align: center\">1\/6 = 0.17<\/td>\r\n<\/tr>\r\n<tr class=\"border\" style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 285.95px;text-align: center\">[latex]P(X \\le 2)[\/latex]<\/td>\r\n<td style=\"height: 15px;width: 215.617px;text-align: center\">2\/6 = 0.33<\/td>\r\n<\/tr>\r\n<tr class=\"border\" style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 285.95px;text-align: center\">[latex]P(X \\le 3)[\/latex]<\/td>\r\n<td style=\"height: 15px;width: 215.617px;text-align: center\">3\/6 = 0.50<\/td>\r\n<\/tr>\r\n<tr class=\"border\" style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 285.95px;text-align: center\">[latex]P(X \\le 4)[\/latex]<\/td>\r\n<td style=\"height: 15px;width: 215.617px;text-align: center\">4\/6 = 0.67<\/td>\r\n<\/tr>\r\n<tr class=\"border\" style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 285.95px;text-align: center\">[latex]P(X \\le 5)[\/latex]<\/td>\r\n<td style=\"height: 15px;width: 215.617px;text-align: center\">5\/6 = 0.83<\/td>\r\n<\/tr>\r\n<tr class=\"border\" style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 285.95px;text-align: center\">[latex]P(X \\le 6)[\/latex]<\/td>\r\n<td style=\"height: 15px;width: 215.617px;text-align: center\">6\/6 = 1.00<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\nWhile the example presented here describes the c.d.f. for discrete random variable outcomes (and their associated probabilities based on the probability mass function (pmf) or probability density function (pdf)), the c.d.f. is also relevant for continuous variable values and the pdf is based on the outcomes (<em>X<\/em>) in an interval (<em>a<\/em>, <em>b<\/em>) represented by <em>P<\/em>(<em>a<\/em>\u00a0&lt;\u00a0<em>X<\/em>\u00a0&lt;\u00a0<em>b<\/em>), where all numbers from the real number line are eligible within the interval of the distribution, typically ranging from 0 to 1.\r\n\r\nIf the data for the c.d.f. were attributed to a continuous random variable such as time, then the graph of the set of probabilities for all possible outcomes of the c.d.f. is presented as a positive <em>S-shaped<\/em> curve ranging from 0 to 1, as shown in the figure below.\r\n<h6 style=\"text-align: center\"><img src=\"http:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/survG1-300x222.png\" alt=\"\" class=\"aligncenter wp-image-1011\" width=\"467\" height=\"345\" \/><\/h6>\r\n<h6 style=\"text-align: center\">Schematic of a c.d.f. for a Continuous Random Variable<\/h6>\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">The SAS code to generate this image was written by Wicklin (2011)<a href=\"#_ftn1\">[1]<\/a> and was processed unedited in SAS Studio shown here.<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\ndata cdf;\r\n\r\ndo x = -3 to 3 by 0.1;\r\n\r\ny = cdf(\"Normal\", x);\r\n\r\noutput; end;\r\n\r\nx0 = 0;\r\n\r\ncdf0 = cdf(\"Normal\", x0);\r\n\r\noutput;\r\n\r\nx0 = 1.645; cdf0 = cdf(\"Normal\", x0); output;\r\n\r\nrun;\r\n\r\nods graphics \/ height=500;\r\n\r\nproc sgplot data=cdf noautolegend;\r\n\r\ntitle \"Normal Cumulative Probability\";\r\n\r\nseries x=x y=y;\r\n\r\nscatter x=x0 y=cdf0;\r\n\r\nvector x=x0 y=cdf0 \/xorigin=x0 yorigin=0 noarrowheads lineattrs=(color=gray);\r\n\r\nvector x=x0 y=cdf0 \/xorigin=-3 yorigin=cdf0 noarrowheads lineattrs=(color=gray);\r\n\r\nxaxis grid label=\"x\";\r\n\r\nyaxis grid label=\"Normal CDF\" values=(0 to 1 by 0.05);\r\n\r\nrefline 0 1\/ axis=y;\r\n\r\nrun;\r\n\r\n<\/div>\r\n<\/div>\r\nThe c.d.f. is an important step in the computation of the survival analysis because it is part of the computation of the survival function. In a time relevant model as is typical in a biostatistics application, the cumulative distribution function can be represented as [latex]F(t)=P(T \\le{t}) \\textit{where t}[\/latex]\u00a0 is the value of the random variable representing a measured time and [latex]{t}[\/latex] is the value of the intended time at the event.\r\n\r\nThe survival function [latex]S{(t)}[\/latex] provides the estimate of the duration of time to an event, be it a failure, death, or a specified incident. The survival function begins at 1, the point where an individual enters the dataset and ends at 0 the point where data monitoring stops, usually because the event of interest has occurred.\r\n\r\nIn simple terms, the Survival Function is the complement of the c.d.f. and is computed as [latex]S{(t)}= 1- F(t)\\textit{, where t &gt;0}[\/latex]. More important, the survival function is the denominator in the computation of the Hazard Function, which is a main element in one approach to the computation of the survival analysis. The survival function can show the probability of surviving up to a designated event, based on units of time.\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">For example, consider the following data set in which a measure of time to an event is recorded.<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">The cutoff time is set at 48\u00a0 <strong><em>(totally arbitrary units)<\/em> <\/strong>so that any value above 48 is assigned a censor score of 1 and any value less than 48 is a value of 0.<\/div>\r\n<\/div>\r\nTable depicting number of individuals that exceeded the time to event\r\n<table class=\" aligncenter\" style=\"height: 211px\">\r\n<thead>\r\n<tr class=\"shaded\" style=\"height: 61px\">\r\n<td style=\"text-align: center;height: 61px;width: 130px\">Patient ID<\/td>\r\n<td style=\"text-align: center;height: 61px;width: 171px\">Time to Event: The measure of the length of time to the event happening<\/td>\r\n<td style=\"text-align: center;height: 61px;width: 185px\">Event Counter variable (0=event has not happened, 1=event has happened)<\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr class=\"border\" style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 130px;text-align: center\">01<\/td>\r\n<td style=\"height: 15px;width: 171px;text-align: center\">40<\/td>\r\n<td style=\"height: 15px;width: 185px;text-align: center\">0<\/td>\r\n<\/tr>\r\n<tr class=\"border\" style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 130px;text-align: center\">02<\/td>\r\n<td style=\"height: 15px;width: 171px;text-align: center\">38<\/td>\r\n<td style=\"height: 15px;width: 185px;text-align: center\">0<\/td>\r\n<\/tr>\r\n<tr class=\"border\" style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 130px;text-align: center\">03<\/td>\r\n<td style=\"height: 15px;width: 171px;text-align: center\">54<\/td>\r\n<td style=\"height: 15px;width: 185px;text-align: center\">1<\/td>\r\n<\/tr>\r\n<tr class=\"border\" style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 130px;text-align: center\">04<\/td>\r\n<td style=\"height: 15px;width: 171px;text-align: center\">56<\/td>\r\n<td style=\"height: 15px;width: 185px;text-align: center\">1<\/td>\r\n<\/tr>\r\n<tr class=\"border\" style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 130px;text-align: center\">05<\/td>\r\n<td style=\"height: 15px;width: 171px;text-align: center\">28<\/td>\r\n<td style=\"height: 15px;width: 185px;text-align: center\">0<\/td>\r\n<\/tr>\r\n<tr class=\"border\" style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 130px;text-align: center\">06<\/td>\r\n<td style=\"height: 15px;width: 171px;text-align: center\">36<\/td>\r\n<td style=\"height: 15px;width: 185px;text-align: center\">0<\/td>\r\n<\/tr>\r\n<tr class=\"border\" style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 130px;text-align: center\">07<\/td>\r\n<td style=\"height: 15px;width: 171px;text-align: center\">42<\/td>\r\n<td style=\"height: 15px;width: 185px;text-align: center\">0<\/td>\r\n<\/tr>\r\n<tr class=\"border\" style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 130px;text-align: center\">08<\/td>\r\n<td style=\"height: 15px;width: 171px;text-align: center\">51<\/td>\r\n<td style=\"height: 15px;width: 185px;text-align: center\">1<\/td>\r\n<\/tr>\r\n<tr class=\"border\" style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 130px;text-align: center\">09<\/td>\r\n<td style=\"height: 15px;width: 171px;text-align: center\">45<\/td>\r\n<td style=\"height: 15px;width: 185px;text-align: center\">0<\/td>\r\n<\/tr>\r\n<tr class=\"border\" style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 130px;text-align: center\">10<\/td>\r\n<td style=\"height: 15px;width: 171px;text-align: center\">49<\/td>\r\n<td style=\"height: 15px;width: 185px;text-align: center\">1<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">The data are processed with the following SAS code<a href=\"#_ftn2\">[2]<\/a> to produce a graph of the survival curve shown below.<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\ntitle 'program to show a survival curve';\r\n\r\ndata survcurv;\r\n\r\ninput id t_event censor;\r\n\r\ndatalines;\r\n\r\n01 40 0\r\n\r\n02 38 0\r\n\r\n03 54 1\r\n\r\n04 56 1\r\n\r\n05 28 0\r\n\r\n06 36 0\r\n\r\n07 42 0\r\n\r\n08 51 1\r\n\r\n09 45 0\r\n\r\n10 49 1\r\n\r\n;\r\n\r\nproc lifetest data=survcurv(where=(censor=1)) method=lt\r\n\r\nintervals=(45 to 60 by 1) plots=survival; time t_event*censor(0); run;\r\n\r\n<\/div>\r\n<\/div>\r\nThe results of this analysis include the table of the survival estimates and the survival curve below \u2013 note that the failure point was set at 48. The curve shows the probability of surviving to 48 and then beyond 48.\r\n\r\nNotice that the entire group begins at probability = 1 and ends at probability = 0.\r\n\r\nFigure of SAS representation of the survival function for n=10 with censoring at x=48.\r\n\r\n<img src=\"http:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv2-300x215.png\" alt=\"\" class=\"aligncenter wp-image-1020\" width=\"500\" height=\"359\" \/>\r\n\r\nThe Hazard Function is determined by the ratio of the probability density function (pdf) to the survival function [latex]S{(t)}[\/latex] and can be written as:[latex]\\lambda = {(p.d.f.) \\over{S(t)}}[\/latex]\r\n\r\nThe following explanation may help to describe the elements of the <strong>hazard function<\/strong> in greater detail. In this annotated formula the hazard function is shown to represent the likelihood of an event such as death or survival occurring within an interval at time [latex]{(t)}[\/latex].\r\n<h3 style=\"text-align: center\">[latex]\\lambda(t) = {\\lim\\limits_{\\Delta{t}\\to {0}}} {P(t \\le{T} \\lt{t} + \\Delta{t} \\mid{T} \\ge{t})\\over{\\Delta(t)}}[\/latex]<\/h3>\r\n<img src=\"http:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv3-300x172.png\" alt=\"\" class=\"aligncenter wp-image-1033\" width=\"629\" height=\"361\" \/>\r\n<h6 style=\"text-align: center\">Annotated image of the Hazard Function Equation<\/h6>\r\n<ul>\r\n \t<li>The hazard function [latex]\\lambda{(t)}[\/latex] measures a specific event with respect to time [latex]{(t)}[\/latex]<\/li>\r\n \t<li>The hazard function [latex]\\lambda{(t)}[\/latex] is based on the probability that the observed event occurring at time [latex]{T}[\/latex] will happen within the interval beginning at time point [latex]{(t)}[\/latex] and ranging to the end of the interval [latex]{(t + \\Delta{t})}[\/latex], so that we say [latex]{(t \\le T \\lt t + \\Delta{t}\u00a0 )}[\/latex]<\/li>\r\n \t<li>Since the hazard function [latex]\\lambda{(t)}[\/latex] is not a probability estimate but is a ratio, the hazard function [latex]\\lambda{(t)}[\/latex] can exceed 1.<\/li>\r\n<\/ul>\r\nThe following table shows the output from a <em>life table<\/em> approach to evaluating the set of data that were used in the SAS program above to produce the survival function.\u00a0 The <em>hazard function<\/em> is included in the tabled output when the method=LT command is included in the proc lifetest procedure. An abbreviated form of the table is shown here.\r\n<div align=\"center\">\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td>Life Table Survival Estimates<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Interval\r\n\r\n(sum of failed)<\/td>\r\n<td>Number failed after censoring<\/td>\r\n<td>PDF<\/td>\r\n<td>Hazard<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>47-48<\/td>\r\n<td>(0)<\/td>\r\n<td>0<\/td>\r\n<td>0<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>49-50<\/td>\r\n<td>(1)<\/td>\r\n<td>0.25<\/td>\r\n<td>0.29<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>51-52<\/td>\r\n<td>(1)<\/td>\r\n<td>0.25<\/td>\r\n<td>0.40<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>54-55<\/td>\r\n<td>(1)<\/td>\r\n<td>0.25<\/td>\r\n<td>0.67<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>56-57<\/td>\r\n<td>(1)<\/td>\r\n<td>0.25<\/td>\r\n<td>2.00<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n&nbsp;\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Recall from the program listed above, that the important SAS code to produce the hazard function using the proc lifetest \u00a0procedure is:<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<pre>proc lifetest data=survcurv(where=(censor=1)) method=lt\r\nintervals=(45 to 60 by 1) plots=survival;\r\n time t_event*censor(0); \r\nrun;<\/pre>\r\n<\/div>\r\n<\/div>\r\n<h2>Censoring Data<\/h2>\r\nIn the computation of survival analyses, not all participants will fail (or die) at the demarcation point set by the researcher.\u00a0 As shown in the data set analyzed above, the demarcation point for the event of interest was set at an arbitrary value of 48 and therefore 4 individuals extended beyond the value 48.\r\n\r\nIn a survival analysis, where the time to an event is noted, any cases that \u201csurvive\u201d beyond the point stated will be considered censored.\u00a0 Censoring does not mean that the participants are dropped from the analysis.\u00a0 Rather, when censored, the individuals that have not demonstrated the event of interest prior to the pre-designated demarcation point are not calculated as part of the group measured with the event of interest (i.e. dying, failing).\r\n\r\nWhen we plot the survival curves for a cohort in SAS, we can specify the censoring point and thereby produce survival probability curves that represent both the cases \u2013 those individuals that have demonstrated the event of interest by the end of the interval measured; or we can plot the non-cases \u2013 those individuals that have not demonstrated the event of interest by the end of the interval measured. In the following example, survival probability curves are used to demonstrate the influence of censoring and the Kaplan-Meier estimates used to develop the survival probability curves.\r\n\r\n<hr \/>\r\n\r\n<h2>Annotated SAS application for a Survival Analysis<\/h2>\r\nAs noted, survival analysis is a time-based evaluation. That is, in survival analysis, we are interested in evaluating the time point at which an event occurs within a cohort. Survival analysis helps researchers evaluate the proportion of individuals at a time to reach a demarcation point<em>, and therefore the number of individuals within a cohort that extends beyond an event (a time point of interest). <\/em>\r\n\r\nIn the following scenario, we will use a random number generator to create a SAS dataset and simulate the scenario of the ZIKA Virus at the Summer Olympics (2017). Next, we will apply the different tools of the SAS Survival Analysis suite to evaluate the data set, with examples that include a comparison of outcomes across athlete cohorts.\r\n<h3><strong>Background:<\/strong><\/h3>\r\nIn August 2016, Brazil hosted the Olympic Summer Games. However, several athletes decided to boycott the games because of the risk of exposure to the ZIKA virus. ZIKA is a virus that can be transmitted through the bite from an infected Aedes mosquito.\u00a0 The ZIKA virus is extremely dangerous for young women as it can reside in the blood for up to 3 months and if the woman becomes pregnant, the virus can have negative consequences for the developing fetus. In particular, the ZIKA virus has been implicated in the development of microcephaly in newborn children.\r\n<h4><strong>Generating the dataset with a random number generator:<\/strong><\/h4>\r\nIn this example, we will use a series of random number generating commands to create a data set with four variables and 100 cases.\r\n\r\nThree discrete variables are: sex, sport and case and we will use the following format: sex (1=m, 2=f), sport (1=golf, 2=equestrian, 3=swimming, 4=gymnastics, 5=track\u00a0 &amp; field), and case (1=yes, 2=no).\r\n\r\nA continuous variable, labelled days will represent the number of days prior to the individual contracting the ZIKA virus from the Aedes mosquito.\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">The program to generate the simulated SAS data set is shown here<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\noptions pagesize=60 linesize=80 center date;\r\n\r\nLIBNAME sample '\/home\/Username\/your directory\/';\r\n\r\nproc format; value sexfmt\u00a0 1 ='male' 2 ='female';\r\n\r\nvalue sprtfmt\u00a0 1 ='golf' 2 ='equestrian' 3 ='swimming' 4 ='gymnastics' 5 ='track &amp; field';\r\n\r\nvalue casefmt 1='present' 0='absent';\r\n\r\ndata sample.zika;\r\n\r\n\/* create 3 new variables set as score1 score2 score3 *\/\r\n\r\narray scores score1-score3;\r\n\r\n\/* set 100 cases per variable *\/\r\n\r\ndo k=1 to 100;\r\n\r\n\/* set days to 100 days of exposure *\/\r\n\r\ndays=ranuni(13)*100; days=round(days, 0.02);\r\n\r\n\/* Loop through each variable to establish 100 randomly generated scores *\/\r\n\r\ndo i=1 to 3;\r\n\r\ncall streaminit(23);\r\n\r\nscores(i)=RAND(\"normal\")*1000000000000;\r\n\r\nscores(i)=ROUND(scores(i));\r\n\r\nscores(i)=1+ABS((mod(scores(i),150)));\r\n\r\n\/*\u00a0 the variable sex will relate to score1, we can create a filter to establish the binary score for sex based on the randomly generated output *\/\r\n\r\nif score1 &gt; 55 then sex = 2;\r\n\r\nif score1 &gt;2 and score1&lt;56 then sex = 1;\r\n\r\n\/* the variable sport type will relate to score2, we can create a filter to establish the determination of an athletes sport based on the randomly generated output *\/\r\n\r\nif score2 &gt;90 then sport = 5;\r\n\r\nif score2 &gt;80 and score2&lt;91 then sport = 4;\r\n\r\nif score2 &gt;60 and score2&lt;81 then sport = 3;\r\n\r\nif score2 &gt;30 and score2&lt;61 then sport = 2;\r\n\r\nif score2 &gt;5 and score2&lt;31 then sport=1;\r\n\r\n\/* the determination of a case will relate to score3, we can create a filter to establish the determination of a case based on the randomly generated output *\/\r\n\r\nif score3 &gt; 48 then case = 1;else case = 0;\r\n\r\n\/* a case=1 is a case present, and a case=0 is a case absent *\/\r\n\r\nif days&lt;=15 then daygrp=1;\r\n\r\nif days&gt;15 and days&lt;=30 then daygrp=2;\r\n\r\nif days&gt;30 and days&lt;=45 then daygrp=3;\r\n\r\nif days&gt;45 and days&lt;=60 then daygrp=4;\r\n\r\nif days&gt;60 and days&lt;=75 then daygrp=5;\r\n\r\nif days&gt;75 and days&lt;=90 then daygrp=6;\r\n\r\nif days&gt;90 and days&lt;=105 then daygrp=7;\r\n\r\nif days&gt;105 and days&lt;=120 then daygrp=8;\r\n\r\nif days&gt;120 and days&lt;=135 then daygrp=9;\r\n\r\nif days&gt;135 then daygrp=10;\r\n\r\n\/* create an interaction term for sex and sport to be used later in the Cox regression analysis *\/\r\n\r\nsex_sport=sex*sport;\r\n\r\nend; output; end;\r\n\r\n<\/div>\r\n<\/div>\r\n<h2>Describing the Output<\/h2>\r\n<h4><strong>Part 1: Descriptive Statistics<\/strong><\/h4>\r\nPrior to computing the survival analysis, descriptive statistics are produced for each of the four variables generated by the computer simulation. Initially a grouping variable called <strong><em>daygrp<\/em><\/strong> was created to summarize the continuous variable (counting number of days) into a discrete variable for use in later presentations.\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Next, the data were sorted and the proc freq command was applied.<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\nproc sort data=sample.zika; by sex;\r\n\r\nproc freq; tables sex sport daygrp case;\r\n\r\nformat\u00a0 case casefmt. ;\r\n\r\n<\/div>\r\n<\/div>\r\n<strong>The FREQ Procedure<\/strong>\r\n<div align=\"center\">\r\n<table>\r\n<thead>\r\n<tr>\r\n<td><strong>sex<\/strong><\/td>\r\n<td><strong>Frequency<\/strong><\/td>\r\n<td><strong>Percent<\/strong><\/td>\r\n<td><strong>Cumulative\r\nFrequency<\/strong><\/td>\r\n<td><strong>Cumulative\r\nPercent<\/strong><\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td><strong>1<\/strong><\/td>\r\n<td>29<\/td>\r\n<td>29.00<\/td>\r\n<td>29<\/td>\r\n<td>29.00<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>2<\/strong><\/td>\r\n<td>71<\/td>\r\n<td>71.00<\/td>\r\n<td>100<\/td>\r\n<td>100.00<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n<div align=\"center\">\r\n<table>\r\n<thead>\r\n<tr>\r\n<td><strong>case<\/strong><\/td>\r\n<td><strong>Frequency<\/strong><\/td>\r\n<td><strong>Percent<\/strong><\/td>\r\n<td><strong>Cumulative\r\nFrequency<\/strong><\/td>\r\n<td><strong>Cumulative\r\nPercent<\/strong><\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td><strong>absent<\/strong><\/td>\r\n<td>30<\/td>\r\n<td>30.00<\/td>\r\n<td>30<\/td>\r\n<td>30.00<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>present<\/strong><\/td>\r\n<td>70<\/td>\r\n<td>70.00<\/td>\r\n<td>100<\/td>\r\n<td>100.00<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n<div align=\"center\">\r\n<table>\r\n<thead>\r\n<tr>\r\n<td><strong>sport<\/strong><\/td>\r\n<td><strong>Frequency<\/strong><\/td>\r\n<td><strong>Percent<\/strong><\/td>\r\n<td><strong>Cumulative\r\nFrequency<\/strong><\/td>\r\n<td><strong>Cumulative\r\nPercent<\/strong><\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td><strong>1<\/strong><\/td>\r\n<td>23<\/td>\r\n<td>23.00<\/td>\r\n<td>23<\/td>\r\n<td>23.00<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>2<\/strong><\/td>\r\n<td>22<\/td>\r\n<td>22.00<\/td>\r\n<td>45<\/td>\r\n<td>45.00<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>3<\/strong><\/td>\r\n<td>17<\/td>\r\n<td>17.00<\/td>\r\n<td>62<\/td>\r\n<td>62.00<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>4<\/strong><\/td>\r\n<td>24<\/td>\r\n<td>24.00<\/td>\r\n<td>86<\/td>\r\n<td>86.00<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>5<\/strong><\/td>\r\n<td>14<\/td>\r\n<td>14.00<\/td>\r\n<td>100<\/td>\r\n<td>100.00<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n<div align=\"center\">\r\n<table>\r\n<thead>\r\n<tr>\r\n<td><strong>daygrp<\/strong><\/td>\r\n<td><strong>Frequency<\/strong><\/td>\r\n<td><strong>Percent<\/strong><\/td>\r\n<td><strong>Cumulative\r\nFrequency<\/strong><\/td>\r\n<td><strong>Cumulative\r\nPercent<\/strong><\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td><strong>1<\/strong><\/td>\r\n<td>1<\/td>\r\n<td>1.00<\/td>\r\n<td>1<\/td>\r\n<td>1.00<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>2<\/strong><\/td>\r\n<td>5<\/td>\r\n<td>5.00<\/td>\r\n<td>6<\/td>\r\n<td>6.00<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>3<\/strong><\/td>\r\n<td>7<\/td>\r\n<td>7.00<\/td>\r\n<td>13<\/td>\r\n<td>13.00<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>4<\/strong><\/td>\r\n<td>5<\/td>\r\n<td>5.00<\/td>\r\n<td>18<\/td>\r\n<td>18.00<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>5<\/strong><\/td>\r\n<td>9<\/td>\r\n<td>9.00<\/td>\r\n<td>27<\/td>\r\n<td>27.00<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>6<\/strong><\/td>\r\n<td>6<\/td>\r\n<td>6.00<\/td>\r\n<td>33<\/td>\r\n<td>33.00<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>7<\/strong><\/td>\r\n<td>4<\/td>\r\n<td>4.00<\/td>\r\n<td>37<\/td>\r\n<td>37.00<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>8<\/strong><\/td>\r\n<td>13<\/td>\r\n<td>13.00<\/td>\r\n<td>50<\/td>\r\n<td>50.00<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>9<\/strong><\/td>\r\n<td>7<\/td>\r\n<td>7.00<\/td>\r\n<td>57<\/td>\r\n<td>57.00<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>10<\/strong><\/td>\r\n<td>43<\/td>\r\n<td>43.00<\/td>\r\n<td>100<\/td>\r\n<td>100.00<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">The demarcation point for a case was set at a value of 100 for the random variable days from the array:<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\ndo i=1 to 2;\r\n\r\ncall streaminit(23);\r\n\r\nscores(i)=RAND(\"normal\")*1000000000000;\r\n\r\nscores(i)=ROUND(scores(i));\r\n\r\nscores(i)=1+ABS((mod(scores(i),150)));\r\n\r\n<\/div>\r\n<\/div>\r\nThe variable <strong><em>days<\/em><\/strong> was given a range of 1 to 150 and 100 days was used as a demarcation point to censor individuals as non-cases.\r\n<div class=\"textbox\"><strong><span style=\"color: #0000ff\">if days &lt; 101 then case = 1;<\/span><\/strong>\r\n<strong><span style=\"color: #0000ff\">if days&gt;100 then case = 0;<\/span><\/strong><\/div>\r\nThe labelling of individuals in this way was used to generate a random assignment of the individual as a case (1) or as a non-case (0). The proc univariate procedure was used to present descriptive statistics for individuals that were considered cases ((where=(case=1))and individuals that were censored (where=(case=0));\r\n<div class=\"textbox\"><strong><span style=\"color: #0000ff\">proc univariate data=sample.zika(where=(case=1));<\/span><\/strong>\r\n<strong><span style=\"color: #0000ff\">var days;<\/span><\/strong>\r\n<strong><span style=\"color: #0000ff\">histogram days\/normal;<\/span><\/strong>\r\n<strong><span style=\"color: #0000ff\">title 'Survivor function for zika virus plot of pdf';<\/span><\/strong>\r\n<strong><span style=\"color: #0000ff\">label days ='days to infection';<\/span><\/strong><\/div>\r\nThe results from the random number generator produced a mean days among cases of 60.49, for a sample of 70 individuals. These data also produced a 95% confidence interval for the mean of <em>60.49 \u00b1 6.45<\/em> which ranged from <em>54.03 to 66.93.<\/em>\r\n\r\n<strong>The UNIVARIATE Procedure -- <\/strong><strong>Variable: days (days since exposure)<\/strong>\r\n<div align=\"center\">\r\n<table>\r\n<thead>\r\n<tr>\r\n<td><strong>Moments<\/strong><\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td><strong>N<\/strong><\/td>\r\n<td>70<\/td>\r\n<td><strong>Sum Weights<\/strong><\/td>\r\n<td>70<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Mean<\/strong><\/td>\r\n<td>60.4874286<\/td>\r\n<td><strong>Sum Observations<\/strong><\/td>\r\n<td>4234.12<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Std Deviation<\/strong><\/td>\r\n<td>27.0551932<\/td>\r\n<td><strong>Variance<\/strong><\/td>\r\n<td>731.98348<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Skewness<\/strong><\/td>\r\n<td>-0.2644189<\/td>\r\n<td><strong>Kurtosis<\/strong><\/td>\r\n<td>-1.1242295<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Uncorrected SS<\/strong><\/td>\r\n<td>306617.891<\/td>\r\n<td><strong>Corrected SS<\/strong><\/td>\r\n<td>50506.8601<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Coeff Variation<\/strong><\/td>\r\n<td>44.7286219<\/td>\r\n<td><strong>Std Error Mean<\/strong><\/td>\r\n<td>3.2337141<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n<div align=\"center\">\r\n<table>\r\n<thead>\r\n<tr>\r\n<td><strong>Basic Confidence Limits Assuming Normality<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Parameter<\/strong><\/td>\r\n<td><strong>Estimate<\/strong><\/td>\r\n<td><strong>95%\u00a0Confidence\u00a0Limits<\/strong><\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td><strong>Mean<\/strong><\/td>\r\n<td>60.48743<\/td>\r\n<td>54.03635<\/td>\r\n<td>66.93851<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\nThe following code produced a set of percentiles from the data set for cases. These data show the percentage of the group being affected by a certain day. For example, 25% of the group were affected within 39.4 days of the start of the games. By day 96 some 90% of the cohort were infected with the Zika Virus. Note, these are not real data but were generated with a random number generator.\r\n<div class=\"textbox\"><strong><span style=\"color: #0000ff\">output out=Pctls pctlpts\u00a0 = 25 40 50 60 75 90<\/span><\/strong>\r\n<strong><span style=\"color: #0000ff\">pctlpre\u00a0 = days_<\/span><\/strong>\r\n<strong><span style=\"color: #0000ff\">pctlname = pct25 pct40 pct50 pct60 pct75 pct90;<\/span><\/strong>\r\n<strong><span style=\"color: #0000ff\">proc print data= Pctls;<\/span><\/strong>\r\n<strong><span style=\"color: #0000ff\">run;<\/span><\/strong><\/div>\r\n<strong>Percentiles for days<\/strong>\r\n<div align=\"center\">\r\n<table>\r\n<thead>\r\n<tr>\r\n<td><strong>Obs<\/strong><\/td>\r\n<td><strong>days_pct25<\/strong><\/td>\r\n<td><strong>days_pct40<\/strong><\/td>\r\n<td><strong>days_pct50<\/strong><\/td>\r\n<td><strong>days_pct60<\/strong><\/td>\r\n<td><strong>days_pct75<\/strong><\/td>\r\n<td><strong>days_pct90<\/strong><\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td><strong>1<\/strong><\/td>\r\n<td>39.4<\/td>\r\n<td>51.84<\/td>\r\n<td>66.34<\/td>\r\n<td>72.44<\/td>\r\n<td>84.08<\/td>\r\n<td>96.44<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n<div class=\"textbox\">\r\n\r\n<strong><span style=\"color: #0000ff\">proc univariate data=sample.zika(where=(case=0)) cibasic;<\/span><\/strong>\r\n<strong><span style=\"color: #0000ff\">var days;<\/span><\/strong>\r\n<strong><span style=\"color: #0000ff\">histogram days;<\/span><\/strong>\r\n<strong><span style=\"color: #0000ff\">title 'Survivor function for zika virus plot of pdf ';<\/span><\/strong>\r\n<strong><span style=\"color: #0000ff\">label days ='days to infection';<\/span><\/strong>\r\n\r\n<\/div>\r\nThe results from the random number generator produced a mean days among cases of 60.49, for a sample of 70 individuals. These data also produced a 95% confidence interval for the mean of <em>127.50 \u00b1 5.36<\/em> which ranged from <em>122.14 to 132.86.<\/em>\r\n\r\n<em>\u00a0<\/em><strong>The UNIVARIATE Procedure --\u00a0 <\/strong><strong>Variable: days (days since exposure)<\/strong>\r\n<div align=\"center\">\r\n<table>\r\n<thead>\r\n<tr>\r\n<td><strong>Moments<\/strong><\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td><strong>N<\/strong><\/td>\r\n<td>30<\/td>\r\n<td><strong>Sum Weights<\/strong><\/td>\r\n<td>30<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Mean<\/strong><\/td>\r\n<td>127.503333<\/td>\r\n<td><strong>Sum Observations<\/strong><\/td>\r\n<td>3825.1<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Std Deviation<\/strong><\/td>\r\n<td>14.3410047<\/td>\r\n<td><strong>Variance<\/strong><\/td>\r\n<td>205.664416<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Skewness<\/strong><\/td>\r\n<td>-0.3026793<\/td>\r\n<td><strong>Kurtosis<\/strong><\/td>\r\n<td>-1.0739046<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Uncorrected SS<\/strong><\/td>\r\n<td>493677.268<\/td>\r\n<td><strong>Corrected SS<\/strong><\/td>\r\n<td>5964.26807<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Coeff Variation<\/strong><\/td>\r\n<td>11.2475528<\/td>\r\n<td><strong>Std Error Mean<\/strong><\/td>\r\n<td>2.61829726<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n<div align=\"center\">\r\n<table>\r\n<thead>\r\n<tr>\r\n<td><strong>Basic Confidence Limits Assuming Normality<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Parameter<\/strong><\/td>\r\n<td><strong>Estimate<\/strong><\/td>\r\n<td><strong>95%\u00a0Confidence\u00a0Limits<\/strong><\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td><strong>Mean<\/strong><\/td>\r\n<td>127.50333<\/td>\r\n<td>122.14831<\/td>\r\n<td>132.85835<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\nThe following code produced a set of percentiles from the data set for non-cases. As shown in the example above, these data show the percentage of the group being affected by a certain day. All individuals in this data set were censored as they had passed the 100 days demarcation point before being infected. This is the reason that individuals in a survival analysis are not dropped from the study but rather censored. The data show that even though an individual exceeded the time to an event, they were continued to be at risk for the event of interest.\r\n<div class=\"textbox\"><strong><span style=\"color: #0000ff\">output out=Pctls pctlpts\u00a0 = 25 30 40 50 60 75 80 90 100<\/span><\/strong>\r\n<strong><span style=\"color: #0000ff\">pctlpre\u00a0 = days_<\/span><\/strong>\r\n<strong><span style=\"color: #0000ff\">pctlname = pct25 pct30 pct40 pct50 pct60 pct75 pct80 pct90 pct100;<\/span><\/strong>\r\n<strong><span style=\"color: #0000ff\">proc print data= Pctls;<\/span><\/strong>\r\n<strong><span style=\"color: #0000ff\">run;<\/span><\/strong><\/div>\r\n<strong>Percentiles for days<\/strong>\r\n<div align=\"center\">\r\n<table>\r\n<thead>\r\n<tr>\r\n<td><strong>Obs<\/strong><\/td>\r\n<td><strong>days_pct25<\/strong><\/td>\r\n<td><strong>days_pct40<\/strong><\/td>\r\n<td><strong>days_pct50<\/strong><\/td>\r\n<td><strong>days_pct60<\/strong><\/td>\r\n<td><strong>days_pct75<\/strong><\/td>\r\n<td><strong>days_pct90<\/strong><\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td><strong>1<\/strong><\/td>\r\n<td>115.18<\/td>\r\n<td>125.24<\/td>\r\n<td>129.69<\/td>\r\n<td>133.47<\/td>\r\n<td>139<\/td>\r\n<td>146.24<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\nIn each of the proc univariate statements there was a call for a histogram to illustrate the distribution of the data for the variable days. The graphs of the histogram for each distribution for days in each of the cohorts (cases versus non-cases) are shown in Figure 19.4 below. Notice that in each distribution the number of days shows a slight negative skewness with more cases appearing after the mean days.\r\n\r\n&nbsp;\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td><\/td>\r\n<td><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n&nbsp;\r\n\r\nFigure 19.4 Comparison of the distribution days in each cohort\r\n\r\n&nbsp;\r\n\r\n<strong>\u00a0<\/strong>\r\n\r\n<strong>\u00a0<\/strong>\r\n\r\n<strong>\u00a0<\/strong>\r\n<h4>Part 2: Creating Life Tables<\/h4>\r\n<strong>\u00a0<\/strong>\r\n\r\nThe survival analysis applications using METHOD=LIFE in the PROC LIFETEST procedure are presented in this section:\r\n\r\n&nbsp;\r\n\r\nIn this first stage of survival processing we can observe the influence of censoring the data. Recall that initially the data are censored at 100 days. Censoring was accomplished by creating the variable days, described above and then combined with the binary variable case. If an individual had a days score of less than 100 then they were assigned to the cohort of cases. Conversely, if the individual had a days score exceeding 100 then they were censored and assigned to the non-cases cohort.\r\n\r\n&nbsp;\r\n\r\nThe SAS code to compute the survival curve for the entire data set is given here:\r\n\r\n&nbsp;\r\n\r\nproc sort data=sample.zika; by case;\r\n\r\nPROC LIFETEST METHOD=LIFE plots=(s) data=sample.zika notable;\r\n\r\ntime days ;\r\n\r\nformat\u00a0 case casefmt. ;\r\n\r\ntitle 'Survivor function for zika virus - implicit right censoring of cases';\r\n\r\nlabel days ='days to infection';\r\n\r\n&nbsp;\r\n\r\nThis SAS code produced the image shown in Figure 19.5, below, which is the survival probability curve for the entire sample of N=100 cases monitored over 150 days. Notice that there is an inflection point in the curve at 100 days. This inflection point corresponds to the censoring limit of 100 days and is shown more explicitly in Figure 19.6 where we change the command time days; to the command: time days * case(0);\r\n\r\n&nbsp;\r\n\r\nFigure 19.5 Life Table Survival curve for all individuals in the data set\r\n\r\n&nbsp;\r\n\r\nFigure 19.6 Life Table Survival curve with explicit right censoring at 100 days\r\n\r\n&nbsp;\r\n\r\nFigure 19.6 above shows the survival probability for each event among the cases and holds the non-cases constant at a probability level of 0.3. Further, when we include the censoring criteria using the command:\u00a0 time days * case(0); a summary table indicating the number of cases that fail prior to the demarcation point (100 days) and the number of cases that exceed the demarcation point is also included, as shown here.\r\n\r\n&nbsp;\r\n<div align=\"center\">\r\n<table>\r\n<thead>\r\n<tr>\r\n<td><strong>Summary of the Number of Censored and Uncensored Values<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Total<\/strong><\/td>\r\n<td><strong>Failed<\/strong><\/td>\r\n<td><strong>Censored<\/strong><\/td>\r\n<td><strong>Percent Censored<\/strong><\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td>100<\/td>\r\n<td>70<\/td>\r\n<td>30<\/td>\r\n<td>30.00<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n&nbsp;\r\n\r\nNext we include a command to show the differences in time to event with a grouping variable. Here we use the strata command to group the data by sex, while maintaining the influence of censoring at 100 days.\r\n\r\nPROC LIFETEST METHOD=LIFE plots=(s)data=sample.zika notable;\r\n\r\ntime days * case(0) ;\r\n\r\nstrata sex;\r\n\r\nformat case casefmt. sex sexfmt. ;\r\n\r\n&nbsp;\r\n\r\nThe code produces a summary table of the number of males and females that failed or exceeded the demarcation point of 100 days and a graph of the survival probability curves for male and females.\r\n\r\n<strong>\u00a0<\/strong>\r\n<div align=\"center\">\r\n<table>\r\n<thead>\r\n<tr>\r\n<td><strong>Summary of the Number of Censored and Uncensored Values<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Stratum<\/strong><\/td>\r\n<td><strong>sex<\/strong><\/td>\r\n<td><strong>Total<\/strong><\/td>\r\n<td><strong>Failed<\/strong><\/td>\r\n<td><strong>Censored<\/strong><\/td>\r\n<td><strong>Percent Censored<\/strong><\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td><strong>1<\/strong><\/td>\r\n<td>female<\/td>\r\n<td>71<\/td>\r\n<td>52<\/td>\r\n<td>19<\/td>\r\n<td>26.76<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>2<\/strong><\/td>\r\n<td>male<\/td>\r\n<td>29<\/td>\r\n<td>18<\/td>\r\n<td>11<\/td>\r\n<td>37.93<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Total<\/strong><\/td>\r\n<td><\/td>\r\n<td>100<\/td>\r\n<td>70<\/td>\r\n<td>30<\/td>\r\n<td>30.00<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n&nbsp;\r\n\r\nFigure 19.7 Life Table Survival Curves With Explicit Right Censoring at 100 Days for Males and Females\r\n\r\n&nbsp;\r\n\r\nIn this next analysis we separate the data using strata=sport, while maintaining the right censoring of the data at 100 days. As shown in the approach used to separate the data by sex, this code produces a summary table of the number of individuals in each of the sport groups that failed or exceeded the demarcation point of 100 days as well as a graph of the survival probability curves for each sport.\r\n\r\n&nbsp;\r\n<div align=\"center\">\r\n<table>\r\n<thead>\r\n<tr>\r\n<td><strong>Summary of the Number of Censored and Uncensored Values<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Stratum<\/strong><\/td>\r\n<td><strong>sport<\/strong><\/td>\r\n<td><strong>Total<\/strong><\/td>\r\n<td><strong>Failed<\/strong><\/td>\r\n<td><strong>Censored<\/strong><\/td>\r\n<td><strong>Percent\r\nCensored<\/strong><\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td><strong>1<\/strong><\/td>\r\n<td>equestrian<\/td>\r\n<td>22<\/td>\r\n<td>15<\/td>\r\n<td>7<\/td>\r\n<td>31.82<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>2<\/strong><\/td>\r\n<td>golf<\/td>\r\n<td>23<\/td>\r\n<td>19<\/td>\r\n<td>4<\/td>\r\n<td>17.39<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>3<\/strong><\/td>\r\n<td>gymnastics<\/td>\r\n<td>24<\/td>\r\n<td>14<\/td>\r\n<td>10<\/td>\r\n<td>41.67<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>4<\/strong><\/td>\r\n<td>swimming<\/td>\r\n<td>17<\/td>\r\n<td>13<\/td>\r\n<td>4<\/td>\r\n<td>23.53<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>5<\/strong><\/td>\r\n<td>track &amp; field<\/td>\r\n<td>14<\/td>\r\n<td>9<\/td>\r\n<td>5<\/td>\r\n<td>35.71<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Total<\/strong><\/td>\r\n<td><\/td>\r\n<td>100<\/td>\r\n<td>70<\/td>\r\n<td>30<\/td>\r\n<td>30.00<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n&nbsp;\r\n\r\n&nbsp;\r\n\r\nFigure 19.8 Life Table Survival Curves With Explicit Right Censoring at 100 Days for Sport Groups\r\n\r\n&nbsp;\r\n<h4>Part 3: The Kaplan-Meier Approach<\/h4>\r\n<h4><\/h4>\r\nThe Kaplan-Meier approach to survival analysis differs slightly from the applications using METHOD=LIFE in the PROC LIFETEST procedure.\u00a0 When we use the METHOD=KM in the PROC LIFETEST procedure we generate a series of survival probability estimates referred to as the Kaplan-Meier estimates (heretofore referred to as the KM estimates), and corresponding survival probability curves for the KM estimates.\r\n\r\n&nbsp;\r\n\r\nIn the KM estimates values are given for the probability change each time an individual becomes a case up to the demarcation point of 100 days. This approach is more precise in reporting the time at event and does not summarize the data across an interval as is done with the METHOD=LIFE in the PROC LIFETEST procedure.\r\n\r\n&nbsp;\r\n\r\nA comparison of the output from the METHOD=LIFE and the METHOD=KM is shown in the comparison of the tables up to the first 12 cases that became infected.\u00a0 Notice that the METHOD=LIFE approach summarizes the estimates within a set of intervals, while the METHOD=KM approach provides the continuous probability values for each individual within the cohort of interest.\r\n\r\n&nbsp;\r\n\r\nTable 19.5 Survivor function for Zika virus using METHOD = LIFE in Proc Lifetest\r\n\r\n<strong>\u00a0<\/strong>\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td><strong>Days Interval<\/strong><\/td>\r\n<td><strong>Abbreviated table showing results for The LIFETEST Procedure<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Lower interval<\/strong><\/td>\r\n<td><strong>Upper interval<\/strong><\/td>\r\n<td><strong>Number failed<\/strong><\/td>\r\n<td><strong>Number censored<\/strong><\/td>\r\n<td><strong>Effective sample size<\/strong><\/td>\r\n<td><strong>Conditional probability of failure<\/strong><\/td>\r\n<td><strong>Conditional probability of failure Standard error<\/strong><\/td>\r\n<td><strong>Survival<\/strong><\/td>\r\n<td><strong>Failure<\/strong><\/td>\r\n<td><strong>Survival Standard error<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td>0<\/td>\r\n<td>20<\/td>\r\n<td>6<\/td>\r\n<td>0<\/td>\r\n<td>100.0<\/td>\r\n<td>0.0600<\/td>\r\n<td>0.0237<\/td>\r\n<td>1.0000<\/td>\r\n<td>0<\/td>\r\n<td>0<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>20<\/td>\r\n<td>40<\/td>\r\n<td>12<\/td>\r\n<td>0<\/td>\r\n<td>94.0<\/td>\r\n<td>0.1277<\/td>\r\n<td>0.0344<\/td>\r\n<td>0.9400<\/td>\r\n<td>0.06<\/td>\r\n<td>0.023<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n&nbsp;\r\n\r\nWhen we use the METHOD=KM approach in the PROC LIFETEST procedure the following estimates are generated. Note these estimates only refer to the first 12 cases designated as infected within the original data set of n=100 cases.\r\n\r\n&nbsp;\r\n\r\n&nbsp;\r\n\r\n&nbsp;\r\n\r\n&nbsp;\r\n\r\n&nbsp;\r\n\r\n&nbsp;\r\n\r\n&nbsp;\r\n\r\nTable 19.6 Survivor function for Zika virus using METHOD = KM in Proc Lifetest\r\n\r\n&nbsp;\r\n\r\n<strong>Abbreviated table showing results for The LIFETEST Procedure<\/strong>\r\n\r\n<strong>\u00a0<\/strong>\r\n<div align=\"center\">\r\n<table>\r\n<thead>\r\n<tr>\r\n<td><strong>Days<\/strong><\/td>\r\n<td><strong>Survival<\/strong><\/td>\r\n<td><strong>Failure<\/strong><\/td>\r\n<td><strong>Survival Standard Error<\/strong><\/td>\r\n<td><strong>Number\r\nFailed<\/strong><\/td>\r\n<td><strong>Number\r\nRemaining<\/strong><\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td><strong>0.000<\/strong><\/td>\r\n<td>1.0000<\/td>\r\n<td>0<\/td>\r\n<td>0<\/td>\r\n<td>0<\/td>\r\n<td>100<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>8.800<\/strong><\/td>\r\n<td>0.9900<\/td>\r\n<td>0.0100<\/td>\r\n<td>0.00995<\/td>\r\n<td>1<\/td>\r\n<td>99<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>11.780<\/strong><\/td>\r\n<td>0.9800<\/td>\r\n<td>0.0200<\/td>\r\n<td>0.0140<\/td>\r\n<td>2<\/td>\r\n<td>98<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>12.540<\/strong><\/td>\r\n<td>0.9700<\/td>\r\n<td>0.0300<\/td>\r\n<td>0.0171<\/td>\r\n<td>3<\/td>\r\n<td>97<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>12.800<\/strong><\/td>\r\n<td>0.9600<\/td>\r\n<td>0.0400<\/td>\r\n<td>0.0196<\/td>\r\n<td>4<\/td>\r\n<td>96<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>14.120<\/strong><\/td>\r\n<td>0.9500<\/td>\r\n<td>0.0500<\/td>\r\n<td>0.0218<\/td>\r\n<td>5<\/td>\r\n<td>95<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>15.860<\/strong><\/td>\r\n<td>0.9400<\/td>\r\n<td>0.0600<\/td>\r\n<td>0.0237<\/td>\r\n<td>6<\/td>\r\n<td>94<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>21.240<\/strong><\/td>\r\n<td>0.9300<\/td>\r\n<td>0.0700<\/td>\r\n<td>0.0255<\/td>\r\n<td>7<\/td>\r\n<td>93<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>22.560<\/strong><\/td>\r\n<td>0.9200<\/td>\r\n<td>0.0800<\/td>\r\n<td>0.0271<\/td>\r\n<td>8<\/td>\r\n<td>92<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>24.720<\/strong><\/td>\r\n<td>0.9100<\/td>\r\n<td>0.0900<\/td>\r\n<td>0.0286<\/td>\r\n<td>9<\/td>\r\n<td>91<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>27.280<\/strong><\/td>\r\n<td>0.9000<\/td>\r\n<td>0.1000<\/td>\r\n<td>0.0300<\/td>\r\n<td>10<\/td>\r\n<td>90<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>27.780<\/strong><\/td>\r\n<td>.<\/td>\r\n<td>.<\/td>\r\n<td>.<\/td>\r\n<td>11<\/td>\r\n<td>89<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>27.780<\/strong><\/td>\r\n<td>0.8800<\/td>\r\n<td>0.1200<\/td>\r\n<td>0.0325<\/td>\r\n<td>12<\/td>\r\n<td>88<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n&nbsp;\r\n\r\nThe difference in the two methods is further exemplified in the comparison of the two survival curves shown in Figure 19.9. The survival curve for the METHOD=LIFE approach is a summary curve while the survival curve for the METHOD=KM approach shows more precise estimates of failures (individuals reporting infection) over the entire time interval. In both curves the data are right censored at days=100, and as such no survival probabilities are reported for individuals that have not become a case as of the 100 days demarcation point, in the data set.\r\n\r\n&nbsp;\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td><\/td>\r\n<td><\/td>\r\n<\/tr>\r\n<tr>\r\n<td>Survival probability curve using Method=LIFE in SAS Proc lifetest<\/td>\r\n<td>Survival probability curve using Method=KM in SAS Proc lifetest<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n&nbsp;\r\n\r\nFigure 19.9 Comparison of Survival Curves With Explicit Right Censoring for life-table analysis versus Kaplan-Meier estimation\r\n\r\n&nbsp;\r\n<h4>Part 4: Comparing Kaplan-Meier Survival Estimates with Log Rank and Wilcoxon Tests<\/h4>\r\n&nbsp;\r\n\r\nIn the PROC LIFETEST procedure we can evaluate the difference between survival probability curves by computing two non-parametric tests: i) the Log Rank Test and ii) the Wilcoxon test. The tests are computed with the PROC LIFETEST procedure when including the strata command, as shown here:\r\n\r\n&nbsp;\r\n\r\nPROC LIFETEST plots=(s) data=sample.zika2 ;\r\n\r\ntime days * case(0);\r\n\r\nstrata sex;\r\n\r\nformat case casefmt. sex sexfmt. ;\r\n\r\ntitle 'Kaplan Meier Estimates with log rank and Wilcoxon tests\u2019;\r\n\r\nlabel days ='days to infection';\r\n\r\n&nbsp;\r\n\r\nThe strata command separates the computation of survival probabilities by different subgroups of the variable used in the strata command.\u00a0 In our Zika data set, survival probabilities are estimated for the males and females in the observed sample.\u00a0 The graphical illustration of the survival probability curves is shown in Figure 19.10 below and the statistical comparison of the survival curves is shown in the following two tests.\r\n\r\n&nbsp;\r\n\r\nThe Log-Rank test and the Wilcoxon test are two non-parametric tests that enable users to compare the survival probability curves based on Kaplan-Meier Survival Estimates for each subgroup within designated strata. The results for the comparison of the Survival Probability Curves for males versus females are shown here.\r\n\r\nTable 19.7 Test to evaluate the survival curves\r\n\r\n&nbsp;\r\n<div align=\"center\">\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td><strong>Test<\/strong><\/td>\r\n<td>Chi-Square<\/td>\r\n<td>DF<\/td>\r\n<td>Pr &gt; Chi square<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Log-Rank<\/strong><\/td>\r\n<td>2.8240<\/td>\r\n<td>1<\/td>\r\n<td>0.0929<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Wilcoxon<\/strong><\/td>\r\n<td>4.2191<\/td>\r\n<td>1<\/td>\r\n<td>0.0400<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n&nbsp;\r\n\r\nThe p value indicates that the difference in survival curves for males versus females was found to be significantly different at p&lt;0.04 for the Wilcoxon test, while the difference was significant at p&lt;0.09 when tested using the log-rank test.\u00a0 The overall conclusion from this test is that the curves for the two survival probabilities were different. However, it should be noted that the Log-Rank test is the more powerful of the two tests because it is based on the assumption that the proportional hazard rate is constant at each time point.\u00a0 This means that the likelihood for an individual to be infected (i.e. become a case) is constant across all time points for all individuals<a href=\"#_ftn3\">[3]<\/a>.\r\n\r\n&nbsp;\r\n\r\nFigure 19.10 illustrates the survival probability curves for males versus females in our Zika dataset. These curves are based on the product-limit estimates (aka Kaplan-Meier estimates) for the survival probability series within each level of the strata. Notice that the two survival curves cross early in the recording. This cross over of KM curves corresponds to the p value identified with the Wilcoxon analysis.\u00a0 In the statistical comparison of survival curves a stronger Wilcoxon outcome is likely to occur when one of the comparison groups has a higher risk of demonstrating the time to the event (becoming a case) earlier in the recording, versus a higher risk of being infected later. The higher risk of being infected (i.e. failing, dying, becoming a case) corresponds with a higher number of days to the event which increases the likelihood of a significant log-rank test outcome if this is demonstrated by one group more than another.\r\n\r\n&nbsp;\r\n\r\nFigure 19.10 Comparison of Survival Curves With Explicit Right Censoring for Kaplan-Meier estimation of males versus females\r\n\r\n&nbsp;\r\n\r\n<strong>\u00a0<\/strong>\r\n<h4>Part 5: Computing the Cox Proportional Hazard Regression Analysis<\/h4>\r\nThe data in a survival analysis can be used in a special type of regression procedure known as the proportional hazard model. This approach to using regression modeling was developed by Cox<a href=\"#_ftn4\">[4]<\/a> and builds on the regression approaches that we have discussed earlier in this text.\r\n\r\nIn simple linear regression we can create equations in which a predictor variable, or set of predictor variables are used to explain the variance in an outcome variable (the dependent variable), as shown in the following simple linear regression and multiple regression equations.\r\n\r\nA simple straight-line or linear regression equation:\r\n\r\n&nbsp;\r\n\r\n&nbsp;\r\n\r\nwhere: \u00a0is the dependent variable,\u00a0is the slope element by which we adjust the predictor () variable,\u00a0 \u00a0is the independent or predictor variable, and \u00a0is the\r\n\r\n\u2013 intercept (i.e. the point where the response graph crosses the vertical axis).\r\n\r\nThe simple linear regression equation in its most basic form helps us to understand the relationship between two variables, one designated as the and the other designated as the . Together these variables help us to predictor or explain an outcome, while adjusting for the variance between the two measures.\r\n\r\nA multiple regression equation:\r\n\r\n&nbsp;\r\n\r\n&nbsp;\r\n\r\nwhere: \u00a0is the dependent variable,\u00a0is the slope element by which we adjust the predictor () variable, \u00a0is the independent or predictor variable, and \u00a0is the\r\n\r\n\u2013 intercept. In this equation, the subscript <em>(i)<\/em> is a counter for each of the predictor variables used in the equation.\r\n\r\n&nbsp;\r\n\r\nThe multiple linear regression equation is an expansion of the simple linear regression, and under a univariate model has one but two or more Again, the regression procedure helps us to predict or explain the outcome \u00a0while adjusting for the variance in the predictor () variables. In multiple regression we can determine the slope of a predictor variable \u2013 the coefficient by which the variable is multiplied, while holding all other variables in the model constant. In this way we are able to determine the significance of each variable in the equation with respect to all of the variables in the equation.\r\n\r\n&nbsp;\r\n\r\nIn the Cox proportional hazard regression, also referred to as the Cox regression, the concepts of simple and multiple regression equations are the same, however the dependent variable is comprised not of a single scalar score, but rather of the hazard function representing the relationship between survival probability and time to an event.\r\n\r\n&nbsp;\r\n\r\nAs stated earlier, the hazard function provides an estimate of an event happening by a given time or within a given interval of time.\u00a0 The hazard function does not provide a probability estimate; therefore the estimate can exceed 1. Rather the hazard function indicates how likely an event is expected to occur by a given time.\r\n\r\n&nbsp;\r\n\r\nIn the computation of the Cox regression we develop a statistical regression model comprised of a dependent variable which consists of a hazard function and a set of independent variables which consist of predictors of the dependent variable, all based on a time based distribution referred to as the Weibull distribution. The Weibull distribution is familiar to the field of engineering because it is helpful in describing reliability and failure of a measured device over time.\u00a0 The applicable characteristic of the Weibull distribution for survival analysis is that it provides a mathematical foundation for failure rate throughout the lifetime of a measurement period. In the Weibull distribution the failure rate is shown to decrease with time reaching a plateau that is relatively constant<a href=\"#_ftn5\">[5]<\/a>. The Weibull distribution fits applications for survival analysis since higher failure rates (i.e. time to an event) occur more often prior to the censoring demarcation point as shown in Figure 19.11.\r\n\r\n&nbsp;\r\n\r\nFigure 19.11 Schematic of a Weibull distribution\r\n\r\n&nbsp;\r\n\r\nAs in the application of simple and multiple linear regression procedures, in the application of the Cox regression the user can establish regression coefficients for each of the predictors of the dependent variable to determine the magnitude and direction of the predictor acting on the dependent variable.\r\n\r\n&nbsp;\r\n\r\nIn our Zika virus example, we use Cox regression to determine the risk of infection based on the ratio of the probability density function and survival probabilities for time to infection as the dependent variable, and individual\u2019s sex and sport as predictor variables.\r\n\r\n&nbsp;\r\n\r\nIn other words, using the simulated dataset for the Olympic athletes and Cox regression we can evaluate the likelihood of being infected with Zika virus based on whether the individual was male or female, and the type of Olympic sport in which they were participating.\r\n\r\n&nbsp;\r\n\r\nIn the following sample code we use the proc phreg; procedure to produce output for the Cox Proportional Hazard Function. However, it is good practice to explain the overall model that we are testing. Here our hazard function is based on the number of days to infection, and the covariates are sex and sport type, along with the interaction of sex by sport type.\r\n\r\n&nbsp;\r\n\r\nproc phreg plots=survival;\r\n\r\nclass sex sport;\r\n\r\nmodel days*case(0) = sex sport sex_sport;\r\n\r\ntitle 'Cox Proportional Hazard Analysis for Zika Virus by sex and sport';\r\n\r\nlabel days ='days to infection';\r\n\r\n&nbsp;\r\n\r\nThe output shown below provides a graphic image of the survival curve and associated tables representing the statistical analyses.\r\n\r\n<img src=\"http:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv4-300x221.png\" alt=\"\" class=\"aligncenter wp-image-1044\" width=\"381\" height=\"281\" \/>\r\n<p style=\"text-align: center\">Plot of the survival probability curve from proc phreg<\/p>\r\nThe summary table of the number of cases that exceeded the censoring demarcation point is presented in Table 19.8 below. The results indicate that 30 of the 100 simulated cases.\r\n<p style=\"text-align: center\"><strong>Table of Proportion of Censored Observations from the Survival Curves<\/strong><\/p>\r\n\r\n<div align=\"center\">\r\n<table>\r\n<thead>\r\n<tr>\r\n<td><strong>Summary of the Number of Event and Censored Values<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Total<\/strong><\/td>\r\n<td><strong>Event<\/strong><\/td>\r\n<td><strong>Censored<\/strong><\/td>\r\n<td><strong>Percent\r\nCensored<\/strong><\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td>100<\/td>\r\n<td>70<\/td>\r\n<td>30<\/td>\r\n<td>30.00<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\nNext, the model fit statistics are presented followed by the test of the null hypothesis that the predictor variables as greater than 0. The model fit statistics are most often used when comparing more than one model, in which case we evaluate the AIC criteria to select the lowest value as suggesting a more appropriate fitting model. In the example shown here, this output is less relevant as we on have one model to consider. The column representing <strong><em>With Covariates<\/em><\/strong> is important to consider as it indicates that as we add predictor variables to the equation we decrease the criteria value, whereby lower values are considered to represent a better fit.\r\n<p style=\"text-align: center\">Table of a Model Fit Statistics for the Application of the Cox PHREG<\/p>\r\n\r\n<div align=\"center\">\r\n<table>\r\n<thead>\r\n<tr>\r\n<td><strong>Model Fit Statistics<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Criterion<\/strong><\/td>\r\n<td><strong>Without\r\nCovariates<\/strong><\/td>\r\n<td><strong>With\r\nCovariates<\/strong><\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td><strong>-2 LOG L<\/strong><\/td>\r\n<td>578.185<\/td>\r\n<td>569.236<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>AIC<\/strong><\/td>\r\n<td>578.185<\/td>\r\n<td>581.236<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>SBC<\/strong><\/td>\r\n<td>578.185<\/td>\r\n<td>594.727<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\nThe main outputs for us to consider from the application of the <span style=\"color: #0000ff\"><strong>proc phreg;<\/strong> <\/span>procedure for this example are the tables of test for Global Null Hypothesis: Beta=0 and the Analysis of the Maximum Likelihood, shown below. The test of the Global Null Hypothesis: Beta=0 is suggesting that the predictor variables do not have an effect on the calculated value of the hazard function.\r\n<p style=\"text-align: center\">Table of Tests of Beta=0 for the Application of the Cox PHREG<\/p>\r\n\r\n<div align=\"center\">\r\n<table>\r\n<thead>\r\n<tr>\r\n<td><strong>Testing Global Null Hypothesis: BETA=0<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Test<\/strong><\/td>\r\n<td><strong>Chi-Square<\/strong><\/td>\r\n<td><strong>DF<\/strong><\/td>\r\n<td><strong>Pr\u00a0&gt;\u00a0ChiSq<\/strong><\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td><strong>Likelihood Ratio<\/strong><\/td>\r\n<td>8.9485<\/td>\r\n<td>6<\/td>\r\n<td>0.1765<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Score<\/strong><\/td>\r\n<td>9.8384<\/td>\r\n<td>6<\/td>\r\n<td>0.1316<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Wald<\/strong><\/td>\r\n<td>9.3857<\/td>\r\n<td>6<\/td>\r\n<td>0.1530<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\nThe results presented in the table above for the test of the Global Null Hypothesis: Beta=0 illustrate the results of three tests of the null hypothesis: i) the likelihood ratio test, ii) the Score test, and iii) the Wald test.\u00a0 Notice that the probability estimates for each Chi-square test are similar in that none of the p values supported a significant difference between the predictor variables and 0.\r\n\r\nSince the predictor variables included the example were discrete class variables (no continuous covariates were included in the model), we also included the class sex sport; statement in the proc phreg; procedure. The output generated a table of the Type 3 tests (also referred to as Joint tests) to determine if each of the categorical discrete variables were significantly different than 0. The results of the Wald Chi-square statistic indicate that there was no significant effect of any of the categorical variables on the computed hazard function for the days to infection from the Zika virus.\r\n<p style=\"text-align: center\">Table of Type 3 Tests from Proc PHREG<\/p>\r\n\r\n<div align=\"center\">\r\n<table>\r\n<thead>\r\n<tr>\r\n<td><strong>Type 3 Tests<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Effect<\/strong><\/td>\r\n<td><strong>DF<\/strong><\/td>\r\n<td><strong>Wald Chi-Square<\/strong><\/td>\r\n<td><strong>Pr\u00a0&gt;\u00a0ChiSq<\/strong><\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td><strong>sex<\/strong><\/td>\r\n<td>1<\/td>\r\n<td>0.5229<\/td>\r\n<td>0.4696<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>sport<\/strong><\/td>\r\n<td>4<\/td>\r\n<td>1.3533<\/td>\r\n<td>0.8523<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>sex_sport<\/strong><\/td>\r\n<td>1<\/td>\r\n<td>0.0311<\/td>\r\n<td>0.8601<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n<p style=\"text-align: left\">The maximum likelihood estimates produced by the SAS <span style=\"color: #0000ff\"><em><strong>proc phreg<\/strong><\/em><\/span> enable us to provide the parameter estimates that correspond to the predictor variables included in the regression equation.\u00a0 The underlying algebraic regression equation<a href=\"#_ftn6\">[6]<\/a> for the Cox Proportional Hazard Model is given as:<\/p>\r\n<p style=\"text-align: center\">[latex]h(t) = h_0 (t)exp(x\\beta_{x})[\/latex]<\/p>\r\nTherefore, the parameter estimates refer to the coefficients for each predictor variable in the equation.\r\n<p style=\"text-align: center\">Maximum Likelihood Estimates from PROC PHREG<\/p>\r\n\r\n<div align=\"center\">\r\n<table>\r\n<thead>\r\n<tr>\r\n<td><strong>Parameter<\/strong><\/td>\r\n<td><\/td>\r\n<td><strong>DF<\/strong><\/td>\r\n<td><strong>Parameter\r\nEstimate<\/strong><\/td>\r\n<td><strong>Standard\r\nError<\/strong><\/td>\r\n<td><strong>Chi-Square<\/strong><\/td>\r\n<td><strong>Pr\u00a0&gt;\u00a0ChiSq<\/strong><\/td>\r\n<td><strong>Hazard\r\nRatio<\/strong><\/td>\r\n<td><strong>Label<\/strong><\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td><strong>sex<\/strong><\/td>\r\n<td><strong>1<\/strong><\/td>\r\n<td>1<\/td>\r\n<td>-0.49114<\/td>\r\n<td>0.67917<\/td>\r\n<td>0.5229<\/td>\r\n<td>0.4696<\/td>\r\n<td>0.612<\/td>\r\n<td>sex 1<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>sport<\/strong><\/td>\r\n<td><strong>1<\/strong><\/td>\r\n<td>1<\/td>\r\n<td>0.47592<\/td>\r\n<td>1.51352<\/td>\r\n<td>0.0989<\/td>\r\n<td>0.7532<\/td>\r\n<td>1.609<\/td>\r\n<td>sport 1<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>sport<\/strong><\/td>\r\n<td><strong>2<\/strong><\/td>\r\n<td>1<\/td>\r\n<td>0.00554<\/td>\r\n<td>1.11315<\/td>\r\n<td>0.0000<\/td>\r\n<td>0.9960<\/td>\r\n<td>1.006<\/td>\r\n<td>sport 2<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>sport<\/strong><\/td>\r\n<td><strong>3<\/strong><\/td>\r\n<td>1<\/td>\r\n<td>0.13256<\/td>\r\n<td>0.80301<\/td>\r\n<td>0.0273<\/td>\r\n<td>0.8689<\/td>\r\n<td>1.142<\/td>\r\n<td>sport 3<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>sport<\/strong><\/td>\r\n<td><strong>4<\/strong><\/td>\r\n<td>1<\/td>\r\n<td>-0.13920<\/td>\r\n<td>0.52854<\/td>\r\n<td>0.0694<\/td>\r\n<td>0.7923<\/td>\r\n<td>0.870<\/td>\r\n<td>sport 4<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>sex_sport<\/strong><\/td>\r\n<td><\/td>\r\n<td>1<\/td>\r\n<td>-0.03689<\/td>\r\n<td>0.20926<\/td>\r\n<td>0.0311<\/td>\r\n<td>0.8601<\/td>\r\n<td>0.964<\/td>\r\n<td><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\nThe results presented in the table above indicate that none of the predictor variables produced a significant parameter estimate, therefore we can conclude that the days to infection were not different by gender nor the sport in which the athlete participated.\r\n\r\n<hr \/>\r\n\r\n<div>\r\n<div>\r\n\r\n<a href=\"#_ftnref1\">[1]<\/a> Wicklin, R. (2011) <a href=\"http:\/\/blogs.sas.com\/content\/iml\/2011\/10\/19\/four-essential-functions-for-statistical-programmers.html\">http:\/\/blogs.sas.com\/content\/iml\/2011\/10\/19\/four-essential-functions-for-statistical-programmers.html<\/a>\r\n\r\n<a href=\"#_ftnref2\">[2]<\/a>\u00a0 Introduction to Survival Analysis in SAS<strong>.<\/strong>UCLA: Statistical Consulting Group.From <a href=\"http:\/\/www.ats.ucla.edu\/stat\/sas\/seminars\/sas_survival\/\">http:\/\/www.ats.ucla.edu\/stat\/sas\/seminars\/sas_survival\/<\/a> (accessed Feb 20, 2017)\r\n\r\n<\/div>\r\n<div>\r\n\r\n<a href=\"#_ftnref3\">[3]<\/a> Bewick, V., Cheek, L., Ball, J., Statistics review 12: Survival analysis, Critical Care 2004, 8:389-394.\r\n\r\n<\/div>\r\n<div>\r\n\r\n<a href=\"#_ftnref4\">[4]<\/a> The Cox Proportional Hazard regression is based on Sir David Cox 1972 paper: Regression Models and Life-Tables (1972),\u00a0 J. R. Stat. Soc. B, 34:187\u2013220).\r\n\r\n<\/div>\r\n<div>\r\n\r\n<a href=\"#_ftnref5\">[5]<\/a> The weibull.com reliability engineering resource website is a service of ReliaSoft Corporation.\r\nCopyright \u00a9 1992 -\u00a02017 ReliaSoft Corporation. All Rights Reserved.\r\n\r\n<\/div>\r\n<div>\r\n\r\n<a href=\"#_ftnref6\">[6]<\/a> Introduction to Survival Analysis in SAS<strong>.<\/strong>UCLA: Statistical Consulting Group.From <a href=\"http:\/\/www.ats.ucla.edu\/stat\/sas\/seminars\/sas_survival\/\">http:\/\/www.ats.ucla.edu\/stat\/sas\/seminars\/sas_survival\/<\/a> (accessed Feb 20, 2017)\r\n\r\n<\/div>\r\n<\/div>","rendered":"<h2 class=\"ABodyCopy\"><span lang=\"EN-US\">Essential Background in Survival Analysis<\/span><\/h2>\n<p class=\"ABodyCopy\"><span lang=\"EN-US\">Survival analysis can be considered in its simplest form as a method to analyze longitudinal data for a cohort, or for a comparison of cohorts with a specific interest in the proportion of individuals that reached or exceeded a definite point on a time scale. <\/span><\/p>\n<p>In survival analysis, the demarcation point for the event of interest on a time scale is referred to in a variety of ways but is dependent upon the perspective of the researcher.\u00a0 For example, if the researcher is interested in the application of survival analysis to estimate mortality as a result of a given treatment regimen then the demarcation point may be used to count the number of individuals that died within the interval up to a specific time, versus the number of individuals that lived beyond the selected time (i.e. survived).\u00a0 However, given the intention of the research, the mathematics of survival analysis need not be limited to only counting deaths (or survival), rather, the approaches of survival analyses may be thought of as a set of mathematical functions that enable statistical techniques which can be applied to the evaluation of any selected event at a specific period of time. Hence, there are several methods that can be used to perform survival analysis, however, in this chapter, the focus will be on the application of SAS for survival analysis using life tables, the calculation of the log-rank test, and the application of the Cox Proportional Hazard Model.<\/p>\n<h2>Important Functions Used in Survival Analysis<\/h2>\n<p>The progression of information about functions used in the computation of survival analyses is presented in Figure 19.1. In the following section, we will review the important concepts of the probability density function for a random discrete variable and a random continuous variable, the cumulative distribution function, the survival function, and the hazard function.<\/p>\n<h3 style=\"text-align: center\"><strong>The flow of function processing in survival analysis<\/strong><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv1-300x98.png\" alt=\"\" class=\"wp-image-1006 aligncenter\" width=\"508\" height=\"166\" srcset=\"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv1-300x98.png 300w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv1-768x252.png 768w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv1-1024x336.png 1024w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv1-65x21.png 65w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv1-225x74.png 225w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv1-350x115.png 350w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv1.png 1600w\" sizes=\"auto, (max-width: 508px) 100vw, 508px\" \/><\/p>\n<p>There are several ways to demonstrate survival analysis, but we will begin here by reviewing the basic terminology and the elements of the different functions used in the calculation of survival analysis so that we can measure the risk of an event happening at a specific period of time.<\/p>\n<p>The probability density function represents a value that describes the probability of an outcome or a combination of outcomes occurring within a known outcome space \u2013 such as an interval.<\/p>\n<p>The probability density function (pdf) can refer to either the associated probability value from a discrete random variable or from a continuous random variable.\u00a0 When the pdf refers to a discrete random variable then it is also referred to as the probability mass function (pmf) for a positive discrete random variable. In this case, we define a positive discrete random variable as a variable that holds numbers from the whole number line, meaning that the scores are whole numbers (ranging from 0 to + \u221e) and may resemble (0,1,2,3, \u2026, \u221e) without decimal values.<\/p>\n<h6 style=\"text-align: center\">Probability Density Function (pdf) Related to Tossing a Single die<\/h6>\n<div style=\"margin: auto;\">\n<table style=\"width: 451px\">\n<thead>\n<tr class=\"shaded\">\n<td style=\"text-align: center;width: 263.283px\">Possible outcome expressed as [latex]P(X = x)[\/latex]<\/td>\n<td style=\"text-align: center;width: 159.283px\">The probability associated with the outcome<\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr class=\"border\">\n<td style=\"width: 263.283px;text-align: center\">[latex]P(X = 1)[\/latex]<\/td>\n<td style=\"width: 159.283px;text-align: center\">1\/6<\/td>\n<\/tr>\n<tr class=\"border\">\n<td style=\"width: 263.283px;text-align: center\">[latex]P(X = 2)[\/latex]<\/td>\n<td style=\"width: 159.283px;text-align: center\">1\/6<\/td>\n<\/tr>\n<tr class=\"border\">\n<td style=\"width: 263.283px;text-align: center\">[latex]P(X = 3)[\/latex]<\/td>\n<td style=\"width: 159.283px;text-align: center\">1\/6<\/td>\n<\/tr>\n<tr class=\"border\">\n<td style=\"width: 263.283px;text-align: center\">[latex]P(X = 4)[\/latex]<\/td>\n<td style=\"width: 159.283px;text-align: center\">1\/6<\/td>\n<\/tr>\n<tr class=\"border\">\n<td style=\"width: 263.283px;text-align: center\">[latex]P(X = 5)[\/latex]<\/td>\n<td style=\"width: 159.283px;text-align: center\">1\/6<\/td>\n<\/tr>\n<tr class=\"border\">\n<td style=\"width: 263.283px;text-align: center\">[latex]P(X = 6)[\/latex]<\/td>\n<td style=\"width: 159.283px;text-align: center\">1\/6<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>A graph of the frequency distribution for these data would produce a platykurtic (flat) distribution profile since each outcome value has a frequency of 1.<\/p>\n<p>However, we could create a graph to demonstrate the cumulative outcomes for the probabilities of the random discrete variable (X) ranging from 1 to 6; which would be to consider the discrete outcome ranging as follows: \u03a1(X=1) \u2264 \u03a1(X=6).<\/p>\n<p>The Cumulative Distribution Function commonly referred to as the c.d.f. and written as F(x)=P(X\u2264x)\u00a0 represents the set of values associated with the probabilities of the random variable (X) occurring equal to or less than a given value (x) in an outcome space.<\/p>\n<p>In the example of the toss of a fair six-sided die, the outcome space is based only on the discrete numbers 1 through 6, as shown in the following outcome chart.<\/p>\n<h6>Cumulative Distribution Function (c.d.f) Related to Tossing a Single die<\/h6>\n<table style=\"height: 121px\">\n<thead>\n<tr class=\"shaded\" style=\"height: 31px\">\n<td style=\"text-align: center;height: 31px;width: 285.95px\">Possible outcome expressed as [latex]P(X \\le x)[\/latex]<\/td>\n<td style=\"text-align: center;height: 31px;width: 215.617px\">Probability associated with the outcome<\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr class=\"border\" style=\"height: 15px\">\n<td style=\"height: 15px;width: 285.95px;text-align: center\">[latex]P(X \\le 1)[\/latex]<\/td>\n<td style=\"height: 15px;width: 215.617px;text-align: center\">1\/6 = 0.17<\/td>\n<\/tr>\n<tr class=\"border\" style=\"height: 15px\">\n<td style=\"height: 15px;width: 285.95px;text-align: center\">[latex]P(X \\le 2)[\/latex]<\/td>\n<td style=\"height: 15px;width: 215.617px;text-align: center\">2\/6 = 0.33<\/td>\n<\/tr>\n<tr class=\"border\" style=\"height: 15px\">\n<td style=\"height: 15px;width: 285.95px;text-align: center\">[latex]P(X \\le 3)[\/latex]<\/td>\n<td style=\"height: 15px;width: 215.617px;text-align: center\">3\/6 = 0.50<\/td>\n<\/tr>\n<tr class=\"border\" style=\"height: 15px\">\n<td style=\"height: 15px;width: 285.95px;text-align: center\">[latex]P(X \\le 4)[\/latex]<\/td>\n<td style=\"height: 15px;width: 215.617px;text-align: center\">4\/6 = 0.67<\/td>\n<\/tr>\n<tr class=\"border\" style=\"height: 15px\">\n<td style=\"height: 15px;width: 285.95px;text-align: center\">[latex]P(X \\le 5)[\/latex]<\/td>\n<td style=\"height: 15px;width: 215.617px;text-align: center\">5\/6 = 0.83<\/td>\n<\/tr>\n<tr class=\"border\" style=\"height: 15px\">\n<td style=\"height: 15px;width: 285.95px;text-align: center\">[latex]P(X \\le 6)[\/latex]<\/td>\n<td style=\"height: 15px;width: 215.617px;text-align: center\">6\/6 = 1.00<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>While the example presented here describes the c.d.f. for discrete random variable outcomes (and their associated probabilities based on the probability mass function (pmf) or probability density function (pdf)), the c.d.f. is also relevant for continuous variable values and the pdf is based on the outcomes (<em>X<\/em>) in an interval (<em>a<\/em>, <em>b<\/em>) represented by <em>P<\/em>(<em>a<\/em>\u00a0&lt;\u00a0<em>X<\/em>\u00a0&lt;\u00a0<em>b<\/em>), where all numbers from the real number line are eligible within the interval of the distribution, typically ranging from 0 to 1.<\/p>\n<p>If the data for the c.d.f. were attributed to a continuous random variable such as time, then the graph of the set of probabilities for all possible outcomes of the c.d.f. is presented as a positive <em>S-shaped<\/em> curve ranging from 0 to 1, as shown in the figure below.<\/p>\n<h6 style=\"text-align: center\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/survG1-300x222.png\" alt=\"\" class=\"aligncenter wp-image-1011\" width=\"467\" height=\"345\" srcset=\"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/survG1-300x222.png 300w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/survG1-768x568.png 768w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/survG1-1024x757.png 1024w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/survG1-65x48.png 65w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/survG1-225x166.png 225w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/survG1-350x259.png 350w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/survG1.png 1400w\" sizes=\"auto, (max-width: 467px) 100vw, 467px\" \/><\/h6>\n<h6 style=\"text-align: center\">Schematic of a c.d.f. for a Continuous Random Variable<\/h6>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">The SAS code to generate this image was written by Wicklin (2011)<a href=\"#_ftn1\">[1]<\/a> and was processed unedited in SAS Studio shown here.<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p>data cdf;<\/p>\n<p>do x = -3 to 3 by 0.1;<\/p>\n<p>y = cdf(&#8220;Normal&#8221;, x);<\/p>\n<p>output; end;<\/p>\n<p>x0 = 0;<\/p>\n<p>cdf0 = cdf(&#8220;Normal&#8221;, x0);<\/p>\n<p>output;<\/p>\n<p>x0 = 1.645; cdf0 = cdf(&#8220;Normal&#8221;, x0); output;<\/p>\n<p>run;<\/p>\n<p>ods graphics \/ height=500;<\/p>\n<p>proc sgplot data=cdf noautolegend;<\/p>\n<p>title &#8220;Normal Cumulative Probability&#8221;;<\/p>\n<p>series x=x y=y;<\/p>\n<p>scatter x=x0 y=cdf0;<\/p>\n<p>vector x=x0 y=cdf0 \/xorigin=x0 yorigin=0 noarrowheads lineattrs=(color=gray);<\/p>\n<p>vector x=x0 y=cdf0 \/xorigin=-3 yorigin=cdf0 noarrowheads lineattrs=(color=gray);<\/p>\n<p>xaxis grid label=&#8221;x&#8221;;<\/p>\n<p>yaxis grid label=&#8221;Normal CDF&#8221; values=(0 to 1 by 0.05);<\/p>\n<p>refline 0 1\/ axis=y;<\/p>\n<p>run;<\/p>\n<\/div>\n<\/div>\n<p>The c.d.f. is an important step in the computation of the survival analysis because it is part of the computation of the survival function. In a time relevant model as is typical in a biostatistics application, the cumulative distribution function can be represented as [latex]F(t)=P(T \\le{t}) \\textit{where t}[\/latex]\u00a0 is the value of the random variable representing a measured time and [latex]{t}[\/latex] is the value of the intended time at the event.<\/p>\n<p>The survival function [latex]S{(t)}[\/latex] provides the estimate of the duration of time to an event, be it a failure, death, or a specified incident. The survival function begins at 1, the point where an individual enters the dataset and ends at 0 the point where data monitoring stops, usually because the event of interest has occurred.<\/p>\n<p>In simple terms, the Survival Function is the complement of the c.d.f. and is computed as [latex]S{(t)}= 1- F(t)\\textit{, where t >0}[\/latex]. More important, the survival function is the denominator in the computation of the Hazard Function, which is a main element in one approach to the computation of the survival analysis. The survival function can show the probability of surviving up to a designated event, based on units of time.<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">For example, consider the following data set in which a measure of time to an event is recorded.<\/p>\n<\/header>\n<div class=\"textbox__content\">The cutoff time is set at 48\u00a0 <strong><em>(totally arbitrary units)<\/em> <\/strong>so that any value above 48 is assigned a censor score of 1 and any value less than 48 is a value of 0.<\/div>\n<\/div>\n<p>Table depicting number of individuals that exceeded the time to event<\/p>\n<table class=\"aligncenter\" style=\"height: 211px\">\n<thead>\n<tr class=\"shaded\" style=\"height: 61px\">\n<td style=\"text-align: center;height: 61px;width: 130px\">Patient ID<\/td>\n<td style=\"text-align: center;height: 61px;width: 171px\">Time to Event: The measure of the length of time to the event happening<\/td>\n<td style=\"text-align: center;height: 61px;width: 185px\">Event Counter variable (0=event has not happened, 1=event has happened)<\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr class=\"border\" style=\"height: 15px\">\n<td style=\"height: 15px;width: 130px;text-align: center\">01<\/td>\n<td style=\"height: 15px;width: 171px;text-align: center\">40<\/td>\n<td style=\"height: 15px;width: 185px;text-align: center\">0<\/td>\n<\/tr>\n<tr class=\"border\" style=\"height: 15px\">\n<td style=\"height: 15px;width: 130px;text-align: center\">02<\/td>\n<td style=\"height: 15px;width: 171px;text-align: center\">38<\/td>\n<td style=\"height: 15px;width: 185px;text-align: center\">0<\/td>\n<\/tr>\n<tr class=\"border\" style=\"height: 15px\">\n<td style=\"height: 15px;width: 130px;text-align: center\">03<\/td>\n<td style=\"height: 15px;width: 171px;text-align: center\">54<\/td>\n<td style=\"height: 15px;width: 185px;text-align: center\">1<\/td>\n<\/tr>\n<tr class=\"border\" style=\"height: 15px\">\n<td style=\"height: 15px;width: 130px;text-align: center\">04<\/td>\n<td style=\"height: 15px;width: 171px;text-align: center\">56<\/td>\n<td style=\"height: 15px;width: 185px;text-align: center\">1<\/td>\n<\/tr>\n<tr class=\"border\" style=\"height: 15px\">\n<td style=\"height: 15px;width: 130px;text-align: center\">05<\/td>\n<td style=\"height: 15px;width: 171px;text-align: center\">28<\/td>\n<td style=\"height: 15px;width: 185px;text-align: center\">0<\/td>\n<\/tr>\n<tr class=\"border\" style=\"height: 15px\">\n<td style=\"height: 15px;width: 130px;text-align: center\">06<\/td>\n<td style=\"height: 15px;width: 171px;text-align: center\">36<\/td>\n<td style=\"height: 15px;width: 185px;text-align: center\">0<\/td>\n<\/tr>\n<tr class=\"border\" style=\"height: 15px\">\n<td style=\"height: 15px;width: 130px;text-align: center\">07<\/td>\n<td style=\"height: 15px;width: 171px;text-align: center\">42<\/td>\n<td style=\"height: 15px;width: 185px;text-align: center\">0<\/td>\n<\/tr>\n<tr class=\"border\" style=\"height: 15px\">\n<td style=\"height: 15px;width: 130px;text-align: center\">08<\/td>\n<td style=\"height: 15px;width: 171px;text-align: center\">51<\/td>\n<td style=\"height: 15px;width: 185px;text-align: center\">1<\/td>\n<\/tr>\n<tr class=\"border\" style=\"height: 15px\">\n<td style=\"height: 15px;width: 130px;text-align: center\">09<\/td>\n<td style=\"height: 15px;width: 171px;text-align: center\">45<\/td>\n<td style=\"height: 15px;width: 185px;text-align: center\">0<\/td>\n<\/tr>\n<tr class=\"border\" style=\"height: 15px\">\n<td style=\"height: 15px;width: 130px;text-align: center\">10<\/td>\n<td style=\"height: 15px;width: 171px;text-align: center\">49<\/td>\n<td style=\"height: 15px;width: 185px;text-align: center\">1<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">The data are processed with the following SAS code<a href=\"#_ftn2\">[2]<\/a> to produce a graph of the survival curve shown below.<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p>title &#8216;program to show a survival curve&#8217;;<\/p>\n<p>data survcurv;<\/p>\n<p>input id t_event censor;<\/p>\n<p>datalines;<\/p>\n<p>01 40 0<\/p>\n<p>02 38 0<\/p>\n<p>03 54 1<\/p>\n<p>04 56 1<\/p>\n<p>05 28 0<\/p>\n<p>06 36 0<\/p>\n<p>07 42 0<\/p>\n<p>08 51 1<\/p>\n<p>09 45 0<\/p>\n<p>10 49 1<\/p>\n<p>;<\/p>\n<p>proc lifetest data=survcurv(where=(censor=1)) method=lt<\/p>\n<p>intervals=(45 to 60 by 1) plots=survival; time t_event*censor(0); run;<\/p>\n<\/div>\n<\/div>\n<p>The results of this analysis include the table of the survival estimates and the survival curve below \u2013 note that the failure point was set at 48. The curve shows the probability of surviving to 48 and then beyond 48.<\/p>\n<p>Notice that the entire group begins at probability = 1 and ends at probability = 0.<\/p>\n<p>Figure of SAS representation of the survival function for n=10 with censoring at x=48.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv2-300x215.png\" alt=\"\" class=\"aligncenter wp-image-1020\" width=\"500\" height=\"359\" srcset=\"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv2-300x215.png 300w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv2-65x46.png 65w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv2-225x161.png 225w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv2-350x250.png 350w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv2.png 1200w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/p>\n<p>The Hazard Function is determined by the ratio of the probability density function (pdf) to the survival function [latex]S{(t)}[\/latex] and can be written as:[latex]\\lambda = {(p.d.f.) \\over{S(t)}}[\/latex]<\/p>\n<p>The following explanation may help to describe the elements of the <strong>hazard function<\/strong> in greater detail. In this annotated formula the hazard function is shown to represent the likelihood of an event such as death or survival occurring within an interval at time [latex]{(t)}[\/latex].<\/p>\n<h3 style=\"text-align: center\">[latex]\\lambda(t) = {\\lim\\limits_{\\Delta{t}\\to {0}}} {P(t \\le{T} \\lt{t} + \\Delta{t} \\mid{T} \\ge{t})\\over{\\Delta(t)}}[\/latex]<\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv3-300x172.png\" alt=\"\" class=\"aligncenter wp-image-1033\" width=\"629\" height=\"361\" srcset=\"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv3-300x172.png 300w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv3-768x441.png 768w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv3-1024x588.png 1024w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv3-65x37.png 65w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv3-225x129.png 225w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv3-350x201.png 350w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv3.png 1600w\" sizes=\"auto, (max-width: 629px) 100vw, 629px\" \/><\/p>\n<h6 style=\"text-align: center\">Annotated image of the Hazard Function Equation<\/h6>\n<ul>\n<li>The hazard function [latex]\\lambda{(t)}[\/latex] measures a specific event with respect to time [latex]{(t)}[\/latex]<\/li>\n<li>The hazard function [latex]\\lambda{(t)}[\/latex] is based on the probability that the observed event occurring at time [latex]{T}[\/latex] will happen within the interval beginning at time point [latex]{(t)}[\/latex] and ranging to the end of the interval [latex]{(t + \\Delta{t})}[\/latex], so that we say [latex]{(t \\le T \\lt t + \\Delta{t}\u00a0 )}[\/latex]<\/li>\n<li>Since the hazard function [latex]\\lambda{(t)}[\/latex] is not a probability estimate but is a ratio, the hazard function [latex]\\lambda{(t)}[\/latex] can exceed 1.<\/li>\n<\/ul>\n<p>The following table shows the output from a <em>life table<\/em> approach to evaluating the set of data that were used in the SAS program above to produce the survival function.\u00a0 The <em>hazard function<\/em> is included in the tabled output when the method=LT command is included in the proc lifetest procedure. An abbreviated form of the table is shown here.<\/p>\n<div style=\"margin: auto;\">\n<table>\n<tbody>\n<tr>\n<td>Life Table Survival Estimates<\/td>\n<\/tr>\n<tr>\n<td>Interval<\/p>\n<p>(sum of failed)<\/td>\n<td>Number failed after censoring<\/td>\n<td>PDF<\/td>\n<td>Hazard<\/td>\n<\/tr>\n<tr>\n<td>47-48<\/td>\n<td>(0)<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<\/tr>\n<tr>\n<td>49-50<\/td>\n<td>(1)<\/td>\n<td>0.25<\/td>\n<td>0.29<\/td>\n<\/tr>\n<tr>\n<td>51-52<\/td>\n<td>(1)<\/td>\n<td>0.25<\/td>\n<td>0.40<\/td>\n<\/tr>\n<tr>\n<td>54-55<\/td>\n<td>(1)<\/td>\n<td>0.25<\/td>\n<td>0.67<\/td>\n<\/tr>\n<tr>\n<td>56-57<\/td>\n<td>(1)<\/td>\n<td>0.25<\/td>\n<td>2.00<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Recall from the program listed above, that the important SAS code to produce the hazard function using the proc lifetest \u00a0procedure is:<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<pre>proc lifetest data=survcurv(where=(censor=1)) method=lt\r\nintervals=(45 to 60 by 1) plots=survival;\r\n time t_event*censor(0); \r\nrun;<\/pre>\n<\/div>\n<\/div>\n<h2>Censoring Data<\/h2>\n<p>In the computation of survival analyses, not all participants will fail (or die) at the demarcation point set by the researcher.\u00a0 As shown in the data set analyzed above, the demarcation point for the event of interest was set at an arbitrary value of 48 and therefore 4 individuals extended beyond the value 48.<\/p>\n<p>In a survival analysis, where the time to an event is noted, any cases that \u201csurvive\u201d beyond the point stated will be considered censored.\u00a0 Censoring does not mean that the participants are dropped from the analysis.\u00a0 Rather, when censored, the individuals that have not demonstrated the event of interest prior to the pre-designated demarcation point are not calculated as part of the group measured with the event of interest (i.e. dying, failing).<\/p>\n<p>When we plot the survival curves for a cohort in SAS, we can specify the censoring point and thereby produce survival probability curves that represent both the cases \u2013 those individuals that have demonstrated the event of interest by the end of the interval measured; or we can plot the non-cases \u2013 those individuals that have not demonstrated the event of interest by the end of the interval measured. In the following example, survival probability curves are used to demonstrate the influence of censoring and the Kaplan-Meier estimates used to develop the survival probability curves.<\/p>\n<hr \/>\n<h2>Annotated SAS application for a Survival Analysis<\/h2>\n<p>As noted, survival analysis is a time-based evaluation. That is, in survival analysis, we are interested in evaluating the time point at which an event occurs within a cohort. Survival analysis helps researchers evaluate the proportion of individuals at a time to reach a demarcation point<em>, and therefore the number of individuals within a cohort that extends beyond an event (a time point of interest). <\/em><\/p>\n<p>In the following scenario, we will use a random number generator to create a SAS dataset and simulate the scenario of the ZIKA Virus at the Summer Olympics (2017). Next, we will apply the different tools of the SAS Survival Analysis suite to evaluate the data set, with examples that include a comparison of outcomes across athlete cohorts.<\/p>\n<h3><strong>Background:<\/strong><\/h3>\n<p>In August 2016, Brazil hosted the Olympic Summer Games. However, several athletes decided to boycott the games because of the risk of exposure to the ZIKA virus. ZIKA is a virus that can be transmitted through the bite from an infected Aedes mosquito.\u00a0 The ZIKA virus is extremely dangerous for young women as it can reside in the blood for up to 3 months and if the woman becomes pregnant, the virus can have negative consequences for the developing fetus. In particular, the ZIKA virus has been implicated in the development of microcephaly in newborn children.<\/p>\n<h4><strong>Generating the dataset with a random number generator:<\/strong><\/h4>\n<p>In this example, we will use a series of random number generating commands to create a data set with four variables and 100 cases.<\/p>\n<p>Three discrete variables are: sex, sport and case and we will use the following format: sex (1=m, 2=f), sport (1=golf, 2=equestrian, 3=swimming, 4=gymnastics, 5=track\u00a0 &amp; field), and case (1=yes, 2=no).<\/p>\n<p>A continuous variable, labelled days will represent the number of days prior to the individual contracting the ZIKA virus from the Aedes mosquito.<\/p>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">The program to generate the simulated SAS data set is shown here<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p>options pagesize=60 linesize=80 center date;<\/p>\n<p>LIBNAME sample &#8216;\/home\/Username\/your directory\/&#8217;;<\/p>\n<p>proc format; value sexfmt\u00a0 1 =&#8217;male&#8217; 2 =&#8217;female&#8217;;<\/p>\n<p>value sprtfmt\u00a0 1 =&#8217;golf&#8217; 2 =&#8217;equestrian&#8217; 3 =&#8217;swimming&#8217; 4 =&#8217;gymnastics&#8217; 5 =&#8217;track &amp; field&#8217;;<\/p>\n<p>value casefmt 1=&#8217;present&#8217; 0=&#8217;absent&#8217;;<\/p>\n<p>data sample.zika;<\/p>\n<p>\/* create 3 new variables set as score1 score2 score3 *\/<\/p>\n<p>array scores score1-score3;<\/p>\n<p>\/* set 100 cases per variable *\/<\/p>\n<p>do k=1 to 100;<\/p>\n<p>\/* set days to 100 days of exposure *\/<\/p>\n<p>days=ranuni(13)*100; days=round(days, 0.02);<\/p>\n<p>\/* Loop through each variable to establish 100 randomly generated scores *\/<\/p>\n<p>do i=1 to 3;<\/p>\n<p>call streaminit(23);<\/p>\n<p>scores(i)=RAND(&#8220;normal&#8221;)*1000000000000;<\/p>\n<p>scores(i)=ROUND(scores(i));<\/p>\n<p>scores(i)=1+ABS((mod(scores(i),150)));<\/p>\n<p>\/*\u00a0 the variable sex will relate to score1, we can create a filter to establish the binary score for sex based on the randomly generated output *\/<\/p>\n<p>if score1 &gt; 55 then sex = 2;<\/p>\n<p>if score1 &gt;2 and score1&lt;56 then sex = 1;<\/p>\n<p>\/* the variable sport type will relate to score2, we can create a filter to establish the determination of an athletes sport based on the randomly generated output *\/<\/p>\n<p>if score2 &gt;90 then sport = 5;<\/p>\n<p>if score2 &gt;80 and score2&lt;91 then sport = 4;<\/p>\n<p>if score2 &gt;60 and score2&lt;81 then sport = 3;<\/p>\n<p>if score2 &gt;30 and score2&lt;61 then sport = 2;<\/p>\n<p>if score2 &gt;5 and score2&lt;31 then sport=1;<\/p>\n<p>\/* the determination of a case will relate to score3, we can create a filter to establish the determination of a case based on the randomly generated output *\/<\/p>\n<p>if score3 &gt; 48 then case = 1;else case = 0;<\/p>\n<p>\/* a case=1 is a case present, and a case=0 is a case absent *\/<\/p>\n<p>if days&lt;=15 then daygrp=1;<\/p>\n<p>if days&gt;15 and days&lt;=30 then daygrp=2;<\/p>\n<p>if days&gt;30 and days&lt;=45 then daygrp=3;<\/p>\n<p>if days&gt;45 and days&lt;=60 then daygrp=4;<\/p>\n<p>if days&gt;60 and days&lt;=75 then daygrp=5;<\/p>\n<p>if days&gt;75 and days&lt;=90 then daygrp=6;<\/p>\n<p>if days&gt;90 and days&lt;=105 then daygrp=7;<\/p>\n<p>if days&gt;105 and days&lt;=120 then daygrp=8;<\/p>\n<p>if days&gt;120 and days&lt;=135 then daygrp=9;<\/p>\n<p>if days&gt;135 then daygrp=10;<\/p>\n<p>\/* create an interaction term for sex and sport to be used later in the Cox regression analysis *\/<\/p>\n<p>sex_sport=sex*sport;<\/p>\n<p>end; output; end;<\/p>\n<\/div>\n<\/div>\n<h2>Describing the Output<\/h2>\n<h4><strong>Part 1: Descriptive Statistics<\/strong><\/h4>\n<p>Prior to computing the survival analysis, descriptive statistics are produced for each of the four variables generated by the computer simulation. Initially a grouping variable called <strong><em>daygrp<\/em><\/strong> was created to summarize the continuous variable (counting number of days) into a discrete variable for use in later presentations.<\/p>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Next, the data were sorted and the proc freq command was applied.<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p>proc sort data=sample.zika; by sex;<\/p>\n<p>proc freq; tables sex sport daygrp case;<\/p>\n<p>format\u00a0 case casefmt. ;<\/p>\n<\/div>\n<\/div>\n<p><strong>The FREQ Procedure<\/strong><\/p>\n<div style=\"margin: auto;\">\n<table>\n<thead>\n<tr>\n<td><strong>sex<\/strong><\/td>\n<td><strong>Frequency<\/strong><\/td>\n<td><strong>Percent<\/strong><\/td>\n<td><strong>Cumulative<br \/>\nFrequency<\/strong><\/td>\n<td><strong>Cumulative<br \/>\nPercent<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>1<\/strong><\/td>\n<td>29<\/td>\n<td>29.00<\/td>\n<td>29<\/td>\n<td>29.00<\/td>\n<\/tr>\n<tr>\n<td><strong>2<\/strong><\/td>\n<td>71<\/td>\n<td>71.00<\/td>\n<td>100<\/td>\n<td>100.00<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div style=\"margin: auto;\">\n<table>\n<thead>\n<tr>\n<td><strong>case<\/strong><\/td>\n<td><strong>Frequency<\/strong><\/td>\n<td><strong>Percent<\/strong><\/td>\n<td><strong>Cumulative<br \/>\nFrequency<\/strong><\/td>\n<td><strong>Cumulative<br \/>\nPercent<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>absent<\/strong><\/td>\n<td>30<\/td>\n<td>30.00<\/td>\n<td>30<\/td>\n<td>30.00<\/td>\n<\/tr>\n<tr>\n<td><strong>present<\/strong><\/td>\n<td>70<\/td>\n<td>70.00<\/td>\n<td>100<\/td>\n<td>100.00<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div style=\"margin: auto;\">\n<table>\n<thead>\n<tr>\n<td><strong>sport<\/strong><\/td>\n<td><strong>Frequency<\/strong><\/td>\n<td><strong>Percent<\/strong><\/td>\n<td><strong>Cumulative<br \/>\nFrequency<\/strong><\/td>\n<td><strong>Cumulative<br \/>\nPercent<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>1<\/strong><\/td>\n<td>23<\/td>\n<td>23.00<\/td>\n<td>23<\/td>\n<td>23.00<\/td>\n<\/tr>\n<tr>\n<td><strong>2<\/strong><\/td>\n<td>22<\/td>\n<td>22.00<\/td>\n<td>45<\/td>\n<td>45.00<\/td>\n<\/tr>\n<tr>\n<td><strong>3<\/strong><\/td>\n<td>17<\/td>\n<td>17.00<\/td>\n<td>62<\/td>\n<td>62.00<\/td>\n<\/tr>\n<tr>\n<td><strong>4<\/strong><\/td>\n<td>24<\/td>\n<td>24.00<\/td>\n<td>86<\/td>\n<td>86.00<\/td>\n<\/tr>\n<tr>\n<td><strong>5<\/strong><\/td>\n<td>14<\/td>\n<td>14.00<\/td>\n<td>100<\/td>\n<td>100.00<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div style=\"margin: auto;\">\n<table>\n<thead>\n<tr>\n<td><strong>daygrp<\/strong><\/td>\n<td><strong>Frequency<\/strong><\/td>\n<td><strong>Percent<\/strong><\/td>\n<td><strong>Cumulative<br \/>\nFrequency<\/strong><\/td>\n<td><strong>Cumulative<br \/>\nPercent<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>1<\/strong><\/td>\n<td>1<\/td>\n<td>1.00<\/td>\n<td>1<\/td>\n<td>1.00<\/td>\n<\/tr>\n<tr>\n<td><strong>2<\/strong><\/td>\n<td>5<\/td>\n<td>5.00<\/td>\n<td>6<\/td>\n<td>6.00<\/td>\n<\/tr>\n<tr>\n<td><strong>3<\/strong><\/td>\n<td>7<\/td>\n<td>7.00<\/td>\n<td>13<\/td>\n<td>13.00<\/td>\n<\/tr>\n<tr>\n<td><strong>4<\/strong><\/td>\n<td>5<\/td>\n<td>5.00<\/td>\n<td>18<\/td>\n<td>18.00<\/td>\n<\/tr>\n<tr>\n<td><strong>5<\/strong><\/td>\n<td>9<\/td>\n<td>9.00<\/td>\n<td>27<\/td>\n<td>27.00<\/td>\n<\/tr>\n<tr>\n<td><strong>6<\/strong><\/td>\n<td>6<\/td>\n<td>6.00<\/td>\n<td>33<\/td>\n<td>33.00<\/td>\n<\/tr>\n<tr>\n<td><strong>7<\/strong><\/td>\n<td>4<\/td>\n<td>4.00<\/td>\n<td>37<\/td>\n<td>37.00<\/td>\n<\/tr>\n<tr>\n<td><strong>8<\/strong><\/td>\n<td>13<\/td>\n<td>13.00<\/td>\n<td>50<\/td>\n<td>50.00<\/td>\n<\/tr>\n<tr>\n<td><strong>9<\/strong><\/td>\n<td>7<\/td>\n<td>7.00<\/td>\n<td>57<\/td>\n<td>57.00<\/td>\n<\/tr>\n<tr>\n<td><strong>10<\/strong><\/td>\n<td>43<\/td>\n<td>43.00<\/td>\n<td>100<\/td>\n<td>100.00<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">The demarcation point for a case was set at a value of 100 for the random variable days from the array:<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p>do i=1 to 2;<\/p>\n<p>call streaminit(23);<\/p>\n<p>scores(i)=RAND(&#8220;normal&#8221;)*1000000000000;<\/p>\n<p>scores(i)=ROUND(scores(i));<\/p>\n<p>scores(i)=1+ABS((mod(scores(i),150)));<\/p>\n<\/div>\n<\/div>\n<p>The variable <strong><em>days<\/em><\/strong> was given a range of 1 to 150 and 100 days was used as a demarcation point to censor individuals as non-cases.<\/p>\n<div class=\"textbox\"><strong><span style=\"color: #0000ff\">if days &lt; 101 then case = 1;<\/span><\/strong><br \/>\n<strong><span style=\"color: #0000ff\">if days&gt;100 then case = 0;<\/span><\/strong><\/div>\n<p>The labelling of individuals in this way was used to generate a random assignment of the individual as a case (1) or as a non-case (0). The proc univariate procedure was used to present descriptive statistics for individuals that were considered cases ((where=(case=1))and individuals that were censored (where=(case=0));<\/p>\n<div class=\"textbox\"><strong><span style=\"color: #0000ff\">proc univariate data=sample.zika(where=(case=1));<\/span><\/strong><br \/>\n<strong><span style=\"color: #0000ff\">var days;<\/span><\/strong><br \/>\n<strong><span style=\"color: #0000ff\">histogram days\/normal;<\/span><\/strong><br \/>\n<strong><span style=\"color: #0000ff\">title &#8216;Survivor function for zika virus plot of pdf&#8217;;<\/span><\/strong><br \/>\n<strong><span style=\"color: #0000ff\">label days =&#8217;days to infection&#8217;;<\/span><\/strong><\/div>\n<p>The results from the random number generator produced a mean days among cases of 60.49, for a sample of 70 individuals. These data also produced a 95% confidence interval for the mean of <em>60.49 \u00b1 6.45<\/em> which ranged from <em>54.03 to 66.93.<\/em><\/p>\n<p><strong>The UNIVARIATE Procedure &#8212; <\/strong><strong>Variable: days (days since exposure)<\/strong><\/p>\n<div style=\"margin: auto;\">\n<table>\n<thead>\n<tr>\n<td><strong>Moments<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>N<\/strong><\/td>\n<td>70<\/td>\n<td><strong>Sum Weights<\/strong><\/td>\n<td>70<\/td>\n<\/tr>\n<tr>\n<td><strong>Mean<\/strong><\/td>\n<td>60.4874286<\/td>\n<td><strong>Sum Observations<\/strong><\/td>\n<td>4234.12<\/td>\n<\/tr>\n<tr>\n<td><strong>Std Deviation<\/strong><\/td>\n<td>27.0551932<\/td>\n<td><strong>Variance<\/strong><\/td>\n<td>731.98348<\/td>\n<\/tr>\n<tr>\n<td><strong>Skewness<\/strong><\/td>\n<td>-0.2644189<\/td>\n<td><strong>Kurtosis<\/strong><\/td>\n<td>-1.1242295<\/td>\n<\/tr>\n<tr>\n<td><strong>Uncorrected SS<\/strong><\/td>\n<td>306617.891<\/td>\n<td><strong>Corrected SS<\/strong><\/td>\n<td>50506.8601<\/td>\n<\/tr>\n<tr>\n<td><strong>Coeff Variation<\/strong><\/td>\n<td>44.7286219<\/td>\n<td><strong>Std Error Mean<\/strong><\/td>\n<td>3.2337141<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div style=\"margin: auto;\">\n<table>\n<thead>\n<tr>\n<td><strong>Basic Confidence Limits Assuming Normality<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>Parameter<\/strong><\/td>\n<td><strong>Estimate<\/strong><\/td>\n<td><strong>95%\u00a0Confidence\u00a0Limits<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Mean<\/strong><\/td>\n<td>60.48743<\/td>\n<td>54.03635<\/td>\n<td>66.93851<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>The following code produced a set of percentiles from the data set for cases. These data show the percentage of the group being affected by a certain day. For example, 25% of the group were affected within 39.4 days of the start of the games. By day 96 some 90% of the cohort were infected with the Zika Virus. Note, these are not real data but were generated with a random number generator.<\/p>\n<div class=\"textbox\"><strong><span style=\"color: #0000ff\">output out=Pctls pctlpts\u00a0 = 25 40 50 60 75 90<\/span><\/strong><br \/>\n<strong><span style=\"color: #0000ff\">pctlpre\u00a0 = days_<\/span><\/strong><br \/>\n<strong><span style=\"color: #0000ff\">pctlname = pct25 pct40 pct50 pct60 pct75 pct90;<\/span><\/strong><br \/>\n<strong><span style=\"color: #0000ff\">proc print data= Pctls;<\/span><\/strong><br \/>\n<strong><span style=\"color: #0000ff\">run;<\/span><\/strong><\/div>\n<p><strong>Percentiles for days<\/strong><\/p>\n<div style=\"margin: auto;\">\n<table>\n<thead>\n<tr>\n<td><strong>Obs<\/strong><\/td>\n<td><strong>days_pct25<\/strong><\/td>\n<td><strong>days_pct40<\/strong><\/td>\n<td><strong>days_pct50<\/strong><\/td>\n<td><strong>days_pct60<\/strong><\/td>\n<td><strong>days_pct75<\/strong><\/td>\n<td><strong>days_pct90<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>1<\/strong><\/td>\n<td>39.4<\/td>\n<td>51.84<\/td>\n<td>66.34<\/td>\n<td>72.44<\/td>\n<td>84.08<\/td>\n<td>96.44<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div class=\"textbox\">\n<p><strong><span style=\"color: #0000ff\">proc univariate data=sample.zika(where=(case=0)) cibasic;<\/span><\/strong><br \/>\n<strong><span style=\"color: #0000ff\">var days;<\/span><\/strong><br \/>\n<strong><span style=\"color: #0000ff\">histogram days;<\/span><\/strong><br \/>\n<strong><span style=\"color: #0000ff\">title &#8216;Survivor function for zika virus plot of pdf &#8216;;<\/span><\/strong><br \/>\n<strong><span style=\"color: #0000ff\">label days =&#8217;days to infection&#8217;;<\/span><\/strong><\/p>\n<\/div>\n<p>The results from the random number generator produced a mean days among cases of 60.49, for a sample of 70 individuals. These data also produced a 95% confidence interval for the mean of <em>127.50 \u00b1 5.36<\/em> which ranged from <em>122.14 to 132.86.<\/em><\/p>\n<p><em>\u00a0<\/em><strong>The UNIVARIATE Procedure &#8212;\u00a0 <\/strong><strong>Variable: days (days since exposure)<\/strong><\/p>\n<div style=\"margin: auto;\">\n<table>\n<thead>\n<tr>\n<td><strong>Moments<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>N<\/strong><\/td>\n<td>30<\/td>\n<td><strong>Sum Weights<\/strong><\/td>\n<td>30<\/td>\n<\/tr>\n<tr>\n<td><strong>Mean<\/strong><\/td>\n<td>127.503333<\/td>\n<td><strong>Sum Observations<\/strong><\/td>\n<td>3825.1<\/td>\n<\/tr>\n<tr>\n<td><strong>Std Deviation<\/strong><\/td>\n<td>14.3410047<\/td>\n<td><strong>Variance<\/strong><\/td>\n<td>205.664416<\/td>\n<\/tr>\n<tr>\n<td><strong>Skewness<\/strong><\/td>\n<td>-0.3026793<\/td>\n<td><strong>Kurtosis<\/strong><\/td>\n<td>-1.0739046<\/td>\n<\/tr>\n<tr>\n<td><strong>Uncorrected SS<\/strong><\/td>\n<td>493677.268<\/td>\n<td><strong>Corrected SS<\/strong><\/td>\n<td>5964.26807<\/td>\n<\/tr>\n<tr>\n<td><strong>Coeff Variation<\/strong><\/td>\n<td>11.2475528<\/td>\n<td><strong>Std Error Mean<\/strong><\/td>\n<td>2.61829726<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div style=\"margin: auto;\">\n<table>\n<thead>\n<tr>\n<td><strong>Basic Confidence Limits Assuming Normality<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>Parameter<\/strong><\/td>\n<td><strong>Estimate<\/strong><\/td>\n<td><strong>95%\u00a0Confidence\u00a0Limits<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Mean<\/strong><\/td>\n<td>127.50333<\/td>\n<td>122.14831<\/td>\n<td>132.85835<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>The following code produced a set of percentiles from the data set for non-cases. As shown in the example above, these data show the percentage of the group being affected by a certain day. All individuals in this data set were censored as they had passed the 100 days demarcation point before being infected. This is the reason that individuals in a survival analysis are not dropped from the study but rather censored. The data show that even though an individual exceeded the time to an event, they were continued to be at risk for the event of interest.<\/p>\n<div class=\"textbox\"><strong><span style=\"color: #0000ff\">output out=Pctls pctlpts\u00a0 = 25 30 40 50 60 75 80 90 100<\/span><\/strong><br \/>\n<strong><span style=\"color: #0000ff\">pctlpre\u00a0 = days_<\/span><\/strong><br \/>\n<strong><span style=\"color: #0000ff\">pctlname = pct25 pct30 pct40 pct50 pct60 pct75 pct80 pct90 pct100;<\/span><\/strong><br \/>\n<strong><span style=\"color: #0000ff\">proc print data= Pctls;<\/span><\/strong><br \/>\n<strong><span style=\"color: #0000ff\">run;<\/span><\/strong><\/div>\n<p><strong>Percentiles for days<\/strong><\/p>\n<div style=\"margin: auto;\">\n<table>\n<thead>\n<tr>\n<td><strong>Obs<\/strong><\/td>\n<td><strong>days_pct25<\/strong><\/td>\n<td><strong>days_pct40<\/strong><\/td>\n<td><strong>days_pct50<\/strong><\/td>\n<td><strong>days_pct60<\/strong><\/td>\n<td><strong>days_pct75<\/strong><\/td>\n<td><strong>days_pct90<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>1<\/strong><\/td>\n<td>115.18<\/td>\n<td>125.24<\/td>\n<td>129.69<\/td>\n<td>133.47<\/td>\n<td>139<\/td>\n<td>146.24<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>In each of the proc univariate statements there was a call for a histogram to illustrate the distribution of the data for the variable days. The graphs of the histogram for each distribution for days in each of the cohorts (cases versus non-cases) are shown in Figure 19.4 below. Notice that in each distribution the number of days shows a slight negative skewness with more cases appearing after the mean days.<\/p>\n<p>&nbsp;<\/p>\n<table>\n<tbody>\n<tr>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>Figure 19.4 Comparison of the distribution days in each cohort<\/p>\n<p>&nbsp;<\/p>\n<p><strong>\u00a0<\/strong><\/p>\n<p><strong>\u00a0<\/strong><\/p>\n<p><strong>\u00a0<\/strong><\/p>\n<h4>Part 2: Creating Life Tables<\/h4>\n<p><strong>\u00a0<\/strong><\/p>\n<p>The survival analysis applications using METHOD=LIFE in the PROC LIFETEST procedure are presented in this section:<\/p>\n<p>&nbsp;<\/p>\n<p>In this first stage of survival processing we can observe the influence of censoring the data. Recall that initially the data are censored at 100 days. Censoring was accomplished by creating the variable days, described above and then combined with the binary variable case. If an individual had a days score of less than 100 then they were assigned to the cohort of cases. Conversely, if the individual had a days score exceeding 100 then they were censored and assigned to the non-cases cohort.<\/p>\n<p>&nbsp;<\/p>\n<p>The SAS code to compute the survival curve for the entire data set is given here:<\/p>\n<p>&nbsp;<\/p>\n<p>proc sort data=sample.zika; by case;<\/p>\n<p>PROC LIFETEST METHOD=LIFE plots=(s) data=sample.zika notable;<\/p>\n<p>time days ;<\/p>\n<p>format\u00a0 case casefmt. ;<\/p>\n<p>title &#8216;Survivor function for zika virus &#8211; implicit right censoring of cases&#8217;;<\/p>\n<p>label days =&#8217;days to infection&#8217;;<\/p>\n<p>&nbsp;<\/p>\n<p>This SAS code produced the image shown in Figure 19.5, below, which is the survival probability curve for the entire sample of N=100 cases monitored over 150 days. Notice that there is an inflection point in the curve at 100 days. This inflection point corresponds to the censoring limit of 100 days and is shown more explicitly in Figure 19.6 where we change the command time days; to the command: time days * case(0);<\/p>\n<p>&nbsp;<\/p>\n<p>Figure 19.5 Life Table Survival curve for all individuals in the data set<\/p>\n<p>&nbsp;<\/p>\n<p>Figure 19.6 Life Table Survival curve with explicit right censoring at 100 days<\/p>\n<p>&nbsp;<\/p>\n<p>Figure 19.6 above shows the survival probability for each event among the cases and holds the non-cases constant at a probability level of 0.3. Further, when we include the censoring criteria using the command:\u00a0 time days * case(0); a summary table indicating the number of cases that fail prior to the demarcation point (100 days) and the number of cases that exceed the demarcation point is also included, as shown here.<\/p>\n<p>&nbsp;<\/p>\n<div style=\"margin: auto;\">\n<table>\n<thead>\n<tr>\n<td><strong>Summary of the Number of Censored and Uncensored Values<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>Total<\/strong><\/td>\n<td><strong>Failed<\/strong><\/td>\n<td><strong>Censored<\/strong><\/td>\n<td><strong>Percent Censored<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>100<\/td>\n<td>70<\/td>\n<td>30<\/td>\n<td>30.00<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Next we include a command to show the differences in time to event with a grouping variable. Here we use the strata command to group the data by sex, while maintaining the influence of censoring at 100 days.<\/p>\n<p>PROC LIFETEST METHOD=LIFE plots=(s)data=sample.zika notable;<\/p>\n<p>time days * case(0) ;<\/p>\n<p>strata sex;<\/p>\n<p>format case casefmt. sex sexfmt. ;<\/p>\n<p>&nbsp;<\/p>\n<p>The code produces a summary table of the number of males and females that failed or exceeded the demarcation point of 100 days and a graph of the survival probability curves for male and females.<\/p>\n<p><strong>\u00a0<\/strong><\/p>\n<div style=\"margin: auto;\">\n<table>\n<thead>\n<tr>\n<td><strong>Summary of the Number of Censored and Uncensored Values<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>Stratum<\/strong><\/td>\n<td><strong>sex<\/strong><\/td>\n<td><strong>Total<\/strong><\/td>\n<td><strong>Failed<\/strong><\/td>\n<td><strong>Censored<\/strong><\/td>\n<td><strong>Percent Censored<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>1<\/strong><\/td>\n<td>female<\/td>\n<td>71<\/td>\n<td>52<\/td>\n<td>19<\/td>\n<td>26.76<\/td>\n<\/tr>\n<tr>\n<td><strong>2<\/strong><\/td>\n<td>male<\/td>\n<td>29<\/td>\n<td>18<\/td>\n<td>11<\/td>\n<td>37.93<\/td>\n<\/tr>\n<tr>\n<td><strong>Total<\/strong><\/td>\n<td><\/td>\n<td>100<\/td>\n<td>70<\/td>\n<td>30<\/td>\n<td>30.00<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Figure 19.7 Life Table Survival Curves With Explicit Right Censoring at 100 Days for Males and Females<\/p>\n<p>&nbsp;<\/p>\n<p>In this next analysis we separate the data using strata=sport, while maintaining the right censoring of the data at 100 days. As shown in the approach used to separate the data by sex, this code produces a summary table of the number of individuals in each of the sport groups that failed or exceeded the demarcation point of 100 days as well as a graph of the survival probability curves for each sport.<\/p>\n<p>&nbsp;<\/p>\n<div style=\"margin: auto;\">\n<table>\n<thead>\n<tr>\n<td><strong>Summary of the Number of Censored and Uncensored Values<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>Stratum<\/strong><\/td>\n<td><strong>sport<\/strong><\/td>\n<td><strong>Total<\/strong><\/td>\n<td><strong>Failed<\/strong><\/td>\n<td><strong>Censored<\/strong><\/td>\n<td><strong>Percent<br \/>\nCensored<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>1<\/strong><\/td>\n<td>equestrian<\/td>\n<td>22<\/td>\n<td>15<\/td>\n<td>7<\/td>\n<td>31.82<\/td>\n<\/tr>\n<tr>\n<td><strong>2<\/strong><\/td>\n<td>golf<\/td>\n<td>23<\/td>\n<td>19<\/td>\n<td>4<\/td>\n<td>17.39<\/td>\n<\/tr>\n<tr>\n<td><strong>3<\/strong><\/td>\n<td>gymnastics<\/td>\n<td>24<\/td>\n<td>14<\/td>\n<td>10<\/td>\n<td>41.67<\/td>\n<\/tr>\n<tr>\n<td><strong>4<\/strong><\/td>\n<td>swimming<\/td>\n<td>17<\/td>\n<td>13<\/td>\n<td>4<\/td>\n<td>23.53<\/td>\n<\/tr>\n<tr>\n<td><strong>5<\/strong><\/td>\n<td>track &amp; field<\/td>\n<td>14<\/td>\n<td>9<\/td>\n<td>5<\/td>\n<td>35.71<\/td>\n<\/tr>\n<tr>\n<td><strong>Total<\/strong><\/td>\n<td><\/td>\n<td>100<\/td>\n<td>70<\/td>\n<td>30<\/td>\n<td>30.00<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>Figure 19.8 Life Table Survival Curves With Explicit Right Censoring at 100 Days for Sport Groups<\/p>\n<p>&nbsp;<\/p>\n<h4>Part 3: The Kaplan-Meier Approach<\/h4>\n<h4><\/h4>\n<p>The Kaplan-Meier approach to survival analysis differs slightly from the applications using METHOD=LIFE in the PROC LIFETEST procedure.\u00a0 When we use the METHOD=KM in the PROC LIFETEST procedure we generate a series of survival probability estimates referred to as the Kaplan-Meier estimates (heretofore referred to as the KM estimates), and corresponding survival probability curves for the KM estimates.<\/p>\n<p>&nbsp;<\/p>\n<p>In the KM estimates values are given for the probability change each time an individual becomes a case up to the demarcation point of 100 days. This approach is more precise in reporting the time at event and does not summarize the data across an interval as is done with the METHOD=LIFE in the PROC LIFETEST procedure.<\/p>\n<p>&nbsp;<\/p>\n<p>A comparison of the output from the METHOD=LIFE and the METHOD=KM is shown in the comparison of the tables up to the first 12 cases that became infected.\u00a0 Notice that the METHOD=LIFE approach summarizes the estimates within a set of intervals, while the METHOD=KM approach provides the continuous probability values for each individual within the cohort of interest.<\/p>\n<p>&nbsp;<\/p>\n<p>Table 19.5 Survivor function for Zika virus using METHOD = LIFE in Proc Lifetest<\/p>\n<p><strong>\u00a0<\/strong><\/p>\n<table>\n<tbody>\n<tr>\n<td><strong>Days Interval<\/strong><\/td>\n<td><strong>Abbreviated table showing results for The LIFETEST Procedure<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>Lower interval<\/strong><\/td>\n<td><strong>Upper interval<\/strong><\/td>\n<td><strong>Number failed<\/strong><\/td>\n<td><strong>Number censored<\/strong><\/td>\n<td><strong>Effective sample size<\/strong><\/td>\n<td><strong>Conditional probability of failure<\/strong><\/td>\n<td><strong>Conditional probability of failure Standard error<\/strong><\/td>\n<td><strong>Survival<\/strong><\/td>\n<td><strong>Failure<\/strong><\/td>\n<td><strong>Survival Standard error<\/strong><\/td>\n<\/tr>\n<tr>\n<td>0<\/td>\n<td>20<\/td>\n<td>6<\/td>\n<td>0<\/td>\n<td>100.0<\/td>\n<td>0.0600<\/td>\n<td>0.0237<\/td>\n<td>1.0000<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<\/tr>\n<tr>\n<td>20<\/td>\n<td>40<\/td>\n<td>12<\/td>\n<td>0<\/td>\n<td>94.0<\/td>\n<td>0.1277<\/td>\n<td>0.0344<\/td>\n<td>0.9400<\/td>\n<td>0.06<\/td>\n<td>0.023<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>When we use the METHOD=KM approach in the PROC LIFETEST procedure the following estimates are generated. Note these estimates only refer to the first 12 cases designated as infected within the original data set of n=100 cases.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>Table 19.6 Survivor function for Zika virus using METHOD = KM in Proc Lifetest<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Abbreviated table showing results for The LIFETEST Procedure<\/strong><\/p>\n<p><strong>\u00a0<\/strong><\/p>\n<div style=\"margin: auto;\">\n<table>\n<thead>\n<tr>\n<td><strong>Days<\/strong><\/td>\n<td><strong>Survival<\/strong><\/td>\n<td><strong>Failure<\/strong><\/td>\n<td><strong>Survival Standard Error<\/strong><\/td>\n<td><strong>Number<br \/>\nFailed<\/strong><\/td>\n<td><strong>Number<br \/>\nRemaining<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>0.000<\/strong><\/td>\n<td>1.0000<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>100<\/td>\n<\/tr>\n<tr>\n<td><strong>8.800<\/strong><\/td>\n<td>0.9900<\/td>\n<td>0.0100<\/td>\n<td>0.00995<\/td>\n<td>1<\/td>\n<td>99<\/td>\n<\/tr>\n<tr>\n<td><strong>11.780<\/strong><\/td>\n<td>0.9800<\/td>\n<td>0.0200<\/td>\n<td>0.0140<\/td>\n<td>2<\/td>\n<td>98<\/td>\n<\/tr>\n<tr>\n<td><strong>12.540<\/strong><\/td>\n<td>0.9700<\/td>\n<td>0.0300<\/td>\n<td>0.0171<\/td>\n<td>3<\/td>\n<td>97<\/td>\n<\/tr>\n<tr>\n<td><strong>12.800<\/strong><\/td>\n<td>0.9600<\/td>\n<td>0.0400<\/td>\n<td>0.0196<\/td>\n<td>4<\/td>\n<td>96<\/td>\n<\/tr>\n<tr>\n<td><strong>14.120<\/strong><\/td>\n<td>0.9500<\/td>\n<td>0.0500<\/td>\n<td>0.0218<\/td>\n<td>5<\/td>\n<td>95<\/td>\n<\/tr>\n<tr>\n<td><strong>15.860<\/strong><\/td>\n<td>0.9400<\/td>\n<td>0.0600<\/td>\n<td>0.0237<\/td>\n<td>6<\/td>\n<td>94<\/td>\n<\/tr>\n<tr>\n<td><strong>21.240<\/strong><\/td>\n<td>0.9300<\/td>\n<td>0.0700<\/td>\n<td>0.0255<\/td>\n<td>7<\/td>\n<td>93<\/td>\n<\/tr>\n<tr>\n<td><strong>22.560<\/strong><\/td>\n<td>0.9200<\/td>\n<td>0.0800<\/td>\n<td>0.0271<\/td>\n<td>8<\/td>\n<td>92<\/td>\n<\/tr>\n<tr>\n<td><strong>24.720<\/strong><\/td>\n<td>0.9100<\/td>\n<td>0.0900<\/td>\n<td>0.0286<\/td>\n<td>9<\/td>\n<td>91<\/td>\n<\/tr>\n<tr>\n<td><strong>27.280<\/strong><\/td>\n<td>0.9000<\/td>\n<td>0.1000<\/td>\n<td>0.0300<\/td>\n<td>10<\/td>\n<td>90<\/td>\n<\/tr>\n<tr>\n<td><strong>27.780<\/strong><\/td>\n<td>.<\/td>\n<td>.<\/td>\n<td>.<\/td>\n<td>11<\/td>\n<td>89<\/td>\n<\/tr>\n<tr>\n<td><strong>27.780<\/strong><\/td>\n<td>0.8800<\/td>\n<td>0.1200<\/td>\n<td>0.0325<\/td>\n<td>12<\/td>\n<td>88<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p>The difference in the two methods is further exemplified in the comparison of the two survival curves shown in Figure 19.9. The survival curve for the METHOD=LIFE approach is a summary curve while the survival curve for the METHOD=KM approach shows more precise estimates of failures (individuals reporting infection) over the entire time interval. In both curves the data are right censored at days=100, and as such no survival probabilities are reported for individuals that have not become a case as of the 100 days demarcation point, in the data set.<\/p>\n<p>&nbsp;<\/p>\n<table>\n<tbody>\n<tr>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>Survival probability curve using Method=LIFE in SAS Proc lifetest<\/td>\n<td>Survival probability curve using Method=KM in SAS Proc lifetest<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>Figure 19.9 Comparison of Survival Curves With Explicit Right Censoring for life-table analysis versus Kaplan-Meier estimation<\/p>\n<p>&nbsp;<\/p>\n<h4>Part 4: Comparing Kaplan-Meier Survival Estimates with Log Rank and Wilcoxon Tests<\/h4>\n<p>&nbsp;<\/p>\n<p>In the PROC LIFETEST procedure we can evaluate the difference between survival probability curves by computing two non-parametric tests: i) the Log Rank Test and ii) the Wilcoxon test. The tests are computed with the PROC LIFETEST procedure when including the strata command, as shown here:<\/p>\n<p>&nbsp;<\/p>\n<p>PROC LIFETEST plots=(s) data=sample.zika2 ;<\/p>\n<p>time days * case(0);<\/p>\n<p>strata sex;<\/p>\n<p>format case casefmt. sex sexfmt. ;<\/p>\n<p>title &#8216;Kaplan Meier Estimates with log rank and Wilcoxon tests\u2019;<\/p>\n<p>label days =&#8217;days to infection&#8217;;<\/p>\n<p>&nbsp;<\/p>\n<p>The strata command separates the computation of survival probabilities by different subgroups of the variable used in the strata command.\u00a0 In our Zika data set, survival probabilities are estimated for the males and females in the observed sample.\u00a0 The graphical illustration of the survival probability curves is shown in Figure 19.10 below and the statistical comparison of the survival curves is shown in the following two tests.<\/p>\n<p>&nbsp;<\/p>\n<p>The Log-Rank test and the Wilcoxon test are two non-parametric tests that enable users to compare the survival probability curves based on Kaplan-Meier Survival Estimates for each subgroup within designated strata. The results for the comparison of the Survival Probability Curves for males versus females are shown here.<\/p>\n<p>Table 19.7 Test to evaluate the survival curves<\/p>\n<p>&nbsp;<\/p>\n<div style=\"margin: auto;\">\n<table>\n<tbody>\n<tr>\n<td><strong>Test<\/strong><\/td>\n<td>Chi-Square<\/td>\n<td>DF<\/td>\n<td>Pr &gt; Chi square<\/td>\n<\/tr>\n<tr>\n<td><strong>Log-Rank<\/strong><\/td>\n<td>2.8240<\/td>\n<td>1<\/td>\n<td>0.0929<\/td>\n<\/tr>\n<tr>\n<td><strong>Wilcoxon<\/strong><\/td>\n<td>4.2191<\/td>\n<td>1<\/td>\n<td>0.0400<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p>The p value indicates that the difference in survival curves for males versus females was found to be significantly different at p&lt;0.04 for the Wilcoxon test, while the difference was significant at p&lt;0.09 when tested using the log-rank test.\u00a0 The overall conclusion from this test is that the curves for the two survival probabilities were different. However, it should be noted that the Log-Rank test is the more powerful of the two tests because it is based on the assumption that the proportional hazard rate is constant at each time point.\u00a0 This means that the likelihood for an individual to be infected (i.e. become a case) is constant across all time points for all individuals<a href=\"#_ftn3\">[3]<\/a>.<\/p>\n<p>&nbsp;<\/p>\n<p>Figure 19.10 illustrates the survival probability curves for males versus females in our Zika dataset. These curves are based on the product-limit estimates (aka Kaplan-Meier estimates) for the survival probability series within each level of the strata. Notice that the two survival curves cross early in the recording. This cross over of KM curves corresponds to the p value identified with the Wilcoxon analysis.\u00a0 In the statistical comparison of survival curves a stronger Wilcoxon outcome is likely to occur when one of the comparison groups has a higher risk of demonstrating the time to the event (becoming a case) earlier in the recording, versus a higher risk of being infected later. The higher risk of being infected (i.e. failing, dying, becoming a case) corresponds with a higher number of days to the event which increases the likelihood of a significant log-rank test outcome if this is demonstrated by one group more than another.<\/p>\n<p>&nbsp;<\/p>\n<p>Figure 19.10 Comparison of Survival Curves With Explicit Right Censoring for Kaplan-Meier estimation of males versus females<\/p>\n<p>&nbsp;<\/p>\n<p><strong>\u00a0<\/strong><\/p>\n<h4>Part 5: Computing the Cox Proportional Hazard Regression Analysis<\/h4>\n<p>The data in a survival analysis can be used in a special type of regression procedure known as the proportional hazard model. This approach to using regression modeling was developed by Cox<a href=\"#_ftn4\">[4]<\/a> and builds on the regression approaches that we have discussed earlier in this text.<\/p>\n<p>In simple linear regression we can create equations in which a predictor variable, or set of predictor variables are used to explain the variance in an outcome variable (the dependent variable), as shown in the following simple linear regression and multiple regression equations.<\/p>\n<p>A simple straight-line or linear regression equation:<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>where: \u00a0is the dependent variable,\u00a0is the slope element by which we adjust the predictor () variable,\u00a0 \u00a0is the independent or predictor variable, and \u00a0is the<\/p>\n<p>\u2013 intercept (i.e. the point where the response graph crosses the vertical axis).<\/p>\n<p>The simple linear regression equation in its most basic form helps us to understand the relationship between two variables, one designated as the and the other designated as the . Together these variables help us to predictor or explain an outcome, while adjusting for the variance between the two measures.<\/p>\n<p>A multiple regression equation:<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>where: \u00a0is the dependent variable,\u00a0is the slope element by which we adjust the predictor () variable, \u00a0is the independent or predictor variable, and \u00a0is the<\/p>\n<p>\u2013 intercept. In this equation, the subscript <em>(i)<\/em> is a counter for each of the predictor variables used in the equation.<\/p>\n<p>&nbsp;<\/p>\n<p>The multiple linear regression equation is an expansion of the simple linear regression, and under a univariate model has one but two or more Again, the regression procedure helps us to predict or explain the outcome \u00a0while adjusting for the variance in the predictor () variables. In multiple regression we can determine the slope of a predictor variable \u2013 the coefficient by which the variable is multiplied, while holding all other variables in the model constant. In this way we are able to determine the significance of each variable in the equation with respect to all of the variables in the equation.<\/p>\n<p>&nbsp;<\/p>\n<p>In the Cox proportional hazard regression, also referred to as the Cox regression, the concepts of simple and multiple regression equations are the same, however the dependent variable is comprised not of a single scalar score, but rather of the hazard function representing the relationship between survival probability and time to an event.<\/p>\n<p>&nbsp;<\/p>\n<p>As stated earlier, the hazard function provides an estimate of an event happening by a given time or within a given interval of time.\u00a0 The hazard function does not provide a probability estimate; therefore the estimate can exceed 1. Rather the hazard function indicates how likely an event is expected to occur by a given time.<\/p>\n<p>&nbsp;<\/p>\n<p>In the computation of the Cox regression we develop a statistical regression model comprised of a dependent variable which consists of a hazard function and a set of independent variables which consist of predictors of the dependent variable, all based on a time based distribution referred to as the Weibull distribution. The Weibull distribution is familiar to the field of engineering because it is helpful in describing reliability and failure of a measured device over time.\u00a0 The applicable characteristic of the Weibull distribution for survival analysis is that it provides a mathematical foundation for failure rate throughout the lifetime of a measurement period. In the Weibull distribution the failure rate is shown to decrease with time reaching a plateau that is relatively constant<a href=\"#_ftn5\">[5]<\/a>. The Weibull distribution fits applications for survival analysis since higher failure rates (i.e. time to an event) occur more often prior to the censoring demarcation point as shown in Figure 19.11.<\/p>\n<p>&nbsp;<\/p>\n<p>Figure 19.11 Schematic of a Weibull distribution<\/p>\n<p>&nbsp;<\/p>\n<p>As in the application of simple and multiple linear regression procedures, in the application of the Cox regression the user can establish regression coefficients for each of the predictors of the dependent variable to determine the magnitude and direction of the predictor acting on the dependent variable.<\/p>\n<p>&nbsp;<\/p>\n<p>In our Zika virus example, we use Cox regression to determine the risk of infection based on the ratio of the probability density function and survival probabilities for time to infection as the dependent variable, and individual\u2019s sex and sport as predictor variables.<\/p>\n<p>&nbsp;<\/p>\n<p>In other words, using the simulated dataset for the Olympic athletes and Cox regression we can evaluate the likelihood of being infected with Zika virus based on whether the individual was male or female, and the type of Olympic sport in which they were participating.<\/p>\n<p>&nbsp;<\/p>\n<p>In the following sample code we use the proc phreg; procedure to produce output for the Cox Proportional Hazard Function. However, it is good practice to explain the overall model that we are testing. Here our hazard function is based on the number of days to infection, and the covariates are sex and sport type, along with the interaction of sex by sport type.<\/p>\n<p>&nbsp;<\/p>\n<p>proc phreg plots=survival;<\/p>\n<p>class sex sport;<\/p>\n<p>model days*case(0) = sex sport sex_sport;<\/p>\n<p>title &#8216;Cox Proportional Hazard Analysis for Zika Virus by sex and sport&#8217;;<\/p>\n<p>label days =&#8217;days to infection&#8217;;<\/p>\n<p>&nbsp;<\/p>\n<p>The output shown below provides a graphic image of the survival curve and associated tables representing the statistical analyses.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv4-300x221.png\" alt=\"\" class=\"aligncenter wp-image-1044\" width=\"381\" height=\"281\" srcset=\"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv4-300x221.png 300w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv4-768x566.png 768w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv4-1024x755.png 1024w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv4-65x48.png 65w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv4-225x166.png 225w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv4-350x258.png 350w, https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-content\/uploads\/sites\/49\/2020\/06\/surv4.png 1200w\" sizes=\"auto, (max-width: 381px) 100vw, 381px\" \/><\/p>\n<p style=\"text-align: center\">Plot of the survival probability curve from proc phreg<\/p>\n<p>The summary table of the number of cases that exceeded the censoring demarcation point is presented in Table 19.8 below. The results indicate that 30 of the 100 simulated cases.<\/p>\n<p style=\"text-align: center\"><strong>Table of Proportion of Censored Observations from the Survival Curves<\/strong><\/p>\n<div style=\"margin: auto;\">\n<table>\n<thead>\n<tr>\n<td><strong>Summary of the Number of Event and Censored Values<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>Total<\/strong><\/td>\n<td><strong>Event<\/strong><\/td>\n<td><strong>Censored<\/strong><\/td>\n<td><strong>Percent<br \/>\nCensored<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>100<\/td>\n<td>70<\/td>\n<td>30<\/td>\n<td>30.00<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>Next, the model fit statistics are presented followed by the test of the null hypothesis that the predictor variables as greater than 0. The model fit statistics are most often used when comparing more than one model, in which case we evaluate the AIC criteria to select the lowest value as suggesting a more appropriate fitting model. In the example shown here, this output is less relevant as we on have one model to consider. The column representing <strong><em>With Covariates<\/em><\/strong> is important to consider as it indicates that as we add predictor variables to the equation we decrease the criteria value, whereby lower values are considered to represent a better fit.<\/p>\n<p style=\"text-align: center\">Table of a Model Fit Statistics for the Application of the Cox PHREG<\/p>\n<div style=\"margin: auto;\">\n<table>\n<thead>\n<tr>\n<td><strong>Model Fit Statistics<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>Criterion<\/strong><\/td>\n<td><strong>Without<br \/>\nCovariates<\/strong><\/td>\n<td><strong>With<br \/>\nCovariates<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>-2 LOG L<\/strong><\/td>\n<td>578.185<\/td>\n<td>569.236<\/td>\n<\/tr>\n<tr>\n<td><strong>AIC<\/strong><\/td>\n<td>578.185<\/td>\n<td>581.236<\/td>\n<\/tr>\n<tr>\n<td><strong>SBC<\/strong><\/td>\n<td>578.185<\/td>\n<td>594.727<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>The main outputs for us to consider from the application of the <span style=\"color: #0000ff\"><strong>proc phreg;<\/strong> <\/span>procedure for this example are the tables of test for Global Null Hypothesis: Beta=0 and the Analysis of the Maximum Likelihood, shown below. The test of the Global Null Hypothesis: Beta=0 is suggesting that the predictor variables do not have an effect on the calculated value of the hazard function.<\/p>\n<p style=\"text-align: center\">Table of Tests of Beta=0 for the Application of the Cox PHREG<\/p>\n<div style=\"margin: auto;\">\n<table>\n<thead>\n<tr>\n<td><strong>Testing Global Null Hypothesis: BETA=0<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>Test<\/strong><\/td>\n<td><strong>Chi-Square<\/strong><\/td>\n<td><strong>DF<\/strong><\/td>\n<td><strong>Pr\u00a0&gt;\u00a0ChiSq<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Likelihood Ratio<\/strong><\/td>\n<td>8.9485<\/td>\n<td>6<\/td>\n<td>0.1765<\/td>\n<\/tr>\n<tr>\n<td><strong>Score<\/strong><\/td>\n<td>9.8384<\/td>\n<td>6<\/td>\n<td>0.1316<\/td>\n<\/tr>\n<tr>\n<td><strong>Wald<\/strong><\/td>\n<td>9.3857<\/td>\n<td>6<\/td>\n<td>0.1530<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>The results presented in the table above for the test of the Global Null Hypothesis: Beta=0 illustrate the results of three tests of the null hypothesis: i) the likelihood ratio test, ii) the Score test, and iii) the Wald test.\u00a0 Notice that the probability estimates for each Chi-square test are similar in that none of the p values supported a significant difference between the predictor variables and 0.<\/p>\n<p>Since the predictor variables included the example were discrete class variables (no continuous covariates were included in the model), we also included the class sex sport; statement in the proc phreg; procedure. The output generated a table of the Type 3 tests (also referred to as Joint tests) to determine if each of the categorical discrete variables were significantly different than 0. The results of the Wald Chi-square statistic indicate that there was no significant effect of any of the categorical variables on the computed hazard function for the days to infection from the Zika virus.<\/p>\n<p style=\"text-align: center\">Table of Type 3 Tests from Proc PHREG<\/p>\n<div style=\"margin: auto;\">\n<table>\n<thead>\n<tr>\n<td><strong>Type 3 Tests<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>Effect<\/strong><\/td>\n<td><strong>DF<\/strong><\/td>\n<td><strong>Wald Chi-Square<\/strong><\/td>\n<td><strong>Pr\u00a0&gt;\u00a0ChiSq<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>sex<\/strong><\/td>\n<td>1<\/td>\n<td>0.5229<\/td>\n<td>0.4696<\/td>\n<\/tr>\n<tr>\n<td><strong>sport<\/strong><\/td>\n<td>4<\/td>\n<td>1.3533<\/td>\n<td>0.8523<\/td>\n<\/tr>\n<tr>\n<td><strong>sex_sport<\/strong><\/td>\n<td>1<\/td>\n<td>0.0311<\/td>\n<td>0.8601<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p style=\"text-align: left\">The maximum likelihood estimates produced by the SAS <span style=\"color: #0000ff\"><em><strong>proc phreg<\/strong><\/em><\/span> enable us to provide the parameter estimates that correspond to the predictor variables included in the regression equation.\u00a0 The underlying algebraic regression equation<a href=\"#_ftn6\">[6]<\/a> for the Cox Proportional Hazard Model is given as:<\/p>\n<p style=\"text-align: center\">[latex]h(t) = h_0 (t)exp(x\\beta_{x})[\/latex]<\/p>\n<p>Therefore, the parameter estimates refer to the coefficients for each predictor variable in the equation.<\/p>\n<p style=\"text-align: center\">Maximum Likelihood Estimates from PROC PHREG<\/p>\n<div style=\"margin: auto;\">\n<table>\n<thead>\n<tr>\n<td><strong>Parameter<\/strong><\/td>\n<td><\/td>\n<td><strong>DF<\/strong><\/td>\n<td><strong>Parameter<br \/>\nEstimate<\/strong><\/td>\n<td><strong>Standard<br \/>\nError<\/strong><\/td>\n<td><strong>Chi-Square<\/strong><\/td>\n<td><strong>Pr\u00a0&gt;\u00a0ChiSq<\/strong><\/td>\n<td><strong>Hazard<br \/>\nRatio<\/strong><\/td>\n<td><strong>Label<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>sex<\/strong><\/td>\n<td><strong>1<\/strong><\/td>\n<td>1<\/td>\n<td>-0.49114<\/td>\n<td>0.67917<\/td>\n<td>0.5229<\/td>\n<td>0.4696<\/td>\n<td>0.612<\/td>\n<td>sex 1<\/td>\n<\/tr>\n<tr>\n<td><strong>sport<\/strong><\/td>\n<td><strong>1<\/strong><\/td>\n<td>1<\/td>\n<td>0.47592<\/td>\n<td>1.51352<\/td>\n<td>0.0989<\/td>\n<td>0.7532<\/td>\n<td>1.609<\/td>\n<td>sport 1<\/td>\n<\/tr>\n<tr>\n<td><strong>sport<\/strong><\/td>\n<td><strong>2<\/strong><\/td>\n<td>1<\/td>\n<td>0.00554<\/td>\n<td>1.11315<\/td>\n<td>0.0000<\/td>\n<td>0.9960<\/td>\n<td>1.006<\/td>\n<td>sport 2<\/td>\n<\/tr>\n<tr>\n<td><strong>sport<\/strong><\/td>\n<td><strong>3<\/strong><\/td>\n<td>1<\/td>\n<td>0.13256<\/td>\n<td>0.80301<\/td>\n<td>0.0273<\/td>\n<td>0.8689<\/td>\n<td>1.142<\/td>\n<td>sport 3<\/td>\n<\/tr>\n<tr>\n<td><strong>sport<\/strong><\/td>\n<td><strong>4<\/strong><\/td>\n<td>1<\/td>\n<td>-0.13920<\/td>\n<td>0.52854<\/td>\n<td>0.0694<\/td>\n<td>0.7923<\/td>\n<td>0.870<\/td>\n<td>sport 4<\/td>\n<\/tr>\n<tr>\n<td><strong>sex_sport<\/strong><\/td>\n<td><\/td>\n<td>1<\/td>\n<td>-0.03689<\/td>\n<td>0.20926<\/td>\n<td>0.0311<\/td>\n<td>0.8601<\/td>\n<td>0.964<\/td>\n<td><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>The results presented in the table above indicate that none of the predictor variables produced a significant parameter estimate, therefore we can conclude that the days to infection were not different by gender nor the sport in which the athlete participated.<\/p>\n<hr \/>\n<div>\n<div>\n<p><a href=\"#_ftnref1\">[1]<\/a> Wicklin, R. (2011) <a href=\"http:\/\/blogs.sas.com\/content\/iml\/2011\/10\/19\/four-essential-functions-for-statistical-programmers.html\">http:\/\/blogs.sas.com\/content\/iml\/2011\/10\/19\/four-essential-functions-for-statistical-programmers.html<\/a><\/p>\n<p><a href=\"#_ftnref2\">[2]<\/a>\u00a0 Introduction to Survival Analysis in SAS<strong>.<\/strong>UCLA: Statistical Consulting Group.From <a href=\"http:\/\/www.ats.ucla.edu\/stat\/sas\/seminars\/sas_survival\/\">http:\/\/www.ats.ucla.edu\/stat\/sas\/seminars\/sas_survival\/<\/a> (accessed Feb 20, 2017)<\/p>\n<\/div>\n<div>\n<p><a href=\"#_ftnref3\">[3]<\/a> Bewick, V., Cheek, L., Ball, J., Statistics review 12: Survival analysis, Critical Care 2004, 8:389-394.<\/p>\n<\/div>\n<div>\n<p><a href=\"#_ftnref4\">[4]<\/a> The Cox Proportional Hazard regression is based on Sir David Cox 1972 paper: Regression Models and Life-Tables (1972),\u00a0 J. R. Stat. Soc. B, 34:187\u2013220).<\/p>\n<\/div>\n<div>\n<p><a href=\"#_ftnref5\">[5]<\/a> The weibull.com reliability engineering resource website is a service of ReliaSoft Corporation.<br \/>\nCopyright \u00a9 1992 &#8211;\u00a02017 ReliaSoft Corporation. All Rights Reserved.<\/p>\n<\/div>\n<div>\n<p><a href=\"#_ftnref6\">[6]<\/a> Introduction to Survival Analysis in SAS<strong>.<\/strong>UCLA: Statistical Consulting Group.From <a href=\"http:\/\/www.ats.ucla.edu\/stat\/sas\/seminars\/sas_survival\/\">http:\/\/www.ats.ucla.edu\/stat\/sas\/seminars\/sas_survival\/<\/a> (accessed Feb 20, 2017)<\/p>\n<\/div>\n<\/div>\n","protected":false},"author":56,"menu_order":4,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-996","chapter","type-chapter","status-publish","hentry"],"part":982,"_links":{"self":[{"href":"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-json\/pressbooks\/v2\/chapters\/996","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-json\/wp\/v2\/users\/56"}],"version-history":[{"count":40,"href":"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-json\/pressbooks\/v2\/chapters\/996\/revisions"}],"predecessor-version":[{"id":1046,"href":"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-json\/pressbooks\/v2\/chapters\/996\/revisions\/1046"}],"part":[{"href":"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-json\/pressbooks\/v2\/parts\/982"}],"metadata":[{"href":"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-json\/pressbooks\/v2\/chapters\/996\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-json\/wp\/v2\/media?parent=996"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-json\/pressbooks\/v2\/chapter-type?post=996"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-json\/wp\/v2\/contributor?post=996"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-json\/wp\/v2\/license?post=996"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}