{"id":371,"date":"2020-04-10T07:34:26","date_gmt":"2020-04-10T11:34:26","guid":{"rendered":"http:\/\/pressbooks.library.upei.ca\/montelpare\/?post_type=chapter&#038;p=371"},"modified":"2020-08-24T14:16:19","modified_gmt":"2020-08-24T18:16:19","slug":"percentiles","status":"publish","type":"chapter","link":"https:\/\/pressbooks.library.upei.ca\/montelpare\/chapter\/percentiles\/","title":{"raw":"Percentiles","rendered":"Percentiles"},"content":{"raw":"<h1>What is a percentile?<\/h1>\r\nThe term \"per cent\" refers to \"per 100\", and thus a percentile is a score representing a value relative to a base 100 scale.\r\n\r\nThe computation of percentiles is a useful way to evaluate scores within a frequency distribution, ie. the set of frequency scores.\r\n\r\nThe percentile provides a baseline at which a given proportion of scores will fall.\r\n\r\nIn other words, if we consider the 60th percentile, then we are suggesting that 60% of the scores in a distribution or set of scores will fall below that particular value.\r\n\r\nPercentiles always refer to a specific position within a frequency distribution.\r\n\r\n<hr \/>\r\n\r\nFormulas to compute percentiles for grouped data:\r\n\r\ni) [latex]{k} = (\\frac{frequency}{N} \\times{100})[\/latex]\r\n\r\n<span class=\"comment-copy\"><code><\/code><\/span>\r\n\r\nii) [latex]{\\beta} = (\\frac{<span>\\textit{Cumulative Frequency for all scores below the Category of Interest<\/span>}}{N}) \\times{100})[\/latex]\r\n\r\niii) [latex]<span>\\textit{<\/span>Percentile}={\\beta} + (0.5 \\times{k})[\/latex]\r\n\r\nThe 0.5 is used to compute half of the number of scores within the category in which the number of interest resides.\r\n\r\nConsider computing the percentile for the <strong>score 71<\/strong> in the frequency distribution shown in Table 14.1\r\n\r\nTable 14.1\u00a0 Frequency Distribution Output\r\n<table class=\" aligncenter\" style=\"height: 240px\">\r\n<thead>\r\n<tr class=\"shaded\" style=\"height: 30px\">\r\n<td style=\"height: 30px;width: 75.85px;text-align: center\">Cell Boundaries<\/td>\r\n<td style=\"height: 30px;width: 31.85px;text-align: center\">Freq (f)<\/td>\r\n<td style=\"height: 30px;width: 255.85px;text-align: center\">[latex]{k} = (\\frac{frequency}{N} \\times{100})[\/latex]<\/td>\r\n<td style=\"height: 30px;width: 50.25px;text-align: center\">Cum. Freq.<\/td>\r\n<td style=\"height: 30px;width: 99.05px;text-align: center\">\u03b2<\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr style=\"height: 30px\">\r\n<td style=\"text-align: center;height: 30px;width: 75.85px\">58.5-61.5<\/td>\r\n<td style=\"text-align: center;height: 30px;width: 31.85px\">4<\/td>\r\n<td style=\"text-align: center;height: 30px;width: 255.85px\">4\/200 * 100 = 0.02 * 100 = 2<\/td>\r\n<td style=\"text-align: center;height: 30px;width: 50.25px\">4<\/td>\r\n<td style=\"text-align: center;height: 30px;width: 99.05px\">4\/200 * 100 = 2<\/td>\r\n<\/tr>\r\n<tr style=\"height: 30px\">\r\n<td style=\"text-align: center;height: 30px;width: 75.85px\">61.5-64.5<\/td>\r\n<td style=\"text-align: center;height: 30px;width: 31.85px\">12<\/td>\r\n<td style=\"text-align: center;height: 30px;width: 255.85px\">12\/200 * 100 = 0.06 * 100 = 6<\/td>\r\n<td style=\"text-align: center;height: 30px;width: 50.25px\">16<\/td>\r\n<td style=\"text-align: center;height: 30px;width: 99.05px\">16\/200 * 100 = 8<\/td>\r\n<\/tr>\r\n<tr style=\"height: 30px\">\r\n<td style=\"text-align: center;height: 30px;width: 75.85px\">64.5-67.5<\/td>\r\n<td style=\"text-align: center;height: 30px;width: 31.85px\">44<\/td>\r\n<td style=\"text-align: center;height: 30px;width: 255.85px\">44\/200 * 100 = 0.22 * 100 = 22<\/td>\r\n<td style=\"text-align: center;height: 30px;width: 50.25px\">60<\/td>\r\n<td style=\"text-align: center;height: 30px;width: 99.05px\">60\/200 * 100 = 30<\/td>\r\n<\/tr>\r\n<tr style=\"height: 30px\">\r\n<td style=\"text-align: center;height: 30px;width: 75.85px\">67.5-70.5<\/td>\r\n<td style=\"text-align: center;height: 30px;width: 31.85px\">64<\/td>\r\n<td style=\"text-align: center;height: 30px;width: 255.85px\">64\/200 * 100 = 0.32 * 100 = 32<\/td>\r\n<td style=\"text-align: center;height: 30px;width: 50.25px\">124<\/td>\r\n<td style=\"text-align: center;height: 30px;width: 99.05px\">124\/200 * 100 = 62<\/td>\r\n<\/tr>\r\n<tr style=\"height: 30px\">\r\n<td style=\"text-align: center;height: 30px;width: 75.85px\">70.5-73.5<\/td>\r\n<td style=\"text-align: center;height: 30px;width: 31.85px\">56<\/td>\r\n<td style=\"text-align: center;height: 30px;width: 255.85px\">56\/200 * 100 = 0.28 * 100 = 28<\/td>\r\n<td style=\"text-align: center;height: 30px;width: 50.25px\">180<\/td>\r\n<td style=\"text-align: center;height: 30px;width: 99.05px\">180\/200 * 100 = 90<\/td>\r\n<\/tr>\r\n<tr style=\"height: 30px\">\r\n<td style=\"text-align: center;height: 30px;width: 75.85px\"><span style=\"color: #000000\">73.5-76.5<\/span><\/td>\r\n<td style=\"text-align: center;height: 30px;width: 31.85px\">16<\/td>\r\n<td style=\"text-align: center;height: 30px;width: 255.85px\">16\/200 * 100 = 0.08 * 100 = 8<\/td>\r\n<td style=\"text-align: center;height: 30px;width: 50.25px\">196<\/td>\r\n<td style=\"text-align: center;height: 30px;width: 99.05px\">196\/200 * 100 = 98<\/td>\r\n<\/tr>\r\n<tr style=\"height: 30px\">\r\n<td style=\"height: 30px;width: 75.85px;text-align: center\">76.5-79.5<\/td>\r\n<td style=\"height: 30px;width: 31.85px;text-align: center\">4<\/td>\r\n<td style=\"height: 30px;width: 255.85px;text-align: center\">4\/200 * 100 = 0.02 * 100 = 2<\/td>\r\n<td style=\"height: 30px;width: 50.25px;text-align: center\">200<\/td>\r\n<td style=\"height: 30px;width: 99.05px;text-align: center\">200\/200 * 100 = 100<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\nThe total sample of scores = 200. We are interested in the specific score with a value of 71. The score 71\u00a0 resides within the category that has cell boundaries 70.5 to 73.5. This category has a corresponding frequency of 56, which indicates that there are 56 scores within the upper and lover boundaries of the category from 70.5 to 73.5. We can then enter 56 as the frequency value and 200 as the value of N in the following equation to determine the value of k in our series of percentile equations.\r\n\r\ni) [latex]{k} = (\\frac{frequency}{N} \\times{100})[\/latex]\r\n\r\n[latex]{k} = (\\frac{56}{200} \\times{100}) = 28[\/latex]\r\n\r\nHere we see that in this scenario k= 28 where k represents the percent of scores in the category of interest. 56 of 200 scores represents 28% of all scores in our distribution.\r\n\r\nNext we determine the value for [latex]{\\beta}[\/latex] based on the equation,\u00a0[latex]{\\beta} = (\\frac{<span>\\textit{<\/span>Cumulative frequency for all scores below the category of interest}}{N}) \\times{100})[\/latex]. The score for [latex]{\\beta}[\/latex]\u00a0 represents the cumulative proportion of scores in the data set up to the category in which our score of interest resides. In this example the Cumulative frequency for all scores below the category of interest refers to the cumulative frequency in the category that precedes the catergory in which our score (71) resides. Here the<em> Cumulative frequency for all scores below the category of Interest<\/em>\u00a0is 124. Using the equation to compute [latex]{\\beta}[\/latex] shown here we see that the value is 62.\r\n\r\n<span class=\"comment-copy\"><code><\/code><\/span>\r\n\r\nii) [latex]{\\beta} = (\\frac{<span>\\textit{<\/span>124}}{200}) \\times{100}) = 62[\/latex]\r\n\r\nAfter we have determined k and [latex]{\\beta}[\/latex], we can then work through the steps in equation iii) to determine the percent of scores falling at or below our score of interest.\r\n\r\niii) [latex]<span>\\textit{<\/span>Percentile}={62} + (0.5 \\times{28}) [\/latex]\r\n\r\n[latex]<span>\\textit{<\/span>Percentile}={62} + (14) [\/latex]\r\n\r\n[latex]<span>\\textit{<\/span>Percentile}=76^{th} <span>\\textit{p<\/span>ercentile} [\/latex]\r\n\r\nThe outcome indicates that 76 percent of the scores within this set (distribution) of scores fall below the score of 71.\r\n\r\n<hr \/>\r\n\r\nWorking through the computation of percentiles from a set of scores\r\n\r\nUse <em>the table of frequency distributions for heights of Grade 5 elementary school children,<\/em> to compute the percentiles for the following values 123, 136, 138,149,152, indicate the values of k <em>,<\/em> and the percentile scores. Fill in the missing data in the following table to obtain a complete data set.\r\n<p style=\"text-align: center\"><em>Table 14.2 Frequency Distribution For Heights Of Grade 5 Elementary School Children.<\/em><\/p>\r\n\r\n<table class=\" aligncenter\" style=\"width: 288px\">\r\n<tbody>\r\n<tr>\r\n<td style=\"width: 83.85px\">Category<\/td>\r\n<td style=\"width: 79.05px\">Frequency<\/td>\r\n<td style=\"width: 86.25px\">Cumulative\r\nFrequency<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 83.85px\">120-122<\/td>\r\n<td style=\"width: 79.05px\">1<\/td>\r\n<td style=\"width: 86.25px\">1<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 83.85px\">123-125<\/td>\r\n<td style=\"width: 79.05px\">3<\/td>\r\n<td style=\"width: 86.25px\">4<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 83.85px\">126-128<\/td>\r\n<td style=\"width: 79.05px\">3<\/td>\r\n<td style=\"width: 86.25px\">7<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 83.85px\">129-131<\/td>\r\n<td style=\"width: 79.05px\">3<\/td>\r\n<td style=\"width: 86.25px\"><\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 83.85px\">132-134<\/td>\r\n<td style=\"width: 79.05px\">1<\/td>\r\n<td style=\"width: 86.25px\">11<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 83.85px\">135-137<\/td>\r\n<td style=\"width: 79.05px\"><\/td>\r\n<td style=\"width: 86.25px\">13<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 83.85px\">138-140<\/td>\r\n<td style=\"width: 79.05px\">1<\/td>\r\n<td style=\"width: 86.25px\">14<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 83.85px\">141-143<\/td>\r\n<td style=\"width: 79.05px\">2<\/td>\r\n<td style=\"width: 86.25px\"><\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 83.85px\">144-146<\/td>\r\n<td style=\"width: 79.05px\">2<\/td>\r\n<td style=\"width: 86.25px\">18<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 83.85px\">147-149<\/td>\r\n<td style=\"width: 79.05px\">2<\/td>\r\n<td style=\"width: 86.25px\"><\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 83.85px\">150-152<\/td>\r\n<td style=\"width: 79.05px\">3<\/td>\r\n<td style=\"width: 86.25px\"><\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 83.85px\">sum of freq=<\/td>\r\n<td style=\"width: 79.05px\"><\/td>\r\n<td style=\"width: 86.25px\"><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<div>A SAS Application -- The Scenario: ZIKA Virus at the Summer Olympics<\/div>\r\nIn August 2016 Brazil hosted the Olympic Summer Games. However, several athletes decided to boycott the games because of the risk of exposure to the ZIKA virus.\u00a0 The ZIKA is a virus that can be transmitted through the bite of an infected Aedes mosquito. \u00a0The ZIKA virus is extremely dangerous for young women as it can reside in the blood for up to 3 months and if the woman becomes pregnant, the virus can have negative consequences for the developing fetus. In particular, the ZIKA virus has been implicated in the development of microcephaly in newborn children.\r\n\r\nIn this example, we will use a series of random number generating commands to create a data set with four variables and 1000 cases. The variables are sex, sport and case and will use the following format: sex (1=m, 2=f),\u00a0 sport (1=golf, 2=equestrian, 3=swimming, 4=gymnastics, 5=track\u00a0 &amp; field),\u00a0 case (1=yes, 2=no), and days which is a continuous variable representing the number of days since exposed to ZIKA virus-carrying mosquitoes.\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<div>A SAS Application -- The Scenario: ZIKA Virus at the Summer Olympics<\/div>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\nPROC FORMAT;\r\nVALUE SEXFMT 1 ='MALE'\u00a0 2 ='FEMALE';\r\nVALUE SPRTFMT 1 ='GOLF'\u00a0 2 ='EQUESTRIAN'\u00a0 3 ='SWIMMING'\r\n4 ='GYMNASTICS'\u00a0 5 ='TRACK &amp; FIELD';\r\nVALUE CASEFMT\u00a0 1='PRESENT'\u00a0 2='ABSENT';\r\n\r\nDATA SASRNG;\r\n\r\n\/* Create 3 new variables labelled SCORE1 SCORE2 SCORE3 *\/\r\n\r\nARRAY SCORES SCORE1-SCORE3;\r\n\r\n\/* Set 1000 cases per variable *\/\r\n\r\nDO K=1 TO 1000;\r\n\r\nDAYS=RANUNI(13)*100;\r\n\r\nDAYS=ROUND(DAYS, 0.02);\r\n\r\n\/* Loop through each variable to establish 1000 randomly generated scores *\/\r\n\r\nDO I=1 TO 3;\r\n\r\nSCORES(I)=RANUNI(I)*1000;\r\n\r\nSCORES(I)=ROUND(SCORES(I));\r\n\r\nSCORES(I)=1+(MOD(SCORES(I),105));\r\n\r\n\/*\u00a0 The variable sex will relate to score1, create a filter to establish the binary score for sex based on the randomly generated output *\/\r\n\r\nIF SCORE1 &gt; 55 THEN SEX = 2;\r\n\r\nIF SCORE1 &gt;2 AND SCORE1&lt;56 THEN SEX = 1;\r\n\r\n\/* Sport Type\u00a0\u00a0 *\/\r\n\r\nIF SCORE2 &gt;90 THEN SPORT = 5;\r\n\r\nIF SCORE2 &gt;80 AND SCORE2&lt;91 THEN SPORT = 4;\r\n\r\nIF SCORE2 &gt;60 AND SCORE2&lt;81 THEN SPORT = 3;\r\n\r\nIF SCORE2 &gt;30 AND SCORE2&lt;61 THEN SPORT = 2;\r\n\r\nIF SCORE2 &gt;5 AND SCORE2&lt;31 THEN SPORT=1;\r\n\r\n\/* Case *\/\r\n\r\nIF SCORE3 &gt; 48 THEN CASE = 1;ELSE CASE = 2;\r\n\r\nEND;\r\n\r\nOUTPUT;\r\n\r\nEND; RUN;\r\n\r\nPROC SORT DATA =SASRNG; BY SEX;\r\n\r\nPROC FREQ; TABLES SEX SPORT CASE SEX*CASE;\r\n\r\nFORMAT SEX SEXFMT. SPORT SPRTFMT. CASE CASEFMT. ;\r\n\r\nPROC FREQ; TABLES SPORT*CASE;BY SEX;\r\n\r\nFORMAT SEX SEXFMT. SPORT SPRTFMT. CASE CASEFMT. ;\r\n\r\nPROC UNIVARIATE; VAR DAYS;\r\n\r\nOUTPUT OUT=PCTLS PCTLPTS\u00a0 = 30 60\r\n\r\nPCTLPRE\u00a0 = DAYS_\r\n\r\nPCTLNAME = PCT30 PCT60;\r\n\r\nPROC PRINT DATA= PCTLS;\r\n\r\nRUN;\r\n\r\n<\/div>\r\n<\/div>\r\n<div>\r\n\r\n<span style=\"text-align: initial;font-size: 1em\">In SAS we can compute the specific percentiles using the PROC UNIVARIATE; feature on the continuous variable. The command PROC UNIVARIATE; VAR days; produces the following output table<\/span><span style=\"text-align: initial;font-size: 1em\">\u00a0to produce a chart of percentiles for the variable: DAYS.<\/span>\r\n<p style=\"text-align: center\">Table 14.3\u00a0 Frequency Distribution Output Showing Percentiles<\/p>\r\n\r\n<\/div>\r\n<div align=\"center\">\r\n<table class=\" aligncenter\" style=\"height: 195px\">\r\n<thead>\r\n<tr style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 190.25px\"><strong>Level<\/strong><\/td>\r\n<td style=\"height: 15px;width: 72.65px\"><strong>Quantile<\/strong><\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 190.25px\"><strong>100% Max<\/strong><\/td>\r\n<td style=\"height: 15px;width: 72.65px\">99.94<\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 190.25px\"><strong>99%<\/strong><\/td>\r\n<td style=\"height: 15px;width: 72.65px\">98.66<\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 190.25px\"><strong>95%<\/strong><\/td>\r\n<td style=\"height: 15px;width: 72.65px\">94.34<\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 190.25px\"><strong>90%<\/strong><\/td>\r\n<td style=\"height: 15px;width: 72.65px\">89.61<\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 190.25px\"><strong>75% Q3<\/strong><\/td>\r\n<td style=\"height: 15px;width: 72.65px\">73.13<\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 190.25px\"><strong>50% Median<\/strong><\/td>\r\n<td style=\"height: 15px;width: 72.65px\">46.83<\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 190.25px\"><strong>25% Q1<\/strong><\/td>\r\n<td style=\"height: 15px;width: 72.65px\">24.75<\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 190.25px\"><strong>10%<\/strong><\/td>\r\n<td style=\"height: 15px;width: 72.65px\">10.23<\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 190.25px\"><strong>5%<\/strong><\/td>\r\n<td style=\"height: 15px;width: 72.65px\">4.86<\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 190.25px\"><strong>1%<\/strong><\/td>\r\n<td style=\"height: 15px;width: 72.65px\">1.27<\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"height: 15px;width: 190.25px\"><strong>0% Min<\/strong><\/td>\r\n<td style=\"height: 15px;width: 72.65px\">0.02<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\nHowever, we can also compute specific percentile values for a continuous variable using the PCTLPTS=, PCTLPRE=, and PCTLNAME= options.\r\n\r\nTogether these three commands help us to identify and label specific percentiles within a data set. For example, to select a specific percentile, such as the 30th percentile we use PCTLPTS= 30. The command\u00a0 PCTLPRE= provides the specific prefix in the label for a percentile. For example, here we use the prefix days_ and then follow the command with the PCTLNAME= command to list the label of the percentile. For example, the sequence of commands: PCTLPTS= 30, fPCTLPRE= DAYS_, and the PCTLNAME= pct30,\u00a0 identifies and labels the 30th percentile within the data set. In the following code we compute the 30th and 60th percentiles for the continuous variable: DAYS, using SAS Commands to identify specific percentiles.\r\n<p style=\"text-align: center\">[table id=15 \/]<\/p>\r\n\r\n<div>\r\n<p style=\"text-align: center\">OUTPUT from the code above:<\/p>\r\n\r\n<\/div>\r\n<div align=\"center\">\r\n<table>\r\n<thead>\r\n<tr>\r\n<td><strong>Obs<\/strong><\/td>\r\n<td><strong>days_pct30<\/strong><\/td>\r\n<td><strong>days_pct60<\/strong><\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td><strong>1<\/strong><\/td>\r\n<td>28.64<\/td>\r\n<td>57.08<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\nThe<strong> PROC FREQ<\/strong> procedure in SAS enables us to create descriptive tables for the frequency distribution of the categorical variables. For example, we can compute the number of females and males in our sample, as well as the number of individuals across each of the sports, and then we can actually create a\u00a0 number to represent the number of cases of ZIKA in our randomly generated data set of 1000 participants.\r\n<p style=\"text-align: center\"><strong>TABLE 14.5 ZIKA Random Number Generated data for SEX<\/strong><\/p>\r\n\r\n<div align=\"center\">\r\n<table>\r\n<thead>\r\n<tr>\r\n<td><strong>sex<\/strong><\/td>\r\n<td><strong>Frequency<\/strong><\/td>\r\n<td><strong>Percent<\/strong><\/td>\r\n<td><strong>Cumulative\r\nFrequency<\/strong><\/td>\r\n<td><strong>Cumulative\r\nPercent<\/strong><\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td><strong>male<\/strong><\/td>\r\n<td>533<\/td>\r\n<td>53.30<\/td>\r\n<td>533<\/td>\r\n<td>53.30<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>female<\/strong><\/td>\r\n<td>467<\/td>\r\n<td>46.70<\/td>\r\n<td>1000<\/td>\r\n<td>100.00<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n<p style=\"text-align: center\"><strong>TABLE 14.6 ZIKA Random Number Generated data for Sports<\/strong><\/p>\r\n\r\n<div align=\"center\">\r\n<table>\r\n<thead>\r\n<tr>\r\n<td><strong>sport<\/strong><\/td>\r\n<td><strong>Frequency<\/strong><\/td>\r\n<td><strong>Percent<\/strong><\/td>\r\n<td><strong>Cumulative\r\nFrequency<\/strong><\/td>\r\n<td><strong>Cumulative\r\nPercent<\/strong><\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td><strong>golf<\/strong><\/td>\r\n<td>266<\/td>\r\n<td>26.60<\/td>\r\n<td>266<\/td>\r\n<td>26.60<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>equestrian<\/strong><\/td>\r\n<td>286<\/td>\r\n<td>28.60<\/td>\r\n<td>552<\/td>\r\n<td>55.20<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>swimming<\/strong><\/td>\r\n<td>192<\/td>\r\n<td>19.20<\/td>\r\n<td>744<\/td>\r\n<td>74.40<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>gymnastics<\/strong><\/td>\r\n<td>96<\/td>\r\n<td>9.60<\/td>\r\n<td>840<\/td>\r\n<td>84.00<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>track &amp; field<\/strong><\/td>\r\n<td>160<\/td>\r\n<td>16.00<\/td>\r\n<td>1000<\/td>\r\n<td>100.00<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n<p style=\"text-align: center\"><strong>TABLE 14.7 ZIKA Random Number Generated data for Disease Present\/Absent<\/strong><\/p>\r\n\r\n<div align=\"center\">\r\n<table>\r\n<thead>\r\n<tr>\r\n<td><strong>case<\/strong><\/td>\r\n<td><strong>Frequency<\/strong><\/td>\r\n<td><strong>Percent<\/strong><\/td>\r\n<td><strong>Cumulative\r\nFrequency<\/strong><\/td>\r\n<td><strong>Cumulative\r\nPercent<\/strong><\/td>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td><strong>present<\/strong><\/td>\r\n<td>505<\/td>\r\n<td>50.50<\/td>\r\n<td>505<\/td>\r\n<td>50.50<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>absent<\/strong><\/td>\r\n<td>495<\/td>\r\n<td>49.50<\/td>\r\n<td>1000<\/td>\r\n<td>100.00<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\nThis procedure also enables us to create cross-tabular tables for comparisons of variables.\r\n<p style=\"text-align: center\"><strong>TABLE 14.8 ZIKA Random Number Generated Cross Tabulations<\/strong><\/p>\r\n\r\n<table class=\"grid aligncenter\" style=\"border-collapse: collapse;width: 100%;height: 75px\" border=\"0\">\r\n<tbody>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 100%;text-align: center;height: 15px\" colspan=\"4\"><strong>Table of Frequencies for case by sex<\/strong><\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 25%;text-align: center;height: 30px\" rowspan=\"2\">SEX<\/td>\r\n<td style=\"width: 75%;text-align: center;height: 15px\" colspan=\"3\">CASES<\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 25%;text-align: center;height: 15px\">Present<\/td>\r\n<td style=\"width: 25%;text-align: center;height: 15px\">Absent<\/td>\r\n<td style=\"width: 25%;text-align: center;height: 15px\">Total<\/td>\r\n<\/tr>\r\n<tr class=\"shaded\" style=\"height: 15px\">\r\n<td style=\"width: 25%;height: 15px;text-align: center\"><strong>Male <\/strong><\/td>\r\n<td style=\"width: 25%;height: 15px;text-align: center;vertical-align: middle\">275<\/td>\r\n<td style=\"width: 25%;height: 15px;text-align: center\">258<\/td>\r\n<td style=\"width: 25%;height: 15px;text-align: center\">533<\/td>\r\n<\/tr>\r\n<tr class=\"shaded\" style=\"height: 15px\">\r\n<td style=\"width: 25%;height: 15px;text-align: center\"><strong>Female <\/strong><\/td>\r\n<td style=\"width: 25%;height: 15px;text-align: center;vertical-align: middle\">230<\/td>\r\n<td style=\"width: 25%;height: 15px;text-align: center\">237<\/td>\r\n<td style=\"width: 25%;height: 15px;text-align: center\">467<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 25%;text-align: center\"><strong>COLUMN TOTALS<\/strong><\/td>\r\n<td style=\"width: 25%;text-align: center;vertical-align: middle\">505<\/td>\r\n<td style=\"width: 25%;text-align: center\">495<\/td>\r\n<td style=\"width: 25%;text-align: center\">1000<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\nAs in most SAS procedures, by including the PROC SORT \u00a0command, we can arrange the processing and subsequent output of the data to control for the categorical variable(s). In this example we computed the cross-tabulation of the frequency distribution for the variables SPORT and CASE, controlling for SEX, to separate the output for Males and Females.\r\n\r\nThe table format provides the following data within each cell: frequency, followed by cell percent, followed by row percent, followed by column percent as shown in this example for the sport: golf.\r\n<p style=\"text-align: center\"><strong>TABLE 14.9 ZIKA Random Number Generated Cross Tabulations<\/strong><\/p>\r\n\r\n<table class=\"grid aligncenter\" style=\"border-collapse: collapse;width: 100%;height: 75px\" border=\"0\">\r\n<tbody>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 100%;text-align: center;height: 15px\" colspan=\"4\"><strong>Table of Frequencies for case by sports<\/strong><\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 23.9637%;text-align: center;height: 30px\" rowspan=\"2\">SPORT<\/td>\r\n<td style=\"width: 76.0363%;text-align: center;height: 15px\" colspan=\"3\">CASES<\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 26.0363%;text-align: center;height: 15px\">Present<\/td>\r\n<td style=\"width: 25%;text-align: center;height: 15px\">Absent<\/td>\r\n<td style=\"width: 25%;text-align: center;height: 15px\">Total<\/td>\r\n<\/tr>\r\n<tr class=\"shaded\" style=\"height: 15px\">\r\n<td style=\"width: 23.9637%;height: 15px;text-align: center\"><strong>MALE GOLF<\/strong><\/td>\r\n<td style=\"width: 26.0363%;height: 15px;text-align: center;vertical-align: middle\">Cell Freq = 73\r\n\r\nCell Pct = 13.70\r\n\r\nRow Pct = 53.28\r\n\r\nCol Pct = 26.55<\/td>\r\n<td style=\"width: 25%;height: 15px;text-align: center\">Cell Freq = 64\r\n\r\nCell Pct = 12.01\r\n\r\nRow Pct = 46.72\r\n\r\nCol Pct = 24.81<\/td>\r\n<td style=\"width: 25%;height: 15px;text-align: center\">Row Total = 137\r\n\r\nRow Pct = 25.70<\/td>\r\n<\/tr>\r\n<tr class=\"shaded\" style=\"height: 15px\">\r\n<td style=\"width: 23.9637%;height: 15px;text-align: center\"><strong>FEMALE GOLF<\/strong><\/td>\r\n<td style=\"width: 26.0363%\">Cell Freq = 56\r\n\r\nCell Pct = 11.99\r\n\r\nRow Pct = 43.41\r\n\r\nCol Pct = 24.35<\/td>\r\n<td style=\"width: 25%\">Cell Freq = 73\r\n\r\nCell Pct = 15.63\r\n\r\nRow Pct = 56.59\r\n\r\nCol Pct = 30.80<\/td>\r\n<td style=\"width: 25%;height: 15px;text-align: center\">Row Total =129\r\n\r\nRow Pct = 27.62<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 23.9637%;text-align: center\"><strong>COLUMN TOTALS<\/strong><\/td>\r\n<td style=\"width: 26.0363%;text-align: center;vertical-align: middle\">505<\/td>\r\n<td style=\"width: 25%;text-align: center\">495<\/td>\r\n<td style=\"width: 25%;text-align: center\">1000<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n&nbsp;","rendered":"<h1>What is a percentile?<\/h1>\n<p>The term &#8220;per cent&#8221; refers to &#8220;per 100&#8221;, and thus a percentile is a score representing a value relative to a base 100 scale.<\/p>\n<p>The computation of percentiles is a useful way to evaluate scores within a frequency distribution, ie. the set of frequency scores.<\/p>\n<p>The percentile provides a baseline at which a given proportion of scores will fall.<\/p>\n<p>In other words, if we consider the 60th percentile, then we are suggesting that 60% of the scores in a distribution or set of scores will fall below that particular value.<\/p>\n<p>Percentiles always refer to a specific position within a frequency distribution.<\/p>\n<hr \/>\n<p>Formulas to compute percentiles for grouped data:<\/p>\n<p>i) [latex]{k} = (\\frac{frequency}{N} \\times{100})[\/latex]<\/p>\n<p><span class=\"comment-copy\"><code><\/code><\/span><\/p>\n<p>ii) [latex]{\\beta} = (\\frac{<span>\\textit{Cumulative Frequency for all scores below the Category of Interest<\/span>}}{N}) \\times{100})[\/latex]<\/p>\n<p>iii) [latex]<span>\\textit{<\/span>Percentile}={\\beta} + (0.5 \\times{k})[\/latex]<\/p>\n<p>The 0.5 is used to compute half of the number of scores within the category in which the number of interest resides.<\/p>\n<p>Consider computing the percentile for the <strong>score 71<\/strong> in the frequency distribution shown in Table 14.1<\/p>\n<p>Table 14.1\u00a0 Frequency Distribution Output<\/p>\n<table class=\"aligncenter\" style=\"height: 240px\">\n<thead>\n<tr class=\"shaded\" style=\"height: 30px\">\n<td style=\"height: 30px;width: 75.85px;text-align: center\">Cell Boundaries<\/td>\n<td style=\"height: 30px;width: 31.85px;text-align: center\">Freq (f)<\/td>\n<td style=\"height: 30px;width: 255.85px;text-align: center\">[latex]{k} = (\\frac{frequency}{N} \\times{100})[\/latex]<\/td>\n<td style=\"height: 30px;width: 50.25px;text-align: center\">Cum. Freq.<\/td>\n<td style=\"height: 30px;width: 99.05px;text-align: center\">\u03b2<\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr style=\"height: 30px\">\n<td style=\"text-align: center;height: 30px;width: 75.85px\">58.5-61.5<\/td>\n<td style=\"text-align: center;height: 30px;width: 31.85px\">4<\/td>\n<td style=\"text-align: center;height: 30px;width: 255.85px\">4\/200 * 100 = 0.02 * 100 = 2<\/td>\n<td style=\"text-align: center;height: 30px;width: 50.25px\">4<\/td>\n<td style=\"text-align: center;height: 30px;width: 99.05px\">4\/200 * 100 = 2<\/td>\n<\/tr>\n<tr style=\"height: 30px\">\n<td style=\"text-align: center;height: 30px;width: 75.85px\">61.5-64.5<\/td>\n<td style=\"text-align: center;height: 30px;width: 31.85px\">12<\/td>\n<td style=\"text-align: center;height: 30px;width: 255.85px\">12\/200 * 100 = 0.06 * 100 = 6<\/td>\n<td style=\"text-align: center;height: 30px;width: 50.25px\">16<\/td>\n<td style=\"text-align: center;height: 30px;width: 99.05px\">16\/200 * 100 = 8<\/td>\n<\/tr>\n<tr style=\"height: 30px\">\n<td style=\"text-align: center;height: 30px;width: 75.85px\">64.5-67.5<\/td>\n<td style=\"text-align: center;height: 30px;width: 31.85px\">44<\/td>\n<td style=\"text-align: center;height: 30px;width: 255.85px\">44\/200 * 100 = 0.22 * 100 = 22<\/td>\n<td style=\"text-align: center;height: 30px;width: 50.25px\">60<\/td>\n<td style=\"text-align: center;height: 30px;width: 99.05px\">60\/200 * 100 = 30<\/td>\n<\/tr>\n<tr style=\"height: 30px\">\n<td style=\"text-align: center;height: 30px;width: 75.85px\">67.5-70.5<\/td>\n<td style=\"text-align: center;height: 30px;width: 31.85px\">64<\/td>\n<td style=\"text-align: center;height: 30px;width: 255.85px\">64\/200 * 100 = 0.32 * 100 = 32<\/td>\n<td style=\"text-align: center;height: 30px;width: 50.25px\">124<\/td>\n<td style=\"text-align: center;height: 30px;width: 99.05px\">124\/200 * 100 = 62<\/td>\n<\/tr>\n<tr style=\"height: 30px\">\n<td style=\"text-align: center;height: 30px;width: 75.85px\">70.5-73.5<\/td>\n<td style=\"text-align: center;height: 30px;width: 31.85px\">56<\/td>\n<td style=\"text-align: center;height: 30px;width: 255.85px\">56\/200 * 100 = 0.28 * 100 = 28<\/td>\n<td style=\"text-align: center;height: 30px;width: 50.25px\">180<\/td>\n<td style=\"text-align: center;height: 30px;width: 99.05px\">180\/200 * 100 = 90<\/td>\n<\/tr>\n<tr style=\"height: 30px\">\n<td style=\"text-align: center;height: 30px;width: 75.85px\"><span style=\"color: #000000\">73.5-76.5<\/span><\/td>\n<td style=\"text-align: center;height: 30px;width: 31.85px\">16<\/td>\n<td style=\"text-align: center;height: 30px;width: 255.85px\">16\/200 * 100 = 0.08 * 100 = 8<\/td>\n<td style=\"text-align: center;height: 30px;width: 50.25px\">196<\/td>\n<td style=\"text-align: center;height: 30px;width: 99.05px\">196\/200 * 100 = 98<\/td>\n<\/tr>\n<tr style=\"height: 30px\">\n<td style=\"height: 30px;width: 75.85px;text-align: center\">76.5-79.5<\/td>\n<td style=\"height: 30px;width: 31.85px;text-align: center\">4<\/td>\n<td style=\"height: 30px;width: 255.85px;text-align: center\">4\/200 * 100 = 0.02 * 100 = 2<\/td>\n<td style=\"height: 30px;width: 50.25px;text-align: center\">200<\/td>\n<td style=\"height: 30px;width: 99.05px;text-align: center\">200\/200 * 100 = 100<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The total sample of scores = 200. We are interested in the specific score with a value of 71. The score 71\u00a0 resides within the category that has cell boundaries 70.5 to 73.5. This category has a corresponding frequency of 56, which indicates that there are 56 scores within the upper and lover boundaries of the category from 70.5 to 73.5. We can then enter 56 as the frequency value and 200 as the value of N in the following equation to determine the value of k in our series of percentile equations.<\/p>\n<p>i) [latex]{k} = (\\frac{frequency}{N} \\times{100})[\/latex]<\/p>\n<p>[latex]{k} = (\\frac{56}{200} \\times{100}) = 28[\/latex]<\/p>\n<p>Here we see that in this scenario k= 28 where k represents the percent of scores in the category of interest. 56 of 200 scores represents 28% of all scores in our distribution.<\/p>\n<p>Next we determine the value for [latex]{\\beta}[\/latex] based on the equation,\u00a0[latex]{\\beta} = (\\frac{<span>\\textit{<\/span>Cumulative frequency for all scores below the category of interest}}{N}) \\times{100})[\/latex]. The score for [latex]{\\beta}[\/latex]\u00a0 represents the cumulative proportion of scores in the data set up to the category in which our score of interest resides. In this example the Cumulative frequency for all scores below the category of interest refers to the cumulative frequency in the category that precedes the catergory in which our score (71) resides. Here the<em> Cumulative frequency for all scores below the category of Interest<\/em>\u00a0is 124. Using the equation to compute [latex]{\\beta}[\/latex] shown here we see that the value is 62.<\/p>\n<p><span class=\"comment-copy\"><code><\/code><\/span><\/p>\n<p>ii) [latex]{\\beta} = (\\frac{<span>\\textit{<\/span>124}}{200}) \\times{100}) = 62[\/latex]<\/p>\n<p>After we have determined k and [latex]{\\beta}[\/latex], we can then work through the steps in equation iii) to determine the percent of scores falling at or below our score of interest.<\/p>\n<p>iii) [latex]<span>\\textit{<\/span>Percentile}={62} + (0.5 \\times{28})[\/latex]<\/p>\n<p>[latex]<span>\\textit{<\/span>Percentile}={62} + (14)[\/latex]<\/p>\n<p>[latex]<span>\\textit{<\/span>Percentile}=76^{th} <span>\\textit{p<\/span>ercentile}[\/latex]<\/p>\n<p>The outcome indicates that 76 percent of the scores within this set (distribution) of scores fall below the score of 71.<\/p>\n<hr \/>\n<p>Working through the computation of percentiles from a set of scores<\/p>\n<p>Use <em>the table of frequency distributions for heights of Grade 5 elementary school children,<\/em> to compute the percentiles for the following values 123, 136, 138,149,152, indicate the values of k <em>,<\/em> and the percentile scores. Fill in the missing data in the following table to obtain a complete data set.<\/p>\n<p style=\"text-align: center\"><em>Table 14.2 Frequency Distribution For Heights Of Grade 5 Elementary School Children.<\/em><\/p>\n<table class=\"aligncenter\" style=\"width: 288px\">\n<tbody>\n<tr>\n<td style=\"width: 83.85px\">Category<\/td>\n<td style=\"width: 79.05px\">Frequency<\/td>\n<td style=\"width: 86.25px\">Cumulative<br \/>\nFrequency<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 83.85px\">120-122<\/td>\n<td style=\"width: 79.05px\">1<\/td>\n<td style=\"width: 86.25px\">1<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 83.85px\">123-125<\/td>\n<td style=\"width: 79.05px\">3<\/td>\n<td style=\"width: 86.25px\">4<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 83.85px\">126-128<\/td>\n<td style=\"width: 79.05px\">3<\/td>\n<td style=\"width: 86.25px\">7<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 83.85px\">129-131<\/td>\n<td style=\"width: 79.05px\">3<\/td>\n<td style=\"width: 86.25px\"><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 83.85px\">132-134<\/td>\n<td style=\"width: 79.05px\">1<\/td>\n<td style=\"width: 86.25px\">11<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 83.85px\">135-137<\/td>\n<td style=\"width: 79.05px\"><\/td>\n<td style=\"width: 86.25px\">13<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 83.85px\">138-140<\/td>\n<td style=\"width: 79.05px\">1<\/td>\n<td style=\"width: 86.25px\">14<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 83.85px\">141-143<\/td>\n<td style=\"width: 79.05px\">2<\/td>\n<td style=\"width: 86.25px\"><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 83.85px\">144-146<\/td>\n<td style=\"width: 79.05px\">2<\/td>\n<td style=\"width: 86.25px\">18<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 83.85px\">147-149<\/td>\n<td style=\"width: 79.05px\">2<\/td>\n<td style=\"width: 86.25px\"><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 83.85px\">150-152<\/td>\n<td style=\"width: 79.05px\">3<\/td>\n<td style=\"width: 86.25px\"><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 83.85px\">sum of freq=<\/td>\n<td style=\"width: 79.05px\"><\/td>\n<td style=\"width: 86.25px\"><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div>A SAS Application &#8212; The Scenario: ZIKA Virus at the Summer Olympics<\/div>\n<p>In August 2016 Brazil hosted the Olympic Summer Games. However, several athletes decided to boycott the games because of the risk of exposure to the ZIKA virus.\u00a0 The ZIKA is a virus that can be transmitted through the bite of an infected Aedes mosquito. \u00a0The ZIKA virus is extremely dangerous for young women as it can reside in the blood for up to 3 months and if the woman becomes pregnant, the virus can have negative consequences for the developing fetus. In particular, the ZIKA virus has been implicated in the development of microcephaly in newborn children.<\/p>\n<p>In this example, we will use a series of random number generating commands to create a data set with four variables and 1000 cases. The variables are sex, sport and case and will use the following format: sex (1=m, 2=f),\u00a0 sport (1=golf, 2=equestrian, 3=swimming, 4=gymnastics, 5=track\u00a0 &amp; field),\u00a0 case (1=yes, 2=no), and days which is a continuous variable representing the number of days since exposed to ZIKA virus-carrying mosquitoes.<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<div>A SAS Application &#8212; The Scenario: ZIKA Virus at the Summer Olympics<\/div>\n<\/header>\n<div class=\"textbox__content\">\n<p>PROC FORMAT;<br \/>\nVALUE SEXFMT 1 =&#8217;MALE&#8217;\u00a0 2 =&#8217;FEMALE&#8217;;<br \/>\nVALUE SPRTFMT 1 =&#8217;GOLF&#8217;\u00a0 2 =&#8217;EQUESTRIAN&#8217;\u00a0 3 =&#8217;SWIMMING&#8217;<br \/>\n4 =&#8217;GYMNASTICS&#8217;\u00a0 5 =&#8217;TRACK &amp; FIELD&#8217;;<br \/>\nVALUE CASEFMT\u00a0 1=&#8217;PRESENT&#8217;\u00a0 2=&#8217;ABSENT&#8217;;<\/p>\n<p>DATA SASRNG;<\/p>\n<p>\/* Create 3 new variables labelled SCORE1 SCORE2 SCORE3 *\/<\/p>\n<p>ARRAY SCORES SCORE1-SCORE3;<\/p>\n<p>\/* Set 1000 cases per variable *\/<\/p>\n<p>DO K=1 TO 1000;<\/p>\n<p>DAYS=RANUNI(13)*100;<\/p>\n<p>DAYS=ROUND(DAYS, 0.02);<\/p>\n<p>\/* Loop through each variable to establish 1000 randomly generated scores *\/<\/p>\n<p>DO I=1 TO 3;<\/p>\n<p>SCORES(I)=RANUNI(I)*1000;<\/p>\n<p>SCORES(I)=ROUND(SCORES(I));<\/p>\n<p>SCORES(I)=1+(MOD(SCORES(I),105));<\/p>\n<p>\/*\u00a0 The variable sex will relate to score1, create a filter to establish the binary score for sex based on the randomly generated output *\/<\/p>\n<p>IF SCORE1 &gt; 55 THEN SEX = 2;<\/p>\n<p>IF SCORE1 &gt;2 AND SCORE1&lt;56 THEN SEX = 1;<\/p>\n<p>\/* Sport Type\u00a0\u00a0 *\/<\/p>\n<p>IF SCORE2 &gt;90 THEN SPORT = 5;<\/p>\n<p>IF SCORE2 &gt;80 AND SCORE2&lt;91 THEN SPORT = 4;<\/p>\n<p>IF SCORE2 &gt;60 AND SCORE2&lt;81 THEN SPORT = 3;<\/p>\n<p>IF SCORE2 &gt;30 AND SCORE2&lt;61 THEN SPORT = 2;<\/p>\n<p>IF SCORE2 &gt;5 AND SCORE2&lt;31 THEN SPORT=1;<\/p>\n<p>\/* Case *\/<\/p>\n<p>IF SCORE3 &gt; 48 THEN CASE = 1;ELSE CASE = 2;<\/p>\n<p>END;<\/p>\n<p>OUTPUT;<\/p>\n<p>END; RUN;<\/p>\n<p>PROC SORT DATA =SASRNG; BY SEX;<\/p>\n<p>PROC FREQ; TABLES SEX SPORT CASE SEX*CASE;<\/p>\n<p>FORMAT SEX SEXFMT. SPORT SPRTFMT. CASE CASEFMT. ;<\/p>\n<p>PROC FREQ; TABLES SPORT*CASE;BY SEX;<\/p>\n<p>FORMAT SEX SEXFMT. SPORT SPRTFMT. CASE CASEFMT. ;<\/p>\n<p>PROC UNIVARIATE; VAR DAYS;<\/p>\n<p>OUTPUT OUT=PCTLS PCTLPTS\u00a0 = 30 60<\/p>\n<p>PCTLPRE\u00a0 = DAYS_<\/p>\n<p>PCTLNAME = PCT30 PCT60;<\/p>\n<p>PROC PRINT DATA= PCTLS;<\/p>\n<p>RUN;<\/p>\n<\/div>\n<\/div>\n<div>\n<p><span style=\"text-align: initial;font-size: 1em\">In SAS we can compute the specific percentiles using the PROC UNIVARIATE; feature on the continuous variable. The command PROC UNIVARIATE; VAR days; produces the following output table<\/span><span style=\"text-align: initial;font-size: 1em\">\u00a0to produce a chart of percentiles for the variable: DAYS.<\/span><\/p>\n<p style=\"text-align: center\">Table 14.3\u00a0 Frequency Distribution Output Showing Percentiles<\/p>\n<\/div>\n<div style=\"margin: auto;\">\n<table class=\"aligncenter\" style=\"height: 195px\">\n<thead>\n<tr style=\"height: 15px\">\n<td style=\"height: 15px;width: 190.25px\"><strong>Level<\/strong><\/td>\n<td style=\"height: 15px;width: 72.65px\"><strong>Quantile<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr style=\"height: 15px\">\n<td style=\"height: 15px;width: 190.25px\"><strong>100% Max<\/strong><\/td>\n<td style=\"height: 15px;width: 72.65px\">99.94<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"height: 15px;width: 190.25px\"><strong>99%<\/strong><\/td>\n<td style=\"height: 15px;width: 72.65px\">98.66<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"height: 15px;width: 190.25px\"><strong>95%<\/strong><\/td>\n<td style=\"height: 15px;width: 72.65px\">94.34<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"height: 15px;width: 190.25px\"><strong>90%<\/strong><\/td>\n<td style=\"height: 15px;width: 72.65px\">89.61<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"height: 15px;width: 190.25px\"><strong>75% Q3<\/strong><\/td>\n<td style=\"height: 15px;width: 72.65px\">73.13<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"height: 15px;width: 190.25px\"><strong>50% Median<\/strong><\/td>\n<td style=\"height: 15px;width: 72.65px\">46.83<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"height: 15px;width: 190.25px\"><strong>25% Q1<\/strong><\/td>\n<td style=\"height: 15px;width: 72.65px\">24.75<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"height: 15px;width: 190.25px\"><strong>10%<\/strong><\/td>\n<td style=\"height: 15px;width: 72.65px\">10.23<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"height: 15px;width: 190.25px\"><strong>5%<\/strong><\/td>\n<td style=\"height: 15px;width: 72.65px\">4.86<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"height: 15px;width: 190.25px\"><strong>1%<\/strong><\/td>\n<td style=\"height: 15px;width: 72.65px\">1.27<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"height: 15px;width: 190.25px\"><strong>0% Min<\/strong><\/td>\n<td style=\"height: 15px;width: 72.65px\">0.02<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>However, we can also compute specific percentile values for a continuous variable using the PCTLPTS=, PCTLPRE=, and PCTLNAME= options.<\/p>\n<p>Together these three commands help us to identify and label specific percentiles within a data set. For example, to select a specific percentile, such as the 30th percentile we use PCTLPTS= 30. The command\u00a0 PCTLPRE= provides the specific prefix in the label for a percentile. For example, here we use the prefix days_ and then follow the command with the PCTLNAME= command to list the label of the percentile. For example, the sequence of commands: PCTLPTS= 30, fPCTLPRE= DAYS_, and the PCTLNAME= pct30,\u00a0 identifies and labels the 30th percentile within the data set. In the following code we compute the 30th and 60th percentiles for the continuous variable: DAYS, using SAS Commands to identify specific percentiles.<\/p>\n<p style=\"text-align: center\">\n<h2 id=\"tablepress-15-name\" class=\"tablepress-table-name tablepress-table-name-id-15\">SAS CODE to produce specific percentiles<\/h2>\n<table id=\"tablepress-15\" class=\"tablepress tablepress-id-15\" aria-labelledby=\"tablepress-15-name\">\n<tbody class=\"row-striping row-hover\">\n<tr class=\"row-1\">\n<td class=\"column-1\">output out=Pctls pctlpts  = 30 60<\/td>\n<\/tr>\n<tr class=\"row-2\">\n<td class=\"column-1\">   pctlpre  = days_                    <\/td>\n<\/tr>\n<tr class=\"row-3\">\n<td class=\"column-1\">  pctlname = pct30 pct60;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><!-- #tablepress-15 from cache --><\/p>\n<div>\n<p style=\"text-align: center\">OUTPUT from the code above:<\/p>\n<\/div>\n<div style=\"margin: auto;\">\n<table>\n<thead>\n<tr>\n<td><strong>Obs<\/strong><\/td>\n<td><strong>days_pct30<\/strong><\/td>\n<td><strong>days_pct60<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>1<\/strong><\/td>\n<td>28.64<\/td>\n<td>57.08<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>The<strong> PROC FREQ<\/strong> procedure in SAS enables us to create descriptive tables for the frequency distribution of the categorical variables. For example, we can compute the number of females and males in our sample, as well as the number of individuals across each of the sports, and then we can actually create a\u00a0 number to represent the number of cases of ZIKA in our randomly generated data set of 1000 participants.<\/p>\n<p style=\"text-align: center\"><strong>TABLE 14.5 ZIKA Random Number Generated data for SEX<\/strong><\/p>\n<div style=\"margin: auto;\">\n<table>\n<thead>\n<tr>\n<td><strong>sex<\/strong><\/td>\n<td><strong>Frequency<\/strong><\/td>\n<td><strong>Percent<\/strong><\/td>\n<td><strong>Cumulative<br \/>\nFrequency<\/strong><\/td>\n<td><strong>Cumulative<br \/>\nPercent<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>male<\/strong><\/td>\n<td>533<\/td>\n<td>53.30<\/td>\n<td>533<\/td>\n<td>53.30<\/td>\n<\/tr>\n<tr>\n<td><strong>female<\/strong><\/td>\n<td>467<\/td>\n<td>46.70<\/td>\n<td>1000<\/td>\n<td>100.00<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p style=\"text-align: center\"><strong>TABLE 14.6 ZIKA Random Number Generated data for Sports<\/strong><\/p>\n<div style=\"margin: auto;\">\n<table>\n<thead>\n<tr>\n<td><strong>sport<\/strong><\/td>\n<td><strong>Frequency<\/strong><\/td>\n<td><strong>Percent<\/strong><\/td>\n<td><strong>Cumulative<br \/>\nFrequency<\/strong><\/td>\n<td><strong>Cumulative<br \/>\nPercent<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>golf<\/strong><\/td>\n<td>266<\/td>\n<td>26.60<\/td>\n<td>266<\/td>\n<td>26.60<\/td>\n<\/tr>\n<tr>\n<td><strong>equestrian<\/strong><\/td>\n<td>286<\/td>\n<td>28.60<\/td>\n<td>552<\/td>\n<td>55.20<\/td>\n<\/tr>\n<tr>\n<td><strong>swimming<\/strong><\/td>\n<td>192<\/td>\n<td>19.20<\/td>\n<td>744<\/td>\n<td>74.40<\/td>\n<\/tr>\n<tr>\n<td><strong>gymnastics<\/strong><\/td>\n<td>96<\/td>\n<td>9.60<\/td>\n<td>840<\/td>\n<td>84.00<\/td>\n<\/tr>\n<tr>\n<td><strong>track &amp; field<\/strong><\/td>\n<td>160<\/td>\n<td>16.00<\/td>\n<td>1000<\/td>\n<td>100.00<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p style=\"text-align: center\"><strong>TABLE 14.7 ZIKA Random Number Generated data for Disease Present\/Absent<\/strong><\/p>\n<div style=\"margin: auto;\">\n<table>\n<thead>\n<tr>\n<td><strong>case<\/strong><\/td>\n<td><strong>Frequency<\/strong><\/td>\n<td><strong>Percent<\/strong><\/td>\n<td><strong>Cumulative<br \/>\nFrequency<\/strong><\/td>\n<td><strong>Cumulative<br \/>\nPercent<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>present<\/strong><\/td>\n<td>505<\/td>\n<td>50.50<\/td>\n<td>505<\/td>\n<td>50.50<\/td>\n<\/tr>\n<tr>\n<td><strong>absent<\/strong><\/td>\n<td>495<\/td>\n<td>49.50<\/td>\n<td>1000<\/td>\n<td>100.00<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>This procedure also enables us to create cross-tabular tables for comparisons of variables.<\/p>\n<p style=\"text-align: center\"><strong>TABLE 14.8 ZIKA Random Number Generated Cross Tabulations<\/strong><\/p>\n<table class=\"grid aligncenter\" style=\"border-collapse: collapse;width: 100%;height: 75px\">\n<tbody>\n<tr style=\"height: 15px\">\n<td style=\"width: 100%;text-align: center;height: 15px\" colspan=\"4\"><strong>Table of Frequencies for case by sex<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 25%;text-align: center;height: 30px\" rowspan=\"2\">SEX<\/td>\n<td style=\"width: 75%;text-align: center;height: 15px\" colspan=\"3\">CASES<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 25%;text-align: center;height: 15px\">Present<\/td>\n<td style=\"width: 25%;text-align: center;height: 15px\">Absent<\/td>\n<td style=\"width: 25%;text-align: center;height: 15px\">Total<\/td>\n<\/tr>\n<tr class=\"shaded\" style=\"height: 15px\">\n<td style=\"width: 25%;height: 15px;text-align: center\"><strong>Male <\/strong><\/td>\n<td style=\"width: 25%;height: 15px;text-align: center;vertical-align: middle\">275<\/td>\n<td style=\"width: 25%;height: 15px;text-align: center\">258<\/td>\n<td style=\"width: 25%;height: 15px;text-align: center\">533<\/td>\n<\/tr>\n<tr class=\"shaded\" style=\"height: 15px\">\n<td style=\"width: 25%;height: 15px;text-align: center\"><strong>Female <\/strong><\/td>\n<td style=\"width: 25%;height: 15px;text-align: center;vertical-align: middle\">230<\/td>\n<td style=\"width: 25%;height: 15px;text-align: center\">237<\/td>\n<td style=\"width: 25%;height: 15px;text-align: center\">467<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 25%;text-align: center\"><strong>COLUMN TOTALS<\/strong><\/td>\n<td style=\"width: 25%;text-align: center;vertical-align: middle\">505<\/td>\n<td style=\"width: 25%;text-align: center\">495<\/td>\n<td style=\"width: 25%;text-align: center\">1000<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>As in most SAS procedures, by including the PROC SORT \u00a0command, we can arrange the processing and subsequent output of the data to control for the categorical variable(s). In this example we computed the cross-tabulation of the frequency distribution for the variables SPORT and CASE, controlling for SEX, to separate the output for Males and Females.<\/p>\n<p>The table format provides the following data within each cell: frequency, followed by cell percent, followed by row percent, followed by column percent as shown in this example for the sport: golf.<\/p>\n<p style=\"text-align: center\"><strong>TABLE 14.9 ZIKA Random Number Generated Cross Tabulations<\/strong><\/p>\n<table class=\"grid aligncenter\" style=\"border-collapse: collapse;width: 100%;height: 75px\">\n<tbody>\n<tr style=\"height: 15px\">\n<td style=\"width: 100%;text-align: center;height: 15px\" colspan=\"4\"><strong>Table of Frequencies for case by sports<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 23.9637%;text-align: center;height: 30px\" rowspan=\"2\">SPORT<\/td>\n<td style=\"width: 76.0363%;text-align: center;height: 15px\" colspan=\"3\">CASES<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 26.0363%;text-align: center;height: 15px\">Present<\/td>\n<td style=\"width: 25%;text-align: center;height: 15px\">Absent<\/td>\n<td style=\"width: 25%;text-align: center;height: 15px\">Total<\/td>\n<\/tr>\n<tr class=\"shaded\" style=\"height: 15px\">\n<td style=\"width: 23.9637%;height: 15px;text-align: center\"><strong>MALE GOLF<\/strong><\/td>\n<td style=\"width: 26.0363%;height: 15px;text-align: center;vertical-align: middle\">Cell Freq = 73<\/p>\n<p>Cell Pct = 13.70<\/p>\n<p>Row Pct = 53.28<\/p>\n<p>Col Pct = 26.55<\/td>\n<td style=\"width: 25%;height: 15px;text-align: center\">Cell Freq = 64<\/p>\n<p>Cell Pct = 12.01<\/p>\n<p>Row Pct = 46.72<\/p>\n<p>Col Pct = 24.81<\/td>\n<td style=\"width: 25%;height: 15px;text-align: center\">Row Total = 137<\/p>\n<p>Row Pct = 25.70<\/td>\n<\/tr>\n<tr class=\"shaded\" style=\"height: 15px\">\n<td style=\"width: 23.9637%;height: 15px;text-align: center\"><strong>FEMALE GOLF<\/strong><\/td>\n<td style=\"width: 26.0363%\">Cell Freq = 56<\/p>\n<p>Cell Pct = 11.99<\/p>\n<p>Row Pct = 43.41<\/p>\n<p>Col Pct = 24.35<\/td>\n<td style=\"width: 25%\">Cell Freq = 73<\/p>\n<p>Cell Pct = 15.63<\/p>\n<p>Row Pct = 56.59<\/p>\n<p>Col Pct = 30.80<\/td>\n<td style=\"width: 25%;height: 15px;text-align: center\">Row Total =129<\/p>\n<p>Row Pct = 27.62<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 23.9637%;text-align: center\"><strong>COLUMN TOTALS<\/strong><\/td>\n<td style=\"width: 26.0363%;text-align: center;vertical-align: middle\">505<\/td>\n<td style=\"width: 25%;text-align: center\">495<\/td>\n<td style=\"width: 25%;text-align: center\">1000<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n","protected":false},"author":56,"menu_order":1,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-371","chapter","type-chapter","status-publish","hentry"],"part":34,"_links":{"self":[{"href":"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-json\/pressbooks\/v2\/chapters\/371","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-json\/wp\/v2\/users\/56"}],"version-history":[{"count":38,"href":"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-json\/pressbooks\/v2\/chapters\/371\/revisions"}],"predecessor-version":[{"id":1524,"href":"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-json\/pressbooks\/v2\/chapters\/371\/revisions\/1524"}],"part":[{"href":"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-json\/pressbooks\/v2\/parts\/34"}],"metadata":[{"href":"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-json\/pressbooks\/v2\/chapters\/371\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-json\/wp\/v2\/media?parent=371"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-json\/pressbooks\/v2\/chapter-type?post=371"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-json\/wp\/v2\/contributor?post=371"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.library.upei.ca\/montelpare\/wp-json\/wp\/v2\/license?post=371"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}