Advanced Concepts for Applied Statistics in Healthcare
In applied health research, we deal primarily with humans; therefore, the term sampling refers to how we select individuals to form a sample (small group) from the larger group of all individuals (the population). For example, we might be interested in knowing if a mindfulness intervention reduces stress among emergency room physicians. Although we could try to include every single emergency room physician in the world, that isn’t necessary because we can actually get a very close estimate of the true effects for the population by studying some of the people in the group (i.e., a sample).
The foundation of sampling is based on the following three essential elements: samples, inferences, and the population.
Figure 40.1 The Foundation of Sampling
The term population refers to the complete set of all people being studied.
The sample is then denoted as the subset of people that are being studied, and enable us to describe the elements of the larger group: the population through estimation and inference. The term inference refers to a deduction or a conclusion and is used in research to describe the process of relating information derived from a sample onto a population.
One of the most important concepts in sampling is that because we can rarely evaluate the population directly, it is expected that the sample is a true representation of the population from which it was drawn. This latter point – that the sample is a true representation of the population from which it was drawn, is imperative if we are going to make inferences about the population based on estimates from the sample.
Figure xxx The sample is a true representation of the population
Once we have defined our population, we need to decide how many people we need to include in our research study in order to be confident in our results. There are several ways to compute sample size and the calculations differ depending on the research design and analysis that you are planning to do. In quantitative research we can simplify sampling strategies into two basic categories: probabilistic sampling and non-probabilistic. A list of the more common types of probabilistic sampling strategies includes: simple random sampling, systematic sampling, stratified sampling, and cluster sampling. While these are elementary sampling approaches there are several other types of more complex probabilistic sampling strategies that are derived from these four basic types. Similarly, with regard to non-probabilistic sampling, convenience sampling is the most common, but one may also consider consecutive sampling, and judgemental sampling as approaches that provide samples, but where the members of the sample are not drawn at random and therefore, not all members of the population had a chance to be selected.
The term probability sampling or probabilistic sampling refers to sampling procedures that are based on random selection from a population or a predefined set. The component of randomness ensures that each unit within the larger group (the population) has an equal chance of being selected. If the researcher uses a true random selection approach, then the process of sample selection will be more likely to reduce the influence of selection bias in the research process. The goal of probability sampling is to end up with a sample that is representative of the population.
Random and independent sampling:
A probability sampling approach that uses random and independent sampling implies that each member of the population has the same chance of being selected for the sample. The term independent suggests that the selection of any one member to the sample in no way influences the selection of any other member to the sample from the population. A sample that is comprised of individuals that were selected using a random and independent approach will enable the researcher can generalize the results to a larger population.
In some circumstances the researcher will have a complete list of all of the members of a group from which the sample is to be derived. For example, the list of all patients within a medical practice, the list of constituents within a voting area, the list of all members in a club. Systematic sampling is an effective approach to draw the sample from the list of the members of the larger group (i.e., the population) because the researcher can select individuals using a method that can have an apriori plan, that can be replicated. For example, a researcher may decide to select every other name or every “nth” name on a list. Likewise, using the entire list, the researcher may decide to create a strategy for dividing the list into specific groups to represent the total population.
Stratified random sample (common in public opinion polls):
The stratified random sample is a probabilistic sampling approach that maintains the elements of randomness and independence but is established within the constraints of a user defined subgrouping system referred to as strata. In the stratified random sampling approach each strata is defined by a particular characteristic of interest, for example an age range, or a level of household income, or a grade level. The characteristic of interest is fixed within the strata so that there is only one characteristic contained in any one strata, as shown in the following example in Table 20.1.
Table 40.1 Example of stratification by income
Here an individual can be selected from one of the three possible income strata. Any individual cannot belong to more than one strata as the income level is a unique identifier for an individual.
The stratified random sample approach is most effective when the researcher’s interest is related to the variables upon which the strata are based. For example, if a researcher is interested on health service utilization within a cohort based on income, then household income may be an appropriate characteristic upon which to base the sampling strata. When a researcher applies the stratified random sampling approach, each stratum is sampled randomly and the various sub-samples collected from the strata are combined to form the representative sample. However, when there are noticeable imbalances in the total number of individuals within a strata, it is suggested that the researcher create proportions within the strata to preserve the natural concentrations of sections within the population.
The cluster sampling approach is a form of random sampling that is used to reduce the large numbers of individuals needed for stratified and random methods. For example, consider a study of university students that are separated by academic discipline. Each discipline is a pseudo-intact group that forms a group which can then be further processed through stratification and random selection. The cluster may be considered as the first level filter of a population that is either impossible or impractical to sample all members because of size. The effect of clustering enables organization of the population based on arbitrary grouping criteria established by the researcher. The clusters are generally easy to define and often individuals will self-identify within the cluster.
A form of fixed clustering is to use postal area codes as the criteria for cluster membership. The postal area code establishes fixed boundaries to a geographic region. Individuals representing households within the postal code area are then selected at random to represent all of the households in the entire area. One caveat to consider in using postal code area however is the minimum sample size. This is especially true when considering selecting individuals based on postal codes for rural and remote areas whereby only a few households are included in the entire postal code area. As a general rule a minimum sample of n=15 is used to ensure anonymity of selected individuals when using cluster sampling approaches.
The term non-probability sampling or non-probabilistic sampling refers to sampling procedures in which randomness of selection is less important than meeting apriori characteristics that are specifically related to the research question. Typically the resulting sample is small in comparison to a larger population and therefore may not necessarily be generalized to describe the larger population from which the sample was drawn.
Often in health-based studies, non-probability samples refer to individuals that are best suited to the conditions of the research question. That is, in non-probability sampling, such individuals may be more likely to be accessible for the study or comply with the specific regimen of the researcher’s interest.
For example, in studies of children with complex and chronic conditions we selected individuals that met the specific apriori planned criteria for the definition of chronic and complex. In this way we were able to select from all children that met the criteria established by the definition. This approach allowed is to infer our results to the specific target “group” but not the general population.
Among the most common non-probabilistic sampling procedures is the convenience sample. This procedure is used most often because it is inexpensive, it takes advantage of the availability of subjects, and it is functional when other methods are less practical.
While convenience samples are efficient in including willing patients where informed consent is required and may inhibit participants, convenience samples include only those individuals who are willing to participate. Therefore, a challenge in selecting a convenience sample is participant bias because they want to be involved. In using convenience sampling the researcher should always check for geographic proximity of samples, the likelihood that subjects may refrain from participation and the intrinsic bias of sampling.
Consecutive sampling is a form of non-probabilistic sampling whereby individual are selected if they meet the study criteria, and if they are available during the duration of the study. Likewise, consecutive sampling refers connecting with an accessible group of individuals but may not be inferential to a larger population. Finally, consecutive sampling is problematic when the duration of the selection process is inappropriate; for example measuring an outcome over a month when a year’s worth of data is suggested.
Judgmental sampling is a form of non-probabilistic sampling whereby individuals are selected by “handpicking” individuals for the study. The judgmental sampling approach resembles convenience sampling in its disregard for the effects of bias, but can produce results which are related to an accessible group of individuals. Participants selected using a judgmental sample may not be generally inferential to a larger population.
Statistical power is defined as the likelihood to find an effect when in fact the effect really does exist. In other words, statistical power refers to the probability of correctly rejecting the null hypothesis.
The terms power, alpha (α) and beta (β) are all related to statistical decisions about accepting and rejecting the null hypothesis. Recall that the null hypothesis always proposes there is no statistically significant effect or difference between groups. Therefore, if you reject the null hypothesis, that means there IS a significant effect or difference. In contrast if you fail to reject the null hypothesis, it stands.
Consider a simple comparison of average heart rate between two groups. Here, the null hypothesis would be that the two groups have the same mean heart rate, given as follows:
Ideally, you want to plan your research study so that you have a large enough sample to be able to accurately discern whether or not there is a meaningful difference between the two groups and avoid making a false conclusion.
A Type I statistical error would be made if the researcher found a significant difference between the two group means (thus, rejecting the null hypothesis that there was no difference) when one did not actually exist. In this situation we would say you have a false positive.
The probability of making a Type I error is also referred to as “alpha”, which uses the Greek letter, α. By convention, most researchers decide that a 5% chance of making a Type I error is acceptable. Therefore α is often set at .05 (5%).
A Type II statistical error would be made if the researcher did not find a significant difference between the two group means (thus, failing to reject the null hypothesis), when in fact a true difference exists. This would be a false negative.
The probability of making a Type II error is inversely related to your statistical power:
β = 1- Power
As power increases, the probability of making a Type II error (missing a significant finding) decreases. However, the sample size required for your study also increases as power increases, sometimes making it impractical. Generally, power is set to .80-.90, resulting in a β value from .20-.10, meaning that even with a power of .90 there is a 10% chance of missing a significant effect.
Statistical errors are based on the interconnection between the size of the sample, the effect size, and the power of the test. Statistical power is computed as:
therefore, if we establish that then and . In which case we would say that the statistical test has 80% power.
Effect size tells us about the magnitude or strength of an effect, difference, or relationship and is specific to a given statistical test. Smaller effect sizes require larger sample sizes, while medium to large effect sizes require smaller sample sizes. Many sample size calculations require effect size estimates which can be tricky if there is not a lot of past research on the variables you are interested in. Generally researchers use the best available estimates and calculate the actual effect size in their study once they have collected their data.
In this section we will work through four different approaches to sample size calculations using probabilistic formulae. To begin, the data in Table 40.2 list the Z-scores associated with common levels of Alpha and Beta estimates. These are essential values for the formula used below.
Table 40.2 Z- Scores associated with common Alpha and Beta Probabilities
|Alpha Terms||Beta Terms|
|Alpha Probability||Z Scores||Beta Probability||Z-scores|
Let’s say you want to create a representative sample for a population using simple random sampling but you’re not sure how big of a sample you need. The following formula can be used to calculate your sample size, “n”.
Unpacking the formula for simple random sampling we see that N represents the population from which the sample will be drawn; p refers to the proportion of individuals displaying the characteristic of interest, while 1-p or q refers to the proportion of individuals in the population not displaying the characteristic of interest.
To determine the proportion of cases in a population, p is the ratio of all individuals displaying the characteristic of interest divided by the set of all cases from which the sample was drawn. In real life you might determine p based on past research or population statistical data such as the census.
The error term in the denominator refers to the expected accuracy or the allowable difference between the estimate of the proportion in the selected sample, as a result of the sample size calculation, and the true population proportion. A typical value here would range between 0.02 and 0.10.
The term refers to the two-tailed standardized score of researcher confidence. If =0.05 then z= 1- = 0.95 or a 95% confidence value.
Figure 40.6. Explaining the parts of the formula
Applying the formula
Consider the following application of the formula. You are asked by the Medical Officer of Health (MOH) to determine the sample size required to represent a proportion of persons who inject drugs (PWID) in a starting population of 157,000 individuals. From previous reports, the proportion of PWID within a sample (i.e. the proportion of the population that display the characteristic of interest) was p = 0.13 and therefore q = (1-p) = 0.87. The MOH wants the estimate to be within 3% of the true population proportion with 95% confidence.
Based on the application of the formula for simple random sampling for the estimate of a Population Proportion you report that the MOH will require a sample size of at least 481 individuals in order to be 95% confident that the proportion of PWID in her sample will represent the true population proportion of PWID individuals within 3%.
In some situations, you may not know the original size of the population from which to draw the sample. In these circumstances the more appropriate sample size formula is shown here as:
Applying this formula to compute sample size where the original “N” is unknown for the unbiased proportional estimate given that p = 0.5; error= 0.05; and =0.05.
Working through the formula with the values given above produces a sample size of n = 384.
In some research designs we are interested in comparing results between two or more groups. In this case we may not know the original population size, but we may know about variability with respect to the dependent variables. To determine sample size for a comparison study we need to know the measure of central tendency for the dependent variable and the amount of variability we could expect for the measures of the dependent variable.
Typically, the amount of variability is based on information from previous studies. In some situations this information may come from pilot work or from an actual study. The sample size formula for comparison studies is shown here as:
The terms of the formula are explained as follows. The term is based on the variance reported in the literature or from pilot studies. Similarly, the expected mean is based on the mean from the literature or from pilot studies, while the expected % accuracy is the proximity of the estimated mean score to the true population score and is set by the researcher to be about 10%. The terms Zalpha or Zand Zbeta or Z use the standard scores for the alpha level and for the beta value (power) levels. Common values for and are =0.05 and =1.28, respectively.
Application of the sample size formula for a comparison study
In a recent healthy heart study, researchers measured the effects of red wine consumption on blood cholesterol concentrations of males over the age of 35 years. The researchers showed that males (n1= 133) who consumed on average one serving of red wine per day, for a minimum of six days per week, had a lower concentration of the athero-genic low-density lipoprotein cholesterol than a group of age matched control subjects (n2= 143) who abstained from any alcohol consumption. The investigation followed the total group of 276 males for a 24 week period.
You believe that the data are valuable and therefore you wish to conduct the study with a group of males in your local community. In the reference study the average cholesterol concentration for the sample of interest (red wine consumers) was 4.6 mmol/L with a standard deviation of 0.32 millimoles per liter (mmol/L). Considering that you wish to use an alpha level of 0.05 with a corresponding beta level of 4 x alpha (where beta = 4 x 0.05 = 0.20) and a power level of “1 – beta = 0.80”. Further you expect that the difference between your estimate of the mean and the TRUE estimate of the mean is within 3 percent.
The data you need in order to compute the appropriate sample size are:
- The estimated mean from previous studies = 4.6 mmol/L
- The standard deviation from previous studies = 0.32
- The z scores for the alpha probability (.05), Zα=1.96
- The z score for the beta probability (.20), Zβ=1.28
- The allowable percent difference between your estimate for the dependent variable and the expected estimate for the dependent variable from the true population = 3%
The results of this computation indicate that in order to be 95% confident that the estimates for the proposed sample will be within 3 percent of the true population value, you will need to have a minimum of 16 participants in the test group and 16 participants in the control group.
In the case-control study design, individuals with a specific measurable condition are “compared” to individuals that do not demonstrate the condition of interest. The case-control design is a retrospective study design type that evaluates, by comparison, the differences in outcome measures between groups of individuals with and without a disease, or the signs/symptoms of a condition.
Case-control studies are useful in demonstrating associations but may not show causation. The temporal characteristics (elements of time) are important to demonstrating the relationship. An essential consideration in a case-control study is the clear definition of the cases and of the controls.
In a case control design we may also consider that the cases are more likely to occur given exposure to the stimulus, to which we say this is a directional hypothesis or a one tailed hypothesis. If we consider a one tailed decision rule (cases are more likely than controls, given the characteristics of the scenario) then we see that a power of 80% has a beta term of 0.20 and a Zβ= 0.084. Estimates of statistical power for the one-tailed (directional hypothesis) and corresponding Zβ values are shown in the following table.
Figure 40.7 Power estimates and corresponding z scores
The sample size formula to determine the number of cases (or the number of controls) in each group of a case-control study is shown here as:
The formula differs slightly from that published more recently by Kasiulevicius, Sapoka, and Filipaviciute (2006) but produces similar estimates. Where the elements of the formula include : the proportion of cases among those individuals suspected to have been exposed. : the proportion of cases among those individuals suspected to not have been exposed. : is the Z score for the term, where 1.96, and : is the Z score for the term for a one-tailed directional hypothesis ( 0.84).
Application of the sample size formula for a case-control study
In order to determine the effects of cannabis smoking on lung cancer you decide to conduct a retrospective case control study in which your sample size estimate is based on the consideration that the relative risk of lung cancer among frequent cannabis smokers is about 5.7 times that of non-smokers. You decide to use p1 = 0.285 to represent the proportion lung cancer patients who were frequent cannabis smokers and p0 = 0.05 to represent the proportion lung cancer patients as controls who never smoked cannabis.
The data needed to compute the appropriate sample size are shown here as:
- Proportion of individuals that had lung cancer among individuals that were considered frequent smokers of cannabis, P1 = 0.285
- Proportion of individuals that had lung cancer among individuals that never smoked cannabis, P0 = 0.05
- The z scores for the alpha term (α=0.05), Zα=1.96
- The z scores for the beta term based on a one-tailed hypothesis, Zβ=0.84
The results of your computation showed that the appropriate sample size needed to conduct your study will require at least 36 individuals in the case group and at least 36 individuals in the control group.
As described previously, the cohort comparison study design is a type of observational study in which the researcher simply observes an outcome without intervening. As a longitudinal study design the cohort study design follows a group of individuals with similar characteristics either forward in time (prospectively) or backward in time (retrospectively).
In the cohort comparison study design a group demonstrating the characteristic(s) of interest are followed for a period of time while being compared to a similar group or multiple similar comparison groups (the cohorts) that do not demonstrate the characteristic(s) of interest. The researcher is intending to measure specific variables within the designated cohort of interest and to compare such measures to those reported for the comparison cohort(s). Throughout the monitoring stage, the selected measures are recorded at the onset of the monitoring activity, at pre-designated time points throughout the study, and at the completion of the study.
The formula to compute the sample size for the group of interest in a cohort comparison study where the data are normally distributed is shown here:
However, if the data are based on a chi-square distribution the recommended approach by Fleiss (1981) is to use the following continuity correction formula, after computing the initial sample size with formula (1) above.
The elements required for formula (1) and formula (2) are shown here.
i). The alpha level – also referred to as the level of statistical significance…the value against which the estimated “test statistic” will be compared to determine if there is something happening in the research question under investigation (i.e. the drug worked, the neighbourhoods differed, more symptoms were reported, the light is brighter, the sound was louder, etc.). The alpha level is also referred to as the probability of committing a Type I error (failure to accept the null hypothesis when in fact it is true). The typical value for the alpha level is 0.05 (also written as = 0.05).
ii). The beta level – also associated with statistical power as in power = 1- beta. The beta value is an estimate of the probability associated with making a Type II statistical error (i.e. failure to reject the null hypothesis when in fact it is false).
According to Cohen (described in Fleiss, 1981), given that committing a Type I error is four times as serious as committing a Type II error, a researcher should set the beta value to 4 x alpha. That is, when a researcher states an alpha value of 0.05, the corresponding beta value should be set to 4 x 0.05 = 0.20.
A beta value of 0.20 is therefore an indication of the researcher’s willingness to accept a 20% chance of missing an event (i.e. the effect) that actually occurred. Considering the concept of power, a beta value of 0.20 represents a statistical power level of 0.80 or 80%. The typical value for the beta level is 0.20.
iii). The ratio estimate represented by the letter m in the formula refers to the ratio of the number of control (comparison) participants to the number of participants of interest. In the computation of sample size for a prospective multiple cohort design the researcher may be faced with cohorts of different sizes. The ratio term refers to the fraction of difference between the cohort of interest and the control or comparison cohorts. In computing sample size for the prospective multiple cohort comparison, the researcher may wish to consider that the group of interest is half as large as the control group, in which case the ratio value is presented as 0.5:1. Similarly, the researcher may consider a ratio of 2:1 or 3:1 for the group of interest and the control group. Likewise, in some situations the researcher may consider ratios as high as 5:1, 10:1, or even 20:1. The value entered for m is simply the value of the ratio.
- iv) The term P1 represents the expected proportion in the group of interest. In the computation of sample size for a prospective multiple cohort design the researcher may have access to previous research that indicates the expected proportion of outcome for individuals within a given cohort, or the expected proportion of individuals that are considered exposed or present with a given characteristic in a study.
For example, in previous research measuring the epidemiology of injuries in ice hockey, Montelpare, Pelletier and Stark (1996) reported injury rates that ranged from 17% to 68%, with an average proportion of injured among individuals that body check of about 43% (a proportion value of 0.43).
In computing the sample size for a prospective multiple cohort design the researcher should enter a decimal value to represent the expected proportion (i.e. outcome proportion) in their study.
- v) The term P0 represents the expected proportion in the CONTROL group. In the computation of sample size for a prospective multiple cohort design the researcher may also have access to previous research that indicates the expected proportion of outcome for individuals within the control group or not exposed or not at risk cohort. If unsure about the true value of the expected proportion for the control group then enter 0.50 as this would be considered as an unbiased level of expected exposure or risk. However, the formula take any value for P0 between 0 and 1.
- vi) The term refers to the average expected proportion and is computed from the formula shown here, which includes the estimate of the ratio score and the estimate for the expected proportion in the control group.
Application of the sample size formula for a cohort comparison study
In order to determine the effects of cannabis smoking on driving related events, you decide to develop a prospective cohort comparison study, by following a group of individuals that self-report driving after using cannabis containing the active ingredient delta-9-tetrahydracannabinol (. The comparison control group will be composed of a similar number of individuals (i.e. 1:1 ratio) that self-report not using cannabis and also self-report that they do not drive after using alcohol. The expected proportion of traffic related events associated with cannabis use is based on previous work by Kelly, Darke, and Ross (004) which reported that approximately 4% of the general population drive while under the influence of drugs but that 25% of traffic related events involve drivers who tested positive for using drugs.
Based on the formula above, the data used to determine the size of each cohort (observed and control) are as follows:
n = =
|Ratio (m) 1:1|
The results of this computation suggest that both the cohort of interest, individuals that self-report driving after using cannabis and the control cohort, individuals that abstain from cannabis as well as alcohol when driving should include approximately 57 individuals each.
Verifying the computations shown above with the following SAS code.
Begin by establishing the alpha and beta terms as probabilities and as z-scores.
ALPHA = 0.05; ZALPHA = 1.96; BETA = 0.20; ZBETA = 1.28;
M = 1; P_1 = 0.25; P_0 = 0.04; PBAR = 0.145;
Next unpack the elements of the formula above by creating variables and working through the mechanics of the formula.
For example: create a variable called NUMRTR1A to represent the first element of the numerator in the formula. Here the element is to compute just the value under the square root sign
NUMRTR1A = SQRT((1+(1/M))* (PBAR*(1-PBAR)));
Multiply the value computed above by the ZALPHA term, which you declared in the second line of the program as ZALPHA = 1.96
NUMRTR1B = (ZALPHA*NUMRTR1A);
Work through each element of the formula as you have above to simplify the stepwise calculations.
Here we create the third numerator element NUMRTR1C by calculating the value under the square root sign
NUMRTR1C = SQRT(((P_0*(1-P_0))/M)+ (P_1*(1-P_1)));
We multiply the value of NUMRTR1C by the ZBETA term that we declared in line 2 of the program as ZBETA = 1.28
NUMRTR1D = (ZBETA*NUMRTR1C);
/* Strictly adhere to the rules of BEDMAS to ensure that the formula is deconstructed and reconstructed in the proper order.
For example, after computing the two elements of the numerator that are contained within the square brackets, add these values together, and then raise them to the exponent 2.
NUMRTR1E = (NUMRTR1B + NUMRTR1D)**2;
/* Next work through the elements of the denominator in the same way as you did with the numerator.
DENOM=(P_0 – P_1)**2;
N_APPRX = (NUMRTR1E/DENOM);
VAR NUMRTR1A NUMRTR1B NUMRTR1C NUMRTR1D NUMRTR1E
The SAS output shown here was generated from the processing of the formula. The elements produced by this formula are a result of the values declared by the user. For example, in this calculation for sample size, you set an alpha level of p<0.05 to control the Type I error. This translates to a Z-alpha value of 1.96. Likewise you set a beta level of 0.20 to control the Type II error, and therefore the corresponding Z-beta term is 0.84. You also knew apriori that the probability of the event of interest was 25%, so you set a P1 value to 0.25, and the probability related to the control was 4% so you set a P0 value to 0.04. The calculations worked through by hand matched the calculations from the SAS program. That is in both approaches you determined that the sample size for this study should be approximately 57 individuals.
In this chapter you were introduced to:
- Describing the importance in establishing a sample to represent the population
- Identifying the difference between probabilistic and non-probabilistic sampling strategies.
- Computing sample size under different scenarios using SAS code
- Understanding when a given sample size calculation is most appropriate
- Applying the appropriate sampling strategy to a given research design
 Kelly, E., Darke , S, & Ross, J., A review of drug use and driving: epidemiology, impairment, risk factors and risk perceptions, Drug and Alcohol Review(September 2004), 23, 319–344