Measures of Central Tendency

William J. Montelpare; Emily Read; Teri McComber; Alyson Mahar; Krista Ritchie

Parametric Statistics

26 Measures of Central Tendency

PART 1: Measures of Central Tendency

The most common measure of central tendency is the mean or average score. The mean is a calculated score that is intended to represent all of the scores in the distribution (set of scores).

The formula for the mean of a sample is shown here:

[latex]{\overline{x}} = \Sigma{(x_i)\over{n}}[/latex]

Where:

[latex]{\overline{x}}[/latex] refers to the sample mean
[latex]\Sigma{(x_i)} refers to the sum of all the scores
i refers to the “ith” case within the distribution
n refers to all of the cases within the distribution.

To calculate the mean for a continuous variable, add up all of the values and divide the sum of values by the number of values. Below is a set of blood glucose measures for 5 patients. These data are represented in millimoles per litre (mmol/L). P_n represents the nominal value label for each patient, so that P1 is patient 1. P₁ 4.2 mmol/L, P₂ 5.6 mmol/L, P₃ 7.9 mmol/L, P₄ 10.2 mmol/L, P₅ 7.5 mmol/L, Follow these steps to calculate the mean:

First add the values together: 4.2 + 5.6 + 7.9 + 10.2 + 7.5 = 35.4.
Next, divide by the number of values (to produce the average): 35.4/5 = 7.08 mmol/L

We can also use SAS to compute the mean for a set of scores. Two specific SAS programs that process measures of central tendency are PROC MEANS, and PROC UNIVARIATE. Each of these programs was designed to produce descriptive statistics for a sample of scores. Below are the SAS commands to compute the mean for a set of 10 resting heart rate scores. In this first program we used the SAS procedural command PROC MEANS to compute three basic estimates: the mean, the standard deviation and the minimum/maximum scores for the sample dataset of 10 numbers.

SAS PROC MEANS to Produce Descriptive Statistics for a Sample of 10 Numbers

DATA MN_HR; INPUT ID SCORE @@; DATALINES; 01 48 02 54 03 66 04 72 05 56 06 68 07 48 08 67 09 55 10 84 ; PROC MEANS DATA=MN_HR; VAR SCORE; RUN;

Notice in the code written above, the semi-colon (;) is placed on a separate line below the set of scores. While PROC MEANS, in its simplest form (without options) provides three basic estimates that describe estimates within a distribution, the SAS procedural command PROC UNIVARIATE not only computes the mean but also creates the Basic Statistical Measures Table which provides an entire summary of descriptive statistics. The output generated by the SAS program above – using the PROC MEANS statement without options – produced a table of summary estimates that included the mean and standard deviation as well as the minimum and maximum values for the dataset. SAS Output from the MEANS Procedure: Variable of interest was Heart Rate

N	Mean	Std Dev	Minimum	Maximum
10	61.80	11.56	48.00	84.00

When we call the PROC UNIVARIATE procedure of SAS, the output is a more complete table of summaries that include estimates of centrality but also the moments, measures of variance, and the tests of the location of the mean, as shown below.

SAS PROC UNIVARIATE to Produce Descriptive Statistics for a Sample of 10 Numbers

PROC UNIVARIATE DATA=MN_HR; VAR SCORE; RUN;

The UNIVARIATE Procedure -- Variable: SCORE

MOMENTS
N	10	Sum Weights	10
Mean	61.8	Sum Observations	618
Std Deviation	11.5547008	Variance	133.511111
Skewness	0.55954538	Kurtosis	-0.2284272
Uncorrected SS	39394	Corrected SS	1201.6
Coeff Variation	18.6969269	Std Error Mean	3.65391723

Tests for Location: Mu0=0
Test	STATISTIC	ESTIMATE		p Value
Student's t	t	16.91336	Pr > \|t\|	.0001
Sign	M	5	Pr >= \|M\|	0.0020
Signed Rank	S	27.5	Pr >= \|S\|	0.0020

Comparing the Mean for a Sample to the Expected Mean for a Population

In the output from the PROC UNIVARIATE procedure, SAS includes a table in which the mean for the variable: SCORE is compared to the mean for the Standard Normal Distribution (SND). The SND represents the hypothetical population mean and has a value of 0 with a standard deviation of 1. In the SAS table shown above, entitled Tests for Location: Mu0=0 the comparison of the sample mean ([latex]{\overline{x}}[/latex] ) to the population ([latex]{\mu}[/latex] ) is evaluated with the Student’s t-Test.

The results presented in the table above show that the Student’s t-Statistic value is 16.91 and the probability associated with this estimate is <0.001. Together these values indicate that the observed sample mean is significantly different than the hypothesized expected mean for the population (set at Mu₀=0) from which the sample was drawn.

However, what if we wanted to establish a suggested value for the population mean that is not 0, but that is based on value reported in the literature? In this case, we could assign a suggested value to the population mean and then compare the observed mean for the sample to the expected value for a population. In the following code, we test this notion.

Assign a suggested value to the population mean

PROC TTEST H0=54
PLOTS(SHOWH0)
ALPHA=0.05;
VAR SCORE;
RUN;

The SAS output is given below. The results indicate that the average score for the sample ([latex]{\overline{x}}[/latex] = 61.80) is not significantly different at the probability level of p < 0.05 than the expected score of ([latex]{\mu}[/latex] =54). Notice, in addition to the table of output SAS also includes a graph illustrating the shape of the distribution and the comparison of the sample estimate to the expected population estimate of centrality.

The t-test Procedure
DF	t Value	Pr > \|t\|
9	2.13	0.0615

Parameter estimates
Mean	95% CL Mean
61.8000	Lower limit: 53.5343	Upper Limit: 70.0657

Considering that the confidence interval shown here includes the mean for the sample (61.8) and the mean for the population which we set apriori as 54, no significant difference is observed, between that which is expected and that which was observed. This estimate is illustrated in the following graph.

Calculate the Mean for A Frequency Distribution

In the following example, we compute the mean for frequency distribution. The formula to compute the mean of a frequency distribution is shown here as:

[latex]{\overline{x}} = {\Sigma{fx_i}\over{n}}[/latex]

Where:

f refers to the frequency in each interval
xi refers to the mid-point of the interval
i refers to the “ith” case within the distribution
n refers to all of the cases within the distribution.

Below is the frequency distribution table for the heights of 200 individuals. The data represent heights recorded in centimetres and organized into seven categories. The SAS code to compute the mean for this set of data is shown below the table. Notice that the table is reduced to a simple composition of two variables which includes the mid-point of the category represented by the variable: GRPMDPT, and the number of individuals, whose height scores fall within the specific category, represented by the variable: COUNTS.

Column 1 cell boundaries	Column 2 frequency (f)	Column 3 cell mid-point	Column 4 (f) x cell midpoint	Column 5 (col 4 ÷ n)
158.5 – 161.5	4	160	4 x 160 = 640	640/200 = 3.2
161.5 – 164.5	12	163	12 x 163 = 1956	1956/200 = 9.78
164.5 – 167.5	44	166	44 x 166 = 7304	7304/200 = 36.52
167.5 – 170.5	64	169	64 x 169 = 10816	10816/200 = 54.08
170.5 – 173.5	56	172	56 x 172 = 9632	9632/200 = 48.16
173.5 – 176.5	16	175	16 x 175 = 2800	2800/200 = 14.00
176.5 – 179.5	4	178	4 x 178 = 712	712/200 = 3.56
	[latex]{\overline{x}} = {\Sigma{fx_i}\over{n}}[/latex]	[latex]{\overline{x}} = {33860\over 200}[/latex]	= 169.3	The [latex]{\overline{x}}[/latex] is the sum of column 5

The SAS code to compute the mean for data in the table above

DATA FREQMN;
INPUT GRPMDPT COUNTS @@;
CRSPRDCT= GRPMDPT*COUNTS;
/* COMPUTE RATIO FOR THE CROSS PRODUCT USING GROUP MIDPOINT X CELL FREQUENCY */
XP_RATIO=CRSPRDCT/200;
LABEL GRPMDPT = ‘GROUP MIDPOINT’
COUNTS = ‘NUMBER OF CASES PER CELL’
CRSPRDCT = ‘CROSS PRODUCT PER CELL’
XP_RATIO = 'CROSS PRODUCT RATIO';
DATALINES;
160 4 163 12 166 44 169 64 172 56 175 16 178 4
;
PROC PRINT;
VAR GRPMDPT COUNTS CRSPRDCT XP_RATIO;
SUM CRSPRDCT XP_RATIO;
FOOTNOTE1 "* THE MEAN IS PRODUCED AS THE SUM OF THE VARIABLE XP_RATIO";
FOOTNOTE2 "** THE MEAN CAN ALSO BE CALCULATED FROM THE SUM OF THE VARIABLE CRSPRDCT ÷ 200";
RUN;

The output generated by the SAS program above is the table of raw data presented in column form and includes the sums of the columns used to compute the mean for the frequency distribution.

Obs	grpmdpt	counts	crsprdct	cp_ratio
1	160	4	640	3.20
2	163	12	1956	9.78
3	166	44	7304	36.52
4	169	64	10816	54.08
5	172	56	9632	48.16
6	175	16	2800	14.00
7	178	4	712	3.56
			33860	169.30

* The mean is produced as the sum of the variable XP_RATIO

** The mean can also be calculated from the sum of the variable crsprdct ÷ 200

The Weighted Mean Score

In some situations, we may wish to combine means from several samples. Under such circumstances, we need to consider the sample size (or weight) of the distribution from which the means were drawn. By adjusting each independent sample mean by the number of subjects in the respective sample from which the means were drawn, we are able to provide different relative contributions of each mean to the total mean of all samples combined. The formula for a weighted mean from two samples is shown here. The formula for the mean of a sample is shown here:

[latex]{\overline{x}}={n_i\times{\overline{x_1}}+n_2{\overline{x_2}}\over{n_1 + n_2}}[/latex]

The Median Score

The median score is also a measure of central tendency, and it is defined as the middle score in a set of ordered scores. In the example below, we begin with a set of scores (an array), we next sort the scores from lowest to highest. Then we identify the number that is in the middle of the ordered set of scores where half the numbers are above the identified middle score, and half the numbers are below the identified middle score.

Example: Median

The median is the middle score. Considering the heart rate values again, we put these readings in order of magnitude and then identify which value is in the middle:

57
59
59
75
78
78
85
88
88
88

In this case, we have an even number of values (n = 10) so we can calculate the average of the two values in the middle. It just so happens that they are the same value in this example (78) so the median is 78.

initial array of scores: {12, 72, 56, 34, 35, 13, 36, 16, 67}
sorted array of scores: {12, 13, 16, 34, 35, 36, 56, 67, 72}
sorted array of scores: {12, 13, 16, 34, 35, 36, 56, 67, 72}

Notice in the example above, regardless of the actual scores, the middle score in the ordered set of scores is the median, which in this set is 35.

When we have an even number of scores in our array there is a special caveat to identifying the median score in the distribution (set of scores). When we have two scores selected as the identified middle score we simply compute the average between the two identified middle scores and use that number as the median score. That is, we add the two middle scores together and divide by 2.

initial array of scores: {22, 32, 86, 44, 25, 13, 16, 18, 47, 11}
sorted array of scores: {11, 13, 16, 18, 22, 25, 32, 44, 47, 86}
computed median for the array: {11, 13, 16, 18, 22, 23.5, 25, 32, 44, 47, 86}

The Mode Score

The mode score is the third measure of central tendency, and it is defined as the most frequently occurring score in a set of scores. In the example below, we simply count the number of scores that are the same within a set of scores, within an array or within a distribution.

Below are 10 resting heart rate values:

78, 88, 57, 59, 75, 85, 88, 78, 59, 88

The mode is 88 because it appears most often.

In the following example of 16 scores, the number 2 occurs 3 times, but the number 27 occurs 4 times therefore we would identify 27 as the mode score.

2, 2, 2, 5, 6, 14, 15, 23, 26, 27, 27, 27, 27, 28, 37, 41

License

Icon for the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Applied Statistics in Healthcare Research Copyright © 2020 by William J. Montelpare, Ph.D., Emily Read, Ph.D., Teri McComber, Alyson Mahar, Ph.D., and Krista Ritchie, Ph.D. is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, except where otherwise noted.