Measurement Concepts

William J. Montelpare; Emily Read; Teri McComber; Alyson Mahar; Krista Ritchie

Basic Principles

2 Measurement Concepts

Measurement

Measurement, or the ability to assign value to pieces of information and allow comparisons between groups, is an essential aspect of applied health research and clinical practice. Although we might not always think deeply about what we are measuring or how we are measuring it in our daily lives, we are always collecting data for comparisons and decisions that inform our activities of daily living. For example, at the grocery store, your decision to buy one type of apple over another may be informed by the price of each (a measure of their economic value in the market); similarly, if you drove to the market you most likely monitored your speed (a measure of velocity = distance/time) to ensure your safety and the safety of others. In this chapter, we are going to delve more deeply into different types of quantitative data measurements. As applied health researchers we may use measurement to inform clinical treatment decisions or more broadly to make health policy decisions. Therefore, it is crucial that we understand exactly what we are measuring and how to handle different types of data collected from those measurements appropriately. I think we can all agree that these decisions are much more important than deciding between two apples at the store, or how fast we should drive to avoid a speeding ticket and thus, deserve our thoughtful attention. Let’s begin.

The term measure can be described as a verb (an action word) or a noun (a naming word).

As a verb, the term measure refers to the action of evaluating an entity and in quantitative research assigning it a value; as in, “I am going to measure the number of individuals who use aspirin daily”. In this example, the term measure is used to describe the action of quantifying the concept of interest – the number of individuals who use aspirin daily. In this context, we count the number of individuals who use aspirin daily within a given sample of participants. In clinical practice, health care professionals assess and measure information about patients frequently, including age, height, weight, heart rate, blood pressure, lung volume, as well as depressive symptoms, or anxiety levels.

Measurement can also involve asking participants to answer questions about themselves (e.g., hair colour, postal code, income) or completing a standardized assessment tool. For example, we might want to see how exercise self-efficacy, or the belief that you can complete an activity, influences junior high students’ participation in gym class. Since exercise self-efficacy is a complex concept, instead of counting, we would measure this concept by asking participants to rate their agreement with a series of pre-planned statements and then use this information create a numeric score that represents their perceived level of self-efficacy. Examples of other standardized assessment tools include the Morse Falls Scale which is commonly used to measure a persons’ risk of falling, and the Braden Scale for Predicting Pressure Ulcer Risk, which – you guessed it – helps measure a persons’ risk of developing a pressure ulcer.

As a noun, the term measure can also refer to the instrument or method used to assess the quantity of an attribute of something or someone. For example, the count of the number of individuals who use aspirin daily is the measure or method used to quantify the number of individuals who use this medication every day. Likewise, the General Health Questionnaire is a measure of participants’ self-reported overall mental and physical health.

In research, the information obtained during the measurement process is referred to as data. The word data is plural for the term datum, whereby datum refers to a single value. Although data are often presented as numbers, data are not limited to numerical values. For example, data can be the verbal responses to an interview. Similarly, data can be raw materials, artifacts, diagrams, or specimens. However, in this book, we will focus on numerical data which can be analyzed statistically.

When we begin our research we need to consider the type of data that we will be collecting because this information will inform our selection of the appropriate statistical tests for data analysis. Numerical data are often classified into four possible categories: nominal, ordinal, interval, or ratio. In the following sections, each type of data will be described in detail.

Nominal Data

The term nominal refers to the first level of measurement. This term is used to describe data that have no intrinsic hierarchy, meaning that there is no underlying number line upon which the measurements are based. Nominal data include those variables that have numeric values but the numbers are not in a logical or meaningful order, as well as those variables that are simply assigned labels as part of the strategy to analyze data. Nominal variables are categorical and must not be used as if they are considered to be on a continuum (see ordinal data). Note that you cannot do math on nominal variables because it doesn’t make sense.

Examples of nominal measurements that are numeric include telephone numbers, or license plate numbers. Telephone numbers are randomly assigned to regions within a geographical area. A comparison of the numbers 688-5550 and 978-2345 does not provide us with any information about our participants except that they come from two different telephone regions.

Examples of nominal measurements that are assigned labels include sex, hair colour, or eye colour. For example, when we write a coding strategy in a computer program to analyze data, we often re-code the measure of sex as 1= male and 2= female, 3=other. In this example sex is considered as a nominal variable because the values held by this variable (i.e. 1, 2, and 3) do not have an intrinsic hierarchy with respect to the entities they represent (i.e. males, females, other). In this example, even though other= 3, female = 2 and male = 1, it doesn’t make sense to say that the category of other are 3 times as much as males. The labels are entirely arbitrary so you could have coded them the other way around (other= 1 male =2 and female =3) or using other numbers.

Ordinal Data

The term ordinal refers to the second level of measurement. This term is used to describe measurements that do not have an intrinsic numerical hierarchy (as defined previously) but do have a distinct order. The importance of the order of ordinal measures is based on a hierarchy established by the researcher. In ordinal data, values represent agreement with subjective anchors. For example, Likert scale response options on questionnaires are often used to measure variables that provide perceptions of constructs such as health, burnout, stress, coping, anxiety, or ratings of emotions. In survey responses, the researcher sets the polarity and order. The values are often discrete, as shown in the following example:

Strongly Disagree 1 —- 2 —- 3 —- 4 —- 5 Strongly Agree

The polarity of these responses (negative to positive) can be reversed (positive to negative) and so respondents need to be vigilant to the meaning of the poles selected for the scale.

Strongly Agree 1 —- 2 —- 3 —- 4 —- 5 Strongly Disagree

One example that illustrates the subjective nature of ordinal scales is the Rating of Perceived Exertion (RPE) scale (Borg, 1982). This scale is commonly used during exercise to determine participants’ feelings of exercise intensity and was developed by matching heart rate to perceptions of exercise intensity. Originally the scale ranged from 6 (~ heart rate of 60 beats/min) to 20 (~ heart rate of 200 beats/min) with scores of 6 to 8 corresponding with feelings of “very, very light” intensity and scores of 19 to 20 corresponding with feelings of “very, very hard” intensity (Table X).

Table 2.1. The original rating of perceived exertion (PER) scale (Borg, 1982)

Score	Perceived Exertion
6
7	Very, very light
8
9	Very light
10
11	Fairly light
12
13	Somewhat hard
14
15	Hard
16
17	Very hard
18
19	Very, very hard
20

Unfortunately, a scale from 6-20 is not intuitive to most people, making it difficult for people to interpret, especially if they did not know the typical range for an individual’s heart rate. Eventually, the RPE scale was changed to 0-10 with 0 being “nothing at all” to 10 being “very, very strong” because it was more meaningful to people and easier to use (Table X). It is noteworthy to mention that the inclusion of the rating estimate: 0.5, was added to convert this ordinal scale to a ratio scale, discussed below.

Table 2.2: The modified RPE scale (Borg, 1982)

Score	Perceived Exertion
0	Nothing at all
0.5	Very, very weak
1	Very weak
2	Weak
3	Moderate
4	Somewhat strong
5	Strong
6
7	Very strong
8
9
10	Very, very strong
.	Maximal

In some applications, a dot is added to the end of the scale so that an individual can provide a rating of their perceived sensation to be over 10. For example, when running on a treadmill to exhaustion, some exercise physiology labs will include the phrase “Saw God!” Regardless, although including an extra indicator at the extreme margin of a scale may seem a bit odd, because it is an ordinal scale the authors are free to assign values to subjective ratings however they wish.

Ordinal-scaled scores can also provide data for ranking, in which the researcher establishes the order of the ranking pattern, usually from highest to lowest or vice versa. An example of ranked ordinal scores is shown in the following table (Table X). Notice that in this example the identification label (ID) assigned to each score is maintained after the original series of scores is ordered by rank (also known as rank-ordered). In this example, notice that the researcher assigned the lowest score to a rank of “1”, indicating to the reader that the lower score is perceived to be better (a decision of the researcher). It is important to recognize that while these data represent an arbitrary set of scores, data that are at the level of ordinal, interval and ratio data can be ranked.

Original Scores		Scores After Ranking
ID	Score	ID	Score	Rank
Respondent A	77	Respondent C	68	1
Respondent B	76	Respondent D	74	2
Respondent C	68	Respondent B	76	3
Respondent D	74	Respondent A	77	4
Respondent E	78	Respondent E	78	5

In reviewing these data by rank alone we do not get a sense of the difference in scores between ranks. The assignment of the rank is not in and of itself at an interval level data point because the gap between the ranks is not necessarily consistent or equal.

We often see the ranking of ordinal measures in sporting events where teams or players are evaluated based on their win to loss record or based on other criteria such as percentage goals as performance indicators. This way of presenting the information allows an observer to quickly determine an order to the ranking of the participants (teams or individuals). For example, below in Table X presents the top 10 men’s basketball teams from the 2017 NCAA post-season rankings (ESPN, 2017). Notice that teams are ranked by total points and the difference in points between ranks is not the same. For example, North Carolina is ranked #1 and has 31 points more than Gonzaga (rank #2), but Gonzaga has 49 points more than the #3 ranked team, Oregon. This is why you cannot treat ranked data the same way as interval or ratio data – ranks simply tell us the relative position of each value within the data set. Ranks do not provide the absolute scores, as in this case where the total number of points, represents ratio level data.

Table 2.4: The Top 10 men's basketball teams in the NCAA in 2017 according to post-season rankings

Rank	Team	Record	Points
1	North Carolina	33-7	775
2	Gonzaga	37-2	744
3	Oregon	33-6	695
4	Kansas	31-5	653
5	Kentucky	32-6	627
6	South Carolina	26-11	561
7	Arizona	32-5	548
8	Villanova	32-4	498
9	UCLA	31-5	492
10	Florida	27-9	468

Interval Data

The term interval refers to the third level of measurement. Data at the interval level of measurement use a constant unit of measurement (i.e., the distance between numbers on the scale represent equal changes in the item being measured) on an underlying real number line. Therefore, measurements made on an interval scale have a distinct order in which the importance of the direction or the polarity of the order is established, previously. However, the “0” value of interval data is subjective and is set by the users of the measurement tool and does not reflect the absence of the characteristic being measured. For example, when measuring temperature in degrees Celsius, 0 represents the freezing point of water. This does not represent the complete absence of heat altogether (absolute zero). Similarly, 100 degrees is not twice as hot as 50 degrees. Another example of an interval scale would be time of day on a clock. Although this measurement of time is meaningful and helps us stay organized, intervals of time only have meaning relative to other times on that scale. It is illogical to say that 12 o’clock is twice as much as 6 o’clock, though we could say 60 minutes is double 30. Again on this time scale, zero o’clock signifies midnight, not the absence of time.

Ratio Data

The term ratio refers to the fourth level of measurement. The ratio level of measurement also has an intrinsic hierarchy and is based on the real number line. Ratio measures have a distinct order, a distinct direction, and a distinct polarity. Yet, the distinguishing characteristic of the ratio level of measurement is the presence of a real 0 which indicates an absolute absence of the item being measured. In health research you will find many examples of ratio level measures. often collecting information on height, weight, blood pressure, heart rate, and age, or you may record income, the number of days employed, or the number of times a times a participant experienced an event. For example, when interpreting age as a ratio measure, 0 is meaningful and may indicate that an individual had just been born (current date subtracting the birth date, on the day an infant was born) and someone who is 30 years old is twice as old as someone who is 15 years old. Similarly, if you were studying the number of ear infections children experience between birth and two years of age, if a study participant recorded 0 ear infections during the study period, this would be meaningful data. When you were interpreting your data, the participant who experienced 2 ear infections would have twice as many as someone who only recorded 1 and so forth.

If you are interested in reading more about levels of measurement the following book is an excellent resource: Pedhazur, E. J., & Pedhazur Schmelkin, L. (1991). Measurement and Scientific Inquiry. In Pedhazur & Pedhazur Schmelkin (Eds.), Measurement, Design, and Analysis: An integrated approach, pp. 15-29. New York: Psychology Press.

[1] National Academies of Sciences, Engineering, and Medicine. 2019. Implementing strategies to enhance public health surveillance of physical activity in the United States. Washington, DC: The National Academies Press. doi: https://doi.org/10.17226/25444.

License

Icon for the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Applied Statistics in Healthcare Research Copyright © 2020 by William J. Montelpare, Ph.D., Emily Read, Ph.D., Teri McComber, Alyson Mahar, Ph.D., and Krista Ritchie, Ph.D. is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, except where otherwise noted.