Observations and Variables

Pile of observations

Pile of observations

In statistics we try to make sense of a clutter of observations. Sometimes we perceive that the observations we can obtain are messy and other times collected in a predefined ordered pattern. In most situations we arrange the observations in a spreadsheet with rows and columns. The rows are labelled observations and each column is labelled a variable.

Important questions

  • Are each observation independent? If not are they clustered/nested and do we have information about that?
  • For each variable we need to ask what scale would be appropriate to use.
  • For nominal variables we also want to know if they are dichotomous or not. For variables measured by an interval or ratio scale we also want to know what distribution they follow.

The answer to these questions are essential to decide what kind of descriptive statistics and inferential statistics that are appropriate to use.

Different scales of measure (levels of measurement)

The major divider is if a variable is quantitative or qualitative (Table 1). However, the labels quantitative and qualitative has gradually become less and less used because they are so easily mixed up with the labels quantitative and qualitative approaches / methods.

PropertiesNominal scaleOrdinal scaleInterval scaleRatio scale
Qualitative variables
=Categorical variables
Quantitative variables
CategoriesXX(X)(X)
OrderXXX
Equidistant stepsXX
Zero point
(enables creating a ratio)
X

Variables measured with an interval or ratio scale

Two features are characteristic for quantitative variables, order between different scale steps ​​and equidistant (equal) scale steps (increments). Imagine that you want to measure the number of children of all families living in a residential area. One family has four children, and the other two. You can say that there is an order in that four is more than two. Since each scale step, a child, is of equal distance, you can say that the four children are exactly twice as many children as two. When the scale steps are equal we say the scale steps are equidistant. There are two main groups of variables measured with an interval or ratio scale, discrete variables and continuous variables. Continuous variables can assume any value, for example blood pressure or blood glucose levels. Variables that can only take on certain values, such as integers, are labelled discrete variables. Examples of discrete variables are the number of children in a family, or the number of visits per year. If the number of possible values ​​of a discrete variable is many, it makes sense to treat it as a continuous variable. In practice, one can say that if the variable is quantitative, ie, has an order and equidistant scale steps, we rarely care about whether it is a discrete or continuous variable.

Typical for a ratio scale is that it has a zero point. Example of a discrete variable that is measured by a ratio scale is the number of visits at a medical center. Ratios, such as number of visits per year, can be calculated if there is a zero point. Another example is to compare the number of children between two day care centers. The number can be twice as high in one center compared to another. Temperature on the Kelvin scale is an example of a continuous variable that can also be described by a ratio scale. Since the temperature on the Kelvin scale can’t go below 0, one can say that 20°K is exactly twice as much as 10°K.  Temperature according to the Celsius scale is an example of a continuous variable that cannot be described using a ratio scale. You could say that +20°C is more than +10°C. However, as the temperature according to the Celsius scale can go below 0, one can’t say that +20°C is twice as much as +10°C.

Variables measured with an ordinal scale

The typical feature is an order but scale steps are not equidistant. An example is measuring pain using a visual analogue scale (VAS). Individuals are encouraged to describe their current pain experience by putting an X on a 100 mm long line. The reading is obtained by measuring the distance from start of the scale (left) and up to the X. The full range is 100 mm long. You could say that 40 mm on the pain scale is more pain than 20 mm. In general values ​​to the right of the scale means more pain than the values ​​to the left of the scale. 40 mm is exactly twice as far away as 20 mm. However, 40 mm is not representing exactly twice as much pain as 20 mm. Although there is an order the scale steps are not equidistant.

Variables measured by a nominal scale

Variables measured by a nominal scale are unordered categorical variables and as the name says they have no order between observations. Blood group is a classic example. You can not say that blood type A is more or better than blood type B. They are simply different blood groups. Another classic example of unordered categorical variables are gender. The latter is a dichotomous or binary version of the nominal scale.

Different scales of measures and descriptive statistics

Addition and subtraction are meaningless if the scale of measure does not have equidistant scale steps. This means for that you can’t calculate sum-score, average or do a subtraction to work out change. The latter causes problems if you want to look at changes over time as measured by, for example, a VAS (Visual Analogue Scale). Thus, you should not take a final value minus the initial value to calculate change when the variable is measured using an ordinal scale.

  == Measures of central tendency ==== Measures of dispersion ==
MeanMedianProportionStandard deviationInterquartile rangeConfidence interval
Interval or ratio scaleNormal distributionXX
Interval or ratio scaleSkewed distributionXX
Ordinal scaleXX
Nominal scaleXX

Choice of inferential statistical tests partly depends on type of variable

Parametric tests are inferential statistical tests assuming that observations are measured with an interval or ratio scale and that observations are normally distributed. Non parametric tests does not require observations to be measured by an interval or ratio scale and they do not need to be normally distributed. Parametric tests are often (but not always) more sensitive meaning that they have a slightly greater chance of finding something that you are looking for. However, the difference in sensitivity between parametric and non parametric testing is often surprisingly small.

  Parametric testsNon parametric tests
Interval or ratio scaleNormal distributionX(X)
Interval or ratio scaleSkewed distributionX
Ordinal scaleNormal distribution((X))X
Ordinal scaleSkewed distributionX
Nominal scaleX

Some statisticians would apply parametric testing on large sets of observations measured with an ordinal scale and being normally distributed. This is by some considered a practical approach. However, it lacks theoretical support since parametric testing requires equidistant scale steps. It is never wrong to use a non parametric test if you are unsure.

You should cite this article if you use its information in other circumstances. An example of citing this article is:
Ronny Gunnarsson. Observations and Variables [in Science Network TV]. Available at: http://science-network.tv/observations-and-variables/. Accessed June 23, 2017.

Comments are closed.