Ronny Gunnarsson. Effect size [in Science Network TV]. Available at: http://science-network.tv/effect-size/. Accessed May 27, 2017.
This page will explain the difference between “statistical significance” and “effect size”, the latter also labelled clinical significance in health care. The label “effect size” is most often used when discussing effect of an intervention in study designs comparing groups. However, it is also used in observational studies trying to explore effects.
Table of Contents
P-value versus effect size
We often talk about the p-value. P stands for “probability”. What kind of probability is it? In the situation of comparing two interventions the p-value states the probability of being wrong when we say that treatment A has a larger effect than treatment B. A low p-value says it is unlikely that we would get the observed observations if the difference in effect between groups is zero. You can read more about this on the page Level of significance. It is important to remember that the p-value only talks about statistical significance but says nothing about importance (clinical significance in health care).
In large studies we may reach statistical significance but the difference between groups may be so small that is is of no importance (no clinical significance). Hence, we need another measure than the p-value showing the importance (clinical significance) of the differences in effect between treatments. This measure is labelled “Effect size”. The p-value is always a probability estimated with a figure between 0-1. The effect size is different. It is not a single figure, rather many different estimates that may all be labelled as effect size when used to evaluate effect of interventions.
Different types of effect size
We use different types of effect sizes depending on the outcome variable and if the scenario is group comparisons with no need to adjust for confounding variables or if the scenario focus on associations, the latter commonly used also for group comparisons where there is a need to adjust for confounding variables. You may find the web page about Choosing statistical analysis clarifying. The most common types of effect sizes are:
Type of effect size | Effect size |
---|---|
A. Difference in outcome between groups* where the outcome is measured by a continuous scale (such as blood pressure) | Cohen’s d, Glass’ Δ, Hedges’ g, The Drug-Placebo Response Curve |
B. Difference in outcome between groups* where the outcome is binary (yes/no or 0/1) | Relative risk reduction (RRR), Absolute risk reduction (ARR), Number needed to treat (NNT), Number needed to harm (NNH), Non occurrence probability increase (NOPI) |
C. Correlation between group allocation* and an outcome measured by a continuous scale | R, R-squared, Beta-coefficient |
D. Correlation between group allocation* and a binary outcome (yes/no or 0/1) | Odds ratio (OR), Relative risk (RR), Hazards ratio (HR) |
(*Group allocation means a variable determining if participants had one intervention / exposure compared to another / or placebo) |
The main difference between type A+B and C+D is that C+D allows for adjustment of confounding factors. Hence, effect size of type A or B should only be used in a randomised controlled trial with no statistically significant difference between groups at baseline. In all other situations it is more relevant to use an effect size of type C or D.
Cohen’s d is sometimes referred to as the effect size and this may cause confusion. It is important to note that Cohen’s d is only one of several different estimates of effect size and Cohen’s d is not always suitable (see table above). Furthermore, it is common to have more than one outcome measure so you may have to mix different types of estimates of effect size. However, it would be logical to stick within type A+B or type C+D depending if you need to adjust for covariates.
P-value is given as a single estimate. However, the effect size is given as a point estimate with a confidence interval (usually a 95% confidence interval). The p-value and the effect size are tied to each other so when the p-value increases and approaches 0.05 the 95% confidence interval for effect size will approach the limit for no effect. Hence, the p-value and the effect size are merely the opposite sides of the same coin.
Magnitude of effect size
The table below are rough rules of thumb that should be adjusted for your particular context.
No effect | Very small | Small | Medium | Large | Very large | Huge | |
---|---|---|---|---|---|---|---|
Cohens d* | <0.01 | 0.01-0.19 | 0.20-0.49 | 0.50-0.79 | 0.80-1.1 | 1.2-1.9 | 2.0- |
R* | 0.0 | 0.10-0.29 | 0.30-0.49 | 0.50- | |||
R squared | 0.0 | 0.010-0.089 | 0.090-0.24 | 0.25- | |||
Odds ratio** | 1.0 | 1.5-3.5 or 0.29-0.67 | 3.5-4.9 or 0.20-0.29 | 5.0 – or <0.20 | |||
Hazards ratio** | 1.0 | ||||||
* Irrespective if the effect size is positive or negative ** Near 1.0 is no effect. The more it moves away from 1.0 (upwards or downwards) the larger the effect. |
Cohens d
(Under construction)
Hedge’s g
(Under construction)
(This page is still under construction)
Useful links
References
Ronny Gunnarsson. Effect size [in Science Network TV]. Available at: http://science-network.tv/effect-size/. Accessed May 27, 2017.