Ronny Gunnarsson. Level of Significance [in Science Network TV]. Available at: http://science-network.tv/level-of-significance/. Accessed November 19, 2018.

Table of Contents

## The difference between level of significance (alpha) and the p-value

A low p-value says it is unlikely that we would get the observed observations if the effect / correlation we’re looking for in reality is zero. A low P value indicates that the null hypothesis can be rejected and the alternative hypothesis is the most likely. How low must the p-value be for us to believe that our alternative hypothesis is the most plausible? This should be determined from case to case and is called the level of significance or alpha.

We use inferential statistics to calculate a p-value. Next step is to compare our calculated p-values (the probability of getting the observed values) to see if they are below or above the predetermined level of significance (alpha). We can reject the null hypothesis and consider the alternative hypothesis to be most likely if our p-value is below the predetermined alpha. The null hypothesis can not be rejected in the opposite case with a higher p-value and the result do not contradict that the null hypothesis is true.

If we find p <0.05 it means that the risk of doing wrong when we claim a statistical finding is <5%. By tradition, alpha is most often set to 0.05. Thus, a p-value of 0.045 might be classified as statistically significant while a p-value of 0.055 might not. However, it is important to keep in mind that the limit 0.05 is not black and white. A finding of p=0.055 or p=0.045 is very similar. P-values should be presented as they are rather than just stating if a finding was statistically significant or not.

In summary, the level of significance (alpha) is a fixed limit that is determined in advance. The level of significance does not depend on our observations and is not calculated, it is a decision determined on the basis of the safety margin you want to avoid doing a Type I error. The p-value however is calculated and depends on our observations.

## The level of significance and pure chance

Assume we want to know which variables differ between two groups, those who have experienced an illness compared to those who has not (or it could be those who say yes to a question compared to those who say no). Also assume that we want to investigate 50 different variables, some being categorical while others are continuous. Using chi-square or t-test to compare the two groups would result in 50 p-values, some below and some above 0.05. However, a p-value below 0.05 can occur by chance without representing a real difference between groups. In average for every 20 p-values calculated at least one can be expected to be below 0.05 just by pure chance.

## Deciding the level of significance

To compensate for the possibility of getting statistical significance by pure chance we need to lower the limit where we consider a statistical finding as significant as soon as we present more than one p-value. There are multiple ways of doing this:

- Bonferroni’s adjustment: This simple method means that we divide the desired level of significance (often 0.05) with the number of p-values calculated. In the example with 50 p-values a Bonferroni adjustment means that only p-values below 0.001 should be considered as statistically significant. The Bonferroni adjustment will work well for few p-values (<10). If you have more p-values use another method for adjustment.
- Scheffé's method to adjust for multiple group comparisons - one way ANOVA by Todd Grande (click to view)

## In what context do I need to consider adjusting the level of significance?

The level of significance needs to be lowered if you present multiple p-values. Do I need to consider all p-values presented in a single table, all p-values presented in one manuscript or all p-values I have ever calculated in my life? If the latter was correct it would mean that all experienced statisticians would be out of work because of their need for hefty adjustments of the level of significance.

It would be reasonable to adjust for all p-values in a manuscript considered to be a result. P-values are sometimes calculated in unadjusted regressions purely as a sorting mechanism to decide which variables to include in a multivariate regression and these unadjusted p-values should not be considered as a result. A reasonable suggestion might be to adjust the level of significance only for primary outcomes, a view presented by Steve Grambow in this video:

## Many or few p-values?

A consequence of calculating many p-values being part of the primary result is that very few or none of the calculated p-values would be considered as statistically significant. A better approach is to carefully decide which p-values are absolutely necessary and only calculate them. If you have many variables a better strategy would be to refrain from simple group comparisons and instead analyse associations. This is well described on the page choosing statistical analysis.

Ronny Gunnarsson. Level of Significance [in Science Network TV]. Available at: http://science-network.tv/level-of-significance/. Accessed November 19, 2018.