[cite]

Suggested pre-reading	What this web page adds
Introduction to statistics Choosing statistical analysis	This web-page describes what sample size calculation is and what you need to consider. Reading this will give you the ability to do simple sample size calculations yourself.

Table of Contents (with links)

Introduction to sample size calculation

You first collect data and process them before analysing numbers . The following steps are calculating descriptive statistics and if applicable also inferential statistics. Most research projects also involve inferential statistics.

If your project only includes descriptive statistics focus your sample size calculation on that. Examples of focusing on descriptive statistics: To confirm a 5% prevalence of a condition with a margin of error of 3% (2-8%) would require 377 observations.

However, if your project includes some inferential statistics focus your sample size calculation on that. In most projects this would have higher priority than doing a sample size calculation for descriptive statistics.

The statistical calculations (=inferential statistics) looks at your data and produces results such as p-values, odds ratios, hazard ratios, etc. Is the reason for not reaching statistical significance that there are no correlation / no difference between groups or is the reason that your sample size was too small. To avoid ending up with the latter problem it is recommended to use a special software and do the statistical calculation backwards using some assumptions. It has gradually become more common that ethics committees require a sample size estimation before approving a project.

Different approaches to sample size estimation

Get a convenient sample and hope it is enough
See how many observations other published projects included and imitate them
Follow a rule of thumb
Make a calculation based on your best assumptions.

Hope is good in many situations except this one. Imitate others is also not a good advice. What if the others did an underpowered study? Why replicate their mistake? There are some rules of thumb such as:

For group comparisons of means (t-test) have at least 30 in each group.
For group comparisons of proportions (chi-square) have at least 5 in each cell.
For standard linegressions/correlations have at least 20 observations for each independent variable.
For logistic regression have have at least 10 times more events / end points than independent variables .
For Cox regression have at least 10 times more events / end points than independent variables . For example: you have four independent predictor variables in the model and the proportion of positive cases in the population is expected to be 0.30 (30%) the minimum number of cases required would be 133.

However, these rule of thumb are quite rudimentary because they do not consider the magnitude of the effect size or correlation you are looking for. They just give the bare minimum number you should have to avoid violating underlying mathematical assumptions but they do not consider your particular situation. The best approach to estimate the size of the sample is to do a proper sample size calculation considering the situation in your study. This is done by first making four important decisions:

Decide what statistical method is going to be used for the inferential statistics.
Decide what effect size / correlation you are looking for. It is best if this can be estimated using data from previous publications. You have to make a qualified guess if no prior publications exists.
Decide what would be an acceptable safety margin to avoid doing a type one error (claiming a statistical finding that is not true). This safety margin is labelled alpha or level of significance and is commonly set to 0.05. This means that you have a one in twenty chance of doing a type one error.
Decide what power your study should have. This is the same as the inverse of the risk of doing a type two error (not identifying an effect/correlation that is true). The power is often set to something between 0.80-0.95 which corresponds to a 5-20% chance of doing a type two error.

The rest is quite easy once we have made these four decisions. We put in our decisions in a software that does the statistical calculation backwards and states how large sample we need. Example of such software are G*Power and PASS. G*Power is free but PASS is quite expensive. G*Power can manage most situations except Cox regression.

Different scenarios

Sample size for a single proportion (such as prevalence or incidence)
Sample size for group comparisons
Sample size for regression analysis

(This section is under construction)

Examples of sample size calculations

Click to expand and watch a video explaining further:

Examples using the software G*Power

Example 1 of sample size calculation for comparing two groups – T-test and Mann-Whitneys test

Example 2 of sample size calculation for comparing two groups – T-test and Mann-Whitneys test

Example of sample size calculation for unconditional binary Logistic regression when the independent variable is binary (such as gender)

Examples using the software PASS

Example of sample size calculation for Cox regression

Sample size calculation in multivariable regression

You may plan for a multivariable regression as your preferred final statistical analysis. There are a few approaches to this situation:

Make one sample size calculation for each independent variable as if you are going to do simple (unadjusted) regressions. You will get one sample size for each independent variable. Pick the one with the highest number as your preferred sample size (and perhaps add a margin of 20% extra). This is the most common strategy and the one used in the videos above.
In case you are only interested in one independent variable and want to add a few more only to adjust for them (as confounding variables) try to estimate the contribution from the covariates (R square other X in G*Power) and add it in G*Power together with the expected information around your main independent variable to calculate the sample size required. Finding the right value on the “R square other X” is tricky and might be impossible. Either make a reasonable guess or go with strategy 1 above.
There may be many independent variables in an exploratory study and none are initially more important than another. The simplest solutions is to use strategy 1 above. It may be difficult to sort out how the variables may relate in a multivariable model without making a lot of guesses.
Calculating sample size for interaction variables in a regression is tricky for two reasons. Firstly, it is often difficult to find support for the assumptions you need to make so you may be left with some wild guessing. Secondly, you would need more advanced software than G*Power and a statistician who has experience of this advanced calculation (not all statisticians would have that).

Level of significance (alpha) versus p-value

A low p-value says it is unlikely that we would get the observed observations if the effect / correlation we’re looking for in reality is zero. A low P value indicates that the null hypothesis can be rejected and the alternative hypothesis is the most likely. How low must the p-value be for us to believe that our alternative hypothesis is the most plausible? This should be determined from case to case. Read more about this on the page describing the level of significance or alpha.

Sample size estimation in clustered studies

Observations are often grouped (clustered). A typical example can be that observations are clustered in different primary health care centres (GP clinics) or different hospitals. These clusters will add a random variation between clusters that makes your vision slightly blurred. It means that you must increase your sample size to maintain your ability to find what you are looking for. It can be shown that it is better to have many clusters contributing with a few observations compared to having a few clusters contributing with many observations. To estimate this calculate the required sample size as if there was no cluster effect. After that use the calculator below to estimate the effect on required sample size different cluster designs will have.

The impact of clusters is measured with Intra Class Correlation (ICC). You need to find a suitable assumption for ICC to put in below. The ideal situation is if you find a publication with a study similar to yours stating the ICC. If that is the case use that. Otherwise make a reasonable guess to estimate ICC. In a hospital setting common values if ICC are 0.02-0.1 . In a primary care setting common estimates of ICC are 0.1-0.2 although estimates up to 0.3 may occasionally be seen .

Examples of how to write

Below are examples of how to write the sample size section in a study protocol. You need to click to expand and read.

Situation	Example of how to present the sample size calculation
An observational study establishing prevalence of symptoms	Assuming a 90% confidence level, a prevalence of 7.5% for confusion, 12% for fatigue and 9.1% for restlessness (reference) requires a sample size of 210, 320 and 250 to achieve a margin of error being less than 3%. To ensure a suitable sample we aim to include 400 participants.
An observational study exploring covariations with logistic regression (This is about predicting a failed stress test to discover coronary artery disease)	In all sample size calculations the level of significance is set to 0.05, the power to 95% and we are using a two sided test. All sample size calculations are made using the software G*Power version 3.1.9.2. The sample sizes required for analysing the different independent variables are: a) Gender: Assuming 50% of men complete a test and 30% of women with an equal distribution between men and women: 312 patients. b) Age: We assume that for 50% of patients at the mean age complete the test and that this is reduced by 10% if patients are standard deviation older: 337 patients. c) Diabetes: We assume that 60% of non diabetic and 40% of diabetes patients will complete the test. Furthermore we assume 20% of patients will have diabetes: 510 patients. d) COPD: We assume that 60% of non COPD patients and 20% of patients with COPD will complete the test. We estimate the prevalence of COPD to be 10%: 236 patients. We aim to include a total of 550 patients. This should be achievable given that 40 exercise stress tests are done weekly at the hospital.
An observational study exploring covariations with logistic regression (This is about clarifying the correlation between non-specific symptoms, diabetes and antibiotic treatment in nursing home residents)	To estimate the covariation between presence of a symptom and having diabetes we assume that 3% of non-diabetics and 12% of diabetics has confusion or fatigue or restlessness with an alpha error of 0.05, a power of 90% and a prevalence of diabetes of 15% requires 620. To estimate the covariation between being on antibiotics and having diabetes we assume that 1% of non-diabetics and 8% of diabetics are on antibiotics with an alpha error of 0.05, a power of 90% and a prevalence of diabetes of 15% requires 602. To ensure a suitable sample we aim to include 850 participants. All sample size calculations are made using the software G*Power version 3.1.9.2.
A long term follow up of a randomized controlled trial. We are doing a long term follow up of a cohort that has been followed for more than a decade. Hence, the sample size is already determined and we aim to here present the likely power this study will have.	According to the main initial objective of the project, sample size was first calculated to detect differences of five points on the urinary irritative-obstructive score of the EPIC questionnaire between treatment groups. For the aim of this manuscript, the statistical power has been calculated assuming an effect Size of 0.5 between groups with and without relapse. Further deciding to use a 2-sided t-test with a level of significance of 5%. The smallest treatment group includes 188 patients (32 biochemical relapse) and the biggest 306 (43 biochemical relapse). The calculation shows that our study may have a statistical power between 73% and 86% which we find acceptable. Estimation of power was made using the software G*Power version 3.1.9.2.
An observational study exploring covariations with Cox proportional hazards regression	Assuming a level of significance of 0.05, a power of 0.95 and a hazard ratio for the variable having diabetes at baseline of 1.75. Furthermore we assume that 15% of data are censored (we lack information on if they got the end point or not). During these circumstances we would need a total of 198 patients. The sample size calculation was done with the software PASS version 11.0.8 (Hintze, J. (2011). PASS 11. NCSS, LLC. Kaysville, Utah, USA. www.ncss.com.)
A randomized controlled trial analysed with with Cox proportional hazards regression	Assuming a hazard ratio between intervention and control group of 1.75 (a patient at any given time has a 75% higher probability of being cured at the next time point compared to patients in the other group). Furthermore we assume a power of 0.95 and an alpha of 0.05 and that 15% of data are censored (we lack information on if they got well or not). During these circumstances we would need 99 patients in each group, in total 198 patients. The sample size calculation was done with the software PASS version 11.0.8 (Hintze, J. (2011). PASS 11. NCSS, LLC. Kaysville, Utah, USA. www.ncss.com.)

Useful links

References

{2262766:58FRF82X};{2262766:MQ5B4NWM};{2262766:JF6GI6TK};{2262766:8VUI55V5};{2262766:JFP8RESM} vancouver default asc 0 99