# Study design (in studies using statistics)

Ronny Gunnarsson. Study design (in studies using statistics) [in Science Network TV]. Available at: https://science-network.tv/study-design/. Accessed July 16, 2024.

Start early to think about study design before submitting an application to the ethics committee. Reading this page will give you a bird’s perspective of the different options and their pros and cons. Pilot studies is a separate topic and is dealt with on the page Pilot studies (feasibility studies).

# A brief overview of research design

## Observational versus experimental studies

Studies are split into observational and experimental. The difference is that in observational studies there is no attempt to actively tamper with reality. Effects of different interventions may be briefly estimated in observational studies collecting observations about previous interventions deployed outside the project. These observational studies are often retrospective reviews of existing data sets such as patient charts. As soon as we actively introduce some kind of tampering with reality, labelled intervention, we have an experimental study design.

## N-Factor design of experimental studies

For the experimental studies we often talk about:

• Zero factor design: No variables are used for group allocation. This means that a single group is compared either to a fix predefined value or a more or less advanced pre-post comparison is made.
• One factor design: It is considered a one factor design if one variable is used to allocate observations to separate groups. A common example is if two or more independent groups are compared. One factor design is the most common design in group comparisons. Seferiadis’s study about basic body awareness therapy to patients with chronic whiplash associated disorders is an example of a one factor design with two groups .
• Two factor design: If two factors (such as type of treatment and timing) are used for group allocation. If each factor had two categories we would get a two-factor design with four separate groups. It would still be a two factor design if each factor had three categories but now we would have nine groups (which is more complicated). Rosenfeld’s study about management of patients exposed to a whiplash trauma is an example of a two factor design with four groups .
• N-factor design: N-factor design means any factor such as 0, 1 , 2 (as mentioned above) or more. Hence, in theory you can use a design that is a three, four, five, etc factor design. These studies using many factors / variables for group allocation are rare and very complicated to implement in reality.

## The birds perspective

1. Observational studies = Non experimental studies
1. Prospective studies (observations does not exists and needs to be collected – data on an individual level can be collected)
1. Longitudinal studies = Cohort studies (One or several groups are followed to see what happens. An example would be to follow one group of non-smokers and compare their risk for lung cancer with another cohort of smokers.)
2. Cross-sectional studies
2. Retrospective studies (observations on an individual level already exists in databases, charts or archives. They just needs to be compiled – data on an individual level can be collected)
1. Longitudinal studies
1. Case-control studies (Smokers and non-smokers are followed in a cohort study to see if there later is a difference in risk for lung cancer. In a case-control study it is the other way around. The proportion of smokers are compared between patients with  diagnosed lung cancer and a group of healthy individuals. Hence you start with the disease and look for exposition.)
2. Historic cohort studies (You have a database or register enabling you to identify people who in the past were smokers and non-smokers. You compare this with other existing data to see if there is a difference in risk for lung cancer. Hence, a historic cohort study is similar to a cohort study with the difference that all data already resides somewhere in a historic cohort study.)
2. Cross-sectional studies
3. Ecological studies (data on an individual level does not exist, only aggregated data for large groups of individuals)
1. Geographical (comparing health and/or exposure between geographical areas)
2. Longitudinal (assessing changes in health and/or other confounding factor over time in one population)
3. Migration (focusing on health and/or exposure in different population types by studying migrant populations)
2. Experimental studies (are always prospective and longitudinal – data on an individual level can be collected)
1. Interrupted times-series (All individuals / groups get the intervention) =zero factor design
1. Single baseline design = Single Case Research Experimental Design – SCRED (A baseline period, labelled A, is followed by a period of intervention labelled B. This sequence can be repeated once or several times.)
1. AB design
2. ABA design
3. ABAB design
2. Multiple baseline design (intervention is introduced in several individuals or groups with some delay between individuals / groups. Allocation to time for intervention is sometimes done using randomization.) .
1. Multiple baseline design across cases (intervention is introduced at different time intervals for an individual or group of individuals) . This design is also labelled Stepped wedge design.
2. Multiple baseline design within a case (two or more phenomena are measured and intervention for these phenomena are introduced at different time intervals within an individual or group of individuals)
2. Controlled Trial (Group comparisons but without randomization)
1. One factor design (only one factor used for group allocation)
1. Unmatched groups (most common with only two groups)
2. Matched pairs design
3. Cross-over design
2. N-factor design (Here it means two or more factors used for group allocation)
1. Unmatched N-factor design
2. Matched N-factor design = Block trial
3. Latin square (cross-over for an N-factor design)
3. Randomized Controlled Trial – RCT (Group comparisons using random allocation to groups)
1. One factor design (only one factor used for group allocation)
1. Unmatched groups (most common with only two groups)
2. Matched pairs design
3. Cross-over design
2. N-factor design (Here it means two or more factors used for group allocation)
1. Unmatched N-factor design
2. Matched N-factor design = Block trial
3. Latin square (cross-over for an N-factor design)

An individual is only observed once in a cross-sectional study in contrast to longitudinal studies where the same individual is observed (measured) more than one time with a short or long time period in between. A non randomized controlled clinical trial done prospectively should be labelled controlled clinical trial (CCT). If it is done retrospectively it would be logical to label it a historic cohort study.

# The most common types

## Observational studies

There is no attempt to tamper with the reality in observational studies (no intervention). There are different types of study design within observational studies (see brief overview above).

### Case studies / case series

One or a few patients are described without using any inferential statistics. This is an observational study that most often is retrospective in its nature.

### Cohort studies

Following a group of individuals over a period of time (often a long period) to see how disease develops is labelled a cohort study. It is common to follow several different groups (called cohorts) to see if there is any difference between the different groups. For example, smokers compared with nonsmokers.

### Case-Control studies

In case-control studies a group of individuals with a particular disease, such as lung cancer, and their exposure to something, such as smoking, is compared with a control group that do not have the disease. Case-control studies can be matched or unmatched. A cohort study usually provides more reliable conclusions than a case-control study.

### Historic cohort studies

The historical cohort study is similar to the case-control study, but the difference is that this is based on a group of individuals with a particular feature, such as being a smoker, following them to see what happened to them. What percentage of them develop lung cancer? Case-control studies are based on a group of individuals with a particular disease (or other outcome) and we are looking for an association to different exposures. A historical cohort study do the opposite and is based on individuals exposed to a risk factor. The most common scenario where the historic cohort design is used is the retrospective chart review. This design has a few pitfalls, the most dangerous one is probably Simpson’s statistical paradox.

### Ecological studies

In some situations there are no individual data, only aggregated data for large groups such as prevalence, incidence etc . These data are often already compiled and published. Hence accessing data is usually relatively simple and cheap. Data is usually analysed using regression techniques to adjust for confounding factors, preferably using multilevel techniques . Despite this ecological studies have potential problems unique for ecological studies and is named the ecological fallacy .

## Experimental studies

Experimental studies are always prospective longitudinal. In most experimental studies (also in SCRED) individuals are in one sense their own controls, that is the statistics is calculated on the individual’s change in an outcome variable rather than measurements at last follow up. We do not label this as a matched pairs design.

The most common variant of experimental study is the one factor design using randomization between one study group and one control group.  Patients are randomized to one of two groups in this scenario. The greater the number of patients included, the less random variation (and the greater the accuracy of the statistics). Matched pairs design is often a little better than an unmatched study but at the cost of much more complicated administration. Matched pairs design may be appropriate if all individuals are collected at once, instead of consecutively being included in the study, one by one where matched pairs design is less suitable.

We talk about studies using zero factor design, one factor design or n factor design. This relates to the number of variables used to determine group allocation. A zero factor design does not have a variable for group allocation because all participants belong to the one and only group. The common situation where participants are randomised to one of two treatment groups is a one factor design.

Let us take an example. We aim to evaluate a physiotherapy intervention for patients (intervention group = IG) just exposed to a whiplash trauma compared to a control group (CG) . Let us assume that we also want to evaluate if it matters if the patient get the treatment early or with some delay. This would give four different groups; group 1 (IG early), group 2 (CG early), group 3 (IG late) and group 4 (CG late). We could create one single variable and for each patient state the group allocation 1-4. Doing so would analyse data as a one factor design even if we have four separate treatment groups. The other alternative would be to create one variable for treatment (IG or CG) and one variable for timing (early or late). By doing the latter we could analyse data using a two-factor design. There are a few significant advantages by using a two-factor design rather than a one factor design in this case:

• A two factor design would allow estimation of interaction between intervention and timing. Is the combination of IG and early as if 1+1=2 or could it be like 1+1=5?
• A two factor design uses the data better and would give a more reliable answer to the question if type of intervention or timing matters.

I guess you now see that the number of variables used for group allocation decides the N in N-factor design. A one or two factor design is not very complicated but anything more than a two-factor design usually requires collaboration with statisticians and other personnel with prior experience of N-factor design.

### Single Case Research Experimental Design – SCRED

Single Case Research Experimental Design (SCRED) is also known as Single Subject Design or Single-case experimental design (SCED). In SCRED all individuals receive the same intervention. SCRED is not a cross-over study in which treatment options are compared.

SCRED is useful in intervention studies where it is very difficult to recruit sufficient numbers of patients for a randomized controlled trial such as when studying rare diseases. A randomized controlled trial is always a much better option if there are enough participants. SCRED should not be used simply because you don’t have the resources to do a proper randomized controlled trial.

One common cause of systematic errors in SCRED is if the outcome variable is not stable over time such as in diseases with a substantial spontaneous healing or in children that by nature always change. Hence, SCRED is especially unsuitable in children and when you want to study a disease that is not stable over a reasonable time period.

Scred can be made as AB, ABA and ABAB design. With A referring to a period without intervention and B a period of intervention. The more periods / cycles showing a change only in the B-periods, the more likely that this changes is actually caused by the intervention. An intervention with lasting effect has as a consequence that the deterioration is not seen in a A period that follows a B period. If you see a continued improvement in an A period following a B period it implies that the improvement is part of a spontaneous recovery rather than caused by the intervention.

There are two traditions when evaluating the results of a SCRED. One involves making a graph with a line for each individual. On the y-axis is the variable of interest and on the x-axis time. You then look at the lines and decide if the trend indicates improvement, deterioration or no change in the B-periods. A more accurate method is to calculate the individual change between the various periods and then with appropriate statistical tests decide if the change during B periods are statistically significant. The latter method is considered safer than mere visual inspection of a chart. You should make a prior estimation of sample size if you decide to evaluate the effect with statistical tests. It is important to report these studies properly, preferably following SCRIBE .

### Clinical trials

The word “clinical” refers to a focus on health outcomes. Hence, a clinical trial is a planned clinical study of the safety, efficacy and optimal dosing schedule of one or more diagnostic, therapeutic or prophylactic drugs, devices, or techniques, performed on humans selected according to predefined criteria to study the relationship between a health-related intervention and a health outcome . It may also be used for veterinary studies that meet the above criteria. A clinical trial is an experimental study, even if you rarely use the latter term.

Please ensure you register a clinical trial before commencing data collection. Failing to do so will make it more difficult to publish your manuscript .

Clinical trials are trials usually divided into Phase I, II, III and IV trials. Phase I is the first time a drug is tested on humans, usually a small group of healthy individuals. Phase II is when testing the agent on a larger group of healthy people (a couple of hundreds) and often also on a small group of patients, among other things, to see which dose is best to use in further studies. Phase III is when a large group of patients is being enrolled and the outcome is compared with a control group. Phase I trials often last around one year. Phase II trials for about two years. Phase III trials are longer, often for three years or more. Phase IV studies are conducted after the product is approved for general sale to get a better grip on efficacy and less common side effects.

What makes it all a bit messy is that a randomized controlled study, sometimes at the same time can be classified as an experimental study, a clinical trial, an intervention study and an epidemiological study (see below).

### Cross-over and latin square

Random variation are like dirt on your glasses. It means that observations spread out from the group mean and it makes it harder to see the details that are there (such as a difference between groups). Reducing random variation increases the chances of detecting something that is there. We have several types of random variation such as variation within individuals, variation between individuals and random variation in measurements. The variation between individuals is often the largest random variation in most scenarios.

One way to remove the inter-individual variation is to have the same individual in the treatment group and the control group. The same individual may of course not have multiple treatments simultaneously but may take one followed by the other. If all first got active treatment and placebo, then it could be any time-bound phenomena that affect our reading. This phenomenon could result in an incorrect conclusion.

Imagine that someone wants to investigating vitamin C’s ability to prevent colds. Assume that 100 patients are given two grams of vitamin C daily for six months. Thereafter, patients are without vitamin C for 6 months and the number of colds during this period is recorded and compared with the previous period. If the first period falls during the summer period and the other during the winter period, this can lead to the incorrect conclusion that vitamin C prevents colds. One way to solve this problem is to form pairs of individuals. Randomisation decides who in the couple start with placebo and  the other individual in the matched pair starts with vitamin C. They switch after half time. This is known as a cross-over trial.

If more than two groups are involved, that is if more than one new treatment is to be evaluated we label this cross-over trial a latin square. In Latin square, as for block design with more than two individuals, each block requires one individual more than the number of new treatments to be examined. The additional individual in the block serve as a control and this extra “treatment” is placebo or an established treatment that the new therapies should be compared with.

### Stepped wedge cluster randomised trial

The stepped wedge cluster randomised trial is a pragmatic trial where all participants in the end gets the intervention. It makes long term follow up impossible. Early names on this design were “waiting list designs” or “phased implementations” .

## Epidemiological studies

The word epidemiology originally comes from the word epidemic. Initially the focus was on infectious diseases. Today epidemiology embraces teaching and research of the occurrence of diseases in different populations and their causal factors. Epidemiological research are studies designed to investigate correlations and if possible to also make hypothesis about causality. Common purposes with epidemiological research are:

• Establish factors associated with a disease of interest. These factors are labelled risk factors or predictors. Combinations of risk factors can be used to create prediction models predicting presence of a disease. Some kind of regression would usually be used to establish this.
• Establish if any of the identified risk factors is also a causal factor. If possible also to clarify the exact relation between exposure to the risk factor and subsequent disease.
• Clarify transmission pathways for infectious diseases.

The most common types of epidemiological studies are case-control studies, cohort studies and cross-sectional studies. Epidemiological studies aiming to establish causality can sometimes be experimental and is then called intervention studies. Hence, epidemiological research can sometimes be experimental although most epidemiological research is observational in its nature.

# Evidence in experimental studies

The trustworthiness of  different experimental study designs is roughly:

1. High quality randomized controlled trial. This is generally considered as being the most robust and reliable design. However, this design is for various reasons not always practical.
2. High quality multiple baseline designs (the more advanced versions of interrupted times-series) are likely to come in as second in respect of being trustworthy.
3. High quality prospective cohort study
4. Various designs relating an intervention with an outcome (without any clear order).
• Suboptimal randomized controlled trial
• Suboptimal multiple baseline design
• Controlled trial
• Single baseline designs = Single Case Research Experimental Design – SCRED (the less advanced versions of interrupted times-series).
• Other observational studies
5. Expert recommendations

There are crappy randomized controlled trials and well conducted studies using interrupted times-series. Hence, the priority for trustworthiness above should be considered as a general guide that is not necessarily applicable to every single study.

# References

1.
WHO [Internet]. [cited 2018 Jul 3]. WHO | Trial Registration. Available from: http://www.who.int/ictrp/trial_reg/en/
1.
WHO [Internet]. [cited 2018 Jul 3]. WHO | Clinical trials. Available from: http://www.who.int/topics/clinical_trials/en/
1.
ICMJE | Recommendations | Clinical Trials [Internet]. [cited 2018 Jul 3]. Available from: http://www.icmje.org/recommendations/browse/publishing-and-editorial-issues/clinical-trial-registration.html
1.
Seferiadis A, Ohlin P, Billhult A, Gunnarsson R. Basic body awareness therapy or exercise therapy for the treatment of chronic whiplash associated disorders: a randomized comparative clinical trial. Disability and Rehabilitation [Internet]. 2016 Feb 27 [cited 2018 Feb 28];38(5):442–51. Available from: https://doi.org/10.3109/09638288.2015.1044036
1.
Biglan A, Ary D, Wagenaar AC. The Value of Interrupted Time-Series Experiments for Community Intervention Research. Prev Sci [Internet]. 2000 Mar 1 [cited 2018 Feb 22];1(1):31–49. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4553062/
1.
Tate, Robyn, Perdices, Michael, Rosenkoetter, Ulrike, McDonald, Skye, Togher, Leanne, Shadish, William, et al. The Single-Case Reporting Guideline In BEhavioural Interventions (SCRIBE) 2016: Explanation and elaboration. Archives of Scientific Psychology [Internet]. 2016;4(1):10–31. Available from: http://psycnet.apa.org/record/2016-17384-001
1.
Pearce N. The ecological fallacy strikes back. J Epidemiol Community Health [Internet]. 2000 May 1 [cited 2017 Jan 9];54(5):326–7. Available from: http://jech.bmj.com/content/54/5/326
1.
Levin KA. Study Design VI - Ecological Studies. Evid-based Dent [Internet]. 2006 Dec [cited 2017 Jan 9];7(4):108–108. Available from: http://www.nature.com/ebd/journal/v7/n4/full/6400454a.html
1.
Rosenfeld M, Seferiadis A, Carlsson J, Gunnarsson R. Active intervention in patients with whiplash-associated disorders improves long-term prognosis: a randomized controlled clinical trial. Spine. 2003 Nov 15;28(22):2491–8.
1.
Hemming K, Haines TP, Chilton PJ, Girling AJ, Lilford RJ. The stepped wedge cluster randomised trial: rationale, design, analysis, and reporting. BMJ [Internet]. 2015 Feb 6 [cited 2016 Nov 23];350:h391. Available from: http://www.bmj.com/content/350/bmj.h391