Statistics consists of two main parts; descriptive statistics and inferential statistics. Descriptive statistics try to describe the observations, usually by describing the central tendency and the dispersion from the central tendency. Inferential statistics try to make conclusions from your observations. This page will focus on choice of statistical method for the inferential statistics. The figure to the right gives a birds perspective of the different approaches in inferential statistics. The two main approaches are:

- Group comparisons (one group towards a fix value, matched groups or unmatched groups) with no or limited adjustment for confounding factors.
- Analyzing covariation in one single group.

Statistical methods used in inferential statistics can be split into parametric or non-parametric methods. Once you have had a look at the descriptive statistics the next step would be to decide if the inferential statistics should use parametric or non-parametric methods (follow this link before you continue to read here). The ground rule is to use parametric methods for the inferential statistics if your observations fulfill all requirements for using parametric methods, otherwise use non-parametric methods. Parametric methods are slightly more “sensitive” to find what you are looking for. It is quite common that within the same study some observations can be analysed using parametric methods while others require non-parametric methods.

## Group comparison or analyzing association with prediction models?

It can be shown that doing a simple group comparison and evaluating the same scenario using association with prediction modelling (usually some sort of regression) will provide the same results. It can be shown that most group comparisons can be considered as a special case of association with prediction modelling. Does it matter if I use the approach of direct group comparison or the approach of association with prediction modelling? Yes it does! It is quite common that there are confounding factors that influence the outcome of a group comparison. Example of such confounding factors can be gender, age, being a smoker or not, having diabetes or not, etc etc. If you go down the path of doing a simple group comparison you might want to adjust for this by doing sub-group analysis for different combinations of gender, age group, having diabetes or not, being a smoker or not, etc. However, this approach results in a couple of important problems:

- You need to make quite a lot of separate analysis leading to many p-values. Let us assume an example where we have three separate outcome (effect) variables to estimate a difference between groups. Let us assume they are reduction in mortality, reduction in the proportion of patients experiencing a heart attach and reduction in cholesterol level. This means three p-values (if we use p-values as our measure for difference between groups). If we also want to adjust for any subgroup of gender, age (below or above 65 years), having diabetes or not and being a smoker or not we would have to calculate 3*2*2*2*2=48 p-values. This invokes a need to adjust the level of significance for multiple testing. With many subgroups you will soon find that the magnitude of adjustments for multiple testing makes it very difficult (sometimes almost impossible) to find a difference between groups.
- Splitting your sample in many subgroups also gives fever observations for each sub-group analysis and your study is likely to be grossly under-powered.

The alternative is to use a statistical method that can incorporate all variables in one go and tell the importance of them. You would do one separate analysis for each outcome variable (they would be labelled as dependent variable) and include all other variables (including group allocation) as independent variables. In the same example as above we would get 3*(1+1+1+1+1)=15 estimates such as p-values, odds ratios or hazard ratios. The magnitude of the adjustment for multiple testing you need to do for simultaneously producing 15 p-values is much less than if you produce 48.

**The conclusion is that you should use simple group comparison if you have no need at all to adjust for confounding variables. This only occurs in the situation of a properly conducted randomized controlled trial. In all other situations (and also in some randomized controlled trials) you would be far better off choosing the approach of analyzing association using techniques also allowing the creation of prediction models (see below). This is very important in the case of retrospective studies (such as retrospective chart reviews) where confounding factors virtually always exists.**

## More about important considerations in different scenarios

Randomized Controlled Trials (RCT)The p-value for any baseline differences is usually not very impressing, rarely below 0.01. If you find a baseline difference with a p-value of less than 0.001 you would question the randomization procedure and in worst case throw the whole study in the waste bin. Statistical methods for simple group comparisons (see below) can usually be used if there are no baseline differences.

Sensitivity and specificity informs you about “the health of the diagnostic test”. Likelihood ratio informs how much more information was added by the test. Finally, predictive value of tests informs you about the health of your patients (if patients are at focus). Hence, sensitivity and specificity are of uttermost importance for manufacturers of diagnostic tests while predictive value of test are much more useful for health care staff.

Estimating associations are usually a good alternative to do a case-control study. Group allocation would in such cases be one of several independent variables. The interpretation of the outcome is not cause-effect but rather associations where the cause-effect needs to be clarified in proper randomized conmtrolled trials.

## Specific advice on choosing statistical method

Analyzing association between two variables (with no need for prediction)Scale of measure | Suitable statistical test |
---|---|

Nominal with two categories (dichotomous) | Contingency coefficient |

Phi coefficient | |

Craemer's Phi coefficient = Craemer's V coefficient | |

Relative risk (RR) | |

Odds ratio (OR) | |

Nominal with more than two categories | Craemer's Phi coefficient = Craemer's V coefficient |

Ordinal (has an order but not equidistant scale steps) | Spearmann’s rank correlation coefficient |

Kendall’s coefficient of concordance = Kendall’s tau | |

Somer’s D | |

Interval scale or Ratio scale | Pearson’s correlation coefficient |

Dependent variable | Observations | Suitable statistical test |
---|---|---|

Nominal with two categories (dichotomous) | independently chosen observations | Unconditional binary logistic regression |

Propensity score matching | ||

Independently chosen matched pairs | Conditional binary logistic regression | |

Nominal with more than two categories | independently chosen observations | Multinominal logistic regression (=multiclass logistic regression) |

Ordinal (has an order but not equidistant scale steps) | independently chosen observations | Ordered logistic regression (=ordinal regression). Also possible to introduce a cut-off and use unconditional binary logistic regression |

Interval or ratio scale | Independently chosen observations with only one independent variable | Simple linear regression (labelled analysis of covariance if the independent variable is dichotomous) |

Independently chosen observations with more than one independent variable | Multivariate linear regression (labelled analysis of covariance if the independent variable is dichotomous) | |

Propensity score matching | ||

Time to an event (this is a special case of Interval or ratio scale) | Independently chosen observations with more than one independent variable | Cox proportional hazards regression |

Important questions to clarify before choosing statistical method:

- Establish how many separate variables (factors) are used to allocate observations to a group.

-Zero factor design: No variables are used for group allocation. This means that a single group is compared either to a fix predefined value or a pre-post comparison is made.

-One factor design: It is considered a one factor design if one variable is used to allocate observations to separate groups. A common example is if two or more independent groups are compared. One factor design is the most common situation in group comparisons.

-Two factor design: If two factors (such as type of treatment and timing) are used for group allocation. If each factor had two categories we would get a two-factor design with four separate groups. It would still be a two factor design if each factor had three categories but now we would have nine groups.

-N-factor design: There are study designs using more than two factors / variables for group allocation. They are rare (and complicated). - If the design has at least two groups (are at least a one factor design) are groups matched or unmatched?
- What scales of measure is appropriate for the observations you have?
- If the interval or ratio scales are appropriate for some variables are these observations normally distributed or not?
- If any variables are measured with the nominal scale are the different possible labels only two (making it dichotomous) or are more than two labels possible?

Scale of measure | Suitable statistical test | Comment |
---|---|---|

Nominal with two categories (dichotomous) | Chi-square | Requires at least five observations in each cell |

Fisher's exact test | Robust with few requirements | |

Nominal with more than two categories | Chi-square | Requires at least five observations in each cell |

Ordinal (has an order but not equidistant scale steps) | Mann-Whitney's test = Wilcoxon two unpaired test = Rank sum test | Very common test |

Fisher's permutation test | ||

Cochran–Mantel–Haenszel (CMH) test | ||

Interval or ratio scale not fulfilling requirements for parametric testing (often due to skewed observations) | Mann-Whitney's test = Wilcoxon two unpaired test = Rank sum test | Very common test |

Fisher's permutation test | ||

Cochran–Mantel–Haenszel (CMH) test | ||

Z-test | ||

Interval or ratio scale fulfilling requirements for parametric testing (such as being normally distributed) | Student's t-test - two sample unpaired test | Very common test. Only for two independent groups. |

One way analysis of variance | Can be used if there are more than two independent groups in a one factor design. Will give the same result as t-test if there are only two groups. | |

Cohen's d | ||

Z-test | ||

Time to an event (this is a special case of Interval or ratio scale) | Log rank test = Mantel–Cox test = time-stratified Cochran–Mantel–Haenszel test | Can include more than two groups. Cox proportional hazards regression is the preferred method if there is a need to adjust for confounding variables. |

Kaplan-Meyer curves | This is a graphical representation |

Scale of measure | Suitable statistical test | Comment |
---|---|---|

Nominal with two categories (dichotomous) | McNemars test | |

Nominal with more than two categories | ||

Ordinal (has an order but not equidistant scale steps) | Signs test | |

(McNemars test) | Signs test is better but McNemars test will work and give a very similar result. | |

Interval or ratio scale not fulfilling requirements for parametric testing (often due to skewed observations) | Signs test | |

(McNemars test) | Signs test is better but McNemars test will work and give a very similar result. | |

Interval or ratio scale fulfilling requirements for parametric testing (such as being normally distributed) | Student's t-test - one sample unpaired test | Very common test. Only for two matched groups. |

One way ANOVA with repeated measures | When individuals are matched, measurements from matched individuals are treated like repeated measures. Can be used if there are more than two matched groups in a one factor design. Will give the same result as one sample t-test if there are only two groups. |

Scale of measure | Suitable statistical test | Comments |
---|---|---|

Nominal with two categories (dichotomous) | Cohen's kappa coefficient | |

Sensitivity and Specificity | Tells you about the health of the diagnostic test | |

Likelihood ratio | Tells you how much more information a test adds | |

Predictive value of test | Tells you about the health of the patient (if your test is about patients and their health) | |

Etiologic predictive value | Predictive value of test while adjusting for possible carriers ill from another agent than the test is looking for. Does not require a gold standard. | |

Nominal with more than two categories | Cohen's kappa coefficient | |

Ordinal (has an order but not equidistant scale steps) | Cohen's kappa coefficient | |

Weighted kappa coefficient | ||

Interval scale or Ratio scale | Limits of agreement | Often combined with Bland-Altman plot |

Bland-Altman plot = Difference plot = Tukey mean-difference plot | This is a graphical representation of comparing two tests | |

Lin's Concordance correlation coefficient | ||

Intra class correlation (=ICC) | (usually better to use one of the above methods) |

Ronny Gunnarsson. Choosing statistical analysis [in Science Network TV]. Available at: http://science-network.tv/choosing-statistical-analysis/. Accessed July 22, 2018.