Written by Ronny Gunnarsson and first published on May 19, 2015.
Last revised on June 13, 2020.

You must cite this article if you use its information in other circumstances. An example of citing this article is:
Ronny Gunnarsson. Agreements [in Science Network TV]. Available at: https://science-network.tv/agreements/. Accessed July 6, 2025.

Suggested pre-reading	What this web page adds
Introduction to statistics Inferential statistics	This web-page describes what testing agreement is and give examples of such statistical tests. Reading this will give you an overview of what analysis of agreement is.

Agreement means to investigate to what extent different measurements trying to estimate the same phenomenon agrees with each other. Typical situations are:

I s your gold standard too good to be true? — Is your gold standard too good to be true?

To compare two different measurements of the same phenomenon as when you evaluate a diagnostic test against a gold standard.
Estimate inter rater agreement (if different users come to the same result when estimating the same phenomenon).
Estimate test-retest agreement (if the same user come to the same estimate if their testing of the same phenomenon is repeated)

The statistical approach most suitable depends on what level of measurement (or scale of measurement) is most appropriate for the investigated variable:

Agreement between variables measured with the nominal scale
- The outcome is dichotomous (only two possible outcomes)
  - Kappa coefficient
  - Sensitivity / Specificity
  - Likelihood ratio
  - Predictive value of tests
  - Etiologic predictive value (when there is no gold standard)
- The outcome can have more than two possible outcomes
  - Kappa coefficient
Agreement between variables measured with the ordinal scale
- Kappa coefficient
- Weighted Kappa coefficient
Agreement between variables measured with an interval scale or a ratio scale
- Limits of agreement and/or Bland Altman plot
- Intra Class Correlation (ICC)

Always require a 95% confidence interval for estimates of sensitivity, specificity, likelihood ratios and predictive values! Point estimates without a confidence interval are useless.

Gold standard

(This section is still under construction. Sorry for the inconvenience.)

Estimating the clinical value of a test

Sensitivity and specificity informs us about the health of the diagnostic test being evaluated. This is great if you are a manufacturer of a diagnostic test but of limited value if you are a doctor. Likelihood ratios informs how much more information a test adds and predictive values informs us about the health of our patient (provides the probability that the individual has what we are looking for). The tables below aim to show the relation between likelihood ratio and predictive values.

Positive predictive value of test (PPV)	Positive likelihood ratio (PLR)	Interpretation
>60%	>1.5	The test supplies useful information.
>60%	<1.5	Prior to testing it may be assumed that the patient probably has the disease. The test only increases knowledge marginally.
<60%	>1.5	The test only provides information of limited clinical value.
<60%	<1.5	The test is not useful in this situation

Negative predictive value of test (NPV)	Negative likelihood ratio (NLR)	Interpretation
>90%	>0.67	Prior to testing it may be assumed that the patient probably doesn’t have the disease. The test only increases knowledge marginally.
>90%	<0.67	The test supplies useful information.
<90%	>0.67	The test is not useful in this situation.
<90%	<0.67	The test only provides information of limited clinical value.

(The limits of 60%, 90%, 1.5 and 0.67 in the table above are arbitrarily chosen to enhance understanding.)

Enter what you want to find and click OK Search

Agreements

Gold standard

Estimating the clinical value of a test

Useful links