This page is under construction. Please also see EPV calculator.
Diagnostic tests such as blood-tests, microbiological diagnostic tests, diagnostic X-rays, questionnaires or orthopedic diagnostic maneuvers are often used to enhance the diagnosis of a patient. These tests are only practicable if they have been properly evaluated and their usefulness estimated. The usefulness of a test may be described in terms of sensitivity, specificity, likelihood ratios, predictive values, etc. However, if carriers of the etiologic agent which our test is designed to detect are present, then our test results may be misleading.
Let us consider a common example. Carriers of potentially pathogenic bacteria simultaneously ill from a viral infection, complicate the diagnostic procedure in respiratory tract infections. The present statistical methods available for the evaluation of common diagnostic tests either ignore the phenomenon of carriers or provide test characteristics that are difficult to apply in clinical decision making. The etiologic predictive value (EPV) is a new statistical method developed for determining the probability of symptoms truly being related to (perhaps caused by) a bacteriological finding, while taking carriers into consideration. To calculate EPV, one must know the number of positive and negative tests among patients and healthy controls as well as the sensitivity of the test. This enables calculating the positive and negative EPV with a 95% confidence interval. Below are sections explaining the motivation for constructing EPV. The idea of EPV may raise several objections. For example, is it possible to calculate predictive values without a gold standard? The following video introduce the problem and also describes the principles behind EPV:
Table of Contents
Respiratory tract infections
In the early 1990’s, I worked as a doctor at a pediatric clinic in the south – western part of Sweden. Respiratory tract infections with sore throat or cough were common complaints. As a young doctor, I tried to assimilate knowledge from more experienced colleagues. However, it was not clear to me when to treat an upper respiratory tract infection with antibiotics. There were several different diagnostic and therapeutic strategies among the doctors for this disorder. Some relied on their clinical judgment, others relied on tests, such as throat or nasopharyngeal cultures.
However, the daily challenge was to decide whether a respiratory tract infection was of viral or bacterial etiology. At the clinic, throat cultures, nasopharyngeal cultures and C-reactive protein (CRP) were tests used in the diagnostic procedure for a large number of patients with upper respiratory tract infections. How useful was the information obtained by these tests? I found the nasopharyngeal culture to be especially difficult to interpret because potentially pathogenic bacteria were found in tests from most of the patients. Should they be treated with antibiotics that could eradicate the bacterium found? Most colleagues recommended antibiotic treatment if the condition had not improved spontaneously by the time the results of the nasopharyngeal culture arrived.
The appropriateness of prescribing antibiotic treatment when the nasopharyngeal culture showed growth of potentially pathogenic bacteria was questionable. As one of the senior doctors mentioned, most child patients, as well as healthy children, harbor these bacteria in a nasopharyngeal culture. It was then obvious to me that I, and perhaps many of my colleagues, had not fully understood the consequences of carriers.
How useful are throat and nasopharyngeal cultures in deciding whether the symptomatic infection is of viral or bacterial origin? If one could obtain the answer to this, how should the answer be presented? At this time, another colleague at the clinic presented different statistical methods of calculating test characteristics. Although I had previously heard of these methods, they became far more relevant to me at this time. Predictive values of throat and nasopharyngeal cultures, taking symptomatic carriers into consideration, would be an aid in understanding the usefulness of these cultures. However, the literature did not provide this information, which lead to this project in the beginning of 1990.
Respiratory tract infections are very common. Approximately one-third of all visits to doctors in primary health care centers are due to upper respiratory tract infections . This is more common among children with up to 80% of consultations due to respiratory tract infections. As respiratory tract infections represent one of the main reasons for antibiotic therapy the diagnostic procedure for patients with this type of infection is of vital importance if the usage of antibiotics is to be diminished.
How can those few patients with a respiratory tract infection that need antibiotic therapy be identified? A prerequisite for developing and redefining guidelines in this subject is proper information on how to use available tests to confirm or exclude the presence of potentially pathogenic bacteria.
Evaluation of dichotomous diagnostic tests
A test to diagnose a disease caused by a microbiologic agent usually has a dichotomous outcome: presence or no presence of the etiologic agent. A fundamental prerequisite for its usefulness is that a test designed to detect a bacterium can detect this bacterium better than if the doctor made a guess based on a preliminary clinical observation. In some situations the doctor’s guess of viral or bacterial etiology is not much more accurate than setting the diagnosis by flipping a coin. When can it be expected that the test provides more information than a random choice? In order to answer this question the test may be described by means of sensitivity and specificity, or by various indices such as the Youden index , the efficiency2, the index of validity or kappa3. The Youden index is dependent on sensitivity and specificity while indices of validity and efficiency are also dependent on the prevalence of disease. Thus they are more informative than the Youden index. The disadvantage of all the indices is that they do not differentiate between the outcome growth of bacteria (T+) or no growth of bacteria (T-). In some tests T- may be highly relevant but T+ of little value. An example of this is the outcome of throat cultures in children (as will be shown later in this dissertation). However, likelihood ratios or predictive values consider T+ and T- separately.
Likelihood ratios depend on sensitivity and specificity alone. Since predictive values also depend on the prevalence of disease they yield more information concerning the evaluation of bacterial cultures than likelihood ratios. The positive likelihood ratios provide information about how much more the odds, for the phenomena the tests is design to detect, increases in case of a positive test. Likelihood ratios cannot be used in clinical practice unless you know the pre-test odds or pre-test probability. The positive predictive value (PPV) provides you with the probability of the phenomenon the test is design to detect.
Although predictive values seem to be the ideal measure of a test it does not take into consideration the presence of symptomatic carriers (individuals harboring the agent our test is supposed to detect and at the same time ill by something else, usually a virus). Methods that may consider asymptomatic carriers are relative risk and hypothesis testing.
Sensitivity and specificity
In order to evaluate a test, sensitivity and specificity are most often used . They are calculated by comparing the observed test outcome with the outcome of the gold standard in a sample of n subjects:
The sensitivity is mathematically independent of the disease prevalence. However, if the test is a microbiologic diagnostic test, in situations with a low disease prevalence, every test will probably be examined less carefully compared to a situation with a higher disease prevalence. Thus, a decrease in the disease prevalence might reduce the sensitivity of the test. A well-known effect on the sensitivity is seen by altering the cut off limit for considering the test as positive, an issue of great interest for manufacturers of rapid tests for detection of GABHS. These phenomena can be studied by constructing Receiver Operating Characteristic curves (ROC-curves). As long as the disease prevalence is below 50%, the influence of the disease prevalence on the sensitivity is small .
It could be appropriate to say that the sensitivity and the specificity inform you about the health status of your test rather than the health status of your patient . Therefore, there is also a need for another method to evaluate throat and nasopharyngeal culture.
As a measure of a tests efficiency Youden in 1950 suggested an index (J) :
This index does not take into account the prevalence of disease and therefore it contains less information than index of validity or efficiency. The Youden index is rarely used.
Index of validity and efficiency
One way of characterizing a diagnostic test is to calculate the proportion of correctly classified individuals as an index of validity (Iv).
If the sensitivity and the specificity are equal, then Iv is independent of the disease prevalence . In all other situations, Iv depends on both the sensitivity, the specificity and the prevalence of disease . The efficiency is the same as Iv multiplied by 100 and expressed in per cent 2.
The choice of statistical method to evaluate a diagnostic test
Gold standard and carriers
A gold standard is necessary for calculating sensitivity, specificity, likelihood ratios and predictive values. It is either the accepted reference method or the best known predictor of the truth, hopefully both. In a situation where presence of a marker, like Group A beta-hemolytic streptococci (GABHS), does not necessarily mean that the individual has a specified disease, there is a difference between predicting the presence of a marker and predicting the presence of a disease [1, 2]. Is the gold standard showing the presence of a marker or the presence of a disease? If the test indicates presence of a marker, for example GABHS, that may cause diseases as well as being transitional commensals, then it could be confusing as to what is actually being predicted. Thus, it is obvious that the question of a proper gold standard ought to be discussed in every evaluation of a test [2, 3].
Predictive value of a direct test to detect GABHS has been estimated by using a conventional throat culture as the gold standard [4, 5-8]. A conventional throat culture has also been used as the gold standard to evaluate an office culture  or another conventional throat culture [10-12]. These predictive values do not relate to the prediction of streptococcal throat infection caused by GABHS but rather to the presence of GABHS in the throat . The accepted strategy of not treating symptomatic carriers of GABHS sick from other causes, such as a virus, with antibiotics [13-15] creates an obvious need for a distinction between predicting a marker on the one hand and a disease on the other.
This problem has been in focus for years, especially in patients with a sore throat caused by GABHS. One attempt to solve the problem was the use of a significant rise in streptococcal antibody titers as the gold standard to predict the presence of a sore throat caused by GABHS as opposed to the presence of GABHS in the throat. This gold standard has been used to evaluate rapid tests for the detection of GABHS [16, 17] and to evaluate conventional throat cultures . The crucial question in every test evaluation is how well the gold standard predicts the truth [2, 18-20]. Streptococcal antibody titers as the gold standard is questionable since several studies has shown them having great difficulties in predicting true streptococcal disease [13, 21].
In the study by Gerberet al  all patients with a sore throat received antibiotics and a throat culture was done. Streptococcal serology for antistreptolysin (ASO)and antideoxyribonuclease B (ADB)was performed in those patients that at the first follow up after 18-24 hours had growth of GABHS in the throat culture. A significant rise in antibody titers of two or more dilutions (>0.2 log rise) between the first blood sample and convalescent sera four weeks later were considered to be a significant rise in streptococcal antibody titers. Thus, all patients belonged to one of three possible groups. Those with a negative throat culture (group one), those with growth of GABHS and a rise in streptococcal antibody titers (group two), and finally, those with growth of GABHS but no rise in streptococcal antibody titers (group three). The majority (80%) of patients in group one still had a sore throat at the follow up after 18-24 hours and only 32% experienced an overall improvement. In group two and three, only a few had throat pain at the follow up (8% and 9%) and most patients felt an overall improvement in their disease (92% and 91%). Both groups two and three experienced a dramatic improvement with no differences between the groups. This finding contradicts the theory that streptococcal antibody titers can distinguish symptomatic carriers with a viral disease from patients actually ill from GABHS. In fact there is no acceptable gold standard predicting throat infection caused by GABHS [1, 8].
The situation becomes more difficult if the doctor wants to have predictive values for a nasopharyngeal culture predicting the presence of a disease with bacterial etiology. There are several bacterial species and symptoms to consider compared to the situation with a sore throat caused by GABHS. However, some attempts have been made to provide predictive values for nasopharyngeal culture to predict bacterial etiology for otitis media . If the gold standard is the presence of bacteria in a middle ear aspirate  and if those are considered to be sterile under normal conditions, then the predictive value may actually predict presence of the disease, acute purulent otitis media, with bacterial etiology. For whooping cough, there might be other ways to find a gold standard predicting the presence of cough caused by B. pertussis . Since asymptomatic carriers of B. pertussis are uncommon, one may interpret the predictive values as predicting cough caused by B. pertussis. Thus, nasopharyngeal culture is usually used in suspected cases of B. pertussis or in the event of therapeutic failure in acute purulent otitis media.
Using nasopharyngeal cultures to predict bacterial etiology of long-standing cough caused by S. pneumoniae, H. influenzae or M. catarrhalis will result in the same problem as with throat infection caused by GABHS. There is no appropriate gold standard predicting the presence of the particular disease. Predictive values will predict presence of bacteria, not presence of disease with bacterial etiology.
In order to predict the presence of the disease “a sore throat caused by GABHS” or, “long-standing cough caused by potentially pathogenic bacteria”, and not just presence of bacteria, the estimation of the truth has to be made some other way. Finding a gold standard predicting disease caused by the found bacterium is an important challenge .
One possible way to solve the problem is the use of a construct validity where one or more logical consequences of the specified disease are selected and defined as the gold standard . In this way the methacoline challenge test was constructed where the response of exposure to methacoline is considered to be a gold standard for asthma . Another way is to find a mathematical model that can produce predictive values predicting disease without using a gold standard. This website aims to present such a mathematical model.
Ronny Gunnarsson. Introduction to EPV [in Science Network TV]. Available at: https://science-network.tv/introduction-to-epv/. Accessed March 21, 2019.