Positive Predictive Value and Prevalence

By October 7, 2020Commentary

This is not an easy or intuitive concept to understand, I struggle with it,  but it is critical because it is the reason we must have an incredible number of false positives, certainly in Minnesota and probably in most states.  Let us take a PCR test with a typical supposed sensitivity, ability to detect people who actually have the disease, of 95% and specificity or ability to detect people without the disease, of 99%.  (Let’s ignore the fact that this is the lab-validated sensitivity and specificity and real world clinical use will be lower.  I am using numbers which appear in a number of studies of PCR tests.)  So you would think that means if I test 100 people who may have the disease, I am only going to erroneously tell 5 people that they were positive when they weren’t and only wrongly tell 1 person they didn’t have it when they did.

But you aren’t testing people you know do or don’t have the disease, you are testing a bunch of people who may or may not have the disease and the key factor in the accuracy of your test is the actual prevalence of the disease.  In Minnesota, we haven’t had a positivity rate over 5% for a long time.  What happens if you test a population with no prevalence?  Every positive is false.  How much does that change as prevalence increases?  Let’s explore that.

Currently Minnesota has had 106,651 cases of the disease.  The population is about 5,690,000.  So cumulative prevalence is 1.9%.  Very low.  But wait, don’t make that mistake, cumulative prevalence isn’t what matters.  Current prevalence is when you are testing today.  Minnesota currently has 10,045 patients in the unrecovered bucket. (I am going to ignore here that a number of these “cases” are themselves false positives.  Doing so makes the situation look better than it is.)   So current prevalence, as far as we know, is  .18%.  That is a very, very low prevalence.  The number of unrecovered patients hasn’t varied a lot for weeks.  So we have some new cases every day, but some people who are recovered every day.  If you took the people who have been infected out of the population, on the theory that they are unlikely to become infected again, you maybe have an ongoing prevalence of disease in the population of .2%, at most.

We are averaging over 20,000 tests a day.  If you test 20,000 people in a day, when the population prevalence is .2%, how many real positives should occur.  An astounding 40.  How many cases are we reporting a day–several hundred, often over 1000.  Now, let us cut the state some slack and acknowledge that this isn’t random testing–it is contact tracing and people who think they have symptoms, but it is also a lot of forced testing by employers and schools of people who probably aren’t sick.  Multiply the .2% five times for selection factors and you get 1%, or 200 cases.

Now you begin to see why we see athletes who test positive and on retesting are often negative–they were false positives.  This is why I have heard from so many readers about either themselves or people they know having the same experience.  I just had this happen last week when a co-worker got tested, was told he was positive, was retested, and that test was negative.  Now here is another interesting question, the state knows how many times a person has a positive test, quickly followed by a negative result.  Why don’t they give us that information?  Why don’t they use it to adjust their case estimates?  Imagine the consequences if you don’t get a retest to verify positivity.

There is a formula for accuracy of a diagnostic test depending on prevalence.  You can google and find a lot of explanations.  Here is one, from the National Institutes for Health.  (Accuracy Formulas)   The overall formula is the (sensitivity of the test times prevalence), plus (the specificity of the test times one minus prevalence).  So let’s figure out how accurate testing is in Minnesota under this formula.  Let’s use my cut the state some slack number of 1% current prevalence.  That would mean there are currently 56,000 Minnesotans with CV-19.  You can see I am being generous.  Overall accuracy looks like 99%, because there are so many negatives.

But how does the positive predictive value look?  That formula is (sensitivity times prevalence), divided by ((sensitivity times prevalence) plus (1 minus specificity times 1 minus prevalence)).  Okay, check my math, many of you are better than I am at this, but it is 49%.  Only half the time is the positive result right.  Half the time it is false.  Change the prevalence to 2%, you still have an alarming number of false positives.

Now, again, I ask, why when the state can figure this out as well as I can, aren’t they double testing every positive?  And why aren’t they explaining this to the population–oh by the way, our false positive rate is likely 50%.  Even if it were 20%, wouldn’t that be alarming?  Again, someone check my math, but I am pretty sure it is right.  And I know the state owes us an explanation of its view on this subject.

Update:   Thanks to Darin Oenning for reviewing my math and other comments.  Darin point out that even if you assume a 3% prevalence rate, the number of false positives is around 25%.  It seems to me that it is hard to avoid the fact that we have a lot of false positives in Minnesota.

Join the discussion 2 Comments

Leave a Reply