« School or sleep? | Main | Family . . . who needs 'em? »

Think positive

17 Oct 2007 01:05 am

Derek Lowe dives into a problem that is far too poorly understood by most of the public: the problem of false positives.

The news of a possible diagnostic test for Alzheimer’s disease is very interesting, although there’s always room to wonder about the utility of a diagnosis of a disease for which there is little effective therapy. The sample size for this study is smaller than I’d like to see, but the protein markers that they’re finding seem pretty plausible, and I’m sure that many of them will turn out to have some association with the disease.

But let’s run some numbers. The test was 91% accurate when run on stored blood samples of people who were later checked for development of Alzheimer’s, which compared to the existing techniques is pretty good. Is it good enough for a diagnostic test, though? We’ll concentrate on the younger elderly, who would be most in the market for this test.The NIH estimates that about 5% of people from 65 to 74 have AD. According to the Census Bureau (pdf), we had 17.3 million people between those ages in 2000, and that’s expected to grow to almost 38 million in 2030. Let’s call it 20 million as a nice round number.

What if all 20 million had been tested with this new method? We’ll break that down into the two groups – the 1 million who are really going to get the disease and the 19 million who aren’t. When that latter group gets their results back, 17,290,000 people are going to be told, correctly, that they don’t seem to be on track to get Alzheimer’s. Unfortunately, because of that 91% accuracy rate, 1,710,000 people are going to be told, incorrectly, that they are. You can guess what this will do for their peace of mind. Note, also, that almost twice as many people have just been wrongly told that they’re getting Alzheimer’s than the total number of people who really will.

People look at tests with small error rates--a false positive rate of, say, .5%, and conclude that if they test positive, that means it's overwhelmingly likely that they have the disease. But this is true only for conditions that are relatively frequent. Take a test for a disease that has a false positive rate of 5%, and a disease prevalence of 1 in 1000--lupus, say. If you test positive in a random assay, what are the odds that you actually have the disease?

Most people--even, apparently, a shocking number of doctors--would say that the odds are 95%. But this is all wrong. If you test 1,000 people for lupus, 1 of them will correctly test positive for lupus--and 50 of them will falsely test positive. The chances are only 1 in 51, less than 2%, that you actually have the disease.

These are in fact the actual numbers for anti-nuclear antibody tests and systemic lupus, at least as relayed to me by my immunologist after I got a borderline positive result on a screen. These suggest that no one should ever do a random ANA; the information it gives is garbage, particularly since they don't treat lupus until you manifest symptoms. Yet lots of doctors, including mine, do.

Comments (37)

It's NEVER lupus.

That's bizarre. Don't they know how to do an elementary tree diagram of conditional probabalities. That is, literally freshman or sophomore math.

No need to test for lupus. It's NEVER lupus.

Megan writes: "Yet lots of doctors, including mine, do."

Just say no.

Besides, what are you, 28 or so? Why would you even have a doctor?

I will never understand the addiction so many people have for medical attention.

(Just turned 46 and I haven't seen a doctor since I was 21, unless giving blood counts.)

Your lupus example works if people are being tested randomly. But how often do people submit to random tests for disease? Re commenter 2, I know several people with lupus. Commenters 1 & 3 seem to be showing off, and commenter 3 is 1) obviously a guy and 2) apparently does not know any chicks.

All tests are designed to have higher false positives than false negatives. That's because the danger of allowing lupus to go undetected (and I bet a lot of people who develop symptoms do not think "lupus," even if a majority do) is far more concerning than to freak a healthy person out.

That being said, if you have no symptoms and are aware of the symptoms of lupus, and you can get a test turned around fairly quickly, its probably best to forgo the test.

BTW, the most obvious example of a test where you want no false negatives so youll take a fair number of false positives is HIV.

Bruce Schneier has discussed this issue in broad terms pretty often, on his blog. It comes up in stuff like terrorist screening, where terrorist are so rare that even a really low false-positive rate can make your screening unworkable.

This reminds me of HIV testing. In the 1980's, when AIDS first attracted national attention, some states proposed HIV testing as a requirement for issuing a marriage licence. This may have been a outgrowth of the politically correct (but factually false) idea that people of all categories were equally likely to have AIDS. In fact people who apply for marriage licences constitute a very low-risk category, where the false positives would probably outnumber the true positives by a huge margin. Can you picture a couple planning their wedding for months and then when they get the blood test for the marriage license, getting hit with the diagnosis of AIDS?
That would have non-trivial results.

I will never understand the addiction so many people have for medical attention.

You may recall that this prince of compassion was one of the people snarling the loudest about the effrontery of conservatives in questioning whether the Frost family needed to be on welfare.

I was reading about that new test for Alzheimer's, and it seems to me ... oh, I forget.

The well-educated layman should understand ROC curves and their import. Maybe you can get your editor to commission a piece on them as a public service.

It also depends on the kind of false positive. I'm sure there are technical terms, but there should be two kinds - one which will be consistent for a given person, and one which is random. That is, if you follow up on the test, do you continue to get a false positive? If not, then there's a much lower personal cost to receiving a false positive, since there's a minuscule chance that you will get very far into treatment before the second test reveals that you're ok.

Paul Zrimsek posts! :
"You may recall that this prince of compassion was one of the people snarling the loudest about the effrontery of conservatives in questioning whether the Frost family needed to be on welfare."

I'm glad my humorous comment drew you out of the woodwork, chuckles. Welcome!

Obviously I have no problem with people who need health care getting it. I do think, though, that we're a nation of hypochondriacs, and that far too many of us are running off to doctors with every ache and pain. Capisce?

And the conservatives who blasted the Frost family suck, each and every one.

This is an interesting subject. The statistical aspect of it makes it apparent why many manufacturers strive for a six sigma standard of excellence. Though a measure of error may be small (5%), when you look at it relative to the size of the sample of this magnitude it can produce impressive numbers. On a lighter note, I find it very reassuring to know that the number of hospitals who use a six sigma standard of excellence is growing.

Oh, and “No need to test for lupus. It's NEVER lupus.” … House quote, no?

It depends on the prevalence of the disease in the population that the patient belongs to (ie, female, over 50 etc). I'm a medical student now, we're taught ROCs and how to determine if doing a test is worthwhile. We were told "What you don't want is a false positive, cause now you need to do something about it (or the patient will see you in court)."

http://alejandrogonzalez.typepad.com/my_weblog/2007/09/dont-just-order.html

Could you post the math for us who don't know how to figure out the answer, that "of them will correctly test positive for lupus--and 50 of them will falsely test positive. The chances are only 1 in 51, less than 2%, that you actually have the disease." Thanks, that would help.

Megan is an asthmatic.
ML&J is an asshole.

I meant to write

"Could you post the math for us who don't know how to figure out the answer, that "1 of them will correctly test positive for lupus--and 50 of them will falsely test positive. The chances are only 1 in 51, less than 2%, that you actually have the disease." Thanks, that would help."

Is this really easy conditional probability? How do I do the " elementary tree diagram of conditional probabalities" that ScentOfViolets writes about above. I thought this required a Bayesian equation?

How do you know that out of 1000, 1 will correctly test positive for lupus, when you know the false positive rate is 5%--I would appreciate it as I'm interested in the math.

Here's a section from False Positives in medical diagnoses from Wikipedia, and they're using a Bayesian test.

http://en.wikipedia.org/wiki/Bayesian_inference#False_positives_in_a_medical_test

"False positives in a medical test

False positives result when a test falsely or incorrectly reports a positive result. For example, a medical test for a disease may return a positive result indicating that patient has a disease even if the patient does not have the disease. We can use Bayes' theorem to determine the probability that a positive result is in fact a false positive. We find that if a disease is rare, then the majority of positive results may be false positives, even if the test is accurate.

Suppose that a test for a disease generates the following results:

* If a tested patient has the disease, the test returns a positive result 99% of the time, or with probability 0.99
* If a tested patient does not have the disease, the test returns a negative result 95% of the time, or with probability 0.95.

Suppose also that only 0.1% of the population has that disease, so that a randomly selected patient has a 0.001 prior probability of having the disease.

We can use Bayes' theorem to calculate the probability that a positive test result is a false positive."
Is this what you are using for your results?

Fred, it depends on the likelihood of the disease. If there is a 0% chance people in your demographic can have the disease and the false positive rate is 5% and you test positive you still have a 0% chance of getting it. So you take 1 divided by your false positive rate 5% times the overall prevalance of the disease (1 in 1000 in Megan's case) +1

or

1
_____________ = .019 0r 1.9%.
(5%*1000)+1

At least this is how I understood it

According to Wikipedia,
http://en.wikipedia.org/wiki/Bayesian_inference#False_positives_in_a_medical_test

the probability of a false positive = 1 minus the probability that the patient actually has the disease given the positive test result. So if you're saying the false positive for lupus is 5%, then by algebra, the probability that the patient actually has the disease given the positive test result is 1 minus the false positive.

In your case, 1- false positive of 5% = 95% probability that the patient actually has the disease given a positive test result.

So, from your information, isn't it true, by, this Bayesian analysis from Wikipedia, that if you KNOW the false positive rate is 5%, given a positive test result, you are 95% likely to have lupus: 1-false positive=probability that the patient actually has the disease? That seems to flow from the definition of what false positive means. So aren't the doctors you mention right?

From wikipedia:
"Let A represent the condition in which the patient has the disease, and B represent the evidence of a positive test result. Then, probability that the patient actually has the disease given the positive test result is

\begin{matrix} P(A | B) &=& \frac{P(B | A) P(A)}{P(B | A)P(A) + P(B |\mathrm{not}\,A)P(\mathrm{not}\,A)} \\ \\ P(A|B) &= &\frac{0.99\times 0.001}{0.99 \times 0.001 + 0.05\times 0.999} \\ ~\\ &\approx &0.019 .\end{matrix}

and hence the probability that a positive result is a false positive is about (1 – 0.019) = 0.981."

If I'm right about this I'll add it to my recent post about misunderstanding/misusing statistics...

http://trickledown.wordpress.com/2007/10/16/the-trouble-with-statistics-are-we-worse-off-financially-than-previous-generations-or-not/

More proof taht ML&J has no clue about anything that he ever talks about. As has been noted above, Megan is both a female and an asthmatic. She has mentioned that she has other health issues where ongoing monitoring is rather important.

For many, if not the vast majority of, single women, an ongoing relationship with a doctor is rather key for ongoing prescriptions and several recurrent tests. Further, all women of childbearing age need a doctor so that they have a good relationship with network of medical specialists if/when they get pregnant. Your first or second trimester is no time to be shopping for a doctor, pouring out your medical history to n strangers until you find a relationship that works with someone who has hospital privileges at someplace that you trust and will work with your situation. This is why nearly all of my male friends in their 20s and 30s don't have a doctor barring a personal friendship or ongoig condition, while I do not know any woman who doesn't have at least one doctor - they usually have several, maintaining contact with some in other cities to fall back on in case of moves or if they can't reach a doc for some reason.

Young men can easily get by with a walk in clinic and the emerg department because we're pretty much guaranteed to only need to types of problems: a minor acute or chronic condition treatable by a walk in clinic or a major acute condition that needs emerg. If a major acute condition results in a medium to long-term chronic condition, we'll pick up a doctor (or set) that specialises in that area as needed, if we don't just try and walk it off.

One of my math professors said that when people, even academics, who are non-mathematicians/statisticians (or not in inherently mathematical or statistical fields such as physics and economics respectively) attempt to use mathematical and statistical methods they did not properly learn and do not properly understand, they will almost inevitably end up misusing them. While he said this in context of the malthusians of the 60's and 70's, (most of whom were political scientists and sociologists and the like) I have found it to be true in many other situations.

Fred: you and Wikipedia have made a simple mistake. The false positive figure is based on the total population that takes the test, whereas the alse negative is based on the population who have the condition but showed as negative.

The best way to model this is two bell curves. The x-axis is the value of the testable variable and the y-axis is the %ge of the 2 reference populations with that value.

In designing your test, there are two aims: to get as wide a separation between the two curves by appropriate selection of testable variable, and to choose a threshold value for that variable that includes as many of the positive population and as little as the negative population as is possible while still meeting the overall aim of the test.

The nature of typical distributions results in there always being some portion of each curve that won't satisfy the threshold. Blackholes radiating particles, the proportion being inversely related to size thanks to the Heisenberg principle letting them break the lightspped barrier, is an excellent example of this.

This creates massive problems when the populations in the two curves are very different, but you want to include as much of one curve as possible bu need to use a variable where the two curves are relatively close and similarly shaped. Doing tests on large populations of generally similar individuals, such as medical tests do, means that your two curves will be very similar in shape, though the populations can differ markedly.

The terrorist vs business/holiday traveler test is one that presents significant problems in design. The costs of a false negative are rather catastrophic - 9/11 was on the medium to low-end in terms of costs of what could have resulted from their plan. Meanwhile the costs of fale positives and the testing protocol are borne over a long period of time by a large population, with false positives facing several stages of escalating tests as part of an acknowledgement of the likelihoood of a false positive. In a year there are billions of tests of air travelers, with 10-100 possible positives.

The costs of failure are so catastrophic that a successful identification leads to an increase in the strictness of the testing protocol, while attacks that should be viewed as successfully prevented (such as the summer's failed London bombings and the attack on Prestwick airport) are seen as failures of the overall system. Airport security is ridiculous because there is a 7 or 8 order of magntiude difference in the population of terrorists and innocent traveler. 0.01% false positive would give you 100,000 falsely IDed terrorists. Medical screening isn't quite that challenging, but it does explain why mass screening tends to be a bad idea, as well as a waste of money that creates a larger waste of money.

"Fred: you and Wikipedia have made a simple mistake. The false positive figure is based on the total population that takes the test, whereas the false negative is based on the population who have the condition but showed as negative."

Thanks for commenting Hey--this could help, as I said, I'm not too familiar with the math.

However, could you please cite to some website where I can get a good description of the difference you're talking about between the false positive and false negative stuff? And could you tell me why the difference you mention would apply to Megan's problem as stated here? We're only discussing a supposed false positive rate of 5%, and not discussing and false negative rate. I don't see where false negatives fit in the problem as cited here. According to the Wikipedia section I cited above, the probability that the patient actually has a disease given a positive test result is 1-the false positive rate of the test. Megan has given the false positive rate as 5%, so according to this equation, the probability that the patient actually the lupus given the positive test result, and given a false positive rate of 5% for the test, is 95%.

Now, given this information, is this method from Wikipedia wrong? I don't see where information on false negatives fit in, Megan didn't give any info on false negatives. It looks like you can use the Bayesian equation with just information on "true positives," "true negatives," and the rate of the disease in the population to calculate the false positive rate--why do you mention needing false negatives at all? Thanks for any help/information.

These suggest that no one should ever do a random ANA; the information it gives is garbage, particularly since they don't treat lupus until you manifest symptoms. Yet lots of doctors, including mine, do.

I think the operative word here is "random". I am unaware (and I am a pathologist, i.e., I supervise clinical laboratories) of great numbers of physicians "randomly" ordering ANA's. In your case, you may have had symptoms suggesting autoimmune disease that you have not divulged, or your physician may have been suspicious for some reason. I very much doubt it was done randomly, or for "screening".

But this talk about statistics misses something essential about the context in which you raised it: physicians do not order tests based on the best statistical models devised by the greatest minds in their fields, or because they get off on over-ordering tests, or they don't understand sensitivity and specificity; they often order tests to cover their asses from the depredations of trial lawyers. We routinely over-test, not because we don't understand the costs to the system involved, and not (for you cynics out there) because we stand to make money from it, but because the standard that we are held to is one that itself does not rely on logical statistical analysis; the standard is perfection.

Next time you show up in court to defend the physician who didn't order the ANA on the fair-skinned female with vague complaints who later is diagnosed with lupus, but who sues for delayed diagnosis because it's your fault her kidneys aren't functioning properly - and you convince a jury that not ordering the ANA was perfectly logical statistically - is the time I will accept your approach to this issue as being a pragmatic one. In the meantime, I will continue to try to talk female relatives out of having their CA-125 checked (I usually don't win that one either).

Okay,
maybe it looks like the Wikipedia entry oversimplifies!

Here's a post which seems in line with what Megan was saying (I wish she would have shown the math/given a citation in the first place)

An Intuitive (and Short) Explanation of Bayes’ Theorem

http://betterexplained.com/articles/an-intuitive-and-short-explanation-of-bayes-theorem/

Anatomy of a Test

The article describes a cancer testing scenario:

* 1% of women have breast cancer (and therefore 99% do not).
* 80% of mammograms detect breast cancer when it is there (and therefore 20% miss it).
* 9.6% of mammograms detect breast cancer when it’s not there (and therefore 90.4% correctly return a negative result).

Put in a table, the probabilities look like this:

bayes_table.png

How do we read it?

* 1% of people have cancer
* If you already have cancer, you are in the first column. There’s an 80% chance you will test positive. There’s a 20% chance you will test negative.
* If you don’t have cancer, you are in the second column. There’s a 9.6% chance you will test positive, and a 90.4% chance you will test negative.

'Now suppose you get a positive test result. What are the chances you have cancer? 80%? 99%? 1%?

Here’s how I think about it:

* Ok, we got a positive result. It means we’re somewhere in the top row of our table. Let’s not assume anything — it could be a true positive or a false positive.
* The chances of a true positive = chance you have cancer * chance test caught it = 1% * 80% = .008
* The chances of a false positive = chance you don’t have cancer * chance test caught it anyway = 99% * 9.6% = 0.09504

The table looks like this:

bayes_table_computed.png

And what was the question again? Oh yes: what’s the chance we really have cancer if we get a positive result. The chance of an event is the number of ways it could happen given all possible outcomes:

Probability = desired event / all possibilities

The chance of getting a real, positive result is .008. The chance of getting any type of positive result is the chance of a true positive plus the chance of a false positive (.008 + 0.09504 = .10304).

So, our chance of cancer is .008/.10304 = 0.0776, or about 7.8%.

Interesting — a positive mammogram only means you have a 7.8% chance of cancer, rather than 80% (the supposed accuracy of the test). It might seem strange at first but it makes sense: the test gives a false positive 10% of the time, so there will be a ton of false positives in any given population. There will be so many false positives, in fact, that most of the positive test results will be wrong.

If you take 100 people, only 1 person will have cancer. Another 10 will not have cancer but will get a false positive result. Getting a positive result means you only have a roughly 1/11 chance of being the person who really has cancer (7.8% to be exact)."

The tree diagram is really quite simple for this sort of analysis. Let's put a few figures into this. suppose one person in a thousand has shingles (not realistic, but I need some quick numbers). Then in a population of a million, 999,000 will not have shingles, while 1000 will. So draw the first part of the tree with two lines coming out of a point; the top line will have 1,000 associated with it, for the actual condition, the bottom line will have 999,000 associated with it, for actually being free of the condition. Now, at those points at the end of the lines(called nodes), draw two more lines coming out, designating the test for the disease, in each case, the top line will be associated with a positive indicator from the test, the bottom line a negative indicator. So going from the top, we have 95% of 1,000, or 950. the next down, indicating having the disease but testing negative is 5% of 1,000 or 50(note that these sum to 1,000, this is always the case at whatever node of the tree you are at). going down to the top line of the next node, we the number of people who test positive even though they do not have the disease. This is 5% of 999,000, or 49,950. finally, we have as the bottom point those who are shingles free and do not test positive, this is 95% of 999,000 or 949,050 (again, not that these sum up to the number coming into the node, 999,00). So now we can ask questions of a conditional nature, for example, given that you test positive, what is the probability that you actually have the disease? That's just the number of those who have the disease and tested postive over all the postives, or: 950/(950+49,950)=950/5900=0.16=16%.

I hope this helps; usually I have a blackboard to draw the diagram. And as so often the case in math, one picture is worth a thousand words.

Oh, to show how nice this is for organizing small amounts of data types, lets figure out the probability of a false negative. That's just the number of people who have the disease but test negative over all the negatives: 50/(50+949,050)=50/949,100=0.000053=0.00053%.

Another example of how to do this calculation (with chart):

http://alejandrogonzalez.typepad.com/my_weblog/2007/09/how-to-calculat.html

1,000 patients. The prevalence of a disease (percentage of people who have it) is 10%. The sensitivity (ability to detect true positives) of the test is 80% and the specificity (ability to detect true negatives) is 40%.

This test would return 80 true positives and 540 false positives even though it's "sensitivity" is 80%.

I assume they taught me how to do this @ Tufts so that I'd use it when I'm out diagnosing.

Can I echo the call for a public service essay describing how ROC curves work? I've had zero luck explaining them to two different well educated audiences, and they're such a valuable tool for understanding any situation involving uncertainty.

And in regards to the commenters defending certain specificity/sensitivity trade-offs: you're absolutely right that low false negative risk curves are desirable in medical tests. But we need to make sure of three things. First, that doctors administering tests them understand the statistical situation as well as Alejandro does. Secondly, that that understanding is effectively communicated to patients who test positive (as I think he at least can do). And finally that any legislator or bureaucrat who is tempted to mandate such tests (whether for security or health programs) understands them.

Paul Zrimsek writes: "Megan is an asthmatic.
ML&J is an asshole."

Then I can see why she needs a doctor. Thanks for the info.

By the way, is "sek" Croatian for "job"?

Hey says: "For many, if not the vast majority of, single women, an ongoing relationship with a doctor is rather key for ongoing prescriptions and several recurrent tests. Further, all women of childbearing age need a doctor so that they have a good relationship with network of medical specialists if/when they get pregnant. Your first or second trimester is no time to be shopping for a doctor, pouring out your medical history to n strangers until you find a relationship that works with someone who has hospital privileges at someplace that you trust and will work with your situation. This is why nearly all of my male friends in their 20s and 30s don't have a doctor barring a personal friendship or ongoig condition, while I do not know any woman who doesn't have at least one doctor - they usually have several, maintaining contact with some in other cities to fall back on in case of moves or if they can't reach a doc for some reason."

I don't think I've ever seen a more perfect illustration of the hysterical hypochondriac way of thinking.

So I guess all women need a Verizon-network-gang-of-doctors following them around. That must be where that commercial campaign idea came from.

I especially loved the "maintaining contact with some in other cities" bit.

You should work for The Onion.

ScentOfViolets, I think the probability of a false positive with these tests is usually different than that of a false negative, rather than identical. So you can't just plug in 5% on both sides. Some tests will give you a fair number of false positives but almost never a false negative.

No link to overcomingbias?

No Yudkowsky plug?

The Bayes Council is Very Disappointed.

Certainly. But then you'd just use a different tree. I was somewhat surprised that no one seems to have heard of the tree method for assessing conditional probabilities, it's a very versatile, very elementary method of organizing data to answer questions like these. The only problem is that as the conditionals multiply, so does the branching of the tree . . . potentially exponentially.

Maybe people are forgetting, when an individual tests positive for ANA you are right, that does not immediately mean Lupus, but there are 80 different diseases which are all very serious and interconnected to autoimmune disorders. A disease that can affect anyone of any age whether you are 10, 28, 46... The chances of a healthy individual scoring a positive result is as frequent as an unhealthy individual who scores a negative. For example, People who suffer from Lupus can still score a negative ANA. The results of a positive cannot be taken lightly, but not to seriously either. Your Diagnoses depends on your current symptoms and recorded health history.