Least Uninteresting Number: Bayes Theorem != Bayesianism

Words are slippery. They have many meanings. Change an ending and you change everything. Not exactly everything, but enough to confuse everybody.

Bayes theorem is not the same as Bayesianism.

Bayes theorem is an elementary mathematical truth of elementary probability.

Bayesianism is a big trend in statistics for creating and interpreting new statistical tests.

So it is fairly obvious now that they are different. They are certainly related, but they are still different things altogether.

Some details are in order.

Thomas Bayes was a man who the famous Bayes Theorem (of elementary probability) was named after.

Also, the direction of Bayesian statistics was similarly named after him. Because of how the theorem thinks of probabilities.

Sure, all three, the man, the theorem, and the statistics philosophy, are related. But not as closely as you would imagine.

First, Bayes was an eighteenth Presbyterian minister in Kent, England. I'm sure he was quite important to the people around him, but the theorem named after him, was never published by Bayes himself. The theorem was stated de facto by someone else in passing, and also in passing mentioned that Bayes had discovered it. This someone else was Richard Price who was also probably important to the people around him but never had anything like a theorem named after him.

Now to the theorem.

Symbolically, Bayes theorem is, at its simplest,

$$
P(A|B) = \frac{P(B|A) P(A) }{P(B)}
$$

Simple enough, if you know elementary probability. Translated to English this means that the probability of an event $A$ having occurred, given that you know $B$ has occurred, can be computed from the probability of $A$ divided by the probability of $B$ times the probability of $B$ given $A$.

Big deal right? So what, right? The clever twist to this is that it is allows you to reverse the direction of causality, to use the past history of A's and B's together to determine the probability of something you don't know about $A$ from something you do know (the other three things on the right hand side). The calculation is elementary, you just count all your events (where $A$, $B$, both $A$ and $B$, and neither have occurred). $P(x)$ is the fraction that is the number of events where $x$ happened divided by the total number of events. $P(x|y)$ is the (note all fractions are between 0 and 1).

The proof is also elementary. From the explanation, we can see that $P(x|y)$ is the fraction the number of events where both $x$ and $y$ occur divided by all the $y$ events (whether $x$ occurred or not). That is

$$
P(x|y) = \frac{P(x) {\rm \ and\ } P(y)}{P(y)}
$$

Since $x and y$ is no different than $y and x$ this means that

$$
P(A|B) P(B) = P(A) {\rm \ and\ } P(B) = P(B|A)P(A)
$$

Divide the two ends by $P(B)$ and you're done.

This is a very short set of inferences, almost purely arithmetic. Thinking of the Venn diagram, a division is really the proportion of the subset in another larger set. A little more complicated, but giving even more insight, is to note that if you look at a 2x2 contingency table, A given B is the proportion in a row, and B given A is in a column, the theorem allows you to move between row and column.

To use it, tabulate a number of events where both $A$ and $B$ may have occurred. This allows you to calculate

It might seem at this point that the theorem is a lot of extra thought work when, if you have the contingency table already, just compute anything you want right there $A$ given $B$, $B$ given $A$, whatever. The idea of using the theorem is that often you are not presented with the contingency table, but have some good idea of the different values. The theorem allows you to update $A$ given $B$, if $B$ given $A$ is somehow magically known to you already. Actually often you don't know $B$ either.

To make this more concrete (a common example), let $A$ be 'patient has toenail cancer', $B$ 'new test on toenail is positive for cancer'. And as you may suspect, tests are not perfect. Sometimes they raise a false alarm, they are positive when there really is no cancer, and sometimes they miss the cancer, it is negative when there really is cancer. And usually there is a huge cost in discovering if the patient really truly has cancer (like the patient dies and you do an autopsy, or you do a biopsy and discover the cancer too late to do anything). So you totally know $P(B)$, when your tests are positive, at least for your lab because you just count. You may have a good idea of $P(A)$ because of national statistics, but your much smaller set of patients may be unusually prone to the disease or unusually healthy. And you may have a good idea of $P(B|A)$ because you know of your patients already who has cancer and who of them had a positive test. And what you want to know is the probability that this one new patient really has cancer, given that the new test turned out positive.

So this says for a given positive test, you can calculate the probability of cancer (there are lots of class examples showing how this is not trivial, that lots of tests are made, to ensure that you don't miss a cancer, the test is very sensitive and may include a lot of false positives. Then if your test is positive, there is still a low chance of having cancer just more than if the test was negative.

But the point is you don't always have the full contingency table at hand. That's what makes Bayes theorem so useful.

To give some perspective, Bayes theorem is very useful theorem but it is very simple. It is one of the simplest theorems ever. It's mostly just a convenience of calculation. It is simple on the order of the theorem that multiplication distributes over addition. It is almost trivial in proof, and its application is mostly just a simplification of calculation. Calling it a theorem is on the order of calling Monaco a country: it is certainly a country but in practice is more like a small but popular and overpopulated district in an overpopulated area of a much larger popular country.

Now to Bayesian Statistics. Statistics in general as a discipline is only a couple centuries old (and that is stretching it). Its mathematical foundations with axiomatic probability and statistical distributions based on the central limit theorem and the normal curve and special functions and practice centered on p-values really started taking off in the early 1900's with Pearson, Fisher and others. This manner of doing statistics was eventually named 'frequentist' statistics in distinction to the newer trend called Bayesian statistics. The trendy new field was called that, not because frequentists did not use or believe in Bayes theorem, but because in its manipulation of distributions relied on computing new distributions based on prior ones, analogous to how Bayes theorem computes new probabilities.

The point is that Bayesian statistics is not some super exploded hypergeneralization of Bayes theorem but rather a large set of mathematical machinery that allows one to compute in many different ways how well different hypotheses are supported by data. Instead of the frequentist procedures t-test and ANOVA, that have very strict assumptions on the distributions of the data, these new procedures allow a parameter of the hypothesis of the existing distribution which then gets updated by the procedure (or if you don't have an idea of the distribution, you can always assume the uniform prior).

You'll note I can say quite a bit more about Bayesianism, but I have spent more about Bayes theorem. Volumes could be written (and have) about Bayesianism and the religious wars between Bayesianism and Frequentism (one could have a meta-religious war whether that war is religious or substantive). There are extremely specific things about Bayes theorem I can write, but about Bayesianism it is easy to stop early without going into lots of complicated math.

So Bayes theorem does not mean the same thing as Bayesianism. Bayes theorem is a tiny almost trivial calculation in elementary probability, with a lot of uses. There is no controversy about the theorem. Bayesianism (or Bayesian statistics) is a forceful trend in statistical practice with a large set of alternative theoretical and practical procedures. It is a controversial trend among statisticians, or rather it was controversial in the 70's snd 80's and is mostly mainstream among them now right along side traditional frequentism.

So don't let the similarity of the names lead you to think they are the same. They are similar and related, but not that much.

If you are thinking of a calculation of a probability of $A$ given $B$ using the probability of $B$ given $A$ (a very very specific usage), then you're using Bayes theorem.
If you're thinking of a trend in statistics that avoids distribution assumptions, or rather allows you to specify your arbitrary distribution assumption, then you're talking about Bayesianism.

In short, Bayes was a guy who is famous for a theorem he may not even have stated, much less proven, Bayes Theorem is an elementary probability calculation, and Bayesianism is a broad trend in statistical practice.

Least Uninteresting Number

Sunday, January 10, 2016

Bayes Theorem != Bayesianism

No comments:

Blog Archive

About Me