Monday, March 28, 2016

A small nitpick about a small p-value problem trope


Lately the p-value has been getting a lot of press, almost entirely bad (tip of the iceberg). Whatever it is and means has been up for discussion as it hasn't had since NPHT was created (the Bayesians have been fighting against it, or rather fighting for alternatives, since the 60's).

Hidden in the middle of this storm, or a small tornado on the side, is the issue of data fishing or p-hacking. Since only a p-value of less than .05 is considered 'statistically significant, only such values are considered publishable leading to the problems of: selective publishing (ignoring significant non-results), and p-hacking (if one p-value isn't good enough, change you hypothesis and testing little by little until you get a p-value below the threshold). The problematic trope is stated in roughly the following manner:

At p-level threshold set at 5% (or .05 or 1/20), all you need is 20 studied hypotheses to get one hypothesis that is significant by chance.

It's so obvious!  With 1/20th probability, you need 20 tries to guarantee a hit! The intention is that you shouldn't make many hypothesis tests at a time, otherwise you'll get some false hypothesis stated as true.

But you may notice with that wording that it is a classic gambler's misinterpretation, 'the run has to end!'. Each hypothesis test is independent, and so the probability of the next test will not change if all the previous tests are all hits or all not or whatever.

Whatever you think of p-values, and whatever you think they mean, they are probabilities. Probabilities of what is complicated and nuanced and misleadingly stated and problematic and the firm basis for statistical inference for the past hundred years. But still, they are probabilities of something and a strict threshold of 5% of accepting a hypothesis over rej... forget that verbiage. it's a 1/20 probability event.


So now we're in the realm of basic probability (and its own difficulties) but they should be shared by Bayesians, Frequentists, Kolmogorov..ans, Keynesians (he had his own!). So any hypothesis has a probability of .05 of being positive, a hit. What's the probability of a hit in 1 trial? 5%. What's the probability of a hit (at least one hit) in 2 trials? 3? n trials? Those are harder but only a tiny bit, basic probability/combinatorics. What's the probability of at least one hit? One minus the probability of no hits at all. What's the probability of no hits in n trials? (probability of no hit in one trial)^n. They are independent events so you multiply. Final answer:


\[
P({\rm hit\ in\ n\ trials}) = 1-P({\rm no\ hit})^n = 1- 0.95^n
\]

Well, not final exactly, it's just a formula. We don't yet have a good picture of what it means in realation to our intuition about 'it'll take 20 to make sure we have a hit'.

So then a picture:



It starts at 0, rises in exponential decay asymptotically to 1, but is a little slower than you'd expect for an exponential because the base is so close to 1.


The usual way to present such probabilities is, like the birthday paradox, to say how many trials it takes to get %50 chance. Multiplying .95 a few times we see that it takes 14 trials for there to be more than 50/50 chance that at least one item is a hit. To get to 95% chance, it takes 59 trials. There's no guarantee that there'll be a hit, just the probability gets smaller and smaller.


I make this point to... well, to pick a nit. You do more experiments, the more likely there will be one that is 'statistically significant' totally by chance. Intuition and logic lead to that immediately. But the logic is never done and one step of it doesn't lead to correct two steps. I do realize it is a bit of a mouthful and difficult to digest to say 'after 14 trials there will be a fifty-fifty for a false positive'? 50/50? It's not obvious how that relates to 5%, but the erroneous 20*.05 = 100% does obviously relate.


In the end, to say '5% means 20 experiments', which seems so directly and intuitively obvious, is wrong. In the right direction, but wrong.








No comments: