Friday, November 18, 2016

Effect size versus statistical significance

One of the major tropes in the p-value wars is the difference between statistical significance and effect size. The usual (important) observation is that you can have a calculation on data that results in very small p-value, meaning very high statistical significance, but very small effect. And often this can be effected by increasing the number of instances: the more instances the smaller the p-value can be guaranteed, that the phenomenon is really not due to chance, no matter how small the scale phenomenon actually is. This is not to say that the phenomenon is not real, just that the phenomenon doesn't change that much in one direction.

This difference is presented often laconically ("Using Effect Size—or Why the P Value Is Not Enough") as:

Statistical significance is the least interesting thing about the results. You should describe the results in terms of measures of magnitude –not just, does a treatment affect people, but how much does it affect them.(Kline RB)

This makes it sound like you have two things that can be presented, and one is much more important than the other. But it's a false dichotomy. You want both. The magnitude is descriptive stats - how big it is. In an experiment on n individuals, fish oil tablets increased memory performance by 10%. If you don't know the effect size, what exactly beyond 'better' do you know about the phenomenon? Statistical significance is trust - how (mathematically) representative the sample is of the population. You can claim something is better but can you really trust the claim?

It's very easy to see how to manufacture a high statistical significance but low effect size - increase the number of instances. In fact, as you increase n, almost all statistical tests asymptotically approach statistical significance (for real world phenomena). Chi-squared is the worst!

A consistent high effect size (over samples) leads obviously to high statistical significance.

But it is possible to have high effect size and low significance.

So in the end, it is not one or the other. Both should be presented. The effect size tells you how different the sample shows phenomenon is, and the p-value tells you how much you can trust the sample that showed the phenomenon.

No comments: