Hail the impossible: p-values, evidence, and likelihood

Scand J Psychol. 2011 Apr;52(2):113-25. doi: 10.1111/j.1467-9450.2010.00852.x. Epub 2010 Nov 16.

Abstract

Significance testing based on p-values is standard in psychological research and teaching. Typically, research articles and textbooks present and use p as a measure of statistical evidence against the null hypothesis (the Fisherian interpretation), although using concepts and tools based on a completely different usage of p as a tool for controlling long-term decision errors (the Neyman-Pearson interpretation). There are four major problems with using p as a measure of evidence and these problems are often overlooked in the domain of psychology. First, p is uniformly distributed under the null hypothesis and can therefore never indicate evidence for the null. Second, p is conditioned solely on the null hypothesis and is therefore unsuited to quantify evidence, because evidence is always relative in the sense of being evidence for or against a hypothesis relative to another hypothesis. Third, p designates probability of obtaining evidence (given the null), rather than strength of evidence. Fourth, p depends on unobserved data and subjective intentions and therefore implies, given the evidential interpretation, that the evidential strength of observed data depends on things that did not happen and subjective intentions. In sum, using p in the Fisherian sense as a measure of statistical evidence is deeply problematic, both statistically and conceptually, while the Neyman-Pearson interpretation is not about evidence at all. In contrast, the likelihood ratio escapes the above problems and is recommended as a tool for psychologists to represent the statistical evidence conveyed by obtained data relative to two hypotheses.

MeSH terms

  • Models, Statistical*
  • Probability*