P values: from suggestion to superstition

John Concato; John A Hartigan

doi:10.1136/jim-2016-000206

P values: from suggestion to superstition

J Investig Med. 2016 Oct;64(7):1166-71. doi: 10.1136/jim-2016-000206. Epub 2016 Aug 3.

Authors

John Concato¹, John A Hartigan²

Affiliations

¹ Clinical Epidemiology Research Center, Cooperative Studies Program, Veterans Affairs Connecticut Healthcare System, West Haven, Connecticut, USA Department of Medicine, Yale University School of Medicine, New Haven, Connecticut, USA.
² Department of Statistics, Yale University, New Haven, Connecticut, USA.

Abstract

A threshold probability value of 'p≤0.05' is commonly used in clinical investigations to indicate statistical significance. To allow clinicians to better understand evidence generated by research studies, this review defines the p value, summarizes the historical origins of the p value approach to hypothesis testing, describes various applications of p≤0.05 in the context of clinical research and discusses the emergence of p≤5×10(-8) and other values as thresholds for genomic statistical analyses. Corresponding issues include a conceptual approach of evaluating whether data do not conform to a null hypothesis (ie, no exposure-outcome association). Importantly, and in the historical context of when p≤0.05 was first proposed, the 1-in-20 chance of a false-positive inference (ie, falsely concluding the existence of an exposure-outcome association) was offered only as a suggestion. In current usage, however, p≤0.05 is often misunderstood as a rigid threshold, sometimes with a misguided 'win' (p≤0.05) or 'lose' (p>0.05) approach. Also, in contemporary genomic studies, a threshold of p≤10(-8) has been endorsed as a boundary for statistical significance when analyzing numerous genetic comparisons for each participant. A value of p≤0.05, or other thresholds, should not be employed reflexively to determine whether a clinical research investigation is trustworthy from a scientific perspective. Rather, and in parallel with conceptual issues of validity and generalizability, quantitative results should be interpreted using a combined assessment of strength of association, p values, CIs, and sample size.

Keywords: Biostatistics; Clinical Research; Research Design.

Publication types

Review
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Confidence Intervals
Genomics
Probability*
Sample Size
Superstitions*