There is a lot of confusion about P-values. Most statisticians prefer confidence intervals any day of the week. Why?


A P-value represents the probability that we get the sample result (say a difference between means) or an even more extreme result when the null-hypothesis is true (H0= no real difference). When that probability is small (say less than 5% or 1%), then we have stronger evidence against the null-hypothesis.

How to calculate a P-value? It's a two-step procedure. First, calculate the test statistic: the size of the observed difference relative to the dispersion of the difference (signal to noise ratio). Take an effect size (difference between two means or percentage difference) and take into account a variability estimate of the effect size based on the standard deviation(s) and size of the sample(s). Next, check this test statistic against the theoretical distribution (i.c. normal or binomial) – the P-value is the area under the curve in the two tails.

Confidence interval

The confidence interval (CI) uses the same elements as a P-value (effect size, variability, and sample size) and an additional factor to get the interval. This gives us an idea of the precision of the estimate. The 95% CI is the estimated effect size plus or minus 1.96 times the variability estimate of the effect size (standard error). Wider intervals indicate less certainty about the estimated effect. If the interval includes the value of the null hypothesis (most often zero), the P value will be non-significant at .05 level. This 95% CI contains the same information as the P-value, but focusses on effect size and precision of the estimate. Confidence intervals make uncertainty explicit and support meta-analyses by moving away from "reject or do-not-reject" decisions in single studies.

Bayesian perspective

The 95% CI is often interpreted as a plausible range of values for the true mean or true effect size. However, this is a Bayesian interpretation concerning the probable distribution of the parameter in the ligt of the data. At least that is what I think now, knowing close to nothing about Bayesian statistics. Time for a summer course?