Somehow, I always get confused when these types of errors have to be explained. So to help myself, here is my first and final attempt to clarify the basics.

All tests are variations on a basic theme. First, calculate the test statistic: size of the observed difference relative to the dispersion of the difference (signal to noise ratio). Next, compare this statistic with the expected range of values under the assumption that the null hypothesis is true. Now we are at a statistical T-intersection: significance testing (data oriented) or hypothesis testing (test oriented). If the test statistic is within the expected range, significance testing is done (the test result is not statistically significant) and hypothesis testing would conclude that the null is not rejected. Hypothesis testing may result in two types of error.

Type I error: the null-hypothesis is true (no difference between means, there is no fire), but is rejected. This is an error probability, the chance of a false alarm or false positive. To reduce the risk of “crying wolf” we can choose a high significance level: alpha < .01 or .001, so that only 1 in 100 or 1 in 1000 tests of the null will result in a false alarm. Reject the null-hypothesis when the P-value is lower than alpha; the actual data-dependent P-value itself is of no importance (P is not an error probability, but the probablity of the statistic given that the null-hypothesis is true).

Type II error: the null-hypothesis is false but is not rejected (an alternative hypothesis is true: there is a real diffe-rence). We are not alarmed by this false negative. To reduce the “not alarmed” risk (beta) we can choose a high powered test (power: 1-beta = .80 or .90, the probability of rejecting a false null-hypothesis).

When we reduce the risk of a Type I error, the risk of a Type II increases and vice versa. Therefore, the balance between these types of error must be struck depending on the setting and type of problem. When we want to be absolutely sure that change is necessary, we risk that nothing happens where things are wrong and changes are necessary. But when we are eager to make changes, we risk a change for the worse.

Very invasive interventions (say ECT or wider criteria for compulsory admission) probably should emphasize reducing Type I errors: reject the no-difference hypothesis only when test results are obvious. Tests of very expensive but patient-oriented interventions (such as Assertive Community treatment) probably should go for high power testing. But these strategies only make sense when a Yes or No decision on the null-hypothesis is relevant.