## Cohen's powerful attack on NHST

Jacob Cohen (1923 - 1998) was a pioneer of the new statistics promoting power analysis and effect sizes. As a pensioner he was still battling against Null Hypothesis Significance Testing (NHST).

The p-value is the probability of getting the test-results if the null-hypothesis is true. Most often the null hypothesis is a nil-hypothesis: no differences between groups. If the p-value is very small we reject that hypothesis, but test results cannot support the claim that there is no difference. However, what we really want to do, based on our data, is to get the probability of the hypothesis. Unfortunately, the p-value does not tell us that. It also cannot tell us that the results are due to chance – this is a restatement of the nil-hypothesis: no difference equals results that are 100% not due to chance. In "The earth is round (p < .05)," Cohen (1994) observed that NHST “does not tell us what we want to know, and we so much want to know what we want to know that, out of desperation, we nevertheless believe that it does!”.

## Box's model

GEP (George Edward Pelham) Box is often quoted for his intriguing line: "all models are wrong, but some are useful". He was son-in-law to the famous Sir R.A. Fisher and one of the scientists who adapted the Fisherian approach to statistical testing. Box was an engineer and the well known quote must be interpreted in that context, preferably a somewhat extended quote in an engineers perspective. The more extended quote is:

"Essentially, all models are wrong, but some are useful. However, the approximate nature of the model must always be borne in mind."

Clearly, Box did not question the meaning of models in general, but added a basic footnote to models that may be working but not necessarily give us an adequate picture of reality. The first task of researchers is to develop models that work. In the field of (social) psychiatry we are still along way off, but better statistical analysis is a necessary (although not sufficient) step in the right direction.

## Two men, two types of errors

The Polish/Russian and English tandem, Jerzey Neyman (1894-1981) and Egon Sharpe Pearson (1895-1960), introduced the notion that we reject the null-hypothesis because the test result is more likely under an alternative hypothesis. Two hypothesis are linked to the same number of types of error: Type I is to reject the null when it is correct, Type II is not to accept the alternative although it is true. Only in the long run can both types of errors be limited:

"These two sources of error can rarely be eliminated completely; in some cases it will be more important to avoid the first, in others the second … The use of these statistical tools in any given case, in determining just how the balance should be struck, must be left to the investigator” (Neyman and Pearson, 1928).

## Fisher on P-values

Sir Ronald Aylmer Fisher (1890-1962) considered his P-value an index of the level of confidence in the research hypothesis. “If P … is below .02 it is strongly indicated that the hypothesis fails to account for the whole of the facts. We shall not often be astray if we draw a conventional line at .05 …” (Fisher, 1990/1925).

In his great book "The lady tasting tea", on the effect of statistics on modern science, David Salsburg (2001) writes that Fisher was never too clear on his use of P-values, giving only examples. However, this 5% line was only a practical guideline. In Fisher's words: “…no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas” (Fisher, 1956).

## Most quoted - probably

One of Auden's central themes is religion and morals. In this poem he combines the two: amendments as vehicle to criticize social sciences:

Thou shalt not answer questionnaires

Or quizzes upon World Affairs,

Nor with compliance

Take any test.

Thou shalt not sit

With statisticians nor commit

A social science.

-W. H. Auden