Wilhelm, a psychologist affiliated with the department of Instruction technology at the University of Twente, together with three pupils of the Bonhoeffer College in Enschede, studied the effect of one can of energy drink on the outcomes of six cognitive tests. The study had a control group drinking water (n = 34), a placebo group that got sugar-free lemonade (n = 35) and the experimental group that received a caffeine drink (n = 34).

The longest cognitive test showed a statistically significant effect. This significant outcome was in the predicted direction, although the placebo group and experimental group ended up close. We are only informed that all other tests showed no differences between the three conditions. Differences in proportions and numbers of correct answers were tested using ANOVA. This statistical method, however, assumes that the dependent variable is continuously and normally-distributed with a constant variance. Proportions and numbers "by definition" do not follow a normal distribution. Possibly differences would have been found using more appropriate analyses (logistic- and poisson regression analyses). Moreover ...

For statistically significant results in groups of 34 subjects average to large effect sizes must be found that can only be expected from a panacea "just one can" product. Therefore, the conclusion that ‘one energy drink most likely has no effect on cognitive performance of high school pupils’ is too firm, importantly: it is fundamentally incorrect. The statistical test does not concern the probability of the hypothesis. The authors tried to falsify the hypothesis of no effect. When the null hypothesis is not rejected, this is not proof that the conditions are equal. This is a basic rule which is easily forgotten but not without consequences. Twitter messages echoed the paper’s title that youth has little to gain cognitively by consuming energy drinks. The study results seem to be socially desirable.

### Reply to Wierdsma

In his comments Wierdsma notices that numbers and proportions do not follow a normal distribution by definition. That is correct and for that reason we used raw scores in the main analyses in which we tested the differences between conditions in test scores. He also notices that the numbers of subjects would have required a very large difference to confirm the hypothesis. In addition, you cannot take this the other way round: finding no difference is not proof of the opposite conclusion (no effect). Apart from the question what would have been a good experiment to come to such a conclusion, the said numbers are customary (but not large) for the analyses methods used and in our view the findings are indeed informative. Moreover, our conclusions are presented with caution and we suggest alternative explanations. Research into effects of energy drinks in youth is scarce, while health risks are currently a topic of hot debate. Research may inform this debate. Social desirability could silence critical reflections.

Pascal Wilhelm, PhD

### A note on raw scores, statistics and social desirability

Raw scores in cognitive tests are bounded (a maximum number of correct answers) and the distribution of correct answers is probably not normal. That is the reason I suggested poison regression analysis. But in this frequentist framework there is no design to confirm a hypothesis. Only Bayesian statistics can do what most researches aim for: to reflect on the probability of the hypothesis. As researchers we should not be interested in informative findings except when we are involved in explorative analyses – but then we can leave formal testing aside. The conclusion captured in the paper’s title is not presented with caution and in this case critical reflection could be silenced by social desirable interpretation of the findings – not the other way around. I believe that scientists have the moral obligation to counter all jumps to conclusion in the public debate. However, if the authors intent to act as a sort of change agents, other journals may be more effective.