In an earlier study Penterman et al. (2009) found that using the ‘Checklist Risk Emergency Service’ helped to predict aggressive behaviour of patients contacted by outpatient psychiatric emergency teams. The replication study (Penterman et al. 2013) also showed that a visual-analogue aggression scale (range 0 to 100) and one or more dangerous persons in the patient’s social environment (yes or no) are useful ‘predictors’ that arrive at 92% correct classifications (91% in the previous study).

The Journal is to be praised to give room for papers concerning the results of a replication study. However, in this case a prediction model was replicated that has no predictive value. For how good is 92% correct classifications?


Information is lacking

A first answer could be that we look at the largest category. In this study 101 aggression incidents were recorded: 8,5% of 1185 emergency service contacts. That percentag is a lower boundary for the assessment of the model’s predictive value: expecting non-aggressive behaviour results in 91,5% correct estimates.

A second answer assumes that the model has no predictive value: observed and predicted classifications are independent. For this calculation we need to know the classification table, but information is lacking. We do know that in the previous study the model had a maximum sensitivity of 74% and specificity of 84%. The ‘practical examples’ that Penterman et al. calculated also show that the risk of aggressive behaviour is fifty-fifty even when the score on the visual-analogue scale is very high (97 in 100) and dangerous persons are likely to be present.

Thus the emergency teams generally can expect to encounter no aggressive behaviour and can toss a coin when there is imminent danger.


Conclusion not tested

The authors conclude that recording aggression incidents and weekly evaluations seem to be adequate tools to increase employee’s safety as the number of aggression incidents reported decreased. That is an interesting hypothesis, but not a conclusion that can be drawn from this replication study.


Penterman EJM, Nijman HLI. Het inschatten van agressie bij patiënten van de ggz-crisisdienst. Tijdschr Psychiatr 2009; 51: 355-64.

Penterman EJM, Nijman HLI, Saalmink K, Rasing S, van der Staak CPF. [Assessing aggressive behaviour at the psychiatric emergency service with a checklist: a replication study.] Tijdschr Psychiatr 2013; 55: 93-100.

Reaction on ‘Assessing aggressive behaviour at the psychiatric emergency service with a checklist: a replication study’. Tijdschr Psychiatr 2013; 55: 312-314.



Reply to Wierdsma

According to Wierdsma employees of emergency services ‘generally can expect to encounter no aggressive behaviour and can toss a coin when there is imminent danger’. Employees of the emergency service Uden/Veghel prefer to structure their assessment of aggression risks on the basis of a checklist instead of tossing a coin. The outcomes that aggression may or may not occur are indeed not equivalent, when you might not care what happens, like tossing a coin. You want to prevent that emergency service personnel step into a dangerous situation unprepared, because the consequences of an aggressive incident can be very serious.

In a classic paper Wilson and Jungner (1968) formulated ten principles for routine screening of undesirable outcomes and risks to be meaningful. Thus (1) the undesirable outcome must have serious consequences; (2) potentially there must be measures to prevent the outcome or lessen its consequences; and (3) the screening should be feasible in terms of the investments necessary (in time and money, possible side effect etcetera) and balanced in relation to the likely damage caused by the undesirable outcome. In our view the procedure developed within our emergency service satisfy these principles. The consequences of an aggression incident can be serious, there are measures that can be taken to reduce the likelihood of aggression, of lessen the consequences, and the screening requires only a small investment. Following Wierdsma’s advice would mean that unprepared emergency service personnel will also enter situations in which more often than one in two cases aggression is manifest. Even while there are opportunities to reduce potential danger, such as take along a colleague, meet the patient at a different location or arrange police support.

For all phenomena that have low frequency (base rate) the theorem holds that predicting that they will not occur at all numerically will result in a nice overall ‘predictive’ value. Percentage wise you will do rather well when predicting that in a sample of 100 people no one will get a specific rare but severe disease or that no one will die because of a criminal offense. However, the consequences of these low frequent incidents are so serious that it is generally considered unacceptable not to make use of variables or risk assessment methods that are statistically significantly associated with increased risk of such outcomes. A good example is the mandatory use of particular (risk assessment) instruments in the forensic sector. Because of the serious consequences an undesirable outcome has for the victims, one often prefers to be better safe than sorry in case there is still an elevated risk and consequently there will undoubtedly be numerous ‘false positives’ in long stay forensic care although they would not reoffend.

In the practical examples we presented the model indicates that the chances of aggression vary from 3% to 54%. These are indeed substantive differences. Incidentally, we calculated these examples in order to warn emergency team workers not to overestimate the predictive value even for extreme high risk assessments because ‘even then a 50% chance of a “false alarm” exists’ and ‘even for high aggression risk assessments the chance is considerable to get a false alarm’ (page 97).

In his comments Wierdsma repeats conclusions and limitations of our study that we mentioned ourselves in the discussion section. We discussed at length his remark that because of the study design the decrease in the number of emergency contacts with aggression incidents can only hypothetically be contributed to the procedures in Veghel/Uden. We gave several alternative explanations for the decrease, for example that after some time staff members may become less motivated to administer the aggression report forms.

Wierdsma’s criticism of the results of regression anayses with which one tries to predict low frequent occurences that could have serious consequences, turns out to concern many scientific papers. The fact that Wierdsma articulated his criticism of our article in a letter to Tijdschrift voor Psychiatrie opens discussion. In which cases can results of regression analyses be relevant for practical application? The outcomes one tries to predict using logistic regression analysis often have different ‘weights’ in terms of undesirability, which in our opinion is the case with aggression versus no aggression. Put in statistical terms, for some outcomes one would prefer to anticipate a ‘false alarm’ more often than to be confronted with an unexpected ‘hit’. One could draw a parallel here with large-scale vaccine supplies for nowadays very rare but severe deceases.


Henk Nijman, Berry Penterman, Cees van der Staak



Wilson JMG, Jungner G. Principles and practice of screening for disease. Public Health papers 34. Genève: World Health Organization; 1968. pp. 26-7.



Speaking of aggressive behaviour

In their rather lengthy reply Nijman et al. use some of the oldest debating tricks in the book: make it personal and change the subject. To suggest that my advice for employees of emergency services would be to expect no danger or toss a coin is either sarcastic or just silly. It is not my advice but their model! To suggest that I would expect mental health care workers to step into dangerous situations alone and unprepared is vicious; it is an example of the informal logical fallacy of ad hominem. I have a background in public health and overall I am very much in favour of screening. However, Wilson and Jungner’s classic screening criterion number 5 states that there should be a suitable test or examination (Nijman et al. appear to be very selective in their quotes). In this case the predictors that Penterman et al. have come up with are not very helpful. It’s like trying to make a surgical incision with a blunt knife. Screening is not the issue here, model fit is. And Penterman et al.’s model just doesn’t fit.