Because of complex problems adolescents often fail to complete their course of treatment. Van der Reijen and colleagues aimed to investigate whether gender and symptom severity predict course of treatment. The English summary is truly puzzling. First of all, the aim of the study does not relate to the background info. Complexity (seen as having both psychiatric and behavioural problems) as background concept is different from symptom severity (many psychiatric problems). Secondly, the method section does not indicate the design or type of analyses. The authors attempted to find out if treatment outcome could be predicted by symptom severity and gender differences “of 127 male adolescent patients” (something went wrong in translation). But summary results include only inter-rater reliability, not symptom and gender differences. Finally, the conclusion is descriptive more than persuasive: “the questionnaire was not able to predict accurately whether patients would complete their treatment”. However, the devil is in the full text…

Significantly small

The results section contains a paragraph on “descriptive statistics” where the authors report T-tests on gender differences – so these are descriptive tests? The t-value for gender difference in age was 1.95 and not significant (p=0.053), but the difference in hospitalisation time was significant: the paper shows p<0.05 but the statistic is only a fraction larger: t(121)=2,04. The first P-value (p=0.053) fits the framework of Sir Ronald Aylmer Fisher (1890-1962) who considered his P-value an index of the level of confidence in the research hypothesis. However, the second P-value (p<0.05) fits the ideas of Jerzey Neyman (1894-1981) and Egon Sharpe Pearson (1895-1960): reject the null-hypothesis when the test result is more likely under an alternative hypothesis, setting the Type I error-rate (rejecting the null when it is correct) at 5%; any value lower than .05 will do. Despite their differences, however, these eminent statisticians would probably agree that the difference as reported by Van der Reijen et al is not significant.

The authors report that gender difference in age was less than 2 months and difference in hospitalisation time about 2 months. However, hospitalisation time is not normally distributed, boys and girls have different group sizes and unequal standard deviations (which violate t-test assumptions). More importantly, this is not a random sample but a so called convenience sample: all hospitalised patients in a five years period. So sampling theory doesn’t apply and p-values are difficult to interpret. The only relevant question is: how important is a 2 month difference? Answer: given a standard deviation in hospitalisation time of about 6 month, the effect of gender on hospitalisation time is small to medium (Cohen’s d=0.33) and not very important from a mental health policy point of view.

Inconclusive conclusion

Van der Reijen et al conclude that they found indications of gender differences: girls seem to benefit more from treatment than boys. This treatment success is explained by type of discharge: boys are more often “pushed out” because of incidents or serious rule violations. “Treatment success” is defined as type of discharge: patient and therapist agree that less intensive treatment is an option (day-care or ambulant care). But doesn’t this sound like a tautology: explaining treatment success by a shift to less intensive treatment?

The paper concludes that the questionnaire on symptom severity was not able to predict accurately whether patients would complete their treatment. Moreover, in the final analysis the authors also did not find gender differences. Logistic regression analyses with hostility scores and gender as predictors of treatment success showed poor model fit. The classification table resulted in 66% correctly classified (sensitivity 37%, specifity 87%), which is on optimistic estimate because the model and model-fit assessment are based on the same dataset (Read more here >>), but still it is only a 7% improvement. Given this poor model fit, the results are inconclusive which is also reflected in a 95% confidence interval for the regression coefficient for gender ranging from 0,97 to 4,84. The P-value for the gender-coefficient was 0,059 and alpha was set at 5%. So controlling for hostility scores affects the association of gender and treatment success.