Beyond 'I Can't Help With That': A New Framework Wants LLM Refusals to Actually Support People in Crisis
A new arXiv preprint argues a refusal can be a small intervention — if you treat it like one.
A new preprint out of a German-Danish-Italian research group, PsychoSafe, argues that the standard chatbot refusal — the polite, generic "I can't help with that" — is itself a clinical failure when the person on the other end is in crisis. The authors propose treating refusals as structured supportive communication, and report that prompting a 27-billion-parameter open model with their framework improved overall refusal quality by 28.1% over a generic baseline on a 500-prompt validation set, with the biggest gains in pointing users to outside resources (+46.8%) and what they call psychological grounding (+34.8%).

The framing is the interesting part. Most AI safety work treats refusal as the endpoint: the model declined, the harm was averted, the log shows a green checkmark. PsychoSafe reframes refusal as structured supportive communication grounded in evidence-based intervention strategies. Put plainly: when a user discloses suicidal intent, a refusal that just shuts the conversation down is doing the same thing a clinician would be sued for doing — acknowledging the disclosure and then walking out of the room.
To build the system, the authors assembled a corpus of 8,019 prompt-response pairs across five psychologically salient risk domains and tried two approaches on Qwen 3.5 27B: careful prompting, and parameter-efficient fine-tuning (the lighter-weight method of nudging a model's behavior without retraining the whole thing). Prompting won on balance. Fine-tuning achieved near-perfect refusal and resource-referral rates but reduced response relevance — meaning the model learned to recite hotline numbers reliably and then recite them whether or not they fit the actual prompt. A familiar problem in clinical training, too.
The honest limitation is in the last sentence of the abstract. Evaluations on SORRY-Bench and XSTest showed strong in-domain robustness but limited out-of-domain generalization, suggesting future work should diversify training data so models apply interventions selectively rather than schematically. Schematic empathy is its own failure mode — the bot that responds to every mention of sadness with the 988 number is not safer than the bot that says nothing, it is louder. This is the same translation-loss pattern visible in the chat-based suicide-care literature: an instrument that performed well in one context degrades when the conversational surface changes.
For clinicians watching the deployment side, two questions follow. First, is a "psychologically informed refusal" actually safer, or does it expand the surface where the model is making clinical-adjacent statements it cannot stand behind? A blunt refusal is at least legible as non-care. A warm, grounded, resource-referring refusal looks more like care, which raises the question of what standard it should be held against. Second, who validates these outputs? An LLM judge giving another LLM a 28.1% bump in "refusal quality" is a methodological starting point, not a clinical endorsement. The human ratings in this paper are a good move; the next move is independent clinical review on transcripts the developers did not curate.
The translation-loss problem PsychoSafe is trying to close — between what a refusal looks like and what a refusal does for the person reading it — is the exact gap Metonym was built to measure. A refusal that sounds supportive is not the same as one that lands as support, and only the second one belongs in a safety claim.
Metonym Clinical AI Intelligence — regulatory analysis at the intersection of clinical evaluation and AI safety. Produced under the Metonym Standard. Informational only — not legal advice, not clinical advice.


