The Supportiveness-Safety Tradeoff in LLM Well-Being Agents
Himanshi Lalwani, Hanan Salam
- Year
- 2026
- Access
- Open access
Abstract
Large language models (LLMs) are being integrated into socially assistive robots (SARs) and other conversational agents providing mental health and well-being support. These agents are often designed to sound empathic and supportive in order to maximize user's engagement, yet it remains unclear how increasing the level of supportive framing in system prompts influences safety relevant behavior. We evaluated 6 LLMs across 3 system prompts with varying levels of supportiveness on 80 synthetic queries spanning 4 well-being domains (1440 responses). An LLM judge framework, validated against human ratings, assessed safety and care quality. Moderately supportive prompts improved empathy and constructive support while maintaining safety. In contrast, strongly validating prompts significantly degraded safety and, in some cases, care across all domains, with substantial variation across models. We discuss implications for prompt design, model selection, and domain specific safeguards in SARs deployment.
Keywords
Related papers
A dual-loop framework for manufacturability-aware topology optimization of electric vehicle structures via wire arc additive manufacturing
Qiang Cui, Chuan Yu, Daoqian Yang +2 more
Robotics and Computer-Integrated Manufacturing · 2026
Geometric digital twin: A digital and intelligent model for aero-engine assembly accuracy prediction
Ke Shang, Xin Jin, Teli Xu +4 more
Robotics and Computer-Integrated Manufacturing · 2026
Design and dynamic performance prediction of a novel large-aperture offset-feed deployable antenna
Chuang Shi, Tianming Liu, Ning Xue +6 more
Aerospace Science and Technology · 2026
Revolutionizing Industries Through AI-Driven Robotics
Aryan Chaudhary
Recent Advances in Computer Science and Communications · 2026