Flexible Agent Alignment with Goal Inference from Open-Ended Dialog

Rachel Ma, Jingyi Qu, Andreea Bobu, Dylan Hadfield-Menell

发表年份: 2025
访问权限: 开放获取

摘要

We introduce Open-Universe Assistance Games (OU-AGs), a formal framework extending assistance games to LLM-based agents. Effective assistance requires reasoning over human preferences that are unbounded, underspecified, and evolving. Current LLM agents struggle in multi-turn interactions and with maintaining accurate models of user intent in collaborative settings. Existing assistance game formulations assume fixed, predefined preferences, an assumption that breaks down in open-ended dialogue where goals are revised incrementally and expressed in natural language. Grounded in cognitive science accounts of preference construction, we represent human preferences as a dynamically updated distribution over discrete natural-language goals. To operationalize OU-AGs, we introduce GOOD (GOals from Open-ended Dialogue), a data-efficient online method that extracts and ranks candidate goals during interaction, using LLM-simulated users to perform probabilistic inference over goal hypotheses. This allows for interpretable, uncertainty-aware preference representations without large offline datasets. We evaluate GOOD across three text-based domains: grocery shopping, household robotics (AI2-THOR), and coding. Compared to baselines without explicit goal tracking, GOOD produces semantically coherent goal representations and improves alignment with user intent across domains.

关键词

cs.AIcs.CLcs.LGcs.RO

Flexible Agent Alignment with Goal Inference from Open-Ended Dialog

摘要

关键词

相关论文

Statistical Learning Theory

Fractional Differential Equations

Applied Nonlinear Control

Genetic Programming: On the Programming of Computers by Means of Natural Selection