首页 /研究 /RACCOON: Grounding Embodied Question-Answering with State Summaries from Existing Robot Modules
OTHER

RACCOON: Grounding Embodied Question-Answering with State Summaries from Existing Robot Modules

Samuel Bustamante, Markus Knauer, Stefan Schneyer, Alin Albu‐Schäffer, Bernhard Weber, Freek Stulp

发表年份
2025
引用次数
2

摘要

Explainability is vital for establishing user trust, also in robotics. Recently, foundation models (e.g. vision-language models, VLMs) fostered a wave of embodied agents that answer arbitrary queries about their environment and their interactions with it. However, naively prompting VLMs to answer queries based on camera images does not take into account existing robot architectures which represent the robot's tasks, skills, and beliefs about the state of the world. To overcome this limitation, we propose RACCOON, a framework that combines foundation models' responses with a robot's internal knowledge. Inspired by Retrieval-Augmented Generation (RAG), RACCOON selects relevant context, retrieves information from the robot's state, and utilizes it to refine prompts for an LLM to answer questions accurately. This bridges the gap between the model's adaptability and the robot's domain expertise.

关键词

Embodied cognitionRobotComputer scienceGroundState (computer science)Question answeringArtificial intelligenceHuman–computer interactionElectrical engineeringEngineering

相关论文

查看 OTHER 分类全部论文