RACCOON: Grounding Embodied Question-Answering with State Summaries from Existing Robot Modules
Samuel Bustamante, Markus Knauer, Stefan Schneyer, Alin Albu‐Schäffer, Bernhard Weber, Freek Stulp
- 发表年份
- 2025
- 引用次数
- 2
摘要
Explainability is vital for establishing user trust, also in robotics. Recently, foundation models (e.g. vision-language models, VLMs) fostered a wave of embodied agents that answer arbitrary queries about their environment and their interactions with it. However, naively prompting VLMs to answer queries based on camera images does not take into account existing robot architectures which represent the robot's tasks, skills, and beliefs about the state of the world. To overcome this limitation, we propose RACCOON, a framework that combines foundation models' responses with a robot's internal knowledge. Inspired by Retrieval-Augmented Generation (RAG), RACCOON selects relevant context, retrieves information from the robot's state, and utilizes it to refine prompts for an LLM to answer questions accurately. This bridges the gap between the model's adaptability and the robot's domain expertise.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Fractional Differential Equations
Igor Podlubný
2025
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991