首页 /研究 /RoboGPT: An LLM-Based Long-Term Decision-Making Embodied Agent for Instruction Following Tasks
MANIPULATION

RoboGPT: An LLM-Based Long-Term Decision-Making Embodied Agent for Instruction Following Tasks

Yaran Chen, Mining Tan, Zhang Xinyao, Dongbin Zhao

发表年份
2025
引用次数
8

摘要

Robotic agents are tasked with mastering common sense and making long-term sequential decisions to execute daily tasks based on natural language instructions. Recent advancements in large language models (LLMs) have catalyzed efforts for complex robotic planning. However, despite their superior generalization and comprehension capabilities, LLM task plans sometimes suffer from issues of accuracy and feasibility. To address these challenges, we propose RoboGPT,<xref ref-type="fn" rid="fn1" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><sup>1</sup></xref><fn id="fn1" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><label><sup>1</sup></label> For more details, please refer to our project page <uri>https://github.com/Cwb0106/RoboGPT</uri>. </fn> an embodied agent specifically designed to make long-term decisions for instruction following tasks. RoboGPT integrates three key modules: 1) RoboPlanner, an LLM-based planning module equipped with 67k embodied planning data, breaks down tasks into logical subgoals. We compile a new robotic dataset using a template feedback-based self-instruction method to fine-tune the Llama model. RoboPlanner with strong generalization can plan hundreds of instruction following tasks; 2) RoboSkill, customized for each subgoal to improve navigation and manipulation capabilities; and 3) Re-Plan, a module that dynamically adjusts the subgoals based on real-time environmental feedback. By utilizing the precise semantic map generated by RoboSkill, the target objects can be replaced by calculating the similarity between subgoals and the objects present in the environment. Experimental results demonstrate that RoboGPT exceeds the performance of other state-of-the-art (SOTA) methods, particularly LLM-based methods, in terms of task planning rationality for hundreds of unseen daily tasks and even tasks from other domains.

关键词

Computer scienceEmbodied cognitionTerm (time)Embodied agentHuman–computer interactionArtificial intelligence

相关论文

查看 MANIPULATION 分类全部论文