Home /Research /An Overview of Robot Embodied Intelligence Based on Multimodal Models: Tasks, Models, and System Schemes

PERCEPTION

An Overview of Robot Embodied Intelligence Based on Multimodal Models: Tasks, Models, and System Schemes

Yao Cong, Hongwei Mo

Year: 2025
Citations: 5
Access: Open access

Abstract

The exploration of embodied intelligence has garnered widespread consensus in the field of artificial intelligence (AI), aiming to achieve artificial general intelligence (AGI). Classical AI models, which rely on labeled data for learning, struggle to adapt to dynamic, unstructured environments due to their offline learning paradigms. Conversely, embodied intelligence emphasizes interactive learning, acquiring richer information through environmental interactions for training, thereby enabling autonomous learning and action. Early embodied tasks primarily centered on navigation. With the surge in popularity of large language models (LLMs), the focus shifted to integrating LLMs/multimodal large models (MLM) with robots, empowering them to tackle more intricate tasks through reasoning and planning, leveraging the prior knowledge imparted by LLM/MLM. This work reviews initial embodied tasks and corresponding research, categorizes various current embodied intelligence schemes deployed in robotics within the context of LLM/MLM, summarizes the perception–planning–action (PPA) paradigm, evaluates the performance of MLM across different schemes, and offers insights for future development directions in this domain.

Keywords

Embodied cognitionCognitive roboticsPopularityRobotContext (archaeology)Focus (optics)

An Overview of Robot Embodied Intelligence Based on Multimodal Models: Tasks, Models, and System Schemes

Abstract

Keywords

Related papers

Artificial intelligence: a modern approach

Are we ready for autonomous driving? The KITTI vision benchmark suite

Self-Organizing Maps

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems