Object-Centric Action-Enhanced Representations for Robot Visuo-Motor Policy Learning
Nikos Giannakakis, Argyris Manetas, Panagiotis P. Filntisis, Petros Maragos, George Retsinas
- Year
- 2025
- Access
- Open access
Abstract
Learning visual representations from observing actions to benefit robot visuo-motor policy generation is a promising direction that closely resembles human cognitive function and perception. Motivated by this, and further inspired by psychological theories suggesting that humans process scenes in an object-based fashion, we propose an object-centric encoder that performs semantic segmentation and visual representation generation in a coupled manner, unlike other works, which treat these as separate processes. To achieve this, we leverage the Slot Attention mechanism and use the SOLV model, pretrained in large out-of-domain datasets, to bootstrap fine-tuning on human action video data. Through simulated robotic tasks, we demonstrate that visual representations can enhance reinforcement and imitation learning training, highlighting the effectiveness of our integrated approach for semantic segmentation and encoding. Furthermore, we show that exploiting models pretrained on out-of-domain datasets can benefit this process, and that fine-tuning on datasets depicting human actions -- although still out-of-domain -- , can significantly improve performance due to close alignment with robotic tasks. These findings show the capability to reduce reliance on annotated or robot-specific action datasets and the potential to build on existing visual encoders to accelerate training and improve generalizability.
Keywords
Related papers
How to Relieve Distribution Shifts in Semantic Segmentation for Off-Road Environments
Ji-Hoon Hwang, Daeyoung Kim, Hyung-Suk Yoon +2 more
2026
Uncertainty-guided evolvable recognition framework for industrial robots via prototype-based fuzzy inference and evidence fusion
Yanrun Zhou, Zihao Lei, Guangrui Wen +4 more
Robotics and Computer-Integrated Manufacturing · 2026
Point cloud registration for non-destructive, high-resolution coating thickness measurement from 3D scans
Simon Duenser, Ivo Aschwanden, Raamadaas Krishnadas +2 more
Robotics and Computer-Integrated Manufacturing · 2026
Toward the intelligent robotics era: Multimodal flexible haptic sensors for advanced perception systems
Sili Ding, Feng Xu, Jie Chen +3 more
Progress in Materials Science · 2026