Temporally Masked Diffusion: An Effective Behavioral Cloning Method in Robot Action Sequence Generation
Ying Zhang, Kai Zhao, Shuangshuang Han, Yingchun Wang, Yanfeng Lu
- 发表年份
- 2025
- 引用次数
- 1
摘要
Diffusion models have shown significant potential in imitation learning, particularly for modeling complex data distributions. However, challenges remain in effectively capturing temporal dependencies in sequential decision-making tasks that are crucial for accurately representing the dynamics of actions and predicting long-term outcomes. This paper proposes temporally-masked diffusion (TM-Diff), an innovative approach that enhances the learning of essential temporal dependencies within action sequences. The proposed TM-Diff introduces a temporal masking method, masking certain actions at the time-step level during training. A novel time-series diffusion Transformer is designed to reconstruct the masked positions as a self-supervised learning auxiliary task, while still following the diffusion training process. This enables TM-Diff to infer missing action information, allowing it to learn temporal relationships between action sequence tokens. The effectiveness of TM-Diff is validated across six complex manipulation tasks. Experimental results indicate that TM-Diff demonstrates improved generative performance compared to standard diffusion-based policies and traditional observation-to-action methods. Particularly in small-sample demonstration datasets, TM-Diff achieves notable success, with a 6.2% improvement in the success rate for the Square task, and a 10.1% improvement in the target coverage ratio for the Push-T task, using only 50 demonstration samples. As a reliable and robust task execution policy, the proposed TM-Diff supports the responsible deployment of intelligent robotic systems in critical domains such as the Industrial Internet of Things (IIoT), home services, healthcare and logistics.
关键词
相关论文
Artificial intelligence: a modern approach
1995
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002
Self-Organizing Maps
Teuvo Kohonen
1995
Vision meets robotics: The KITTI dataset
Andreas Geiger, Philip Lenz, Christoph Stiller 等 4 位作者
2013