Spacetime Optimal-Transport Attention for Visuo-Haptic Imitation Learning of Contact-Rich Manipulation
Yue Feng, Weicheng Huang, I-Ming Chen
- 发表年份
- 2026
- 访问权限
- 开放获取
摘要
Contact-rich manipulation tasks such as tight-clearance insertion, connector mating, polishing, and surface-conforming wiping remain difficult for data-driven controllers because they couple discontinuous contact dynamics, partial observability, and strict safety constraints. No single sensing modality suffices: vision supplies global context before contact, force/torque (F/T) feedback governs interaction after contact, and proprioceptive pose provides a consistent kinematic backbone. Most prior imitation-learning policies for contact-rich tasks operate on uni- or bi-modal signals, and the few that fuse three modalities typically adopt off-the-shelf attention modules with no explicit prior on how attention mass should be distributed across task-relevant regions. We present Spacetime Optimal-Transport Attention (SO-TA), a tri-modal fusion backbone that replaces softmax-normalized patch attention by an entropy-regularized Optimal Transport (OT) alignment between force-pose-derived sub-queries and visual patches. Explicit marginal constraints act as a structured inductive bias for contact-rich tasks, encouraging conditioning-aware spatial selection that is stable across illumination, distractors, and partial occlusion. SO-TA is paired with a diffusion-based sequence policy mapping observation windows to pose-action chunks. We evaluate SO-TA on three real-robot tasks: tight peg-in-hole assembly, BCM wiring-connector insertion, and curved-surface mark erasing. With ~200 rollouts per condition, SO-TA reaches 100% success on tight peg-in-hole versus 93% for cross-attention at matched capacity, and retains 82.5% success under illumination, distractor, and partial-occlusion perturbations where a concatenation baseline drops to 43.5%. OT-derived patch heatmaps and leave-one-out modality-influence ratios provide interpretable, phase-dependent diagnostics.
关键词
相关论文
工业5.0中人机协作的多模态感知、互认知与具身执行综述与展望
Kai Ding, Qingyuan Mao, Yaqian Zhang 等 6 位作者
Robotics and Computer-Integrated Manufacturing · 2026
迈向以人为中心的制造:人机协作装配中不确定性下的任务规划
Yingchao You, Ze Ji, Changyun Wei
Robotics and Computer-Integrated Manufacturing · 2026
代理式人机协作:通过记忆实现上下文对齐
Jiahui Si, Wenchao Li, Xi Chen 等 7 位作者
Robotics and Computer-Integrated Manufacturing · 2026
自适应物理信息Transformer结合高斯过程残差补偿用于人机协作中的逆动力学建模
Rui Qian, Xi Zhang, Dongpeng Li 等 5 位作者
Robotics and Computer-Integrated Manufacturing · 2026