SANTS: A State-Adaptive Scheduler for World Action Models
Yirui Sun, Guangyu Zhuge, Keliang Liu, Jie Gu, Xinyu Bing, Zhongxue Gan, Chunxu Tian
2026
Abstract
World Action Models (WAMs) improve robot manipulation by using video-based future representations to condition action generation. In pixel-space WAMs, however, the best action condition is not necessarily the fully denoised video. Controlled denoising-depth scans show that video refinement can reduce action error up to a state-dependent point, after which the gain may saturate or even reverse when late predictions become less action-relevant or physically unreliable. This suggests that action generation should use a state-dependent point along the video noise trajectory rather than a fixed terminal denoising depth. We introduce State-Adaptive Noise Trajectory Scheduler (SANTS), a lightweight scheduler for video-to-action diffusion policies. At each video decision point, SANTS reads the current video-state representation and noise level, then jointly predicts a cumulative stopping hazard and a relative noise-progression ratio. SANTS is post-trained with a path-level reward computed after the frozen action branch generates the final action chunk, so the scheduler is optimized for downstream action quality rather than intermediate video fidelity, while redundant video-state updates are explicitly penalized. Experiments show that SANTS reaches \(94.4\%\) overall success on RoboTwin 2.0 and \(73.1\%\) average success across seven real-robot tasks, while reducing latency by \(81.7\%\) and \(79.0\%\) relative to full video denoising, respectively. These results indicate that adaptive selection along the video noise trajectory can preserve the control benefits of WAM-style future reasoning while removing much of its redundant inference cost.
Keywords
Related papers
State-of-the-art in mobile robot-assisted grinding technologies for large-scale complex components
Yusen Li, Ziwei Wang, Xiangye Zhu +9 more
Robotics and Computer-Integrated Manufacturing · 2026
A fusion prediction model of tool wear based on physical information and machine learning in five-axis milling TC4 titanium alloy
Shaoqing Qin, Lida Zhu, Yanpeng Hao +7 more
Robotics and Computer-Integrated Manufacturing · 2026
Enhancing robotic milling quality via a novel piezoelectric active damping toolholder
Bo Li, Yuanbo Zhao, Huijie Xiao +3 more
Robotics and Computer-Integrated Manufacturing · 2026
A novel method of suppressing low-frequency chatter in robotic milling using magnetically-induced nonlinear broadband multidirectional passive vibration absorber
Hao Li, Yuhui Yu, Rui Fu +3 more
Robotics and Computer-Integrated Manufacturing · 2026