MM-LMPC: Multi-Modal Learning Model Predictive Control via Bandit-Based Mode Selection
Wataru Hashimoto, Kazumune Hashimoto
- 发表年份
- 2025
- 访问权限
- 开放获取
摘要
Learning Model Predictive Control (LMPC) improves performance on iterative tasks by leveraging data from previous executions. At each iteration, LMPC constructs a sampled safe set from past trajectories and uses it as a terminal constraint, with a terminal cost given by the corresponding cost-to-go. While effective, LMPC heavily depends on the initial trajectories: states with high cost-to-go are rarely selected as terminal candidates in later iterations, leaving parts of the state space unexplored and potentially missing better solutions. For example, in a reach-avoid task with two possible routes, LMPC may keep refining the initially shorter path while neglecting the alternative path that could lead to a globally better solution. To overcome this limitation, we propose Multi-Modal LMPC (MM-LMPC), which clusters past trajectories into modes and maintains mode-specific terminal sets and value functions. A bandit-based meta-controller with a Lower Confidence Bound (LCB) policy balances exploration and exploitation across modes, enabling systematic refinement of all modes. This allows MM-LMPC to escape high-cost local optima and discover globally superior solutions. We establish recursive feasibility, closed-loop stability, asymptotic convergence to the best mode, and a logarithmic regret bound. Simulations on obstacle-avoidance tasks validate the performance improvements of the proposed method.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Fractional Differential Equations
Igor Podlubný
2025
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
Genetic Programming: On the Programming of Computers by Means of Natural Selection
John R. Koza
1992