Home /Research /MM-LMPC: Multi-Modal Learning Model Predictive Control via Bandit-Based Mode Selection

OTHER

MM-LMPC: Multi-Modal Learning Model Predictive Control via Bandit-Based Mode Selection

Wataru Hashimoto, Kazumune Hashimoto

Year: 2025
Access: Open access

Abstract

Learning Model Predictive Control (LMPC) improves performance on iterative tasks by leveraging data from previous executions. At each iteration, LMPC constructs a sampled safe set from past trajectories and uses it as a terminal constraint, with a terminal cost given by the corresponding cost-to-go. While effective, LMPC heavily depends on the initial trajectories: states with high cost-to-go are rarely selected as terminal candidates in later iterations, leaving parts of the state space unexplored and potentially missing better solutions. For example, in a reach-avoid task with two possible routes, LMPC may keep refining the initially shorter path while neglecting the alternative path that could lead to a globally better solution. To overcome this limitation, we propose Multi-Modal LMPC (MM-LMPC), which clusters past trajectories into modes and maintains mode-specific terminal sets and value functions. A bandit-based meta-controller with a Lower Confidence Bound (LCB) policy balances exploration and exploitation across modes, enabling systematic refinement of all modes. This allows MM-LMPC to escape high-cost local optima and discover globally superior solutions. We establish recursive feasibility, closed-loop stability, asymptotic convergence to the best mode, and a logarithmic regret bound. Simulations on obstacle-avoidance tasks validate the performance improvements of the proposed method.

Keywords

eess.SY

MM-LMPC: Multi-Modal Learning Model Predictive Control via Bandit-Based Mode Selection

Abstract

Keywords

Related papers

Statistical Learning Theory

Fractional Differential Equations

Applied Nonlinear Control

Genetic Programming: On the Programming of Computers by Means of Natural Selection