Lagrangian Relaxation for Multi-Action Partially Observable Restless Bandits: Heuristic Policies and Indexability
Rahul Meshram, Kesav Kaza
- 发表年份
- 2025
- 访问权限
- 开放获取
摘要
Partially observable restless multi-armed bandits have found numerous applications including in recommendation systems, communication systems, public healthcare outreach systems, and in operations research. We study multi-action partially observable restless multi-armed bandits, it is a generalization of the classical restless multi-armed bandit problem -- 1) each bandit has finite states, and the current state is not observable, 2) each bandit has finite actions. In particular, we assume that more than two actions are available for each bandit. We motivate our problem with the application of public-health intervention planning. We describe the model and formulate a long term discounted optimization problem, where the state of each bandit evolves according to a Markov process, and this evolution is action dependent. The state of a bandit is not observable but one of finitely many feedback signals are observable. Each bandit yields a reward, based on the action taken on that bandit. The agent is assumed to have a budget constraint. The bandits are assumed to be independent. However, they are weakly coupled at the agent through the budget constraint. We first analyze the Lagrangian bound method for our partially observable restless bandits. The computation of optimal value functions for finite-state, finite-action POMDPs is non-trivial. Hence, the computation of Lagrangian bounds is also challenging. We describe approximations for the computation of Lagrangian bounds using point based value iteration (PBVI) and online rollout policy. We further present various properties of the value functions and provide theoretical insights on PBVI and online rollout policy. We study heuristic policies for multi-actions PORMAB. Finally, we discuss present Whittle index policies and their limitations in our model.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Fractional Differential Equations
Igor Podlubný
2025
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
Genetic Programming: On the Programming of Computers by Means of Natural Selection
John R. Koza
1992