首页 /研究 /Adaptive Scheduling: A Reinforcement Learning Whittle Index Approach for Wireless Sensor Networks
LEARNING

Adaptive Scheduling: A Reinforcement Learning Whittle Index Approach for Wireless Sensor Networks

Sokipriala Jonah, Seong Ki Yoo, Saurav Sthapit

发表年份
2026
访问权限
开放获取

摘要

We propose a reinforcement learning based scheduling framework for Restless Multi-Armed Bandit (RMAB) problems, centred on a Whittle Index Q-Learning policy with Upper Confidence Bound (UCB) exploration, referred to as WIQL-UCB. Unlike existing approaches that rely on fixed or adaptive epsilon-greedy strategies and require careful hyperparameter tuning, the proposed method removes problem-specific tuning and is therefore more generalisable across diverse RMAB settings. We evaluate WIQL-UCB on standard RMAB benchmarks and on a practical sensor scheduling application based on the Age of Incorrect Information (AoII), using an edge-based state estimation scheme that requires no prior knowledge of system dynamics. Experimental results show that WIQL-UCB achieves near-optimal performance while significantly improving computational and memory efficiency. For a representative problem size of N = 15 and M = 3, the proposed method requires only around 600 bytes of memory, compared with several kilobytes for tabular Q-learning and hundreds of kilobytes to megabytes for deep reinforcement learning baselines. In addition, WIQL-UCB achieves sub-millisecond per-decision runtimes and is several times faster than deep reinforcement learning approaches, while maintaining competitive performance. Overall, these results demonstrate that WIQL-UCB consistently outperforms both non-Whittle-based and Whittle-index learning baselines across a wide range of RMAB settings.

关键词

eess.SY

相关论文

查看 LEARNING 分类全部论文