Battery Management for Warehouse Robots via Average-Reward Reinforcement Learning
Yongjin Mu, Yanjie Li, Ke Lin, Ki Deng, Qi Liu
- 发表年份
- 2022
- 引用次数
- 2
摘要
In automated warehouses, the battery management strategy of Automated Guided Vehicles (AGVs) can affect the throughput and operational efficiency of the warehouse. In this paper, we first model the battery management problem as a Markov Decision Process (MDP) and adopt the deep reinforcement learning (DRL) algorithm as the battery management strategy. However, discounted reward DRL algorithms ignore long-term benefits, which are not suitable for the strategy since orders arriving at the warehouse at every moment are important and should be treated. In order to solve the above problems, we then introduce the average reward DRL algorithm to focus more on long-term benefits. But the existing average reward DRL algorithms have the problems of low sample utilization and unstable training. Therefore, we present a practical algorithm called average reward TD3 (ARTD3) that learns faster and is more stable. Finally, we conduct extensive experiments to confirm that ARTD3 outperforms discounted reward DRL algorithm and rule-based methods.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Fractional Differential Equations
Igor Podlubný
2025
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991