首页 /研究 /Battery Management for Warehouse Robots via Average-Reward Reinforcement Learning
LEARNING

Battery Management for Warehouse Robots via Average-Reward Reinforcement Learning

Yongjin Mu, Yanjie Li, Ke Lin, Ki Deng, Qi Liu

发表年份
2022
引用次数
2

摘要

In automated warehouses, the battery management strategy of Automated Guided Vehicles (AGVs) can affect the throughput and operational efficiency of the warehouse. In this paper, we first model the battery management problem as a Markov Decision Process (MDP) and adopt the deep reinforcement learning (DRL) algorithm as the battery management strategy. However, discounted reward DRL algorithms ignore long-term benefits, which are not suitable for the strategy since orders arriving at the warehouse at every moment are important and should be treated. In order to solve the above problems, we then introduce the average reward DRL algorithm to focus more on long-term benefits. But the existing average reward DRL algorithms have the problems of low sample utilization and unstable training. Therefore, we present a practical algorithm called average reward TD3 (ARTD3) that learns faster and is more stable. Finally, we conduct extensive experiments to confirm that ARTD3 outperforms discounted reward DRL algorithm and rule-based methods.

关键词

Reinforcement learningMarkov decision processComputer scienceBattery (electricity)ThroughputRobotArtificial intelligenceMarkov processMathematicsPower (physics)

相关论文

查看 LEARNING 分类全部论文