首页 /研究 /Deep Controlled Learning for Inventory Control
LEARNING

Deep Controlled Learning for Inventory Control

Tarkan Temizöz, Christina Imdahl, Remco Dijkman, Douniel Lamghari-Idrissi, Willem van Jaarsveld

发表年份
2025
引用次数
20

摘要

The application of Deep Reinforcement Learning (DRL) to inventory management is an emerging field. However, traditional DRL algorithms, originally developed for diverse domains such as game-playing and robotics, may not be well-suited for the specific challenges posed by inventory management. Consequently, these algorithms often fail to outperform established heuristics; for instance, no existing DRL approach consistently surpasses the capped base-stock policy in lost sales inventory control. This highlights a critical gap in the practical application of DRL to inventory management: the highly stochastic nature of inventory problems requires tailored solutions. In response, we propose Deep Controlled Learning (DCL), a new DRL algorithm designed for highly stochastic problems. DCL is based on approximate policy iteration and incorporates an efficient simulation mechanism, combining Sequential Halving with Common Random Numbers. Our numerical studies demonstrate that DCL consistently outperforms state-of-the-art heuristics and DRL algorithms across various inventory settings, including lost sales, perishable inventory systems, and inventory systems with random lead times. DCL achieves lower average costs in all test cases while maintaining an optimality gap of no more than 0.2%. Remarkably, this performance is achieved using the same hyperparameter set across all experiments, underscoring the robustness and generalizability of our approach. These findings contribute to the ongoing exploration of tailored DRL algorithms for inventory management, providing a foundation for further research and practical application in this area. • Inventory management requires tailored algorithms to meet their requirements. • We propose a deep reinforcement learning algorithm (DCL) for inventory management. • DCL uses Sequential Halving with Common Random Numbers for efficient simulation. • DCL outperforms state-of-the-art heuristics in lost sales and perishable inventory. • DCL achieves optimality gap of at most 0.2% in numerical experiments.

关键词

Computer scienceInventory controlControl (management)Artificial intelligenceOperations researchMathematics

相关论文

查看 LEARNING 分类全部论文