ERPPO: Entropy Regularization-based Proximal Policy Optimization
Changha Lee, Gyusang Cho
- Year
- 2026
- Access
- Open access
Abstract
Multi-Agent Proximal Policy Optimization (MAPPO) is a variant of the Proximal Policy Optimization (PPO) algorithm, specifically tailored for multi-agent reinforcement learning (MARL). MAPPO optimizes cooperative multi-agent settings by employing a centralized critic with decentralized actors. However, in case of multi-dimensional environment, MAPPO can not extract optimal policy due to non-stationary agent observation. To overcome this problem, we introduce a novel approach, Entropy Regularization-based Proximal Policy Optimization (ERPPO). For the policy optimization, we first define the object detection ambiguity under multi-dimensional observation environment. Distributional Spatiotemporal Ambiguity (DSA) learner is trained to estimate object detection uncertainty in non-stationary constraints. Then, we enhance PPO with a novel Entropy Regularization term. This regularization dynamically adjusts the policy update by applying a stronger (L1) regularization in high-ambiguity observation to encourage significant exploratory actions and a weaker (L2) regularization in low-ambiguity observation to stabilize the proximal policy optimization. This approach is designed to enhance the probability of successful object localization in time-critical operations by reducing detection failures and optimizing search policy. Experiments on a testbed with AirSim-based maritime searching scenarios show that the proposed ERPPO improves accuracy performance. Our proposed method improves higher gradient than MAPPO. Qualitative results confirm that ERPPO effectiveness in terms of suppressing false detection in visually uncertain conditions.
Keywords
Related papers
How to Relieve Distribution Shifts in Semantic Segmentation for Off-Road Environments
Ji-Hoon Hwang, Daeyoung Kim, Hyung-Suk Yoon +2 more
2026
Uncertainty-guided evolvable recognition framework for industrial robots via prototype-based fuzzy inference and evidence fusion
Yanrun Zhou, Zihao Lei, Guangrui Wen +4 more
Robotics and Computer-Integrated Manufacturing · 2026
Point cloud registration for non-destructive, high-resolution coating thickness measurement from 3D scans
Simon Duenser, Ivo Aschwanden, Raamadaas Krishnadas +2 more
Robotics and Computer-Integrated Manufacturing · 2026
Toward the intelligent robotics era: Multimodal flexible haptic sensors for advanced perception systems
Sili Ding, Feng Xu, Jie Chen +3 more
Progress in Materials Science · 2026