首页 /研究 /Risk-Conditioned Reinforcement Learning: A Generalized Approach for Adapting to Varying Risk Measures
LOCOMOTION

Risk-Conditioned Reinforcement Learning: A Generalized Approach for Adapting to Varying Risk Measures

Gwangpyo Yoo, Jinwoo Park, Honguk Woo

发表年份
2024
引用次数
4
访问权限
开放获取

摘要

In application domains requiring mission-critical decision making, such as finance and robotics, the optimal policy derived by reinforcement learning (RL) often hinges on a preference for risk management. Yet, the dynamic nature of risk measures poses considerable challenges to achieving generalization and adaptation of risk-sensitive policies in the context of RL. In this paper, we propose a risk-conditioned RL model that enables rapid policy adaptation to varying risk measures via a unified risk representation, the Weighted Value-at-Risk (WV@R). To sample risk measures that avoid undue optimism, we construct a risk proposal network employing a conditional adversarial auto-encoder and a normalizing flow. This network establishes coherent representations for risk measures, preserving the continuity in terms of the Wasserstein distance on the risk measures. The normalizing flow is used to support non-crossing quantile regression that obtains valid samples for risk measures, and it is also applied to the agent’s critic to ascertain the preservation of monotonicity in quantile estimations. Through experiments with locomotion, finance, and self-driving scenarios, we show that our model is capable of adapting to a range of risk measures, achieving comparable performance to the baseline models individually trained for each measure. Our model often outperforms the baselines, especially in the cases when exploration is required during training but risk-aversion is favored during evaluation.

关键词

ReinforcementReinforcement learningComputer sciencePsychologyArtificial intelligenceSocial psychology

相关论文

查看 LOCOMOTION 分类全部论文