Proposal of the Continuous-Valued Penalty Avoiding Rational Policy Making Algorithm
Kazuteru Miyazaki
- 发表年份
- 2012
- 引用次数
- 3
摘要
Applying reinforcement learning to actual problems, sometimes requires the treatment of continuousvalued input and output. We previously proposed a process called Exploitation-oriented Learning (XoL) to strongly enhance successful experience and thereby reduce the number of trial-and-error searches. A method based on Penalty-Avoiding Rational Policymaking (PARP) is proposed as a XoL method corresponding to continuous-valued input, but types of action treating continuous-valued output are not executed. We study the treatment of continuous-valued output suitable for a XoL method in which the environment includes both a reward and a penalty. We extend PARP in continuous-valued input to continuousvalued output. We apply our proposal to the pole-cart balancing problem and the biped LEGO robot, and confirm its effectiveness.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Fractional Differential Equations
Igor Podlubný
2025
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991