首页 /研究 /Proposal of the Continuous-Valued Penalty Avoiding Rational Policy Making Algorithm
LEARNING

Proposal of the Continuous-Valued Penalty Avoiding Rational Policy Making Algorithm

Kazuteru Miyazaki

发表年份
2012
引用次数
3

摘要

Applying reinforcement learning to actual problems, sometimes requires the treatment of continuousvalued input and output. We previously proposed a process called Exploitation-oriented Learning (XoL) to strongly enhance successful experience and thereby reduce the number of trial-and-error searches. A method based on Penalty-Avoiding Rational Policymaking (PARP) is proposed as a XoL method corresponding to continuous-valued input, but types of action treating continuous-valued output are not executed. We study the treatment of continuous-valued output suitable for a XoL method in which the environment includes both a reward and a penalty. We extend PARP in continuous-valued input to continuousvalued output. We apply our proposal to the pole-cart balancing problem and the biped LEGO robot, and confirm its effectiveness.

关键词

Computer scienceReinforcement learningProcess (computing)AlgorithmMathematical optimizationAction (physics)Artificial intelligenceMathematics

相关论文

查看 LEARNING 分类全部论文