首页 /研究 /Proposal of the Continuous-Valued Penalty Avoiding Rational Policy Making Algorithm

LEARNING

Proposal of the Continuous-Valued Penalty Avoiding Rational Policy Making Algorithm

Kazuteru Miyazaki

发表年份: 2012
引用次数: 3

摘要

Applying reinforcement learning to actual problems, sometimes requires the treatment of continuousvalued input and output. We previously proposed a process called Exploitation-oriented Learning (XoL) to strongly enhance successful experience and thereby reduce the number of trial-and-error searches. A method based on Penalty-Avoiding Rational Policymaking (PARP) is proposed as a XoL method corresponding to continuous-valued input, but types of action treating continuous-valued output are not executed. We study the treatment of continuous-valued output suitable for a XoL method in which the environment includes both a reward and a penalty. We extend PARP in continuous-valued input to continuousvalued output. We apply our proposal to the pole-cart balancing problem and the biped LEGO robot, and confirm its effectiveness.

关键词

Computer scienceReinforcement learningProcess (computing)AlgorithmMathematical optimizationAction (physics)Artificial intelligenceMathematics

Proposal of the Continuous-Valued Penalty Avoiding Rational Policy Making Algorithm

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Fractional Differential Equations

Applied Nonlinear Control