Home /Research /XCS with Combined Reward Method (XCSCR) for Policy Search in Multistep Problems
LEARNING

XCS with Combined Reward Method (XCSCR) for Policy Search in Multistep Problems

Zheming Zhang, Will N. Browne, Dale A. Carnegie

Year
2019
Citations
2

Abstract

A reward mechanism is critical for a Reinforcement Learning agent to learn action policies from rewards. The reward mechanism establishes a policy by estimating contributions of constituents of the policy to a reward. Traditionally, rewards from an environment have two categories: long-term rewards for guiding the policy learning process, and short-term rewards for optimisation. However, long-term, positive rewards are scarce at the initial learning phase in multistep problems such that existing reward mechanisms lack sufficient stimulus to learn policies effectively. This paper proposes XCSCR, an Accuracy-based Learning Classifier System (XCS) algorithm with a combined reward (CR) method, to guide the search for global optimal policies in multistep maze problems. The XCSCR discriminates long-term and short-term rewards through four novel rewardassignment mechanisms: 1) A short-term reward mechanism encourages exploration of the RL agent searching for policies based on short-term rewards. 2) An imprinting mechanism amends the negative impact of indiscriminate rewards between exploration and exploitation. 3) A learning-rate switching mechanism emphasises the impact of long-term positive rewards in the policy searching process. 4) A learning step-threshold mechanism creates an optimisation pressure for policies. Experiments were conducted in three maze environments as this enabled the effects of XCSCR on policies to interpreted easily. Results show that the XCSCR enables learning the optimum path-finding policies quicker and more often than previous XCS algorithms. The XCSCR's improvements for the policy search will facilitate realworld applications, e.g. robotic applications.

Keywords

Reinforcement learningComputer scienceArtificial intelligenceMechanism (biology)Term (time)Stimulus (psychology)Process (computing)Machine learningClassifier (UML)Cognitive psychology

Related papers

Browse all LEARNING papers