Reward-penalty reinforcement learning scheme for planning and reactive behaviour
A.F.R. Araújo, Andreza Pereira Braga
- 发表年份
- 2002
- 引用次数
- 8
摘要
This paper describes a reinforcement learning algorithm that allows a point robot to learn navigation strategies within initially unknown indoor environments with fixed and dynamic obstacles. The knowledge is encoded in two surfaces, called reward and penalty surfaces, that are updated either when a target is found or whenever the robot moves respectively. The proposed policy is suitable for both planning and reactive behaviour. The tests involve different kinds of obstacles: a fixed passage, a barrier, a U-shape obstacle and a simple maze. The results suggest that the model solves the goal-directed exploration problem. Thus, the robot is able to reach a desired goal, starting its movement from any position within the environment, avoiding obstacles, and following a viable trajectory. The robot may get stuck in dynamic obstacles, may depend on randomness to avoid them, and generally does not solve the goal-directed reinforcement learning problem.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002