Reinforcement learning of reactive navigation for computer animation of simulated agents
Welton Becket
- Year
- 1997
- Citations
- 2
Abstract
Behavioral control has been an effective method for controlling low-level motion for autonomous agents. However, one difficulty is the complexity of designing behaviors and arbitration among behaviors for all but the simplest navigation or motor control tasks. The approach taken here applies reinforcement learning techniques with delayed rewards to behavioral control, building on existing approaches from robotics, computer graphics, and machine learning by dealing with issues specific to autonomous agents for computer animation. In addition, behaviors are assumed to be part of a larger architecture, such as a symbolic reasoner or task-network system, so that the learning can focus on problems for which behavioral control is most appropriate. Three learning approaches are first considered and compared on two single-agent navigation problems where agents have only local information through simulated sensors. The first uses numerical optimization to find a single configuration of parameters of behaviors. This method is very fast and takes advantage of pre-defined behaviors. However, it is conceptually limited because it does not change parameters over time. The second approach models the problem as a Markov Decision Process (MDP) and finds a policy (a mapping from states to actions) that directly learns behavior from delayed reinforcement without the use of pre-defined behaviors. This approach, though conceptually very powerful, is extremely slow even when a generalization method is applied. The third approach, behavior-parameter learning (BP-learning), combines advantages of the first two approaches by learning a policy from perceived state to parameters of pre-defined behaviors using an MDP model: it uses the second approach to schedule rather than optimize the parameters of the first approach. This solution is both powerful, due to varying parameters, and quick, to converge because it takes advantage of pre-defined behaviors. The power of BP-learning for animation is then demonstrated on a group navigation problem which is extremely-difficult to solve by hand. Finally, all three methods are found to be applicable in different situations and can apply to other single-agent and group navigation problems for computer animation.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002