Making reinforcement learning work on real robots
Leslie Pack Kaelbling, William D. Smart
- Year
- 2002
- Citations
- 93
Abstract
Programming robots is hard. It often takes a great deal of time to fine-tune the many parameters in a typical control algorithm. For some robot tasks, we may not even know a good solution without extensive experimentation. Even when we, as humans, have good intuitions about how to perform a given task, it is often difficult to translate these into the sensor and actuator spaces of the robot. Having the robot learn how to perform a given task is one way of addressing these problems. Specifying what the robot should be doing, and allowing it to fill in the details of how using learning is an appealing idea. In general, describing a task at a higher, more behavioral level is easier for humans than having to specifying the exact mapping from sensors to actuators that defines a control policy. In particular, reinforcement learning is a very promising paradigm for learning on real robots. However, simply applying existing reinforcement learning techniques will almost certainly lead to failure. Issues such as large, continuous state and action spaces, extremely limited amounts of training data, lack of initial knowledge about the task and environment, and the necessity of keeping the robot physically safe during learning must be explicitly addressed if learning is to succeed. In this dissertation, we identify some of the problems that must be overcome when attempting to implement a reinforcement learning system on a real mobile robot. We discuss some solutions to these problems and present two components that, together, allow us to use reinforcement learning techniques effectively on a real robot. HEDGER is a safe value-function approximation algorithm designed to be used with continuous state and action spaces, and with sparse reward functions. JAQL is our general framework for reinforcement learning on real robots, and deals with the problems of initial knowledge and robot safety. We validate the effectiveness of both components using a variety of simulated and real robot task domains.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002