Active Exploration in Dynamic Environments
Sebastian Thrun, Knut Möller
- 发表年份
- 1991
- 引用次数
- 114
摘要
Whenever an agent learns to control an unknown environment, two opposing principles have to be combined, namely: exploration (long-term optimization) and exploitation (short-term optimization). Many real-valued connectionist approaches to learning control realize exploration by randomness in action selection. This might be disadvantageous when costs are assigned to "negative experiences". The basic idea presented in this paper is to make an agent explore unknown regions in a more directed manner. This is achieved by a so-called competence map, which is trained to predict the controller's accuracy, and is used for guiding exploration. Based on this, a bistable system enables smoothly switching attention between two behaviors -- exploration and exploitation -- depending on expected costs and knowledge gain. The appropriateness of this method is demonstrated by a simple robot navigation task.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Fractional Differential Equations
Igor Podlubný
2025
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991