Active Exploration in Dynamic Environments

Sebastian Thrun, Knut Möller

发表年份: 1991
引用次数: 114

摘要

Whenever an agent learns to control an unknown environment, two opposing principles have to be combined, namely: exploration (long-term optimization) and exploitation (short-term optimization). Many real-valued connectionist approaches to learning control realize exploration by randomness in action selection. This might be disadvantageous when costs are assigned to &quot;negative experiences&quot;. The basic idea presented in this paper is to make an agent explore unknown regions in a more directed manner. This is achieved by a so-called competence map, which is trained to predict the controller&apos;s accuracy, and is used for guiding exploration. Based on this, a bistable system enables smoothly switching attention between two behaviors -- exploration and exploitation -- depending on expected costs and knowledge gain. The appropriateness of this method is demonstrated by a simple robot navigation task.

关键词

Computer scienceRandomnessAction selectionRobotArtificial intelligenceTask (project management)Reinforcement learningMachine learningEngineering

Active Exploration in Dynamic Environments

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Fractional Differential Equations

Applied Nonlinear Control