首页 /研究 /Incremental acquisition of behaviors and signs based on a reinforcement learning schemata model and a spike timing-dependent plasticity network

LEARNING

Incremental acquisition of behaviors and signs based on a reinforcement learning schemata model and a spike timing-dependent plasticity network

Tadahiro Taniguchi, Tetsuo Sawaragi

发表年份: 2007
引用次数: 21

摘要

A novel integrative learning architecture based on a reinforcement learning schemata model (RLSM) with a spike timing-dependent plasticity (STDP) network is described. This architecture models operant conditioning with discriminative stimuli in an autonomous agent engaged in multiple reinforcement learning tasks. The architecture consists of two constitutional learning architectures: RLSM and STDP. RLSM is an incremental modular reinforcement learning architecture, and it makes an autonomous agent acquire several behavioral concepts incrementally through continuous interactions with its environment and/or caregivers. STDP is a learning rule of neuronal plasticity found in cerebral cortices and the hippocampus of the human brain. STDP is a temporally asymmetric learning rule that contrasts with the Hebbian learning rule. We found that STDP enabled an autonomous robot to associate auditory input with its acquired behaviors and to select reinforcement learning modules more effectively. Auditory signals interpreted based on the acquired behaviors were revealed to correspond to 'signs' of required behaviors and incoming situations. This integrative learning architecture was evaluated in the context of on-line modular learning.

关键词

Reinforcement learningHebbian theoryComputer scienceModular designArtificial intelligenceReinforcementContext (archaeology)Discriminative modelMachine learningArtificial neural network

Incremental acquisition of behaviors and signs based on a reinforcement learning schemata model and a spike timing-dependent plasticity network

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory