An Efficient Deep $Q$-learning Strategy for Sequential Decision-making in Game-playing
Oscar Chang, Manuel Eugenio Morocho-Cayamcela, Israel Pineda, K. Cardenas
- 发表年份
- 2022
- 引用次数
- 5
摘要
This paper presents a deep reinforcement learning model that efficiently learns a sequential decision-making policy to play tic-tac-toe intelligently directly from a high-dimensional video. To produce a stable, sparse neural representation of the states of the tic-tac-toe board, a convolutional pre-trained neural network has been used, followed by a fully-connected sigmoidal network. The assemble behaves as a <tex>$\boldsymbol{Q}$</tex> -matrix and produces the ultimate state-decision pairs that control a robotic arm placing physical tokens on the board. The hyperparameters in the whole network are tuned to produce a stable trainable array of elements. An internal clock composed of internal neurons is integrated to give the agent a sense of sequential timing. To solve the <tex>$\mathbf{max}(\cdot)$</tex> function, a novel algorithm is introduced to search for the <tex>$\boldsymbol{Q}$</tex> -network values. The algorithm uses a dedicated, sigmoidal net initialized with random parameters. Under backpropagation it iteratively moves to a stable plateau that mimics the all-zeros condition of an initial Q-matrix. Next, the agent uses Bellman's reinforcement principles to learn an optimal policy with a noticeable look-ahead capability. Computer simulations driving a physical robot proved the convergence and effectiveness of the proposed methodology and demonstrated a marked ability in sequential decision-making, taking raw video frames as input.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002