Home /Research /Decentralized Reinforcement Learning Robust Optimal Tracking Control for Time Varying Constrained Reconfigurable Modular Robot Based on ACI and<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M1"><mml:mrow><mml:mi>Q</mml:mi></mml:mrow></mml:math>-Function
LEARNING

Decentralized Reinforcement Learning Robust Optimal Tracking Control for Time Varying Constrained Reconfigurable Modular Robot Based on ACI and<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M1"><mml:mrow><mml:mi>Q</mml:mi></mml:mrow></mml:math>-Function

Bo Dong, Yuanchun Li

Year
2013
Citations
15
Access
Open access

Abstract

A novel decentralized reinforcement learning robust optimal tracking control theory for time varying constrained reconfigurable modular robots based on action-critic-identifier (ACI) and state-action value function (<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M2"><mml:mrow><mml:mi>Q</mml:mi></mml:mrow></mml:math>-function) has been presented to solve the problem of the continuous time nonlinear optimal control policy for strongly coupled uncertainty robotic system. The dynamics of time varying constrained reconfigurable modular robot is described as a synthesis of interconnected subsystem, and continuous time state equation and<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M3"><mml:mrow><mml:mi>Q</mml:mi></mml:mrow></mml:math>-function have been designed in this paper. Combining with ACI and RBF network, the global uncertainty of the subsystem and the HJB (Hamilton-Jacobi-Bellman) equation have been estimated, where critic-NN and action-NN are used to approximate the optimal<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M4"><mml:mrow><mml:mi>Q</mml:mi></mml:mrow></mml:math>-function and the optimal control policy, and the identifier is adopted to identify the global uncertainty as well as RBF-NN which is used to update the weights of ACI-NN. On this basis, a novel decentralized robust optimal tracking controller of the subsystem is proposed, so that the subsystem can track the desired trajectory and the tracking error can converge to zero in a finite time. The stability of ACI and the robust optimal tracking controller are confirmed by Lyapunov theory. Finally, comparative simulation examples are presented to illustrate the effectiveness of the proposed ACI and decentralized control theory.

Keywords

Reinforcement learningAlgorithmComputer scienceModular designIdentifierFunction (biology)Artificial intelligenceMachine learningMathematics

Related papers

Browse all LEARNING papers