Behavior Cloning Assisted Reinforcement Learning for Cable-Driven Continuum Space Robots in Sparse Reward Environments
Xianru Tian, Bo Xia, Junbo Tan, Bo Yuan, Zhiheng Li
- Year
- 2025
- Citations
- 1
Abstract
Deep reinforcement learning (DRL) has emerged as a powerful tool for controlling cable-driven continuum space robots (CDCSRs), offering a solution that bypasses complex system modeling. However, DRL based on dense reward functions (DRLDR) requires meticulous tuning of the reward structure, whereas DRL based on sparse reward functions (DRLSR) exhibits limited decision-making abilities, particularly in the space environments. To avoid extensive fine-tuning and enhance the performance in controlling CDCSRs, we propose the behavior cloning assisted twin delayed deep deterministic policy gradient (BATD3), a novel algorithm that integrates behavior cloning (BC) with DRLSR. Firstly, a DRLSR-based control framework is developed, which reformulates the control problem as a Markov decision process (MDP). Building on this, the BATD3 algorithm is proposed, comprising two training phases: the prior phase to train the BC model using demonstrations; the formal phase to pre-fill the RL replay buffer with demonstrations and successful BC-environment interaction trajectories, and optimize the RL model with the assistance of BC. Finally, extensive experiments are conducted in the MuJoCo environment to assess the performance of BATD3 in controlling CDCSRs. The results highlight the effectiveness, generalization, stability, robustness and potential of BATD3, along with the practicality and feasibility of the DRLSR-based control framework for CDCSRs.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002