首页 /研究 /Behavior Cloning Assisted Reinforcement Learning for Cable-Driven Continuum Space Robots in Sparse Reward Environments
LEARNING

Behavior Cloning Assisted Reinforcement Learning for Cable-Driven Continuum Space Robots in Sparse Reward Environments

Xianru Tian, Bo Xia, Junbo Tan, Bo Yuan, Zhiheng Li

发表年份
2025
引用次数
1

摘要

Deep reinforcement learning (DRL) has emerged as a powerful tool for controlling cable-driven continuum space robots (CDCSRs), offering a solution that bypasses complex system modeling. However, DRL based on dense reward functions (DRLDR) requires meticulous tuning of the reward structure, whereas DRL based on sparse reward functions (DRLSR) exhibits limited decision-making abilities, particularly in the space environments. To avoid extensive fine-tuning and enhance the performance in controlling CDCSRs, we propose the behavior cloning assisted twin delayed deep deterministic policy gradient (BATD3), a novel algorithm that integrates behavior cloning (BC) with DRLSR. Firstly, a DRLSR-based control framework is developed, which reformulates the control problem as a Markov decision process (MDP). Building on this, the BATD3 algorithm is proposed, comprising two training phases: the prior phase to train the BC model using demonstrations; the formal phase to pre-fill the RL replay buffer with demonstrations and successful BC-environment interaction trajectories, and optimize the RL model with the assistance of BC. Finally, extensive experiments are conducted in the MuJoCo environment to assess the performance of BATD3 in controlling CDCSRs. The results highlight the effectiveness, generalization, stability, robustness and potential of BATD3, along with the practicality and feasibility of the DRLSR-based control framework for CDCSRs.

关键词

Reinforcement learningCloning (programming)RobotReinforcementSpace (punctuation)Computer scienceArtificial intelligenceEngineeringStructural engineering

相关论文

查看 LEARNING 分类全部论文