Home /Research /Hybrid Deep Learning Framework for Eye-in-Hand Visual Control Systems
MANIPULATION

Hybrid Deep Learning Framework for Eye-in-Hand Visual Control Systems

Adrian-Paul Botezatu, A. Iancu, Adrian Burlacu

Year
2025
Citations
3
Access
Open access

Abstract

This work proposes a hybrid deep learning-based framework for visual feedback control in an eye-in-hand robotic system. The framework uses an early fusion approach in which real and synthetic images define the training data. The first layer of a ResNet-18 backbone is augmented to fuse interest-point maps with RGB channels, enabling the network to capture scene geometry better. A manipulator robot with an eye-in-hand configuration provides a reference image, while subsequent poses and images are generated synthetically, removing the need for extensive real data collection. The experimental results reveal that this enriched input representation significantly improves convergence accuracy and velocity smoothness compared to a baseline that processes real images alone. Specifically, including feature point maps allows the network to discriminate crucial elements in the scene, resulting in more precise velocity commands and stable end-effector trajectories. Thus, integrating additional, synthetically generated map data into convolutional architectures can enhance the robustness and performance of the visual servoing system, particularly when real-world data gathering is challenging. Unlike existing visual servoing methods, our early fusion strategy integrates feature maps directly into the network’s initial convolutional layer, allowing the model to learn critical geometric details from the very first stage of training. This approach yields superior velocity predictions and smoother servoing compared to conventional frameworks.

Keywords

Eye–hand coordinationArtificial intelligenceVisual servoingComputer scienceControl (management)Deep learningComputer visionPsychologyImage (mathematics)

Related papers

Browse all MANIPULATION papers