A Lightweight Detection Model Without Convolutions for Complex Stacked Grasping Tasks
Li Ren, Haitao Jia, Shaojiang Wang, Rui Zhong
- Year
- 2025
- Citations
- 2
- Access
- Open access
Abstract
In complex environments with multi-object stacking, the spatial relationships between objects necessitate a sequential grasping strategy to ensure both the safety of the target objects and the efficiency of robotic arm operations. To address this challenge, this study introduces a Visual Manipulation Relationship Network (VMRN) to determine the optimal grasping sequence. Traditional VMRN frameworks typically rely on convolutional neural networks (CNNs) for feature extraction, which often struggle with high-frequency feature extraction, long-tail data distributions, and real-time computational demands in multi-object stacking scenarios. To overcome these limitations, we propose a lightweight, convolution-free Transformer-based feature extraction network integrated into the visual detection model. This model is specifically designed for visual reasoning, with a focus on lightweight optimization to enhance the extraction of features for stacked objects. The proposed network incorporates local window attention, global information aggregation and broadcasting, and a dual-dimensional attention-based feedforward network to improve feature representation. Additionally, a novel loss function is designed to address the performance degradation in detecting long-tail categories, effectively mitigating the over-suppression of rare objects in imbalanced datasets. Experimental results demonstrate that the proposed model significantly improves both detection accuracy and computational efficiency, making it particularly suitable for real-time robotic grasping tasks in complex environments due to its lightweight design.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002