Robot Instance Segmentation with Few Annotations for Grasping
Moshe Kimhi, David Vainshtein, Chaim Baskin, Dotan Di Castro
- Year
- 2025
- Citations
- 4
Abstract
The ability of robots to manipulate objects relies heavily on their aptitude for visual perception. In domains charac-terized by cluttered scenes and high object variability such as traffic, navigation and object grasping, most methods call for vast labeled datasets, laboriously hand-annotated, with the aim of training capable models. Once deployed, the challenge of generalizing to unfamiliar objects implies that the model must evolve alongside its domain. To address this, we propose a novel framework that combines Semi-Supervised Learning (SSL) with Learning Through Interaction (LTI), allowing a model to learn by observing scene alterations and leverage visual consistency despite tempo-ral gaps without requiring curated data of interaction se-quences. As a result, our approach exploits partially anno-tated data through self-supervision and incorporates temporal context using pseudo-sequences generated from unla-beled still images. We validate our method on two common benchmarks, ARMBench mix-object-tote and OCID, where it achieves state-of-the-art performance. Notably, on ARM-Bench, we attain an AP50 of 86.37, almost a 20% improvement over existing work, and obtain remarkable results in scenarios with extremely low annotation, achieving an AP<inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">50</inf> score of 84.89 with just 1 % of annotated data compared to previous state of the art of 82 which targeted the fully anno-tated dataset.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002