Object Positions Interpretation System for Service Robots Through Targeted Object Marking
Kosei Yamao, Daiju Kanaoka, Kosei Isomoto, Hakaru Tamukoh
- 发表年份
- 2025
- 引用次数
- 2
摘要
Service robots are typically required to interpret and execute various complex tasks in home environments. Recognizing the environment, such as furniture, and understanding the relationships between object positions is critical for executing various tasks. Set of mark (SoM) is a visual prompting method that focuses on interpreting the relationship between semantic regions by overlaying marks in each region. However, SoM marks segmented regions that are not objects such as walls and floors. This marking creates noise when interpreting object positions. To address this problem, we propose a novel object-position interpretation system that combines an object detection model and a vision-language model (VLM). The proposed system incorporates an object detection model to mark only objects, allowing the VLM to efficiently interpret object positions. Furthermore, the proposed system improves the accuracy of the system by including the original image and label output by the object detection model in the input to the VLM. The experimental results show that the proposed system outperforms SoM in terms of interpreting object positions.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002