Vision-Language Artificial Intelligence for Robotic-Based Monitoring: Concrete Defect Detection, Classification, and Localization in Two-Dimensional Maps
Farzad Azizi Zade, Arvin Ebrahimkhanlou
- 发表年份
- 2025
- 引用次数
- 2
摘要
This paper introduces a novel framework that combines vision-language models (VLMs) and localization techniques to detect, classify, and localize visual structural defects using moving platforms such as robots and handheld devices, with an emphasis on concrete defects. The framework interactively searches for defects by analyzing images captured from various locations and perspectives, employing, but not limited to, the vision transformer for open-world localization (OWL-ViT). Upon detection, defect localization is estimated using the moving platform’s position, orientation, view angles, and depth measurements, with a postprocessing module further enhancing detection relevancy via mixing estimations from distinct views. Evaluations in the real world, in simulation, and on a custom dataset include prompt engineering and a comparison with the classic models (e.g., YOLO). The framework achieves an average Euclidean error of 0.56 m with OWL-ViT’s optimal prompt, compared to 0.75 m with YOLO and 0.97 with DETR, demonstrating its potential for robotic inspection of concrete structures.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Fractional Differential Equations
Igor Podlubný
2025
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991