首页 /研究 /Vision-Language Artificial Intelligence for Robotic-Based Monitoring: Concrete Defect Detection, Classification, and Localization in Two-Dimensional Maps
OTHER

Vision-Language Artificial Intelligence for Robotic-Based Monitoring: Concrete Defect Detection, Classification, and Localization in Two-Dimensional Maps

Farzad Azizi Zade, Arvin Ebrahimkhanlou

发表年份
2025
引用次数
2

摘要

This paper introduces a novel framework that combines vision-language models (VLMs) and localization techniques to detect, classify, and localize visual structural defects using moving platforms such as robots and handheld devices, with an emphasis on concrete defects. The framework interactively searches for defects by analyzing images captured from various locations and perspectives, employing, but not limited to, the vision transformer for open-world localization (OWL-ViT). Upon detection, defect localization is estimated using the moving platform’s position, orientation, view angles, and depth measurements, with a postprocessing module further enhancing detection relevancy via mixing estimations from distinct views. Evaluations in the real world, in simulation, and on a custom dataset include prompt engineering and a comparison with the classic models (e.g., YOLO). The framework achieves an average Euclidean error of 0.56 m with OWL-ViT’s optimal prompt, compared to 0.75 m with YOLO and 0.97 with DETR, demonstrating its potential for robotic inspection of concrete structures.

关键词

Mobile robotRobotMobile deviceVisual inspectionRoboticsEuclidean distanceMachine visionTransformer

相关论文

查看 OTHER 分类全部论文