Home /Research /Vision-Language Artificial Intelligence for Robotic-Based Monitoring: Concrete Defect Detection, Classification, and Localization in Two-Dimensional Maps
OTHER

Vision-Language Artificial Intelligence for Robotic-Based Monitoring: Concrete Defect Detection, Classification, and Localization in Two-Dimensional Maps

Farzad Azizi Zade, Arvin Ebrahimkhanlou

Year
2025
Citations
2

Abstract

This paper introduces a novel framework that combines vision-language models (VLMs) and localization techniques to detect, classify, and localize visual structural defects using moving platforms such as robots and handheld devices, with an emphasis on concrete defects. The framework interactively searches for defects by analyzing images captured from various locations and perspectives, employing, but not limited to, the vision transformer for open-world localization (OWL-ViT). Upon detection, defect localization is estimated using the moving platform’s position, orientation, view angles, and depth measurements, with a postprocessing module further enhancing detection relevancy via mixing estimations from distinct views. Evaluations in the real world, in simulation, and on a custom dataset include prompt engineering and a comparison with the classic models (e.g., YOLO). The framework achieves an average Euclidean error of 0.56 m with OWL-ViT’s optimal prompt, compared to 0.75 m with YOLO and 0.97 with DETR, demonstrating its potential for robotic inspection of concrete structures.

Keywords

Mobile robotRobotMobile deviceVisual inspectionRoboticsEuclidean distanceMachine visionTransformer

Related papers

Browse all OTHER papers