首页 /研究 /UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission Generation

OTHER

UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission Generation

Oleg Sautenkov, Yasheerah Yaqoot, Artem Lykov, Muhammad Ahsan Mustafa, Grik Tadevosyan, Aibek Akhmetkazy, Miguel Altamirano Cabrera, Mikhail Martynov, Sausar Karaf, Dzmitry Tsetserukou

发表年份: 2025
引用次数: 22

摘要

The UAV-VLA (Visual-Language-Action) system is a tool designed to facilitate communication with aerial robots. By integrating satellite imagery processing with the Visual Language Model (VLM) and the powerful capabilities of GPT, UAV-VLA enables users to generate general flight paths-and-action plans through simple text requests. This system leverages the rich contextual information provided by satellite images, allowing for enhanced decision-making and mission planning. The combination of visual analysis by VLM and natural language processing by GPT can provide the user with the path-and-action set, making aerial operations more efficient and accessible. The newly developed method showed the difference in the length of the created trajectory in 22% and the mean error in finding the objects of interest on a map in 34.22 m by Euclidean distance in the K-Nearest Neighbors (KNN) approach. Additionally, the UAV-VLA system generates all flight plans in just 5 minutes and 24 seconds, making it 6.5 times faster than an experienced human operator. The code is available here: https://github.com/sautenich/uav-vla

关键词

Computer scienceScale (ratio)Remote sensingAction (physics)Aerospace engineeringArtificial intelligenceAstrobiologyGeologyEngineeringPhysics

UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission Generation

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Fractional Differential Equations

Applied Nonlinear Control