首页 /研究 /Flex: End-to-End Text-Instructed Visual Navigation from Foundation Model Features

OTHER

Flex: End-to-End Text-Instructed Visual Navigation from Foundation Model Features

Makram Chahine, Alex Quach, Alaa Maalouf, Tsun-Hsuan Wang, Daniela Rus

发表年份: 2024
访问权限: 开放获取

摘要

End-to-end learning directly maps sensory inputs to actions, creating highly integrated and efficient policies for complex robotics tasks. However, such models often struggle to generalize beyond their training scenarios, limiting adaptability to new environments, tasks, and concepts. In this work, we investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies under unseen text instructions and visual distribution shifts. Our findings are synthesized in Flex (Fly lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors, generating spatially aware embeddings that integrate semantic and visual information. We demonstrate the effectiveness of this approach on a quadrotor fly-to-target task, where agents trained via behavior cloning on a small simulated dataset successfully generalize to real-world scenes with diverse novel goals and command formulations.

关键词

cs.ROcs.AI

Flex: End-to-End Text-Instructed Visual Navigation from Foundation Model Features

摘要

关键词

相关论文

一种面向线弧增材制造的电动汽车结构可制造性拓扑优化的双环框架

几何数字孪生：一种用于航空发动机装配精度预测的数字智能模型

通过人工智能驱动的机器人技术革新产业

新型大口径偏置馈电可展开天线设计与动态性能预测