Robust Driving QA through Metadata-Grounded Context and Task-Specific Prompts
Seungjun Yu, Junsung Park, Youngsun Lim, Hyunjung Shim
- 发表年份
- 2025
- 访问权限
- 开放获取
摘要
We present a two-phase vision-language QA system for autonomous driving that answers high-level perception, prediction, and planning questions. In Phase-1, a large multimodal LLM (Qwen2.5-VL-32B) is conditioned on six-camera inputs, a short temporal window of history, and a chain-of-thought prompt with few-shot exemplars. A self-consistency ensemble (multiple sampled reasoning chains) further improves answer reliability. In Phase-2, we augment the prompt with nuScenes scene metadata (object annotations, ego-vehicle state, etc.) and category-specific question instructions (separate prompts for perception, prediction, planning tasks). In experiments on a driving QA benchmark, our approach significantly outperforms the baseline Qwen2.5 models. For example, using 5 history frames and 10-shot prompting in Phase-1 yields 65.1% overall accuracy (vs.62.61% with zero-shot); applying self-consistency raises this to 66.85%. Phase-2 achieves 67.37% overall. Notably, the system maintains 96% accuracy under severe visual corruption. These results demonstrate that carefully engineered prompts and contextual grounding can greatly enhance high-level driving QA with pretrained vision-language models.
关键词
相关论文
如何缓解越野环境中语义分割的分布偏移
Ji-Hoon Hwang, Daeyoung Kim, Hyung-Suk Yoon 等 5 位作者
2026
基于原型模糊推理与证据融合的不确定性引导工业机器人可进化识别框架
Yanrun Zhou, Zihao Lei, Guangrui Wen 等 7 位作者
Robotics and Computer-Integrated Manufacturing · 2026
基于点云配准的非破坏性高分辨率涂层厚度三维扫描测量
Simon Duenser, Ivo Aschwanden, Raamadaas Krishnadas 等 5 位作者
Robotics and Computer-Integrated Manufacturing · 2026
迈向智能机器人时代:用于高级感知系统的多模态柔性触觉传感器
Sili Ding, Feng Xu, Jie Chen 等 6 位作者
Progress in Materials Science · 2026