Feedback-Enhanced Hallucination-Resistant Vision-Language Model for Real-Time Scene Understanding
Zahir Alsulaimawi
- 发表年份
- 2025
- 访问权限
- 开放获取
摘要
Real-time scene comprehension is a key advance in artificial intelligence, enhancing robotics, surveillance, and assistive tools. However, hallucination remains a challenge. AI systems often misinterpret visual inputs, detecting nonexistent objects or describing events that never happened. These errors, far from minor, threaten reliability in critical areas like security and autonomous navigation where accuracy is essential. Our approach tackles this by embedding self-awareness into the AI. Instead of trusting initial outputs, our framework continuously assesses them in real time, adjusting confidence thresholds dynamically. When certainty falls below a solid benchmark, it suppresses unreliable claims. Combining YOLOv5's object detection strength with VILA1.5-3B's controlled language generation, we tie descriptions to confirmed visual data. Strengths include dynamic threshold tuning for better accuracy, evidence-based text to reduce hallucination, and real-time performance at 18 frames per second. This feedback-driven design cuts hallucination by 37 percent over traditional methods. Fast, flexible, and reliable, it excels in applications from robotic navigation to security monitoring, aligning AI perception with reality.
关键词
相关论文
如何缓解越野环境中语义分割的分布偏移
Ji-Hoon Hwang, Daeyoung Kim, Hyung-Suk Yoon 等 5 位作者
2026
基于原型模糊推理与证据融合的不确定性引导工业机器人可进化识别框架
Yanrun Zhou, Zihao Lei, Guangrui Wen 等 7 位作者
Robotics and Computer-Integrated Manufacturing · 2026
基于点云配准的非破坏性高分辨率涂层厚度三维扫描测量
Simon Duenser, Ivo Aschwanden, Raamadaas Krishnadas 等 5 位作者
Robotics and Computer-Integrated Manufacturing · 2026
迈向智能机器人时代:用于高级感知系统的多模态柔性触觉传感器
Sili Ding, Feng Xu, Jie Chen 等 6 位作者
Progress in Materials Science · 2026