首页 /研究 /Synthesizing the Kill Chain: A Zero-Shot Framework for Target Verification and Tactical Reasoning on the Edge
PERCEPTION

Synthesizing the Kill Chain: A Zero-Shot Framework for Target Verification and Tactical Reasoning on the Edge

Jesse Barkley, Abraham George, Amir Barati Farimani

发表年份
2026
访问权限
开放获取

摘要

Deploying autonomous edge robotics in dynamic military environments is constrained by both scarce domain-specific training data and the computational limits of edge hardware. This paper introduces a hierarchical, zero-shot framework that cascades lightweight object detection with compact Vision-Language Models (VLMs) from the Qwen and Gemma families (4B-12B parameters). Grounding DINO serves as a high-recall, text-promptable region proposer, and frames with high detection confidence are passed to edge-class VLMs for semantic verification. We evaluate this pipeline on 55 high-fidelity synthetic videos from Battlefield 6 across three tasks: false-positive filtering (up to 100% accuracy), damage assessment (up to 97.5%), and fine-grained vehicle classification (55-90%). We further extend the pipeline into an agentic Scout-Commander workflow, achieving 100% correct asset deployment and a 9.8/10 reasoning score (graded by GPT-4o) with sub-75-second latency. A novel "Controlled Input" methodology decouples perception from reasoning, revealing distinct failure phenotypes: Gemma3-12B excels at tactical logic but fails in visual perception, while Gemma3-4B exhibits reasoning collapse even with accurate inputs. These findings validate hierarchical zero-shot architectures for edge autonomy and provide a diagnostic framework for certifying VLM suitability in safety-critical applications.

关键词

cs.CVcs.AIcs.RO

相关论文

查看 PERCEPTION 分类全部论文