首页 /研究 /Real2Sim via Active Perception with Behavior Trees Automatically Generated by VLMs
PERCEPTION

Real2Sim via Active Perception with Behavior Trees Automatically Generated by VLMs

Alessandro Adami, Sebastian Zudaire, Ruggero Carli, Pietro Falco

发表年份
2026
访问权限
开放获取

摘要

Constructing physically accurate simulation environments (Real2Sim) traditionally relies on manual system identification or rigid, exhaustive exploration routines. These task-agnostic pipelines often fail to leverage semantic scene context, leading to redundant physical interactions and inefficient data acquisition. In this paper, we present an autonomous, intent-driven Real2Sim framework that leverages Vision-Language Models (VLMs) for Semantic Task Decomposition. Given a high-level natural language request, an incomplete simulation description, and a visual observation, the framework autonomously identifies the minimal subset of missing physical parameters required for the simulation task. It then generates a reactive Behavior Tree (BT) composed of atomic motion and sensing primitives to selectively acquire these parameters through contact-rich robotic interaction. Extensive real-world experiments on a torque-controlled Franka Emika Panda demonstrate that our approach accurately estimates object mass, surface geometry, and derived parameters such as friction. Quantitative evaluations reveal significant operational efficiency gains compared to exhaustive baseline methods, while ablation studies confirm the robustness of the prompt architecture across different state-of-the-art VLMs. Furthermore, the reactive hierarchy of the BT acts as a deterministic safety filter, successfully mitigating generative VLM hallucinations and preventing unsafe physical anomalies. Ultimately, this work provides a scalable, efficient, and interpretable pipeline for building physics-aware digital twins directly from unstructured human intent.

关键词

cs.RO

相关论文

查看 PERCEPTION 分类全部论文