首页 /研究 /DAG: Unleash the Potential of Diffusion Model for Open-Vocabulary 3D Affordance Grounding
HRI

DAG: Unleash the Potential of Diffusion Model for Open-Vocabulary 3D Affordance Grounding

Hanqing Wang, Zhenhao Zhang, Kaiyang Ji, Mingyu Liu, Wenti Yin, Yuchao Chen, Zhirui Liu, Xiangyu Zeng, Tianxiang Gui, Hangxing Zhang

发表年份
2025
访问权限
开放获取

摘要

3D object affordance grounding aims to predict the touchable regions on a 3d object, which is crucial for human-object interaction, human-robot interaction, embodied perception, and robot learning. Recent advances tackle this problem via learning from demonstration images. However, these methods fail to capture the general affordance knowledge within the image, leading to poor generalization. To address this issue, we propose to use text-to-image diffusion models to extract the general affordance knowledge because we find that such models can generate semantically valid HOI images, which demonstrate that their internal representation space is highly correlated with real-world affordance concepts. Specifically, we introduce the DAG, a diffusion-based 3d affordance grounding framework, which leverages the frozen internal representations of the text-to-image diffusion model and unlocks affordance knowledge within the diffusion model to perform 3D affordance grounding. We further introduce an affordance block and a multi-source affordance decoder to endow 3D dense affordance prediction. Extensive experimental evaluations show that our model excels over well-established methods and exhibits open-world generalization.

关键词

cs.CV

相关论文

查看 HRI 分类全部论文