首页 /研究 /Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial Attention

OTHER

Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial Attention

Haomeng Zhang, Chiao-An Yang, Raymond A. Yeh

发表年份: 2024
访问权限: 开放获取

摘要

Multi-object 3D Grounding involves locating 3D boxes based on a given query phrase from a point cloud. It is a challenging and significant task with numerous applications in visual understanding, human-computer interaction, and robotics. To tackle this challenge, we introduce D-LISA, a two-stage approach incorporating three innovations. First, a dynamic vision module that enables a variable and learnable number of box proposals. Second, a dynamic camera positioning that extracts features for each proposal. Third, a language-informed spatial attention module that better reasons over the proposals to output the final prediction. Empirically, experiments show that our method outperforms the state-of-the-art methods on multi-object 3D grounding by 12.8% (absolute) and is competitive in single-object 3D grounding.

关键词

cs.CV

Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial Attention

摘要

关键词

相关论文

一种面向线弧增材制造的电动汽车结构可制造性拓扑优化的双环框架

几何数字孪生：一种用于航空发动机装配精度预测的数字智能模型

通过人工智能驱动的机器人技术革新产业

新型大口径偏置馈电可展开天线设计与动态性能预测