首页 /研究 /Multi-scale Fusion and Global Semantic Encoding for Affordance Detection
PERCEPTION

Multi-scale Fusion and Global Semantic Encoding for Affordance Detection

Yang Zhang, Huiyong Li, Tao Ren, Yuanbo Dou, Qingfeng Li

发表年份
2022
引用次数
7

摘要

Affordance detection is of great importance in robot operational tasks, due to its capability of helping robots effectively interact with objects. Many affordance detectors have been proposed, primarily based on two-stage object detection, significantly suffering from the slow detection speed. Hence, recent years have saw the popularity of one-stage affordance detectors based on encoder-decoder structures that adopt dilated convolutions to extract high-resolution feature maps. However, dilated convolutions on high resolution features tend to be computation and memory-intensive, greatly limiting the practicality of one-stage detectors. To address the issue, this paper proposes a novel convolution neural network (CNN) based encoder-decoder architecture, without the need of adopting dilated convolution. A repeated multi-scale feature-map-fusion network is introduced to produce high-resolution features, effectively improving the feature representation performance of the model. Besides, a semantic encode module is embedded to capture global semantic information and enhance category-relevant feature maps. Extensive experiments show that the proposed framework outperforms the start-of-art methods with only 1/2 of the computational cost, while maintaining the inference at the speed of 26ms per image, indicating the promising affordance-detection performance of our network on IIT-AFF dataset and UMD dataset.

关键词

Computer scienceAffordanceConvolution (computer science)Artificial intelligenceEncoding (memory)EncoderConvolutional neural networkObject detectionFeature (linguistics)Inference

相关论文

查看 PERCEPTION 分类全部论文