Multi-scale Fusion and Global Semantic Encoding for Affordance Detection
Yang Zhang, Huiyong Li, Tao Ren, Yuanbo Dou, Qingfeng Li
- Year
- 2022
- Citations
- 7
Abstract
Affordance detection is of great importance in robot operational tasks, due to its capability of helping robots effectively interact with objects. Many affordance detectors have been proposed, primarily based on two-stage object detection, significantly suffering from the slow detection speed. Hence, recent years have saw the popularity of one-stage affordance detectors based on encoder-decoder structures that adopt dilated convolutions to extract high-resolution feature maps. However, dilated convolutions on high resolution features tend to be computation and memory-intensive, greatly limiting the practicality of one-stage detectors. To address the issue, this paper proposes a novel convolution neural network (CNN) based encoder-decoder architecture, without the need of adopting dilated convolution. A repeated multi-scale feature-map-fusion network is introduced to produce high-resolution features, effectively improving the feature representation performance of the model. Besides, a semantic encode module is embedded to capture global semantic information and enhance category-relevant feature maps. Extensive experiments show that the proposed framework outperforms the start-of-art methods with only 1/2 of the computational cost, while maintaining the inference at the speed of 26ms per image, indicating the promising affordance-detection performance of our network on IIT-AFF dataset and UMD dataset.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002