Efficient Transformer-Based Road Scene Segmentation Approach with Attention-Guided Decoding for Memory-Constrained Systems
Bartas Lisauskas, Rytis Maskeliūnas
- 发表年份
- 2025
- 引用次数
- 4
- 访问权限
- 开放获取
摘要
Accurate object detection and an understanding of the surroundings are key requirements when applying computer vision systems in the automotive or robotics industries, namely with autonomous vehicles or self-driving robots. A precise understanding of road users or obstacles is essential to avoid potential accidents. Due to the presence of many objects and the diversity of the environment, the segmentation of the road scene remains a challenging task. In our approach, a Transformer-based backbone is employed for robust feature extraction in the encoder module. In addition, we have developed a custom decoder module in which we implement attention-based fusion mechanisms to effectively combine features. The decoder modification is specifically designed to maintain fine spatial details and enhance the global context understanding, setting our method apart from conventional approaches that typically use simple projection layers or standard query-based decoders. The implemented model consists of 17.2 million parameters and achieves competitive performance, with a mean intersection over union (mIoU) of 76.41% on the Cityscapes validation set. The results gathered indicate the ability of the model to capture both the global context and fine spatial details that are critical to the accurate segmentation of urban scenes. Furthermore, the lightweight design makes the approach suitable for deployment on memory-limited devices.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002