首页 /研究 /Human Tide, Clear Sight: Semantically Enhanced Visual Localization in High-Crowd Scenarios
OTHER

Human Tide, Clear Sight: Semantically Enhanced Visual Localization in High-Crowd Scenarios

Yida Wei, Sikang Liu, Zixuan Huang, Wei He, You Li

发表年份
2025
引用次数
1

摘要

Accurate visual localization is essential in IoT applications, particularly for robotics, autonomous systems, and augmented reality. Traditional feature-based methods struggle with efficiency and robustness against environmental variations. To enhance the robustness of visual localization algorithms against these variations, state-of-the-art (SOTA) methods have incorporated semantic information as an advanced dimension into their models, but still suffer from several shortcomings. These methods often embed semantic information implicitly, which limits their extensibility and interpretability. Moreover, the introduction of some unstable semantic labels may, on the contrary, degrade the localization accuracy. Therefore, modularity, quantization, and filtering semantic labels by their stability become critical. To address these gaps, this article proposes a method that explicitly and quantitatively integrates semantic information through a plug-and-play module. This module scores image-to-image and feature-to-feature correspondences based on semantic similarity and stability, with a particular focus on improving smartphone-based visual localization in high-crowd indoor scenarios. This module is introduced into two key stages of visual hierarchical localization: 1) visual place recognition (coarse localization) and 2) 6-Degree-of-Freedom pose estimation (fine localization). Specifically, correspondences with low scores imply a higher probability of matching errors and are therefore suppressed. To validate the proposed approach, a novel dataset designed for semantic visual localization tasks is collected, rich with dynamic objects and scene variations. The method demonstrates superior accuracy and robustness, particularly in environments with significant scene appearance changes, with 13.6% and 5.4% improvement in localization accuracy in Cafds and Libds datasets, respectively, compared to the SOTA approach. The code and dataset are available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/1da1da/SEVL</uri>.

关键词

Computer scienceSightVisualizationHuman–computer interactionComputer visionRemote sensingArtificial intelligenceGeology

相关论文

查看 OTHER 分类全部论文