Home /Research /Memory-Maze: Scenario Driven Visual Language Navigation Benchmark for Guiding Blind People

PERCEPTION

Memory-Maze: Scenario Driven Visual Language Navigation Benchmark for Guiding Blind People

Masaki Kuribayashi, K. Uehara, Daisuke Sato, Renato Alexandre Ribeiro, Simon Chu, Shigeo Morishima

Year: 2025
Citations: 1

Abstract

Visual Language Navigation (VLN) powered robots have the potential to guide blind people by understanding route instructions provided by sighted passersby. This capability allows robots to operate in environments often unknown a prior. Existing VLN models are insufficient for the scenario of navigation guidance for blind people, as they need to understand routes described from human memory, which frequently contains stutters, errors, and omissions of details, as opposed to those obtained by thinking out loud, such as in the R2R dataset. However, existing benchmarks do not contain instructions obtained from human memory in natural environments. To this end, we present our benchmark, Memory-Maze, which simulates the scenario of seeking route instructions for guiding blind people. Our benchmark contains a maze-like structured virtual environment and novel route instruction data from human memory. Our analysis demonstrates that instruction data collected from memory was longer and contained more varied wording. We further demonstrate that addressing errors and ambiguities from memory-based instructions is challenging, by evaluating state-of-the-art models alongside our baseline model with modularized perception and controls.

Keywords

Benchmark (surveying)RobotPerceptionNatural languageBaseline (sea)Virtual machineVisual languageTask analysis

Memory-Maze: Scenario Driven Visual Language Navigation Benchmark for Guiding Blind People

Abstract

Keywords

Related papers

Artificial intelligence: a modern approach

A new optimizer using particle swarm theory

Are we ready for autonomous driving? The KITTI vision benchmark suite

Self-Organizing Maps