Home /Research /Lightweight Keyword Spotting with Inter-Domain Interaction and Attention for Real-Time Voice-Controlled Robotics
OTHER

Lightweight Keyword Spotting with Inter-Domain Interaction and Attention for Real-Time Voice-Controlled Robotics

Hien Pham, Thuy Phuong Vu, Huong Thi Nguyen, Minhhuy Le

Year
2025
Citations
1
Access
Open access

Abstract

This study introduces a novel lightweight Keyword Spotting (KWS) model optimized for deployment on resource-constrained microcontrollers, with potential applications in robotic control and end-effector operations. The proposed model employs inter-domain interaction to effectively extract features from both Mel-frequency cepstral coefficients (MFCCs) and temporal audio characteristics, complemented by an attention mechanism to prioritize relevant audio segments for enhanced keyword detection. Achieving a 93.70% accuracy on the Google Command v2-12 commands dataset, the model outperforms existing benchmarks. It also demonstrates remarkable efficiency in inference speed (0.359 seconds) and resource utilization (34.9KB peak RAM and 98.7KB flash memory), offering a 3x faster inference time and reduced memory footprint compared to the DS-CNN-S model. These attributes make it particularly suitable for real-time voice command applications in low-power robotic systems, enabling intuitive and responsive control of robotic arms, end-effectors, and navigation systems. In this work, however, the KWS model is demonstrated in a simple non-destructive testing system for controlling sensor movement. This research lays the groundwork for advancing voice-activated robotic technologies on resource-limited hardware platforms.

Keywords

Keyword spottingSpottingComputer scienceRoboticsDomain (mathematical analysis)Artificial intelligenceSpeech recognitionTime domainHuman–computer interactionComputer vision

Related papers

Browse all OTHER papers