At the Speed of Sound: Efficient Audio Scene Classification

Bo Dong, Cristian Lumezanu, Yuncong Chen, Dongjin Song, Takehiko Mizoguchi, Haifeng Chen, Latifur Khan

发表年份: 2020
引用次数: 5

摘要

Efficient audio scene classification is essential for smart sensing platforms such as robots, medical monitoring, surveillance, or autonomous vehicles. We propose a retrieval-based scene classification architecture that combines recurrent neural networks and attention to compute embeddings for short audio segments. We train our framework using a custom audio loss function that captures both the relevance of audio segments within a scene and that of sound events within a segment. Using experiments on real audio scenes, we show that we can discriminate audio scenes with high accuracy after listening in for less than a second. This preserves 93% of the detection accuracy obtained after hearing the entire scene.

关键词

Computer scienceArtificial intelligenceAudio signal processingAudio analyzerActive listeningComputer visionAudio signalSpeech recognitionSpeech coding

At the Speed of Sound: Efficient Audio Scene Classification

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory