Home /Research /Efficient CNN Inference on Ultra-Low-Power MCUs via Saturation-Aware Convolution
OTHER

Efficient CNN Inference on Ultra-Low-Power MCUs via Saturation-Aware Convolution

Shiming Li, Luca Mottola, Yuan Yao, Stefanos Kaxiras

Year
2025
Access
Open access

Abstract

Quantized CNN inference on ultra-low-power MCUs incurs unnecessary computations in neurons that produce saturated output values. These values are too extreme and are eventually clamped to the boundaries allowed by the neuron. Often times, the neuron can save time by only producing a value that is extreme enough to lead to the clamped result, instead of completing the computation, yet without introducing any error. Based on this, we present saturation-aware convolution: an inference technique whereby we alter the order of computations in convolution kernels to induce earlier saturation, and value checks are inserted to omit unnecessary computations when the intermediate result is sufficiently extreme. Our experimental results display up to 24% inference time saving on a Cortex-M0+ MCU, with zero impact on accuracy.

Keywords

eess.SY

Related papers

Browse all OTHER papers