Efficient CNN Inference on Ultra-Low-Power MCUs via Saturation-Aware Convolution
Shiming Li, Luca Mottola, Yuan Yao, Stefanos Kaxiras
- 发表年份
- 2025
- 访问权限
- 开放获取
摘要
Quantized CNN inference on ultra-low-power MCUs incurs unnecessary computations in neurons that produce saturated output values. These values are too extreme and are eventually clamped to the boundaries allowed by the neuron. Often times, the neuron can save time by only producing a value that is extreme enough to lead to the clamped result, instead of completing the computation, yet without introducing any error. Based on this, we present saturation-aware convolution: an inference technique whereby we alter the order of computations in convolution kernels to induce earlier saturation, and value checks are inserted to omit unnecessary computations when the intermediate result is sufficiently extreme. Our experimental results display up to 24% inference time saving on a Cortex-M0+ MCU, with zero impact on accuracy.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Fractional Differential Equations
Igor Podlubný
2025
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
Genetic Programming: On the Programming of Computers by Means of Natural Selection
John R. Koza
1992