From Black-Box to White-Box: Control-Theoretic Neural Network Interpretability
Jihoon Moon
- 发表年份
- 2025
- 访问权限
- 开放获取
摘要
Deep neural networks achieve state of the art performance but remain difficult to interpret mechanistically. In this work, we propose a control theoretic framework that treats a trained neural network as a nonlinear state space system and uses local linearization, controllability and observability Gramians, and Hankel singular values to analyze its internal computation. For a given input, we linearize the network around the corresponding hidden activation pattern and construct a state space model whose state consists of hidden neuron activations. The input state and state output Jacobians define local controllability and observability Gramians, from which we compute Hankel singular values and associated modes. These quantities provide a principled notion of neuron and pathway importance: controllability measures how easily each neuron can be excited by input perturbations, observability measures how strongly each neuron influences the output, and Hankel singular values rank internal modes that carry input output energy. We illustrate the framework on simple feedforward networks, including a 1 2 2 1 SwiGLU network and a 2 3 3 2 GELU network. By comparing different operating points, we show how activation saturation reduces controllability, shrinks the dominant Hankel singular value, and shifts the dominant internal mode to a different subset of neurons. The proposed method turns a neural network into a collection of local white box dynamical models and suggests which internal directions are natural candidates for pruning or constraints to improve interpretability.
关键词
相关论文
面向学习与规划的并行可微可达性:具有认证神经动力学与控制器的系统
Keyi Shen, Glen Chou
2026
人工智能增强的智能焊接岛:基础模型革新制造业
Xiwei Wu, Wei Wu, Qiqi Chen 等 9 位作者
Robotics and Computer-Integrated Manufacturing · 2026
基于深度强化学习和动态图神经网络的多任务机器人调度代理
Hedi Boukamcha, Anas Neumann, Monia Rekik 等 6 位作者
Robotics and Computer-Integrated Manufacturing · 2026
基于微调与AAS增强检索的LLM驱动自动化DFA评估
Jiaxin Liu, Xiaofeng Zhou, Suyang Yu 等 8 位作者
Robotics and Computer-Integrated Manufacturing · 2026