In Defense of Knowledge Distillation for Task Incremental Learning and Its Application in 3D Object Detection
Yun Peng, Yuxuan Liu, Ming Liu
- Year
- 2021
- Citations
- 20
Abstract
Making robots learn skills incrementally is an efficient way to design real intelligent agents. To achieve this, researchers adopt knowledge distillation to transfer old-task knowledge from old models to new ones. However, when the length of the task sequence increases, the effectiveness of knowledge distillation to prevent models from forgetting old-task knowledge degrades, which we call the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">long-sequence effectiveness degradation (LED)</i> problem. In this letter, we analyze the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">LED</i> problem in the task-incremental-learning setting, and attribute it to the inevitable data distribution differences among tasks. To address this problem, we propose to correct the knowledge distillation for task incremental learning with a Bayesian approach. It additionally maximizes the posterior probability related to the data distributions of all seen tasks. To demonstrate its effectiveness, we further apply our proposed corrected knowledge distillation to 3D object detection. The comparison between the results of increment-at-once and increment-in-sequence experiments shows that our proposed method solves the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">LED</i> problem. Besides, it reaches the upper-bound performance in the task-incremental-learning experiments on the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">KITTI</i> dataset. The code and supplementary materials are available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://sites.google.com/view/c-kd/</uri> .
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002