Home /Research /Multimodal Gesture Recognition Using 3-D Convolution and Convolutional LSTM
LEARNING

Multimodal Gesture Recognition Using 3-D Convolution and Convolutional LSTM

Guangming Zhu, Liang Zhang, Peiyi Shen, Juan Song

Year
2017
Citations
277

Abstract

Gesture recognition aims to recognize meaningful movements of human bodies, and is of utmost importance in intelligent human-computer/robot interactions. In this paper, we present a multimodal gesture recognition method based on 3-D convolution and convolutional long-short-term-memory (LSTM) networks. The proposed method first learns short-term spatiotemporal features of gestures through the 3-D convolutional neural network, and then learns long-term spatiotemporal features by convolutional LSTM networks based on the extracted short-term spatiotemporal features. In addition, fine-tuning among multimodal data is evaluated, and we find that it can be considered as an optional skill to prevent overfitting when no pre-trained models exist. The proposed method is verified on the ChaLearn LAP large-scale isolated gesture data set (IsoGD) and the Sheffield Kinect gesture (SKIG) data set. The results show that our proposed method can obtain the state-of-the-art recognition accuracy (51.02% on the validation set of IsoGD and 98.89% on SKIG).

Keywords

Computer scienceConvolutional neural networkGestureOverfittingArtificial intelligenceGesture recognitionConvolution (computer science)Set (abstract data type)Data setHidden Markov model

Related papers

Browse all LEARNING papers