首页 /研究 /Speech-Driven Conversational Agents using Conditional Flow-VAEs
LEARNING

Speech-Driven Conversational Agents using Conditional Flow-VAEs

Sarah Taylor, Jonathan Windle, David Greenwood, Iain Matthews

发表年份
2021
引用次数
17

摘要

Automatic control of conversational agents has applications from animation, through human-computer interaction, to robotics. In interactive communication, an agent must move to express its own discourse, and also react naturally to incoming speech. In this paper we propose a Flow Variational Autoencoder (Flow-VAE) deep learning architecture for transforming conversational speech to body gesture, during both speaking and listening. The model uses a normalising flow to perform variational inference in an autoencoder framework and is a more expressive distribution than the Gaussian approximation of conventional variational autoencoders. Our model is non-deterministic, so can produce variations of plausible gestures for the same speech. Our evaluation demonstrates that our approach produces expressive body motion that is close to the ground truth using a fraction of the trainable parameters compared with previous state of the art.

关键词

AutoencoderComputer scienceGestureArtificial intelligenceSpeech recognitionHumanoid robotInferenceRobotOptical flowAnimation

相关论文

查看 LEARNING 分类全部论文