Speech-Driven Conversational Agents using Conditional Flow-VAEs

Sarah Taylor, Jonathan Windle, David Greenwood, Iain Matthews

发表年份: 2021
引用次数: 17

摘要

Automatic control of conversational agents has applications from animation, through human-computer interaction, to robotics. In interactive communication, an agent must move to express its own discourse, and also react naturally to incoming speech. In this paper we propose a Flow Variational Autoencoder (Flow-VAE) deep learning architecture for transforming conversational speech to body gesture, during both speaking and listening. The model uses a normalising flow to perform variational inference in an autoencoder framework and is a more expressive distribution than the Gaussian approximation of conventional variational autoencoders. Our model is non-deterministic, so can produce variations of plausible gestures for the same speech. Our evaluation demonstrates that our approach produces expressive body motion that is close to the ground truth using a fraction of the trainable parameters compared with previous state of the art.

关键词

AutoencoderComputer scienceGestureArtificial intelligenceSpeech recognitionHumanoid robotInferenceRobotOptical flowAnimation

Speech-Driven Conversational Agents using Conditional Flow-VAEs

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory