首页 /研究 /Speaker/Style-Dependent Neural Network Speech Synthesis Based on Speaker/Style Embedding

LEARNING

Speaker/Style-Dependent Neural Network Speech Synthesis Based on Speaker/Style Embedding

Milan Sečujski, Darko Pekar, Siniša Suzić, Anton A Smirnov, Tijana Nosek

发表年份: 2020
引用次数: 11
访问权限: 开放获取

摘要

The paper presents a novel architecture and method for training neural networks to produce synthesized speech in a particular voice and speaking style, based on a small quantity of target speaker/style training data. The method is based on neural network embedding, i.e. mapping of discrete variables into continuous vectors in a low-dimensional space, which has been shown to be a very successful universal deep learning technique. In this particular case, different speaker/style combinations are mapped into different points in a low-dimensional space, which enables the network to capture the similarities and differences between speakers and speaking styles more efficiently. The initial model from which speaker/style adaptation was carried out was a multi-speaker/multi-style model based on 8.5 hours of American English speech data which corresponds to 16 different speaker/style combinations. The results of the experiments show that both versions of the obtained system, one using 10 minutes and the other as little as 30 seconds of target data, outperform the state of the art in parametric speaker/style-dependent speech synthesis. This opens a wide range of application of speaker/style dependent speech synthesis based on small quantities of training data, in domains ranging from customer interaction in call centers to robot-assisted medical therapy.

关键词

Speech recognitionComputer scienceSpeaker diarisationSpeech synthesisArtificial neural networkSpeaker recognitionEmbeddingStyle (visual arts)Space (punctuation)Range (aeronautics)

Speaker/Style-Dependent Neural Network Speech Synthesis Based on Speaker/Style Embedding

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory