Voice Mapping of Text-to-Speech Systems: A Metric-Based Approach for Voice Quality Assessment
Huanchen Cai, Sten Ternström
- Year
- 2026
- Access
- Open access
Abstract
This study investigates voice mapping as an evaluation framework for text-to-speech (TTS) synthesis quality. The study analyzes six TTS models, including historical and recent ones. The metrics are crest factor, spectrum balance, and cepstral peak prominence (CPPs). We investigated 6 influential TTS models: Merlin, Tacotron 2, Transformer TTS, FastSpeech 2, Glow-TTS, and VITS. The results demonstrate that voice range serves as a primary indicator of model capability, with VITS showing the largest range among tested models. Glow-TTS exhibited superior performance in soft phonation, indicated by higher spectrum balance, despite limited voice range. The results showed that the CPPs values between 7-8 dB indicate natural voice quality, while with CPPs exceeding 10 dB, the speech tends to sound robotic. These findings underscore the need for voice mapping to evaluate vocal effort, and capture how TTS systems handle voice dynamic and expressiveness.
Keywords
Related papers
A dual-loop framework for manufacturability-aware topology optimization of electric vehicle structures via wire arc additive manufacturing
Qiang Cui, Chuan Yu, Daoqian Yang +2 more
Robotics and Computer-Integrated Manufacturing · 2026
Geometric digital twin: A digital and intelligent model for aero-engine assembly accuracy prediction
Ke Shang, Xin Jin, Teli Xu +4 more
Robotics and Computer-Integrated Manufacturing · 2026
Revolutionizing Industries Through AI-Driven Robotics
Aryan Chaudhary
Recent Advances in Computer Science and Communications · 2026
Design and dynamic performance prediction of a novel large-aperture offset-feed deployable antenna
Chuang Shi, Tianming Liu, Ning Xue +6 more
Aerospace Science and Technology · 2026