A Two-stage Prosodic Structure Generation Strategy for Mandarin Text-to-speech Systems
-
摘要: 韵律结构生成是改进一个语音合成系统中的合成语音的完整度和自然度的重要组成部分. 韵律词和韵律短语的自动切分是中文层级韵律结构的两个重要的基本层面, 本文调研了这个基本问题, 并提出了一种两层韵律结构生成体系. 为此, 我们建立了条件随机场模型为韵律词和韵律短语的预测选取不同的前端特征. 除此之外, 我们还引入了基于转换的错误驱动学习模块来修正后端的初始预测. 实验结果显示, 这种结合条件随机场和错误驱动学习的方法使得韵律词和韵律短语的自动分割的F-score值达到了94.66%.Abstract: Prosodic structure generation is the key component in improving the intelligibility and naturalness of synthetic speech for a text-to-speech (TTS) system. This paper investigates the problem of automatic segmentation of prosodic word and prosodic phrase, which are two fundamental layers in the hierarchical prosodic structure of Mandarin, and presents a two-stage prosodic structure generation strategy. Conditional random fields (CRF) models are built for both prosodic word and prosodic phrase prediction at the front end with different feature selections. Besides, a transformation-based error-driven learning (TBL) modification module is introduced in the back end to amend the initial prediction. Experiment results show that the approach combining CRF and TBL achieves an F-score of 94.66%.
计量
- 文章访问数: 2174
- HTML全文浏览量: 46
- PDF下载量: 887
- 被引次数: 0