采用模型和得分非监督自适应的说话人识别

王尔玉; 郭武; 李轶杰; 戴礼荣; 王仁华

doi:10.3724/SP.J.1004.2009.00267

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名

邮箱

手机号码

标题

留言内容

验证码

采用模型和得分非监督自适应的说话人识别

doi: 10.3724/SP.J.1004.2009.00267

1.
中国科学技术大学电子工程与信息科学系科大讯飞语音实验室合肥 230027

详细信息

通讯作者:
郭武

中图分类号: TP391
计量
- 文章访问数: 2156
- HTML全文浏览量: 23
- PDF下载量: 1350
- 被引次数: 0
出版历程
- 收稿日期: 2007-12-03
- 修回日期: 2008-09-06
- 刊出日期: 2009-03-20

Speaker Verification with Model-based and Score-based Unsupervised Adaptation Method

1.
iFly Speech Laboratory, Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei 230027

More Information

Corresponding author: GUO Wu

摘要

摘要: 在说话人识别的研究中, 使用以前的测试语句信息对模型参数或者测试得分进行动态更新, 使模型可以更精确地反映测试语句和说话人模型之间的关系, 这种更新策略称为非监督模式, 这方面的研究对实际的说话人识别系统具有非常重要的意义. 本文除了采用非监督的说话人模型自适应更新方法之外, 还提出了非监督的得分域自适应算法: 首先采用双高斯函数对得分建立一个先验的得分模型, 利用最大后验概率准则对得分规整的模型进行调整. 在测试过程中, 采用得分域和模型域的非监督算法可以互相补充, 提高识别率, 在NIST SRE 2006年1训练语段-1测试语段数据库上, 使用模型域和得分域非监督自适应的系统能够取得等错误率4.3%和检测代价函数0.021的结果.
- 说话人确认 /
- 混合高斯模型 /
- 非监督模式 /
- 得分规整
Abstract: In the text-independent speaker verification research, the information of previous trials can be adopted to update the speaker models or the test scores dynamically. This process is defined as the unsupervised mode, which can make a coupling between the trials and the speaker models. The unsupervised mode is very useful for real speaker recognition application. In this paper, a score-based unsupervised adaptation is proposed as well as model-based unsupervised adaptation. In the score-based unsupervised adaptation mode, a bi-Gaussian model is introduced as a prior score distribution. Then the MAP (maximum a posteriori) method is adopted to adjust the parameters of the score normalization. In the test process, the unsupervised score adaptation and unsupervised model adaptation can both improve the performance. In the case of NIST\ SRE 2006 1conv4w-1conv4w corpus, the equal error rate (EER) of the proposed system is 4.3% and the minimum detection cost function (minDCF) is 0.021.
- Speaker verification /
- Gaussian mixture model (GMM) /
- unsupervised mode /
- score normalization