Speaker Recognition with Kernel Based IVEC-SVM
-
摘要: 在说话人识别研究中,基于身份认证向量(Identity vector,IVEC)的说话人建模方法可以有效地提取说话人信息,是目前处于国际前沿的建模方法.本文对身份认证向量后接支持向量机(Identity vector followed by support vector machine,IVEC-SVM) 的说话人识别系统进行了研究,对比了该系统在十种不同核函数下的识别性能,并与文献中身份认证向量后接余弦距离打分(Identity vector followed by cosine distance scoring,IVEC-CDS)系统进行了比较. 在美国国家标准技术局(American National Institute of Standards and Technology,NIST)组织的2010年电话信道——电话信道说话人识别核心评测数据库上的实验结果显示,基于核函数的IVEC-SVM系统性能明显优于IVEC-CDS的系统性能.此外,实验结果表明基于Spline核的IVEC-SVM系统可取得最好的识别性能,与IVEC-CDS系统相比,其等错点(Equal error rate,EER)在分数归一化前后分别降低了10%和3%.
-
关键词:
- 身份认证向量后接余弦距离打分 /
- 身份认证向量后接支持向量机 /
- Spline核 /
- 说话人识别
Abstract: In the text-independent speaker recognition research area, identity vector (IVEC) based modeling has been recently proved to be the most efficient method of extracting speaker information. This paper explores and compares the performances of ten different kernel functions in identity vector followed by support vector machines (IVEC-SVM) system and identity vector followed by cosine distance scoring (IVEC-CDS). Experiments corpora the speaker recognition evaluation data, telephone-telephone corpus released by American National Institute of Standard and Technology (NIST) in 2010, demonstrate that the kernel function based IVEC-SVM system performs better than the IVEC-CDS system. Among all the kernel function based IVEC-SVM systems, the spline kernel function performs the best, and it has relative decreases of 10% and 3% in EER compared to the IVEC-CDS system before and after doing score normalization, respectively. -
[1] Reynolds D A, Quatieri T F, Dunn R B. Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 2000, 10(1-3): 19-41 [2] Kinnunen T, Li H Z. An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 2010, 52(1): 12-40 [3] Li Zhi-Yi, He Liang, Zhang Wei-Qiang, Liu Jia. Speaker recognition based on discriminant i-vector local distance preserving projection. Journal of Tsinghua University (Science and Technology), 2012, 52(5): 598-601 (栗志意, 何亮, 张卫强, 刘加. 基于鉴别性i-vector局部距离保持映射的说话人识别. 清华大学学报(自然科学版), 2012, 52(5): 598601) [4] Campbell W M, Campbell J P, Reynolds D A, Singer E, Torres-Carrasquillo P A. Support vector machines for speaker and language recognition. Computer Speech and Language, 2006, 20(2-3): 210-229 [5] Kenny P, Boulianne G, Ouellet P, Dumouchel P. Speaker and session variability in GMM-based speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(4): 1448-1460 [6] Kenny P, Boulianne G, Ouellet P, Dumouchel P. Joint factor analysis versus eigenchannels in speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(4): 1435-1447 [7] Dehak N, Kenny P J, Dehak R, Dumouchel P, Ouellet P. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4): 788-798 [8] Kenny P, Boulianne G, Dumouchel P. Eigenvoice modeling with sparse training data. IEEE Transactions on Speech and Audio Processing, 2005, 13(3): 345-354 [9] Hatch A O, Kajarekar S S, Stolcke A. Within-class covariance normalization for SVM-based speaker recognition. In: Proceedings of the International Conference on Spoken Language. Pittsburgh, PA, 2006. 1471-1474 [10] Bishop C M. Pattern Recognition and Machine Learning. Berlin: Springer, 2008 [11] Sonnenburg S, Rätsch G, Henschel S, Widmer C, Behr J, Zien A, de Bona F, Binder A, Gehl C, Franc V. The SHOGUN machine learning toolbox. Journal of Machine Learning Research, 2010, 11: 1799-1802 [12] Cortes C, Vapnik V. Support-vector networks. Machine Learning, 1995, 20(3): 273-297
点击查看大图
计量
- 文章访问数: 1765
- HTML全文浏览量: 93
- PDF下载量: 1126
- 被引次数: 0