基于流形正则化极限学习机的语种识别系统

徐嘉明; 张卫强; 杨登舟; 刘加; 夏善红

doi:10.16383/j.aas.2015.c140916

基于流形正则化极限学习机的语种识别系统

doi: 10.16383/j.aas.2015.c140916 cstr: 32138.14.j.aas.2015.c140916

徐嘉明^1,2, ,,
张卫强³,
杨登舟^1,2,
刘加³,
夏善红²

1.
中国科学院大学北京 100190;
2.
中国科学院电子学研究所传感技术国家重点实验室北京 100190;
3.
清华大学电子工程系清华信息与科学技术国家实验室北京 100084

基金项目:

国家自然科学基金(61273268,61370034,61403224)资助

详细信息

作者简介:
张卫强清华大学电子工程系副研究员.主要研究方向为语音信号处理,机器学习.E-mail:wqzhang@tsinghua.edu.cn

杨登舟中国科学院电子学研究所博士研究生.主要研究方向为语音信号处理,机器学习.E-mail:yangdengzhou@sina.com

刘加清华大学电子工程系教授.主要研究方向为语音识别,信号处理.E-mail:liuj@tsinghua.edu.cn

夏善红中国科学院电子学研究所研究员.主要研究方向为信号处理,传感技术.E-mail:shxia@mail.ie.ac.cn

通讯作者:
徐嘉明中国科学院电子学研究所博士研究生.主要研究方向为语音信号处理,机器学习.本文通信作者.E-mail:xujiaming09@sina.com

计量
- 文章访问数: 2073
- HTML全文浏览量: 142
- PDF下载量: 1795
- 被引次数: 0
出版历程
- 收稿日期: 2015-01-05
- 修回日期: 2015-06-07
- 刊出日期: 2015-09-20

Manifold Regularized Extreme Learning Machine for Language Recognition

1.
University of Chinese Academy of Sciences, Beijing 100190;
2.
State Key Laboratory of Transducer Technology, Institute of Electronics, Chinese Academy of Sciences, Beijing 100190;
3.
Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 100084

Funds:

Supported by National Natural Science Foundation of China (61273268, 61370034, 61403224)

摘要

摘要: 支持向量机 (Support vector machine, SVM) 在语种识别中已经起到了重要的作用.近些年来,极限学习机 (Extreme learning machine, ELM) 在很多领域取得了成功的应用.相比于 SVM, ELM 最大的优点在于极易实现、训练速度快,而且通常可以取得与 SVM 相近甚至优于 SVM 的识别性能. 鉴于 ELM 这些优异的特点,本文将 ELM 引入到语种识别中,并针对 ELM 由于随机初始化模型参数所带来的潜在问题,提出了流形正则化极限学习机 (Manifold regularized extreme learning machine, MRELM) 算法.实验结果表明,在高斯超矢量(Gaussian supervector, GSV)特征空间上,相对于 SVM 基线系统,该算法对30秒语音的识别性能有明显的提升. 同时该算法也可以成功地应用到 i-vector 特征空间中,取得与当前主流的打分算法相近的识别性能.
- 语种识别 /
- 极限学习机 /
- 流形学习 /
- 支持向量机
Abstract: Support vector machines (SVMs) have played an important role in the state-of-the-art language recognition systems. The recently developed extreme learning machine (ELM) which has been successfully applied to many areas tends to achieve much better generalization performance than the traditional SVM. Inspired by the excellent features of ELM, we introduce it into language recognition and propose a manifold regularized extreme learning machine (MRELM) to overcome the potential problem of ELM due to random initialization of model parameters. Experimental results show that the proposed algorithm can achieve much better performance than SVM at 30s durations in the Gaussian supervector (GSV) feature space. In addition, MRELM can be applied to the i-vector space and get comparable results to the existing scoring methods.
- Language recognition /
- extreme learning machine (ELM) /
- manifold learning /
- support vector machine (SVM)

HTML全文

参考文献(34)

[1]	Li H Z, Ma B, Lee K A. Spoken language recognition: from fundamentals to practice. Proceedings of the IEEE, 2013, 101(5): 1136-1159
[2]	Biadsy F. Automatic dialect and accent recognition and its application to speech recognition [Ph.D. dissertation], Columbia University, USA, 2011.
[3]	Zissman M A, Berkling K M. Automatic language identification. Speech Communication, 2001, 35(1-2): 115-124
[4]	Muthusamy Y K, Barnard E, Cole R A. Reviewing automatic language identification. IEEE Signal Processing Magazine, 1994, 11(4): 33-41
[5]	Campbell W M, Singer E, Torres-Carrasquillo P A, Reynolds, D A. Language recognition with support vector machines. In: Proceedings of the 2004 ODYSSEY-The Speaker and Language Recognition Workshop. Toledo, Spain: ISCA, 2004. 285-288
[6]	Campbell W M, Campbell J P, Reynolds D A, Singer E, Torres-Carrasquillo P A. Support vector machines for speaker and language recognition. Computer Speech & Language, 2006, 20(2-3): 210-229
[7]	Huang G B, Zhu Q Y, Siew C K. Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of the 2004 IEEE International Joint Conference on Neural Networks. Budapest, Hungary: IEEE, 2004. 985-990
[8]	Huang G B, Wang D H, Lan Y. Extreme learning machines: a survey. International Journal of Machine Learning and Cybernetics, 2011, 2(2): 107-122
[9]	Huang G B, Zhu Q Y, Siew C K. Extreme learning machine: theory and applications. Neurocomputing, 2006, 70(1-3): 489-501
[10]	Huang G B, Zhou H M, Ding X J, Zhang R. Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2012, 42(2): 513-529
[11]	Liang N Y, Huang G B, Saratchandran P, Sundararajan N. A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Transactions on Neural Networks, 2006, 17(6): 1411-1423
[12]	Xu J T, Zhou H M, Huang G B. Extreme learning machine based fast object recognition. In: Proceedings of the 15th IEEE International Conference on Information Fusion. Singapore: IEEE, 2012. 1490-1496
[13]	Sole M M, Tsoeu M S. Sign language recognition using the extreme learning machine. In: Proceedings of the 2011 IEEE AFRICON Conference. Livingstone, Zambia: IEEE, 2011. 1-6
[14]	Suresh S, Babu V, Sundararajan N. Image quality measurement using sparse extreme learning machine classifier. In: Proceedings of the 9th IEEE International Conference on Control, Automation, Robotics and Vision. Singapore: IEEE, 2006. 1-6
[15]	Horata P, Chiewchanwattana S, Sunat K. Robust extreme learning machine. Neurocomputing, 2013, 102: 31-44
[16]	Yu Q, Miche Y, Eirola E, Van Heeswijk M, Séverin E, Lendasse A. Regularized extreme learning machine for regression with missing data. Neurocomputing, 2013, 102: 45-51
[17]	Zong W W, Huang G B, Chen Y Q. Weighted extreme learning machine for imbalance learning. Neurocomputing, 2013, 101: 229-242
[18]	Iosifidis A, Tefas A, Pitas I. Minimum class variance extreme learning machine for human action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 2013, 23(11): 1968-1979
[19]	Tenenbaum J B, De Silva V, Langford J C. A global geometric framework for nonlinear dimensionality reduction. Science, 2000, 290(5500): 2319-2323
[20]	Roweis S T, Saul L K. Nonlinear dimensionality reduction by locally linear embedding. Science, 2000, 290(5500): 2323-2326
[21]	Huang G, Song S J, Gupta J N D, Wu C. Semi-supervised and unsupervised extreme learning machines. IEEE Transactions on Cybernetics, 2014, 44(12): 2405-2417
[22]	Liu B, Xia S X, Meng F R, Zhou Y. Manifold regularized extreme learning machine. Neural Computing and Applications, 2015, DOI: 10.1007/s00521-014-1777-8
[23]	Deng W Y, Zheng Q H, Chen L. Regularized extreme learning machine. In: Proceedings of the 2009 IEEE Symposium on Computational Intelligence and Data Mining. Nashville, USA: IEEE, 2009. 389-395
[24]	Campbell W M, Sturim D E, Reynolds D A. Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters, 2006, 13(5): 308-311
[25]	Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4): 788-798
[26]	Tomar V S, Rose R C. Manifold regularized deep neural networks. In: Proceedings of the 2014 Annual Conference of the International Speech Communication Association. Singapore: ISCA, 2014. 348-352
[27]	Guan N Y, Tao D C, Luo Z G, Yuan B. Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent. IEEE Transactions on Image Processing, 2011, 20(7): 2030-2048
[28]	Belkin M, Niyogi P, Sindhwani V. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. The Journal of Machine Learning Research, 2006, 7: 2399-2434
[29]	Peng Y, Zhu J Y, Zheng W L, Lu B L. EEG-based emotion recognition with manifold regularized extreme learning machine. In: Proceedings of the 36th IEEE International Conference on Engineering in Medicine and Biology Society. San Diego, USA: IEEE, 2014. 974-977
[30]	Wang H, Yan S C, Xu D, Tang X A, Huang T. Trace ratio vs. ratio trace for dimensionality reduction. In: Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA: IEEE, 2007. 1-8
[31]	Martin A F, Greenberg C S. The 2009 NIST language recognition evaluation. In: Proceedings of the 2010 ODYSSEY-The Speaker and Language Recognition Workshop. Brno, Czech Republic: ISCA, 2010. 165-171
[32]	Zhang W Q, Hou T, Liu J. Discriminative score fusion for language identification. Chinese Journal of Electronics, 2010, 19(1): 124-128
[33]	Campbell W M, Sturim D E, Reynolds D A, Solomonoff A. SVM based speaker verification using a GMM supervector kernel and NAP variability compensation. In: Proceedings of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing. Toulouse, France: IEEE, 2006. 1-1
[34]	Singer E, Torres-Carrasquillo P, Reynolds D, McCree A, Richardson F, Dehak N, Sturim D. The MITLL NIST LRE 2011 language recognition system. In: Proceedings of the 2012 The Speaker and Language Recognition Workshop. Singapore: ISCA, 2012. 209-215