基于JSM和MLP改进发音错误检测的方法

袁桦; 史永哲; 赵军红; 刘加

doi:10.3724/SP.J.1004.2014.02815

基于JSM和MLP改进发音错误检测的方法

doi: 10.3724/SP.J.1004.2014.02815

袁桦^1, ,,
史永哲¹,
赵军红^2,3,
刘加¹

1.
清华大学电子工程系, 清华信息科学与技术国家实验室北京 100084;
2.
中国科学院电子学研究所, 传感技术国家重点实验室北京 100190;
3.
中国科学院大学北京 100049

基金项目:

国家自然科学基金(61370034,61005019,61273268,61105017)资助

详细信息

作者简介:
史永哲清华大学电子工程系博士研究生. 主要研究方向为语音识别, 语言模型和音频检索.E-mail: shiyz09@gmail.com

通讯作者:
袁桦清华大学电子工程系博士研究生. 主要研究方向为发音错误检测. 本文通信作者. E-mail:yuanh08@mails.tsinghua.edu.cn

计量
- 文章访问数: 1983
- HTML全文浏览量: 81
- PDF下载量: 1366
- 被引次数: 0
出版历程
- 收稿日期: 2013-06-03
- 修回日期: 2013-09-06
- 刊出日期: 2014-12-20

Improved Mispronunciation Detection Based on JSM and MLP

1.
Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 100084;
2.
State Key Laboratory of Transducer Technology, Institute of Electronics, Chinese Academy of Sciences, Beijing 100190;
3.
University of Chinese Academy of Sciences, Beijing 100049

Funds:

Supported by National Natural Science Foundation of China (61370034, 61005019, 61273268, 61105017)

摘要

摘要: 针对发音错误检测的发音字典生成提出基于联合序列多阶模型(Joint-sequence multi-gram, JSM)和多层神经感知(Multi-layer perception, MLP)的方法. 首先使用JSM模型对发音错误进行建模, 将标准发音和错误发音组合为发音对, 表示它们之间的对应关系, 再使用N元文法来统计各发音对之间的关系, 描述错误发音对上下文关系的依赖. 最后使用MLP对发音对之间的关系进行重新建模, 以学习到在相似的上下文条件下发生的相似的错误. 实验证明使用MLP对高阶模型进行概率重估能有效的平滑概率空间, 提高了发音错误检测的性能.
- 发音错误检测 /
- 联合序列多阶模型 /
- 多层神经感知 /
- 计算机辅助语言学习
Abstract: In this paper, we propose a method of dictionary generation based on joint-sequence multi-gram model (JSM) and multi-layer perception (MLP) for mispronunciation detection. The JSM model is firstly used to model the mispronunciation. The canonical pronunciation and mispronunciation are combined into pronunciation pairs for representation of their corresponding relationship; then the N-gram is used to count the relationship between pronunciation pairs to describe the dependence of mispronunciations on the context. Lastly, the MLP is used to model the relationship of pronunciation pairs again, in order to capture the similar mispronunciations occurred in similar contexts. Experiments show that rescoring the probability of high-order model by MLP can effectively smooth the probability, resulting in improved mispronunciation detection.
- Mispronunciation detection /
- joint-sequence multi-gram model (JSM) /
- multi-layer perception (MLP) /
- computer-assisted language learning (CALL)

HTML全文

参考文献(24)

[1]	Eskenazi M. An overview of spoken language technology for education. Speech Communication, 2009, 51(10): 823-844
[2]	Ito A, Lim Y L, Suzuki M. Pronunciation error detection method based on error rule clustering using a decision tree. In: Proceeding of the 6th Annual Conference of the International Speech Communication Association. Tohoku University, Japan: ISCA, 2005. 173-176
[3]	Yoon S Y, Hasegawa-Johnson M, Sproat R. Landmark-based automated pronunciation error detection. In: Proceeding of the 11th Annual Conference of the International Speech Communication Association. Tokyo: ISCA, 2010. 614-617
[4]	Strika H, Truongb K, Wet F D, Cucchiarini C. Comparing different approaches for automatic pronunciation error detection. Speech Communication, 2009, 51(10): 845-852
[5]	Zhang F, Huang C, Soong F K, Chu M, Wang R H. Automatic mispronunciation detection for Mandarin. In: Proceeding of 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. Las Vegas, Nevada, USA: IEEE, 2008. 5077-5080
[6]	Wei S, Hu G P, Hu Y, Wang R H. A new method for mispronunciation detection using support vector machine based on pronunciation space models. Speech Communication, 2009, 51(10): 896-905
[7]	Wang H C, Waple C J, Kawahara T. Computer Assisted language learning system based on dynamic question generation and error prediction for automatic speech recognition. Speech Communication, 2009, 51(10): 995-1005
[8]	Luo D, Yang X S, Wang L. Improvement of segmental mispronunciation detection with prior knowledge extracted from large L2 speech corpus. In: Proceeding of the 12th Annual Conference of the International Speech Communication Association. Florence, Italy: ISCA, 2011. 1593-1596
[9]	Yuan H, Zhao J H, Liu J. A two-stage mispronunciation detection approach for computer-assisted pronunciation training. In: Proceeding of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2011. Xi'an, China: Asia-Pacific Signal and Information Processing Association, 2011. 972-976
[10]	Meng H, Lo Y Y, Wang L, Lau W Y. Deriving salient learners' mispronunciations from cross-language phonological comparisons. In: Proceeding of the 2007 Automatic Speech Recognition and Understanding Workshop. Kyoto, Japan: IEEE, 2007. 437-442
[11]	Lo W K, Zhang S, Meng H. Automatic derivation of phonological rules for mispronunciation detection in a computer-assisted pronunciation training system. In: Proceeding of the 11th Annual Conference of the International Speech Communication Association. Makuhari, Chiba, Japan: ISCA, 2010. 765-768
[12]	Harrison A M, Lau W Y, Meng H, Wang L. Improving mispronunciation detection and diagnosis of learners' speech with context-sensitive phonological rules based on language transfer. In: Proceeding of the 9th Annual Conference of the International Speech Communication Association. Brisbane: ISCA, 2008. 2787-2790
[13]	Stanley T, Hacioglu K, Pellom B. Statistical machine translation framework for modeling phonological errors in computer assisted pronunciation training system. In: The 2011 Speech and Language Technology in Education Workshop. Venice, Italy: ISCA, 2011. 125-128
[14]	Stanley T, Hacioglu K. Improving L1-specific phonological error diagnosis in computer assisted pronunciation training. In: Proceeding of the 13th Annual Conference of the International Speech Communication Association. Portland, Oregon: ISCA, 2012. 826-829
[15]	Qian X J, Meng H, Soong F F. On mispronunciation lexicon generation using joint-sequence multigrams in computer-aided pronunciation training. In: Proceeding of the 12th Annual Conference of the International Speech Communication Association. Italy, Florence: ISCA, 2011. 865-868
[16]	Qian X J, Meng H, Soong F. Capturing L2 segmental mispronunciations with ioint-sequence models in computer-aided pronunciation training (CAPT). In: Proceeding of the 7th International Symposium on Chinese Spoken Language Processing. Taiwan, China: IEEE Computer Society, 2010. 84-88
[17]	Gass S M, Selinker L. Language Transfer in Language Learning. Philadelphia, USA: John Benjamins Publishing Company, 1993. 87-101
[18]	Mohri M, Pereira F, Riley M. Weighted finite-state transducers in speech recognition. Computer Speech and Language, 2002, 16(1): 69-88
[19]	Harrison A M, Lo W K, Qian X J, Meng H. Implementation of an extended recognition network for mispronunciation detection and diagnosis in computer-assisted pronunciation training. In: The 2009 Speech and Language Technology in Education Workshop. Warwickshire, England: ISCA, 2009. 45-48
[20]	Bisani M, Ney H. Joint-sequence models for grapheme-to-phoneme conversion. Speech Communication, 2008, 50(5): 434-451
[21]	Schwenk H. Continuous space language models. Computer Speech and Language, 2007, 21(3): 492-518
[22]	David T, Miles O. Randomised language modelling for statistical machine translation. In: Proceedings of the 45th Prague, Czech Republic Annual Meeting of the Association for Computational Linguistics. Prague, Czech Republic: ACL, 2007. 512-519
[23]	Schwenk H. Continuous-space language models for statistical machine translation. The Prague Bulletin of Mathematical Linguistics, 2010, 93(1): 137-146
[24]	Oparin I, Sundermeyer M, Ney H, Gauvain J. Performance analysis of neural networks in combination with n-gram language models. In: Proceeding of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto, Japan: IEEE, 2012. 5005-5008