2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于JSM和MLP改进发音错误检测的方法

袁桦 史永哲 赵军红 刘加

袁桦, 史永哲, 赵军红, 刘加. 基于JSM和MLP改进发音错误检测的方法. 自动化学报, 2014, 40(12): 2815-2823. doi: 10.3724/SP.J.1004.2014.02815
引用本文: 袁桦, 史永哲, 赵军红, 刘加. 基于JSM和MLP改进发音错误检测的方法. 自动化学报, 2014, 40(12): 2815-2823. doi: 10.3724/SP.J.1004.2014.02815
YUAN Hua, SHI Yong-Zhe, ZHAO Jun-Hong, LIU Jia. Improved Mispronunciation Detection Based on JSM and MLP. ACTA AUTOMATICA SINICA, 2014, 40(12): 2815-2823. doi: 10.3724/SP.J.1004.2014.02815
Citation: YUAN Hua, SHI Yong-Zhe, ZHAO Jun-Hong, LIU Jia. Improved Mispronunciation Detection Based on JSM and MLP. ACTA AUTOMATICA SINICA, 2014, 40(12): 2815-2823. doi: 10.3724/SP.J.1004.2014.02815

基于JSM和MLP改进发音错误检测的方法

doi: 10.3724/SP.J.1004.2014.02815
基金项目: 

国家自然科学基金(61370034,61005019,61273268,61105017)资助

详细信息
    作者简介:

    史永哲 清华大学电子工程系博士研究生. 主要研究方向为语音识别, 语言模型和音频检索.E-mail: shiyz09@gmail.com

    通讯作者:

    袁桦 清华大学电子工程系博士研究生. 主要研究方向为发音错误检测. 本文通信作者. E-mail:yuanh08@mails.tsinghua.edu.cn

Improved Mispronunciation Detection Based on JSM and MLP

Funds: 

Supported by National Natural Science Foundation of China (61370034, 61005019, 61273268, 61105017)

  • 摘要: 针对发音错误检测的发音字典生成提出基于联合序列多阶模型(Joint-sequence multi-gram, JSM)和多层神经感知(Multi-layer perception, MLP)的方法. 首先使用JSM模型对发音错误进行建模, 将标准发音和错误发音组合为发音对, 表示它们之间的对应关系, 再使用N元文法来统计各发音对之间的关系, 描述错误发音对上下文关系的依赖. 最后使用MLP对发音对之间的关系进行重新建模, 以学习到在相似的上下文条件下发生的相似的错误. 实验证明使用MLP对高阶模型进行概率重估能有效的平滑概率空间, 提高了发音错误检测的性能.
  • [1] Eskenazi M. An overview of spoken language technology for education. Speech Communication, 2009, 51(10): 823-844
    [2] Ito A, Lim Y L, Suzuki M. Pronunciation error detection method based on error rule clustering using a decision tree. In: Proceeding of the 6th Annual Conference of the International Speech Communication Association. Tohoku University, Japan: ISCA, 2005. 173-176
    [3] Yoon S Y, Hasegawa-Johnson M, Sproat R. Landmark-based automated pronunciation error detection. In: Proceeding of the 11th Annual Conference of the International Speech Communication Association. Tokyo: ISCA, 2010. 614-617
    [4] Strika H, Truongb K, Wet F D, Cucchiarini C. Comparing different approaches for automatic pronunciation error detection. Speech Communication, 2009, 51(10): 845-852
    [5] Zhang F, Huang C, Soong F K, Chu M, Wang R H. Automatic mispronunciation detection for Mandarin. In: Proceeding of 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. Las Vegas, Nevada, USA: IEEE, 2008. 5077-5080
    [6] Wei S, Hu G P, Hu Y, Wang R H. A new method for mispronunciation detection using support vector machine based on pronunciation space models. Speech Communication, 2009, 51(10): 896-905
    [7] Wang H C, Waple C J, Kawahara T. Computer Assisted language learning system based on dynamic question generation and error prediction for automatic speech recognition. Speech Communication, 2009, 51(10): 995-1005
    [8] Luo D, Yang X S, Wang L. Improvement of segmental mispronunciation detection with prior knowledge extracted from large L2 speech corpus. In: Proceeding of the 12th Annual Conference of the International Speech Communication Association. Florence, Italy: ISCA, 2011. 1593-1596
    [9] Yuan H, Zhao J H, Liu J. A two-stage mispronunciation detection approach for computer-assisted pronunciation training. In: Proceeding of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2011. Xi'an, China: Asia-Pacific Signal and Information Processing Association, 2011. 972-976
    [10] Meng H, Lo Y Y, Wang L, Lau W Y. Deriving salient learners' mispronunciations from cross-language phonological comparisons. In: Proceeding of the 2007 Automatic Speech Recognition and Understanding Workshop. Kyoto, Japan: IEEE, 2007. 437-442
    [11] Lo W K, Zhang S, Meng H. Automatic derivation of phonological rules for mispronunciation detection in a computer-assisted pronunciation training system. In: Proceeding of the 11th Annual Conference of the International Speech Communication Association. Makuhari, Chiba, Japan: ISCA, 2010. 765-768
    [12] Harrison A M, Lau W Y, Meng H, Wang L. Improving mispronunciation detection and diagnosis of learners' speech with context-sensitive phonological rules based on language transfer. In: Proceeding of the 9th Annual Conference of the International Speech Communication Association. Brisbane: ISCA, 2008. 2787-2790
    [13] Stanley T, Hacioglu K, Pellom B. Statistical machine translation framework for modeling phonological errors in computer assisted pronunciation training system. In: The 2011 Speech and Language Technology in Education Workshop. Venice, Italy: ISCA, 2011. 125-128
    [14] Stanley T, Hacioglu K. Improving L1-specific phonological error diagnosis in computer assisted pronunciation training. In: Proceeding of the 13th Annual Conference of the International Speech Communication Association. Portland, Oregon: ISCA, 2012. 826-829
    [15] Qian X J, Meng H, Soong F F. On mispronunciation lexicon generation using joint-sequence multigrams in computer-aided pronunciation training. In: Proceeding of the 12th Annual Conference of the International Speech Communication Association. Italy, Florence: ISCA, 2011. 865-868
    [16] Qian X J, Meng H, Soong F. Capturing L2 segmental mispronunciations with ioint-sequence models in computer-aided pronunciation training (CAPT). In: Proceeding of the 7th International Symposium on Chinese Spoken Language Processing. Taiwan, China: IEEE Computer Society, 2010. 84-88
    [17] Gass S M, Selinker L. Language Transfer in Language Learning. Philadelphia, USA: John Benjamins Publishing Company, 1993. 87-101
    [18] Mohri M, Pereira F, Riley M. Weighted finite-state transducers in speech recognition. Computer Speech and Language, 2002, 16(1): 69-88
    [19] Harrison A M, Lo W K, Qian X J, Meng H. Implementation of an extended recognition network for mispronunciation detection and diagnosis in computer-assisted pronunciation training. In: The 2009 Speech and Language Technology in Education Workshop. Warwickshire, England: ISCA, 2009. 45-48
    [20] Bisani M, Ney H. Joint-sequence models for grapheme-to-phoneme conversion. Speech Communication, 2008, 50(5): 434-451
    [21] Schwenk H. Continuous space language models. Computer Speech and Language, 2007, 21(3): 492-518
    [22] David T, Miles O. Randomised language modelling for statistical machine translation. In: Proceedings of the 45th Prague, Czech Republic Annual Meeting of the Association for Computational Linguistics. Prague, Czech Republic: ACL, 2007. 512-519
    [23] Schwenk H. Continuous-space language models for statistical machine translation. The Prague Bulletin of Mathematical Linguistics, 2010, 93(1): 137-146
    [24] Oparin I, Sundermeyer M, Ney H, Gauvain J. Performance analysis of neural networks in combination with n-gram language models. In: Proceeding of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto, Japan: IEEE, 2012. 5005-5008
  • 加载中
计量
  • 文章访问数:  1968
  • HTML全文浏览量:  80
  • PDF下载量:  1357
  • 被引次数: 0
出版历程
  • 收稿日期:  2013-06-03
  • 修回日期:  2013-09-06
  • 刊出日期:  2014-12-20

目录

    /

    返回文章
    返回