2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

各种不同的基于词格的鉴别性训练方法在中文单语以及中英双语语音识别系统中的性能改善调研及比较

钱彦旻 单煜翔 王林芳 刘加

钱彦旻, 单煜翔, 王林芳, 刘加. 各种不同的基于词格的鉴别性训练方法在中文单语以及中英双语语音识别系统中的性能改善调研及比较. 自动化学报, 2012, 38(7): 1162-1168. doi: 10.3724/SP.J.1004.2012.01162
引用本文: 钱彦旻, 单煜翔, 王林芳, 刘加. 各种不同的基于词格的鉴别性训练方法在中文单语以及中英双语语音识别系统中的性能改善调研及比较. 自动化学报, 2012, 38(7): 1162-1168. doi: 10.3724/SP.J.1004.2012.01162
QIAN Yan-Min, SHAN Yu-Xiang, WANG Lin-Fang, LIU Jia. Improvement Comparison of Different Lattice-based Discriminative Training Methods in Chinese-monolingual and Chinese-English-bilingual Speech Recognition. ACTA AUTOMATICA SINICA, 2012, 38(7): 1162-1168. doi: 10.3724/SP.J.1004.2012.01162
Citation: QIAN Yan-Min, SHAN Yu-Xiang, WANG Lin-Fang, LIU Jia. Improvement Comparison of Different Lattice-based Discriminative Training Methods in Chinese-monolingual and Chinese-English-bilingual Speech Recognition. ACTA AUTOMATICA SINICA, 2012, 38(7): 1162-1168. doi: 10.3724/SP.J.1004.2012.01162

各种不同的基于词格的鉴别性训练方法在中文单语以及中英双语语音识别系统中的性能改善调研及比较

doi: 10.3724/SP.J.1004.2012.01162

Improvement Comparison of Different Lattice-based Discriminative Training Methods in Chinese-monolingual and Chinese-English-bilingual Speech Recognition

  • 摘要: 近年来, 鉴别性训练方法在语音识别领域已经显示出相当大的性能改善, 比如说MPE, fMPE以及BMMI等方法, 然而, 关于鉴别性训练的研究尚还有很多工作要做. 本文详细的对三种基于词格的鉴别性训练方法进行了调查和研究, 并对各方法的性能进行了展示. 然后, 还对不同的I平滑方法进行了分析对比, 从而得到了在中文单语语音识别情况下更加鲁棒的模型. 本文对不同鉴别性训练方法的互补特性做了研究, 通过ROVER融合算法完成了系统融合. 尽管鉴别性训练方法通常应用在单语言语音识别系统, 本文也系统的研究了鉴别性训练方法在双语语音识别中的应用, 包括MPE、fMPE和BMMI. 一种新的方法被使用去产生更好的用于双语模型训练的词格, 同时研究了双语语音识别环境下互补的鉴别性训练方法来得到最好的ROVER融合性能. 实验结果显示, 不同形式的鉴别性训练在单语和双语语音识别系统中都降低了词错误率, 同时融合有互补性的鉴别性训练方法很大程度的改善了系统的性能.
  • [1] Bahl L R, Brown P F, de Souza P V, Mercer L R. Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: Proceedings of the 1986 IEEE International Conference on Acoustics, Speech, and Signal Processing. Tokyo, Japan: IEEE, 1986. 49-52[2] Povey D. Discriminative Training for Large Vocabulary Speech Recognition [Ph.D. dissertation], Cambridge University, USA, 2004[3] Povey D, Kingsbury B, Mangu L, Saon G, Soltau H, Zweig G. fMPE: discriminatively trained features for speech recognition. In: Proceedings of the 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing. Philadelphia, USA: IEEE, 2005. 961-964[4] Sha F, Saul L K. Large margin Gaussian mixture modeling for phonetic classification and recognition. In: Proceedings of the 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing. Toulouse, France: IEEE, 2006. 265-268[5] Sha F, Saul L K. Comparison of large margin training to other discriminative methods for phonetic recognition by hidden markov models. In: Proceedings of the 2007 IEEE International Conference on Acoustics, Speech, and Signal Processing. Honolulu, USA: IEEE, 2007. 313-316[6] Povey D, Kanevsky D, Kingsbury B, Ramabhadran B, Sanon G, Visweswariah K. Boosted MMI for model and feature-space discriminative training. In: Proceedings of the 2008 IEEE International Conference on Acoustics, Speech, and Signal Processing. Las Vegas, USA: IEEE, 2008. 4057-4060[7] Fung P, Schultz T. Multilingual spoken language processing. IEEE Signal Processing Magazine, 2008, 25(3): 89-97[8] Schultz T, Waibel A. Language-independent and language-adaptive acoustic modeling for speech recognition. Speech Communication, 2001, 35(1-2): 31-51[9] Khler J. Multilingual phone models for vocabulary-independent speech recognition tasks. Speech Communication, 2001, 35(1-2): 21-30[10] Wang Z R, Topkara U, Schultz T, Waibel A. Towards universal speech recognition. In: Proceedings of the 4th IEEE International Conference on Multimodal Interfaces. Pittsburgh, USA: IEEE, 2002. 247-252[11] Qian Y M, Liu J. Phone modeling and combining discriminative training for mandarin-english bilingual speech recognition. In: Proceedings of the 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing. Dallas, USA: IEEE, 2010. 4918-4921[12] Qian Y M, Liu J. Mandarin-English bilingual phone modeling and combining mpe based discriminative training for cross-language speech recognition. In: Proceedings of the 2010 International Symposium on Chinese Spoken Language Processing. Tainan, China: ISCA, 2010. 103-108[13] Young S, Evermann G, Gales M J F, Hain T, Kershaw D, Liu X A, Moore G, Odell J J, Ollason D, Povey D, Valtchev V, Woodland P. The HTK Book (for version 3.4). UK: Cambridge University Engineering Department, 2009[14] Stolcke A. SRILM--An extensible language modeling toolkit. In: Proceedings of the 2002 International Conference on Spoken Language Processing. Denver, USA: ISCA, 2002. 901-904[15] Zheng J, Cetin O, Hwang M Y, Lei X, Stolcke A, Morgan N. Combining discriminative feature, transform, and model training for large vocabulary speech recognition. In: Proceedings of the 2007 IEEE International Conference on Acoustics, Speech, and Signal Processing. Honolulu, USA: IEEE, 2007. 633-636[16] Povey D, Woodland P C. Minimum phone error and I-smoothing for improved discriminative training. In: Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing. Orlando, USA: IEEE, 2002. 105-108[17] Fiscus J G. A post-processing system to yield reduced word error rates: recognizer output voting error reduction (rover). In: Proceedings of the 1997 IEEE Workshop on Automatic Speech Recognition and Understanding. Santa Barbara, USA: IEEE, 1997. 347-354[18] Xu H H, Zhu J, Wu G Y. An efficient multistage rover method for automatic speech recognition. In: Proceedings of the 2009 IEEE International Conference on Multimedia and Expo. Cancun, Mexico: IEEE, 2009. 894-897[19] Schlüter R, Müller B, Wessel F, Ney H. Interdependence of language models and discriminative training. In: Proceedings of the 1999 IEEE Workshop on Automatic Speech Recognition and Understanding. Keystone, CO: IEEE, 1999[20] Gillick L, Cox S J. Some statistical issues in the comparison of speech recognition algorithms. In: Proceedings of the 1989 IEEE International Conference on Acoustics, Speech, and Signal Processing. Glasgow, Scotland: IEEE, 1989. 532-535
  • 加载中
计量
  • 文章访问数:  2373
  • HTML全文浏览量:  70
  • PDF下载量:  872
  • 被引次数: 0
出版历程
  • 收稿日期:  2011-07-25
  • 修回日期:  2012-02-10
  • 刊出日期:  2012-07-20

目录

    /

    返回文章
    返回