


  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘


尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!



单煜翔 邓妍 刘加

单煜翔, 邓妍, 刘加. 一种联合语种识别的新型大词汇量连续语音识别算法. 自动化学报, 2012, 38(3): 366-374. doi: 10.3724/SP.J.1004.2012.00366
引用本文: 单煜翔, 邓妍, 刘加. 一种联合语种识别的新型大词汇量连续语音识别算法. 自动化学报, 2012, 38(3): 366-374. doi: 10.3724/SP.J.1004.2012.00366
SHAN Yu-Xiang, DENG Yan, LIU Jia. A Novel Large Vocabulary Continuous Speech Recognition Algorithm Combined with Language Recognition. ACTA AUTOMATICA SINICA, 2012, 38(3): 366-374. doi: 10.3724/SP.J.1004.2012.00366
Citation: SHAN Yu-Xiang, DENG Yan, LIU Jia. A Novel Large Vocabulary Continuous Speech Recognition Algorithm Combined with Language Recognition. ACTA AUTOMATICA SINICA, 2012, 38(3): 366-374. doi: 10.3724/SP.J.1004.2012.00366


doi: 10.3724/SP.J.1004.2012.00366

    单煜翔, 清华大学电子工程系博士研究生. 主要研究方向为语音识别,关键词检测,说话人识别. E-mail: syx06@mails.tsinghua.edu.cn

A Novel Large Vocabulary Continuous Speech Recognition Algorithm Combined with Language Recognition

  • 摘要: 提出了一种联合语种识别的新型大词汇量连续语音识别(Large vocabulary continuous speech recognition, LVCSR)算法,并构建了实时处理系统. 该算法能够充分利用语音解码过程中收集的音素识别假设,在识别语音内容的同时识别语种类别.该系统可以应用于多语种环境,不仅可以以更小的系统整体计算开销替代独立的语种识别模块,更能有效应对在同一段语音中混有非目标语种的情况,极大地减少由非目标语种引入的无意义识别错误,避免错误积累对后续识别过程的误导.为将语音内容识别和语种识别紧密整合在一个统一语音识别解码过程中,本文提出了三种不同的算法对解码产生的音素格结构进行调整(重构):一方面去除语音识别中由发音字典和语言模型引入的特定目标语种偏置,另一方面在音素格中包含更加丰富的音素识别假设.实验证明, 音素格重构算法可有效提高联合识别中语种识别的精度.在汉语为目标语种、汉英混杂的电话对话语音库上测试表明,本文提出的联合识别算法将集外语种引起的无意义识别错误减少了91.76%,纯汉字识别错误率为54.98%.
  • [1] Lim D C Y, Lane I. Language identification for speech-to-speech translation. In: Proceedings of the 10th Annual Conference of the International Speech Communication Association. Brighton, UK: ISCA, 2009. 204-207[2] Motlicek P. Automatic out-of-language detection based on confidence measures derived from LVCSR word and phone lattices. In: Proceedings of the 10th Annual Conference of the International Speech Communication Association. Brighton, UK: ISCA, 2009. 1215-1218[3] Motlicek P, Valente F. Application of out-of-language detection to spoken term detection. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Dallas, USA: IEEE, 2010. 5098-5101[4] Motlicek P, Valente F, Garner P N. English spoken term detection in multilingual recordings. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association. Chiba, Japan: ISCA, 2010. 206-209[5] Li H Z, Ma B, Lee C H. A vector space modeling approach to spoken language identification. IEEE Transactions on Audio, Speech and Language Processing, 2007, 15(1): 271-284[6] Gauvain J L, Messaoudi A, Schwenk H. Language recognition using phone lattices. In: Proceedings of the 8th International Conference on Spoken Language Processing. Jeju Island, Korea: ISCA, 2004. 1283-1286[7] Zissman M A. Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, 1996, 4(1): 31-44[8] Torres-Carrasquillo P A. Language Identification Using Gaussian Mixture Models [Ph.D. dissertation], Michigan State University, USA, 2002[9] Mangu L, Brill E, Stolcke A. Finding consensus in speech recognition: word error minimization and other application of confusion network. Computer Speech and Language, 2000, 14(4): 373-400[10] Campbell W M, Campbell J P, Reynolds D A, Jones D A, Leek T R. Phonetic speaker recognition with support vector machines. In: Proceedings of the Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2003. 1377-1384[11] Young S J, Russell N H, Thornton J H S. Token Passing: a Simple Conceptual Model for Connected Speech Recognition Systems, Technical Report CUED/F-INFENG/TR38, Department of Engineering, Cambridge University, UK, 1989[12] Povey D. Discriminative Training for Large Vocabulary Speech Recognition [Ph. D. dissertation], University of Cambridge, UK, 2004
  • 加载中
  • 文章访问数:  2142
  • HTML全文浏览量:  47
  • PDF下载量:  1057
  • 被引次数: 0
  • 收稿日期:  2011-06-03
  • 修回日期:  2011-10-08
  • 刊出日期:  2012-03-20


