一种联合语种识别的新型大词汇量连续语音识别算法

单煜翔; 邓妍; 刘加

doi:10.3724/SP.J.1004.2012.00366

一种联合语种识别的新型大词汇量连续语音识别算法

doi: 10.3724/SP.J.1004.2012.00366 cstr: 32138.14.SP.J.1004.2012.00366

1.
清华大学电子工程系清华信息科学与技术国家实验室北京 100084

详细信息

通讯作者:
单煜翔, 清华大学电子工程系博士研究生. 主要研究方向为语音识别,关键词检测,说话人识别. E-mail: syx06@mails.tsinghua.edu.cn

计量
- 文章访问数: 2177
- HTML全文浏览量: 47
- PDF下载量: 1067
- 被引次数: 0
出版历程
- 收稿日期: 2011-06-03
- 修回日期: 2011-10-08
- 刊出日期: 2012-03-20

A Novel Large Vocabulary Continuous Speech Recognition Algorithm Combined with Language Recognition

1.
Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 100084

摘要

摘要: 提出了一种联合语种识别的新型大词汇量连续语音识别(Large vocabulary continuous speech recognition, LVCSR)算法,并构建了实时处理系统. 该算法能够充分利用语音解码过程中收集的音素识别假设,在识别语音内容的同时识别语种类别.该系统可以应用于多语种环境,不仅可以以更小的系统整体计算开销替代独立的语种识别模块,更能有效应对在同一段语音中混有非目标语种的情况,极大地减少由非目标语种引入的无意义识别错误,避免错误积累对后续识别过程的误导.为将语音内容识别和语种识别紧密整合在一个统一语音识别解码过程中,本文提出了三种不同的算法对解码产生的音素格结构进行调整(重构):一方面去除语音识别中由发音字典和语言模型引入的特定目标语种偏置,另一方面在音素格中包含更加丰富的音素识别假设.实验证明, 音素格重构算法可有效提高联合识别中语种识别的精度.在汉语为目标语种、汉英混杂的电话对话语音库上测试表明,本文提出的联合识别算法将集外语种引起的无意义识别错误减少了91.76%,纯汉字识别错误率为54.98%.
- 语音识别 /
- 语种识别 /
- 集外语种问题 /
- 音素格重构
Abstract: In this paper, a novel large vocabulary continuous speech recognition (LVCSR) algorithm combined with language recognition is proposed, and a real-time processing system is developed. This algorithm can make full use of phonetic hypotheses collected during decoding, and identify language types simultaneously. In a multilingual environment, this algorithm can not only take the place of a standalone language recognizer at a lower system overall computational cost, but also effectively cope with the case where target and non-target languages mix in a single utterance. It can significantly reduce speech recognition error introduced by non-target language, and avoid error accumulation which may mislead the subsequent decoding procedure. In order to tightly combine the content and language recognition into a unified decoding procedure, three different phone lattice reconstruction algorithms are also proposed to eliminate pronunciation and grammar restrictions introduced by the target language's dictionary and language model of the LVCSR decoder, and to encode lattices with richer phonetic information. Experiments show that the lattice reconstruction algorithms can significantly improve language recognition accuracy in the combined recognition. Evaluated on a Mandarin/English mixed conversational telephone speech corpus where Mandarin is the target language, the proposed algorithms reduced the recognition error introduced by non-target language by 91.76%, and achieved a character error rate of 54.98%.
- Speech recognition /
- language recognition /
- out-of-language problem /
- phone lattice reconstruction

HTML全文

参考文献(1)

[1]

Lim D C Y, Lane I. Language identification for speech-to-speech translation. In: Proceedings of the 10th Annual Conference of the International Speech Communication Association. Brighton, UK: ISCA, 2009. 204-207[2] Motlicek P. Automatic out-of-language detection based on confidence measures derived from LVCSR word and phone lattices. In: Proceedings of the 10th Annual Conference of the International Speech Communication Association. Brighton, UK: ISCA, 2009. 1215-1218[3] Motlicek P, Valente F. Application of out-of-language detection to spoken term detection. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Dallas, USA: IEEE, 2010. 5098-5101[4] Motlicek P, Valente F, Garner P N. English spoken term detection in multilingual recordings. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association. Chiba, Japan: ISCA, 2010. 206-209[5] Li H Z, Ma B, Lee C H. A vector space modeling approach to spoken language identification. IEEE Transactions on Audio, Speech and Language Processing, 2007, 15(1): 271-284[6] Gauvain J L, Messaoudi A, Schwenk H. Language recognition using phone lattices. In: Proceedings of the 8th International Conference on Spoken Language Processing. Jeju Island, Korea: ISCA, 2004. 1283-1286[7] Zissman M A. Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, 1996, 4(1): 31-44[8] Torres-Carrasquillo P A. Language Identification Using Gaussian Mixture Models [Ph.D. dissertation], Michigan State University, USA, 2002[9] Mangu L, Brill E, Stolcke A. Finding consensus in speech recognition: word error minimization and other application of confusion network. Computer Speech and Language, 2000, 14(4): 373-400[10] Campbell W M, Campbell J P, Reynolds D A, Jones D A, Leek T R. Phonetic speaker recognition with support vector machines. In: Proceedings of the Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2003. 1377-1384[11] Young S J, Russell N H, Thornton J H S. Token Passing: a Simple Conceptual Model for Connected Speech Recognition Systems, Technical Report CUED/F-INFENG/TR38, Department of Engineering, Cambridge University, UK, 1989[12] Povey D. Discriminative Training for Large Vocabulary Speech Recognition [Ph. D. dissertation], University of Cambridge, UK, 2004

施引文献

资源附件(0)

访问统计

计量

文章访问数: 2177
HTML全文浏览量: 47
PDF下载量: 1067
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

一种联合语种识别的新型大词汇量连续语音识别算法

doi: 10.3724/SP.J.1004.2012.00366 cstr: 32138.14.SP.J.1004.2012.00366

通讯作者:
单煜翔, 清华大学电子工程系博士研究生. 主要研究方向为语音识别,关键词检测,说话人识别. E-mail: syx06@mails.tsinghua.edu.cn

计量

A Novel Large Vocabulary Continuous Speech Recognition Algorithm Combined with Language Recognition

计量

目录

留言板

一种联合语种识别的新型大词汇量连续语音识别算法

doi: 10.3724/SP.J.1004.2012.00366 cstr: 32138.14.SP.J.1004.2012.00366

通讯作者: 单煜翔, 清华大学电子工程系博士研究生. 主要研究方向为语音识别,关键词检测,说话人识别. E-mail: syx06@mails.tsinghua.edu.cn

计量

出版历程

A Novel Large Vocabulary Continuous Speech Recognition Algorithm Combined with Language Recognition

计量

出版历程

目录

通讯作者:
单煜翔, 清华大学电子工程系博士研究生. 主要研究方向为语音识别,关键词检测,说话人识别. E-mail: syx06@mails.tsinghua.edu.cn