声学模型区分性训练中的动态加权数据选取方法

陈斌; 牛铜; 张连海; 李弼程; 屈丹

doi:10.3724/SP.J.1004.2014.02899

声学模型区分性训练中的动态加权数据选取方法

doi: 10.3724/SP.J.1004.2014.02899

1.
解放军信息工程大学信息系统工程学院郑州 450002

基金项目:

国家自然科学基金(61175017)资助

详细信息

作者简介:
牛铜解放军信息工程大学信息系统工程学院博士研究生. 主要研究方向为语音增强, 语音识别.E-mail: niutong0072@gmail.com

通讯作者:
陈斌解放军信息工程大学信息系统工程学院博士研究生. 主要研究方向为连续语音识别, 区分性训练. 本文通信作者. E-mail: chenbin873335@163.com

计量
- 文章访问数: 1606
- HTML全文浏览量: 35
- PDF下载量: 1204
- 被引次数: 0
出版历程
- 收稿日期: 2013-12-30
- 修回日期: 2014-03-31
- 刊出日期: 2014-12-20

A Variable Weighting Based Training Data Selection Method for Discriminative Training of Acoustic Models

1.
Institute of Information System Engineering, PLA Information Engineering University, Zhengzhou 450002

Funds:

Supported by National Natural Science Foundation of China (61175017)

摘要

摘要: 提出了一种基于动态加权的数据选取方法, 并应用到连续语音识别的声学模型区分性训练中. 该方法联合后验概率和音素准确率选取数据, 首先, 采用后验概率的Beam算法裁剪词图, 在此基础上依据候选词所在候选路径的错误率, 基于后验概率动态的赋予候选词不同的权值; 其次, 通过统计音素对之间的混淆程度, 给易混淆音素对动态地加以不同的惩罚权重, 计算音素准确率; 最后, 在估计得到弧段期望准确率分布的基础上, 采用高斯函数形式对所有竞争弧段的期望音素准确率软加权.实验结果表明, 与最小音素错误准则相比, 该动态加权方法识别准确率提高了0.61%, 可有效减少训练时间.
- 区分性训练 /
- 语音识别 /
- 训练数据选取 /
- 动态加权
Abstract: By combining the phone posterior and phone accuracy, a data selection method based on variable weighting is proposed to improve the discriminative training performance of the acoustic model for continuous speech recognition. Firstly, the word lattice is reduced by using a posterior-based Beam pruning method, and for each hypothesis word a weight is derived from the word error rates of the path containing that word with the posterior. Then, each pair of confusing phones is variably weighted according to a phone confusion matrix, and the modified phone accuracy is calculated by applying those weights. Finally, the distribution of the expected phone accuracies is estimated and all competing arcs are soft weighted using Gaussian functions. Experimental results show that compared with the minimum phone error criterion, the variable weighting method not only improves the recognition rate by 0.61%, but also reduces the required training time.
- Discriminative training /
- speech recognition /
- training data selection /
- variable weighting

HTML全文

参考文献(22)

[1]	Valtchev V, Odell J J, Woodland P C, Young S J. MMIE training of large vocabulary recognition systems. Speech Communication, 1997, 22(4): 303-314
[2]	Juang B H, Chou W, Lee C H. Minimum classification error rate methods for speech recognition. IEEE Transactions on Speech and Audio Processing, 1997, 5(3): 257-265
[3]	Povey D, Woodland P C. Minimum phone error and i-smoothing for improved discriminative training. In: Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing. Florida, USA: IEEE, 2002, 1: 105-108
[4]	Sha F. Large Margin Training of Acoustic Models for Speech Recognition [Ph.D. dissertation], University of Pennsylvania, USA, 2007.
[5]	Li J Y. Soft Margin Estimation for Automatic Speech Recognition [Ph.D. dissertation], Electrical and Computer Engineering, Georgia Institute of Technology, USA, 2008.
[6]	Povey D, Kanevsky D, Kingsbury B, Ramabhadran B. Boosted MMI for model and feature-space discriminative training. In: Proceedings of the 2008 International Conference on Acoustics, Speech, and Signal Processing. Las Vegas, USA: IEEE, 2008. 4057-4060
[7]	Wu Ya-Hui, Liu Gang, Guo Jun. Research on model combination based on model confusion. Acta Automatica Sinica, 2009, 35(5): 551-555 (吴娅辉, 刘刚, 郭军. 基于模型混淆度的模型组合算法研究. 自动化学报, 2009, 35(5): 551-555)
[8]	Huang Hao, Li Bing-Hu, Wushour Silamu. Discriminative model combination using decision tree based phonetic context modeling. Acta Automatica Sinica, 2012, 38(9): 1449-1458 (黄浩, 李兵虎, 吾守尔·斯拉木. 区分性模型组合中基于决策树的声学上下文建模方法. 自动化学报, 2012, 38(9): 1449-1458)
[9]	Seltzer M L, Droppo J. Multi-task learning in deep neural networks for improved phoneme recognition. In: Proceedings of the 2013 International Conference on Acoustics, Speech, and Signal Processing. Vancouver, Canada: IEEE, 2013. 6965-6969
[10]	Kingsbury B, Sainath T N, Soltau H. Scalable minimum Bayes risk training of deep neural network acoustic models using distributed Hessian-free optimization. In: Proceedings of the 13th Annual Confererce of the International Speech Communication Association. Portland, USA: ISCA, 2012.
[11]	Veselý K, Ghoshal A, Burget L, Povey D. Sequence-discriminative training of deep neural networks. In: Proceedings of the 14th Annual Conference of the International Speech Communication Association. Lyon, France: ISCA, 2013. 2345-2349
[12]	Toth L. Phone recognition with deep sparse rectifier neural networks. In: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing. Vancouver, Canada: IEEE, 2013. 6985-6989
[13]	Vinyals O, Deng L. Are sparse representations rich enough for acoustic modeling? In: Proceedings of the 13th Annual Conference of the International Speech Communication Association. Portland, USA: ISCA, 2012.
[14]	Zhang W B, Fung P. Discriminatively trained sparse inverse covariance matrices for low resource acoustic modeling. In: Proceedings of the 14th Annual Conference of the International Speech Communication Association. Lyon, France: ISCA, 2013. 2350-2354
[15]	Liu S H, Chu F H, Lin S H, Lee H S, Chen B. Training data selection for improving discriminative training of acoustic models. In: Proceedings of the 2007 IEEE Workshop on Automatic Speech Recognition & Understanding. Kyoto, Japan: IEEE, 2007. 284-289
[16]	Chen B, Liu S H, Chu F H. Training data selection for improving discriminative training of acoustic models. Pattern Recognition Letters, 2009, 30(13): 1228-1235
[17]	Qin L, Rudnicky A. The effect of lattice pruning on MMIE training. In: Proceedings of the 2010 International Conference on Acoustics, Speech and Signal Processing. Dallas, USA: IEEE, 2010. 4898-4901
[18]	Liu Y, Harper M P, Johnson M T, Jamieson L H. The effect of pruning and compression on graphical representations of the output of a speech recognizer. Computer Speech and Language, 2003, 17(4): 329-356
[19]	Mangu L, Brill E, Stolcke A. Finding consensus in speech recognition: word error minimization and other applications of confusion networks. Computer Speech and Language, 2000, 14(4): 373-400
[20]	Zheng J, Stolcke A. Improved discriminative training using phone attices. In: Proceedings of the 2005 European Confidences Speech Communication and Technology. Lisbon, Portugal: DBLP, 2005. 2125-2128
[21]	Povey D, Kingsbury B. Evaluation of proposed modifications to MPE for large scale discriminative training. In: Proceedings of the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing. Honolulu. HI: IEEE, 2007. 321-324
[22]	Du J, Liu P, Jiang H, Soong F K, Zhou J L, Wang R H. A new minimum divergence approach to discriminative training. In: Proceedings of the 2007 IEEE International Conference on Acoustics, Speech, and Signal Processing. Honolulu, HI: IEEE, 2007. IV-677-IV-680

施引文献

资源附件(0)

访问统计

计量

文章访问数: 1606
HTML全文浏览量: 35
PDF下载量: 1204
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

声学模型区分性训练中的动态加权数据选取方法

doi: 10.3724/SP.J.1004.2014.02899

作者简介:
牛铜解放军信息工程大学信息系统工程学院博士研究生. 主要研究方向为语音增强, 语音识别.E-mail: niutong0072@gmail.com

通讯作者:
陈斌解放军信息工程大学信息系统工程学院博士研究生. 主要研究方向为连续语音识别, 区分性训练. 本文通信作者. E-mail: chenbin873335@163.com

计量

A Variable Weighting Based Training Data Selection Method for Discriminative Training of Acoustic Models

计量

目录

留言板

声学模型区分性训练中的动态加权数据选取方法

doi: 10.3724/SP.J.1004.2014.02899

作者简介: 牛铜 解放军信息工程大学信息系统工程学院博士研究生. 主要研究方向为语音增强, 语音识别.E-mail: niutong0072@gmail.com

通讯作者: 陈斌 解放军信息工程大学信息系统工程学院博士研究生. 主要研究方向为连续语音识别, 区分性训练. 本文通信作者. E-mail: chenbin873335@163.com

计量

出版历程

A Variable Weighting Based Training Data Selection Method for Discriminative Training of Acoustic Models

计量

出版历程

目录

作者简介:
牛铜解放军信息工程大学信息系统工程学院博士研究生. 主要研究方向为语音增强, 语音识别.E-mail: niutong0072@gmail.com

通讯作者:
陈斌解放军信息工程大学信息系统工程学院博士研究生. 主要研究方向为连续语音识别, 区分性训练. 本文通信作者. E-mail: chenbin873335@163.com