2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

声学模型区分性训练中的动态加权数据选取方法

陈斌 牛铜 张连海 李弼程 屈丹

陈斌, 牛铜, 张连海, 李弼程, 屈丹. 声学模型区分性训练中的动态加权数据选取方法. 自动化学报, 2014, 40(12): 2899-2907. doi: 10.3724/SP.J.1004.2014.02899
引用本文: 陈斌, 牛铜, 张连海, 李弼程, 屈丹. 声学模型区分性训练中的动态加权数据选取方法. 自动化学报, 2014, 40(12): 2899-2907. doi: 10.3724/SP.J.1004.2014.02899
CHEN Bin, NIU Tong, ZHANG Lian-Hai, LI Bi-Cheng, QU Dan. A Variable Weighting Based Training Data Selection Method for Discriminative Training of Acoustic Models. ACTA AUTOMATICA SINICA, 2014, 40(12): 2899-2907. doi: 10.3724/SP.J.1004.2014.02899
Citation: CHEN Bin, NIU Tong, ZHANG Lian-Hai, LI Bi-Cheng, QU Dan. A Variable Weighting Based Training Data Selection Method for Discriminative Training of Acoustic Models. ACTA AUTOMATICA SINICA, 2014, 40(12): 2899-2907. doi: 10.3724/SP.J.1004.2014.02899

声学模型区分性训练中的动态加权数据选取方法

doi: 10.3724/SP.J.1004.2014.02899
基金项目: 

国家自然科学基金(61175017)资助

详细信息
    作者简介:

    牛铜 解放军信息工程大学信息系统工程学院博士研究生. 主要研究方向为语音增强, 语音识别.E-mail: niutong0072@gmail.com

    通讯作者:

    陈斌 解放军信息工程大学信息系统工程学院博士研究生. 主要研究方向为连续语音识别, 区分性训练. 本文通信作者. E-mail: chenbin873335@163.com

A Variable Weighting Based Training Data Selection Method for Discriminative Training of Acoustic Models

Funds: 

Supported by National Natural Science Foundation of China (61175017)

  • 摘要: 提出了一种基于动态加权的数据选取方法, 并应用到连续语音识别的声学模型区分性训练中. 该方法联合后验概率和音素准确率选取数据, 首先, 采用后验概率的Beam算法裁剪词图, 在此基础上依据候选词所在候选路径的错误率, 基于后验概率动态的赋予候选词不同的权值; 其次, 通过统计音素对之间的混淆程度, 给易混淆音素对动态地加以不同的惩罚权重, 计算音素准确率; 最后, 在估计得到弧段期望准确率分布的基础上, 采用高斯函数形式对所有竞争弧段的期望音素准确率软加权.实验结果表明, 与最小音素错误准则相比, 该动态加权方法识别准确率提高了0.61%, 可有效减少训练时间.
  • [1] Valtchev V, Odell J J, Woodland P C, Young S J. MMIE training of large vocabulary recognition systems. Speech Communication, 1997, 22(4): 303-314
    [2] Juang B H, Chou W, Lee C H. Minimum classification error rate methods for speech recognition. IEEE Transactions on Speech and Audio Processing, 1997, 5(3): 257-265
    [3] Povey D, Woodland P C. Minimum phone error and i-smoothing for improved discriminative training. In: Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing. Florida, USA: IEEE, 2002, 1: 105-108
    [4] Sha F. Large Margin Training of Acoustic Models for Speech Recognition [Ph.D. dissertation], University of Pennsylvania, USA, 2007.
    [5] Li J Y. Soft Margin Estimation for Automatic Speech Recognition [Ph.D. dissertation], Electrical and Computer Engineering, Georgia Institute of Technology, USA, 2008.
    [6] Povey D, Kanevsky D, Kingsbury B, Ramabhadran B. Boosted MMI for model and feature-space discriminative training. In: Proceedings of the 2008 International Conference on Acoustics, Speech, and Signal Processing. Las Vegas, USA: IEEE, 2008. 4057-4060
    [7] Wu Ya-Hui, Liu Gang, Guo Jun. Research on model combination based on model confusion. Acta Automatica Sinica, 2009, 35(5): 551-555 (吴娅辉, 刘刚, 郭军. 基于模型混淆度的模型组合算法研究. 自动化学报, 2009, 35(5): 551-555)
    [8] Huang Hao, Li Bing-Hu, Wushour Silamu. Discriminative model combination using decision tree based phonetic context modeling. Acta Automatica Sinica, 2012, 38(9): 1449-1458 (黄浩, 李兵虎, 吾守尔·斯拉木. 区分性模型组合中基于决策树的声学上下文建模方法. 自动化学报, 2012, 38(9): 1449-1458)
    [9] Seltzer M L, Droppo J. Multi-task learning in deep neural networks for improved phoneme recognition. In: Proceedings of the 2013 International Conference on Acoustics, Speech, and Signal Processing. Vancouver, Canada: IEEE, 2013. 6965-6969
    [10] Kingsbury B, Sainath T N, Soltau H. Scalable minimum Bayes risk training of deep neural network acoustic models using distributed Hessian-free optimization. In: Proceedings of the 13th Annual Confererce of the International Speech Communication Association. Portland, USA: ISCA, 2012.
    [11] Veselý K, Ghoshal A, Burget L, Povey D. Sequence-discriminative training of deep neural networks. In: Proceedings of the 14th Annual Conference of the International Speech Communication Association. Lyon, France: ISCA, 2013. 2345-2349
    [12] Toth L. Phone recognition with deep sparse rectifier neural networks. In: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing. Vancouver, Canada: IEEE, 2013. 6985-6989
    [13] Vinyals O, Deng L. Are sparse representations rich enough for acoustic modeling? In: Proceedings of the 13th Annual Conference of the International Speech Communication Association. Portland, USA: ISCA, 2012.
    [14] Zhang W B, Fung P. Discriminatively trained sparse inverse covariance matrices for low resource acoustic modeling. In: Proceedings of the 14th Annual Conference of the International Speech Communication Association. Lyon, France: ISCA, 2013. 2350-2354
    [15] Liu S H, Chu F H, Lin S H, Lee H S, Chen B. Training data selection for improving discriminative training of acoustic models. In: Proceedings of the 2007 IEEE Workshop on Automatic Speech Recognition & Understanding. Kyoto, Japan: IEEE, 2007. 284-289
    [16] Chen B, Liu S H, Chu F H. Training data selection for improving discriminative training of acoustic models. Pattern Recognition Letters, 2009, 30(13): 1228-1235
    [17] Qin L, Rudnicky A. The effect of lattice pruning on MMIE training. In: Proceedings of the 2010 International Conference on Acoustics, Speech and Signal Processing. Dallas, USA: IEEE, 2010. 4898-4901
    [18] Liu Y, Harper M P, Johnson M T, Jamieson L H. The effect of pruning and compression on graphical representations of the output of a speech recognizer. Computer Speech and Language, 2003, 17(4): 329-356
    [19] Mangu L, Brill E, Stolcke A. Finding consensus in speech recognition: word error minimization and other applications of confusion networks. Computer Speech and Language, 2000, 14(4): 373-400
    [20] Zheng J, Stolcke A. Improved discriminative training using phone attices. In: Proceedings of the 2005 European Confidences Speech Communication and Technology. Lisbon, Portugal: DBLP, 2005. 2125-2128
    [21] Povey D, Kingsbury B. Evaluation of proposed modifications to MPE for large scale discriminative training. In: Proceedings of the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing. Honolulu. HI: IEEE, 2007. 321-324
    [22] Du J, Liu P, Jiang H, Soong F K, Zhou J L, Wang R H. A new minimum divergence approach to discriminative training. In: Proceedings of the 2007 IEEE International Conference on Acoustics, Speech, and Signal Processing. Honolulu, HI: IEEE, 2007. IV-677-IV-680
  • 加载中
计量
  • 文章访问数:  1584
  • HTML全文浏览量:  35
  • PDF下载量:  1194
  • 被引次数: 0
出版历程
  • 收稿日期:  2013-12-30
  • 修回日期:  2014-03-31
  • 刊出日期:  2014-12-20

目录

    /

    返回文章
    返回