2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

特征空间本征音说话人自适应

屈丹 杨绪魁 张文林

屈丹, 杨绪魁, 张文林. 特征空间本征音说话人自适应. 自动化学报, 2015, 41(7): 1244-1252. doi: 10.16383/j.aas.2015.c140644
引用本文: 屈丹, 杨绪魁, 张文林. 特征空间本征音说话人自适应. 自动化学报, 2015, 41(7): 1244-1252. doi: 10.16383/j.aas.2015.c140644
QU Dan, YANG Xu-Kui, ZHANG Wen-Lin. Feature Space Eigenvoice Speaker Adaptation. ACTA AUTOMATICA SINICA, 2015, 41(7): 1244-1252. doi: 10.16383/j.aas.2015.c140644
Citation: QU Dan, YANG Xu-Kui, ZHANG Wen-Lin. Feature Space Eigenvoice Speaker Adaptation. ACTA AUTOMATICA SINICA, 2015, 41(7): 1244-1252. doi: 10.16383/j.aas.2015.c140644

特征空间本征音说话人自适应

doi: 10.16383/j.aas.2015.c140644
基金项目: 

国家自然科学基金(61175017, 61403415, 61302107)资助

详细信息
    作者简介:

    杨绪魁中国人民解放军信息工程大学信息系统工程学院博士研究生. 主要研究方向为语音信号处理, 语音识别.E-mail: gzyangxk@163.com

Feature Space Eigenvoice Speaker Adaptation

Funds: 

Supported by National Natural Science Foundation of China (61175017, 61403415, 61302107)

  • 摘要: 提出了特征空间本征音说话人自适应算法,该方法首先借鉴RATZ 算法的思想,采用高斯混合模型对特征空间中的说话人信息进行建模;其次利用 子空间方法实现对特征补偿项的估计,减少估计参数的数量,在对特征空间精确建 模的同时,降低了算法对自适应数据量的需求.基于微软语料库的中文连续语 音识别实验表明,该算法在自适应数据量极少时仍能取得较好的性能,配合说话人自适 应训练能够进一步降低词错误率,其实时性优于本征音说话人自适应算法.
  • [1] Teng W X, Gravier G, Bimbot F, Soufflet F. Speaker adaptation by variable reference model subspace and application to large vocabulary speech recognition. In: Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. Taiwan, China: IEEE, 2009. 4381-4384
    [2] Zhang W L, Zhang W Q, Li B C, Qu D, Johnson M T. Bayesian speaker adaptation based on a new hierarchical probabilistic model. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(7): 2002-2015
    [3] Zhang W L, Qu D, Zhang W Q, Li B C. Rapid speaker adaptation using compressive sensing. Speech Communication, 2013, 55(10): 950-963
    [4] Kenny P, Boulianne G, Dumouchel P. Eigenvoice modeling with sparse training data. IEEE Transactions on Speech and Audio Processing, 2005, 13(3): 345-354
    [5] Varadarajan B, Povey D, Chu S M. Quick FMLLR for speaker adaptation in speech recognition. In: Proceedings of the 2008 International Conference on Acoustics, Speech, and Signal Processing. Las Vegas, Nevada, USA: IEEE, 2008. 4297-4300
    [6] Ghoshal A, Povey D, Agarwal M, Akyazi P, Burget L, Kai Feng, Glembek O, Goel N, Karafiat M, Rastrow A, Rose R C, Schwarz P, Thomas S. A novel estimation of feature-space MLLR for full-covariance models. In: Proceedings of the 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing. Dallas, TX, USA: IEEE, 2010. 4310-4313
    [7] Rath S P, Povey D, Vesely K, Cernocky J. Improved feature processing for Deep Neural Networks. In: Proceedings of the 14th Annual Conference of the International Speech Communication Association, Lyon, France: ISCA, 2013. 109-113
    [8] Rath S P, Burget L, Karafiát M, Glembek O, Cernocký J. A region-specific feature-space transformation for speaker adaptation and singularity analysis of Jacobian matrix. In: Proceedings of the 2013 Annual Conference of International Speech Communication Association. Lyon, France: ISCA, 2013. 1228-1232
    [9] Ghalehjegh S H, Rose R C. Two-stage speaker adaptation in subspace gaussian mixture models. In: Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing. Florence, Italy: IEEE, 2014. 6324-6328
    [10] Chen S, Kingsbury B, Mangu L, Povey D, Saon G, Soltau H, Zweig G. Advances in speech transcription at IBM under the DARPA EARS program. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(5): 1596-1608
    [11] Saon G, Chien J T. Large-vocabulary continuous speech recognition systems: a look at some recent advances. IEEE Signal Processing Magazine, 2012, 29(6): 18-33
    [12] Joshi V, Prasad V N, Umesh S. Modified cepstral mean normalization-transforming to utterance specific non-zero mean. In: Proceedings of the 2013 Annual Conference of International Speech Communication Association. Lyon, France: ISCA, 2013. 881-885
    [13] Buera L, Lleida E, Miguel A, Ortega A, Saz O. Cepstral vector normalization based on stereo data for robust speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(3): 1098-1113
    [14] Droppo J, Deng L, Acero A. Uncertainty decoding with SPLICE for noise robust speech recognition. In: Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing. Orlando, FL, USA: IEEE, 2002. I-57-I-60
    [15] Moreno P J, Raj B, Stern R M. Data-driven environmental compensation for speech recognition: a unified approach. Speech Communication, 1998, 24(4): 267-285
    [16] Wang Y Q, Gales M J F. Model-based approaches to adaptive training in reverberant environments. In: Proceedings of the 2012 Annual Conference of International Speech Communication Association. Portland, Oregon: ISCA, 2012. 959-963
    [17] Ochiai T, Matsuda S, Lu X G, Hori C, Katagiri S. Speaker adaptive training using deep neural networks. In: Proceedings of the 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing. Florence, Italy: IEEE, 2014. 6349-6353
    [18] Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlicek P, Qian YM, Schwarz P, Silovsky J, Stemmer G, Vesely K. The Kaldi speech recognition toolkit. In: Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition and Understanding. Hawaii, USA: IEEE, 2011.
    [19] Eric C, Zhou J L, Shi Y, Huang C. Speech lab in a box: a Mandarin speech toolbox to jumpstart speech related research. In: Proceedings of the 2001 European Conference on Speech Communication and Technology. Scandinavia, Aalborg, Denmark: ISCA, 2001. 2799-2782
  • 加载中
计量
  • 文章访问数:  1973
  • HTML全文浏览量:  116
  • PDF下载量:  1250
  • 被引次数: 0
出版历程
  • 收稿日期:  2014-09-12
  • 修回日期:  2015-01-24
  • 刊出日期:  2015-07-20

目录

    /

    返回文章
    返回