2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于卷积神经网络的鲁棒性基音检测方法

张晖 苏红 张学良 高光来

张晖, 苏红, 张学良, 高光来. 基于卷积神经网络的鲁棒性基音检测方法. 自动化学报, 2016, 42(6): 959-964. doi: 10.16383/j.aas.2016.c150672
引用本文: 张晖, 苏红, 张学良, 高光来. 基于卷积神经网络的鲁棒性基音检测方法. 自动化学报, 2016, 42(6): 959-964. doi: 10.16383/j.aas.2016.c150672
ZHANG Hui, SU Hong, ZHANG Xue-Liang, GAO Guang-Lai. Convolutional Neural Network for Robust Pitch Determination. ACTA AUTOMATICA SINICA, 2016, 42(6): 959-964. doi: 10.16383/j.aas.2016.c150672
Citation: ZHANG Hui, SU Hong, ZHANG Xue-Liang, GAO Guang-Lai. Convolutional Neural Network for Robust Pitch Determination. ACTA AUTOMATICA SINICA, 2016, 42(6): 959-964. doi: 10.16383/j.aas.2016.c150672

基于卷积神经网络的鲁棒性基音检测方法

doi: 10.16383/j.aas.2016.c150672
基金项目: 

国家自然科学基金 61365006, 61263037

详细信息
    作者简介:

    张晖 内蒙古大学博士研究生. 分别于2011年和2014年获得内蒙古大学学士和硕士学位. 主要研究方向为语音信号处理, 语音分离和机器学习. E-mail: alzhu.san@163.com

    苏红 内蒙古大学硕士研究生. 2013年获得内蒙古师范大学学士学位. 主要研究方向为语音信号处理和机器学习. E-mail: sh123imu@163.com

    高光来 内蒙古大学计算机学院教授. 1985年获得内蒙古大学学士学位, 1988年获得国防科技大学硕士学位. 主要研究方向为人工智能与模式识别. E-mail: csggl@imu.edu.cn

    通讯作者:

    张学良内蒙古大学计算机学院副教授. 2003 年获得内蒙古大学学士学位, 2005 年获得哈尔滨工业大学硕士学位, 2010 年获得中国科学院自动化研究所博士学位. 主要研究方向为语音分离, 听觉场景分析和语音信号处理. 本文通信作者. E-mail: cszxl@imu.edu.cn

Convolutional Neural Network for Robust Pitch Determination

Funds: 

National Natural Science Foundation of China 61365006, 61263037

More Information
    Author Bio:

    ZHANG Hui Ph. D. candidate at Inner Mongolia University. He received his B. S. and M. S. degrees from Inner Mongolia University in 2011 and 2014, respectively. His research interest covers audio signal processing, speech separation, and machine learning algorithms

    SU Hong Master student at Inner Mongolia University. She received her B. S. degree from Inner Mongolia Normal University in 2013. Her research interest covers audio signal processing and machine learning

    GAO Guang-Lai Professor in the Department of Computer Science, Inner Mongolia University. He received his B. S. degree from Inner Mongolia University in 1985, and received his M. S. degree from the National University of Defense Technology in 1988. His research interest covers arti¯cial intelligence and pattern recognition

    Corresponding author: ZHANG Xue-Liang Associate professor in the Department of Computer Science, Inner Mongolia University. He received his B. S. degree from the Inner Mongolia University in 2003, the M. S. degree from Harbin Institute of Technology in 2005, and the Ph. D. degree from the Institute of Automation, Chinese Academy of Sciences in 2010. His research interest covers speech separation, computational auditory scene analysis, and speech signal processing. Corresponding author of this paper
  • 摘要: 在语音信号中, 基音是一个重要参数, 且有重要用途. 然而, 检测噪声环境中语音的基音却是一项难度较大的工作. 由于卷积神经网络(Convolutional neural network, CNN)具有平移不变性, 能够很好地刻画语谱图中的谐波结构, 因此我们提出使用CNN来完成这项工作. 具体地, 我们使用CNN来选取候选基音, 再用动态规划方法(Dynamic programming, DP)进行基音追踪, 生成连续的基音轮廓. 实验表明, 与其他方法相比, 本文的方法具有明显的性能优势, 并且 对新的说话人和噪声有很好的泛化性能, 具有更好的鲁棒性.
  • 图  1  语谱图中的谐波结构(小方框中的局部模式重复出现)

    Fig.  1  Harmonic structure in spectrogram (The patterns in small windows are repeated. See the ones in the two black boxes.)

    图  2  基频估计算法流程

    Fig.  2  The proposed pitch determination algorithm

    图  3  CNN 的网络结构

    Fig.  3  Structure of the proposed CNN

    图  4  基音检测示例(图中所用语料是一个男声语音和机器噪声按照0 dB 混合而成的)

    Fig.  4  Example output of the proposed pitch determination method (The example mixture is a male utterance which is mixed with machine noise at 0 dB.)

    图  5  性能对比图

    Fig.  5  Performance comparisons

    表  1  本文方法参数设置表

    Table  1  Parameters setting of our method

    DR VDE
    SNR -5 0 5 10 -5 0 5 10
    说话人相关测试集 见过的噪声 CNN 0.5342 0.7179 0.8049 0.8292 0.264 0.1753 0.114 0.0994
    DNN 0.4747 0.6659 0.7664 0.7994 0.2713 0.1746 0.1083 0.0951
    PEFAC 0.4248 0.6131 0.7478 0.8187 0.3127 0.2443 0.1862 0.1413
    Jin 0.2622 0.4316 0.535 0.6042 0.3751 0.3021 0.2565 0.2244
    新噪声 CNN 0.4211 0.6278 0.7671 0.8224 0.3166 0.2287 0.1524 0.1133
    DNN 0.3720.58880.73690.79340.32160.22160.14990.1154
    PEFAC 0.32240.52910.70110.79880.38440.31250.24010.1815
    Jin 0.29980.44030.5420.6070.39540.33240.28380.2484
    说话人不相关测试集见过的噪声CNN 0.44950.61770.72280.76990.33340.21560.14450.1242
    DNN 0.36240.54490.66350.71770.36850.24780.18270.159
    PEFAC 0.36110.53020.66220.74210.31720.25460.2030.1624
    Jin 0.25520.45240.57310.65380.38070.30740.26160.2293
    新噪声CNN 0.30970.48990.63060.69610.37240.2840.18750.1302
    DNN 0.27140.44270.57620.64890.36890.27690.20260.1633
    PEFAC 0.29990.46190.59020.67010.36310.29530.23480.1857
    Jin 0.2680.40450.53620.6030.39810.33390.28450.2482
    ACC测试集新噪声CNN 0.32680.47390.59380.65190.39310.3160.22220.16
    DNN 0.26850.40530.50.54250.40960.35160.28960.2519
    PEFAC 0.27510.42010.53420.60510.38930.3190.25830.2102
    Jin 0.2207 0.3624 0.4592 0.4642 0.4647 0.4002 0.3465 0.2822
    下载: 导出CSV
  • [1] Kun H, Wang D L. A classification based approach to speech segregation. The Journal of the Acoustical Society of America, 2012, 132(5): 3475-3483
    [2] Zhao X J, Shao Y, Wang D L. CASA-based robust speaker identification. IEEE Transactions on Audio, Speech,&Language Processing, 2012, 20(5): 1608-1616
    [3] Huang F, Lee T. Pitch estimation in noisy speech using accumulated peak spectrum and sparse estimation technique. IEEE Transactions on Audio, Speech,&Language Processing, 2013, 21(1): 99-109
    [4] Rabiner L. On the use of autocorrelation analysis for pitch detection. IEEE Transactions on Acoustics, Speech,&Signal Processing, 1977, 25(1): 24-33
    [5] Wu M Y, Wang D L, Brown G J. A multipitch tracking algorithm for noisy speech. IEEE Transactions on Speech&Audio Processing, 2003, 11(3): 229-241
    [6] Gonzalez S, Brookes M. PEFAC——a pitch estimation algorithm robust to high levels of noise. IEEE/ACM Transactions on Audio, Speech,&Language Processing, 2014, 22(2): 518-530
    [7] Zhang H, Zhang X, Nie S, Gao G, Liu W. A pairwise algorithm for pitch estimation and speech separation using deep stacking network. In: Proceedings of the 2015 IEEE International Conference on Acoustics, Speech&Signal Processing (ICASSP). South Brisbane, QLD: IEEE, 2015. 246-250
    [8] Ciresan D, Meier U, Schmidhuber J. Multi-column deep neural networks for image classification. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI: IEEE, 2012. 3642-3649
    [9] Hinton G, Deng L, Yu D, Dahl G E, Mohamed A, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath T N, Kingsbury B. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Processing Magazine, 2012, 29(6): 82-97
    [10] Han K, Wang D L. Neural network based pitch tracking in very noisy speech. IEEE/ACM Transactions on Audio, Speech,&Language Processing, 2014, 22(12): 2158-2168
    [11] Kasi K, Zahorian S A. Yet another algorithm for pitch tracking. In: Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Orlando, FL, USA: IEEE, 2002. I-361-I-364
    [12] Hu G N. 100 nonspeech sounds[Online], available: http://www.cse.ohio-state.edu/pnl/corpus/HuCorpus.html, April 1, 2006.
    [13] Giannoulis D, Benetos E, Stowell D, Rossignol M, Lagrange M, Plumbley M D. Detection and classification of acoustic scenes and events: an IEEE AASP challenge. In: Proceedings of the 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). New Paltz, NY: IEEE, 2013. 1-4
    [14] Boersma P, Weenink D J M. PRAAT, a system for doing phonetics by computer. Glot International, 2001, 5(9-10): 341-345
    [15] Tieleman T, Hinton G. Lecture 6.5——RMSprop. COURSERA: Neural Networks for Machine Learning, 2012.
    [16] Jin Z Z, Wang D L. Hmm-based multipitch tracking for noisy and reverberant speech. IEEE Transactions on Audio, Speech,&Language Processing, 2011, 19(5): 1091-1102
  • 加载中
图(5) / 表(1)
计量
  • 文章访问数:  2585
  • HTML全文浏览量:  502
  • PDF下载量:  1713
  • 被引次数: 0
出版历程
  • 收稿日期:  2015-10-29
  • 录用日期:  2016-04-01
  • 刊出日期:  2016-06-20

目录

    /

    返回文章
    返回