A Single-channel Speech Enhancement Approach Based on Perceptual Masking Deep Neural Network
-
摘要: 本文将心理声学掩蔽特性应用于基于深度神经网络(Deep neural network,DNN)的单通道语音增强任务中,提出了一种具有感知掩蔽特性的DNN结构.首先,提出的DNN对带噪语音幅度谱特征进行训练并分别得到纯净语音和噪声的幅度谱估计.其次,利用估计的纯净语音幅度谱计算噪声掩蔽阈值.然后,将噪声掩蔽阈值和估计的噪声幅度谱联合计算得到一个感知增益函数.最后,利用感知增益函数从带噪语音幅度谱中估计出增强语音幅度谱.在TIMIT数据库上,对不同信噪比下的20种噪声进行的仿真实验表明,无论噪声类型是否在语音的训练集中出现,所提出的感知掩蔽DNN都能够在有效去除噪声的同时保持较小的语音失真,增强效果明显优于常见的DNN增强方法以及NMF(Nonnegative matrix factorization)增强方法.Abstract: A new deep neural network (DNN) is proposed for single-channel speech enhancement, which incorporates the perceptual masking properties of psychoacoustic models. Firstly, the proposed DNN is trained to learn both the clean speech magnitude spectrum and the noise magnitude spectrum from the noisy magnitude spectrum. Secondly, the estimated clean speech magnitude spectrum is used to calculate the noise masking threshold. Then, the noise masking threshold and the estimated noise magnitude spectrum are combined to calculate a perceptual gain function. Finally, the enhanced speech magnitude spectrum are obtained by jointly training the perceptual gain function and the noisy speech magnitude spectrum. Experimental results on TIMIT with 20 noise types at various SNR (signal-noise ratio) levels demonstrate that the proposed perceptual masking DNN can effectively remove the noise while maintaining small speech distortion, so as to obtain better performance than the common DNN methods and the NMF (nonnegative matrix factorization) method, no matter noise conditions are included in the training set or not.
-
Key words:
- Speech enhancement /
- deep neural network /
- perceptual gain function /
- masking threshold
-
-
表 1 4种信噪比下, 不同方法对20种噪声的PESQ均值
Table 1 The PESQ scores of different methods at four different input SNR levels (For each condition, the numbers are the mean values over all the 20 noise types.)
SNR (dB) NMF DNN IRM-DNN PM-DNN (First output) PM-DNN NMF (Mask) DNN (Mask) IRM-DNN (Mask) PM-DNN (Mask) -5 1.705 1.74 1.787 1.732 1.875 1.701 1.775 1.74 1.834 0 2.002 1.995 2.061 1.996 2.165 1.995 2.034 2.015 2.122 5 2.261 2.194 2.35 2.256 2.445 2.262 2.284 2.308 2.411 10 2.524 2.35 2.631 2.518 2.714 2.52 2.535 2.596 2.691 -
[1] Boll S F. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1979, 27(2):113-120 doi: 10.1109/TASSP.1979.1163209 [2] Chen J D, Benesty J, Huang Y T, Doclo S. New insights into the noise reduction Wiener filter. IEEE Transactions on Audio, Speech and Language Processing, 2006, 14(4):1218-1234 doi: 10.1109/TSA.2005.860851 [3] Ephraim Y, Malah D. Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1984, 32(6):1109-1121 doi: 10.1109/TASSP.1984.1164453 [4] Gerkmann T, Hendriks R C. Unbiased MMSE-based noise power estimation with low complexity and low tracking delay. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(4):1383-1393 doi: 10.1109/TASL.2011.2180896 [5] Jensen J R, Benesty J, Christensen M G, Jensen S H. Enhancement of single-channel periodic signals in the time-domain. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(7):1948-1963 doi: 10.1109/TASL.2012.2191957 [6] Wilson K W, Raj B, Smaragdis P, Divakaran A. Speech denoising using nonnegative matrix factorization with priors. In:Proceedings of the 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. Las Vegas, USA:IEEE, 2008. 4029-4032 [7] Sun C L, Zhu Q, Wan M H. A novel speech enhancement method based on constrained low-rank and sparse matrix decomposition. Speech Communication, 2014, 60:44-55 doi: 10.1016/j.specom.2014.03.002 [8] Sun M, Li Y N, Gemmeke J, Zhang X W. Speech enhancement under low SNR conditions via noise estimation using sparse and low-rank NMF with Kullback-Leibler divergence. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(7):1233-1242 doi: 10.1109/TASLP.2015.2427520 [9] Xu Y, Du J, Dai L R, Lee C H. A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(1):7-19 doi: 10.1109/TASLP.2014.2364452 [10] Huang P S, Kim M, Hasegawa-Johnson M, Smaragdis P. Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(12):2136-2147 doi: 10.1109/TASLP.2015.2468583 [11] Wang Y X, Narayanan A, Wang D L. On training targets for supervised speech separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(12):1849-1858 doi: 10.1109/TASLP.2014.2352935 [12] Sun M, Zhang X W, Van hamme H, Zheng T F. Unseen noise estimation using separable deep auto encoder for speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(1):93-104 doi: 10.1109/TASLP.2015.2498101 [13] Williamson D S, Wang Y X, Wang D L. Complex ratio masking for monaural speech separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(3):483-492 doi: 10.1109/TASLP.2015.2512042 [14] Narayanan A, Wang D L. Improving robustness of deep neural network acoustic models via speech separation and joint adaptive training. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(1):92-101 https://www.researchgate.net/publication/273296153_Improving_Robustness_of_Deep_Neural_Network_Acoustic_Models_via_Speech_Separation_and_Joint_Adaptive_Training [15] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18(7):1527-1554 doi: 10.1162/neco.2006.18.7.1527 [16] Bengio Y. Learning deep architectures for AI. Foundations and Trends® in Machine Learning, 2009, 2(1):1-127 doi: 10.1561/2200000006 [17] Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. In:Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. Fort Lauderdale, USA:JMLR, 2011. 315-323 [18] 张勇, 刘轶, 刘宏.结合人耳听觉感知的两级语音增强算法.信号处理, 2014, 30(4):363-373 http://www.cnki.com.cn/Article/CJFDTOTAL-XXCN201404001.htmZhang Yong, Liu Yi, Liu Hong. A two-stage speech enhancement algorithm combined with human auditory perception. Journal of Signal Processing, 2014, 30(4):363-373 http://www.cnki.com.cn/Article/CJFDTOTAL-XXCN201404001.htm [19] Johnston J D. Transform coding of audio signals using perceptual noise criteria. IEEE Journal on Selected Areas in Communications, 1988, 6(2):314-323 doi: 10.1109/49.608 [20] Udrea R M, Vizireanu N D, Ciochina S. An improved spectral subtraction method for speech enhancement using a perceptual weighting filter. Digital Signal Processing, 2008, 18(4):581-587 doi: 10.1016/j.dsp.2007.08.002 [21] Hu Y, Loizou P C. Incorporating a psychoacoustical model in frequency domain speech enhancement. IEEE Signal Processing Letters, 2004, 11(2):270-273 doi: 10.1109/LSP.2003.821714 [22] Rix A W, Beerends J G, Hollier M P, Hekstra A P. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In:Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Salt Lake City, USA:IEEE, 2001. 749-752 [23] 邹霞, 陈亮, 张雄伟.基于Gamma语音模型的语音增强算法.通信学报, 2006, 27(10):118-123 http://www.cnki.com.cn/Article/CJFDTOTAL-TXXB200610019.htmZou Xia, Chen Liang, Zhang Xiong-Wei. Speech enhancement with Gamma speech modeling. Journal on Communications, 2006, 27(10):118-123 http://www.cnki.com.cn/Article/CJFDTOTAL-TXXB200610019.htm [24] Hu Y, Loizou P C. Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 2008, 16(1):229-238 doi: 10.1109/TASL.2007.911054 [25] Huang P S, Kim M, Hasegawa-Johnson M, Smaragdis P. Deep learning for monaural speech separation. In:Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing. Florence, Italy:IEEE, 2014. 1562-1566 期刊类型引用(29)
1. 李倩,聂简,黄鸿殿,孔庆宇,奔粤阳. 基于大脑海马认知机理的主从式AUV协同定位方法. 中国惯性技术学报. 2024(01): 27-33 . 百度学术
2. 游雄,李科,田江鹏,杨剑,余岸竹,贾奋励. 机器地图信息加工模型. 武汉大学学报(信息科学版). 2024(04): 516-526 . 百度学术
3. 高昊,王仁茂. 基于类脑仿生的环境感知技术. 舰船电子对抗. 2024(05): 42-46+55 . 百度学术
4. 陈荟慧,钟委钊. 基于人机协作的高质量城市图像采集方法. 应用科学学报. 2023(05): 801-814 . 百度学术
5. 朱祥维,沈丹,肖凯,马岳鑫,廖祥,古富强,余芳文,高柯夫,刘经南. 类脑导航的机理、算法、实现与展望. 航空学报. 2023(19): 6-38 . 百度学术
6. 于乃功,廖诣深. 基于鼠脑内嗅—海马认知机制的移动机器人空间定位模型. 生物医学工程学杂志. 2022(02): 217-227 . 百度学术
7. 刘溢,阳加远,张驰. 一种基于RTX的移动机器人实时控制平台. 电子技术与软件工程. 2022(08): 169-172 . 百度学术
8. 于子航,王改云. 基于路径积分强化的机器人目标导向运动控制. 计算机仿真. 2022(07): 412-415+516 . 百度学术
9. 董卫华,刘毅龙,黑巧松,杨天宇. 泛地图空间认知理论与方法研究框架. 武汉大学学报(信息科学版). 2022(12): 2007-2014 . 百度学术
10. 阮晓钢,李鹏,朱晓庆,刘鹏飞. 基于目标导向行为和空间拓扑记忆的视觉导航方法. 计算机学报. 2021(03): 594-608 . 百度学术
11. 赵辰豪,吴德伟,韩昆,代传金. 无环境信息下多尺度网格细胞群空间表征模型. 系统工程与电子技术. 2021(03): 814-822 . 百度学术
12. 阮晓钢,柴洁,武悦,张晓平,黄静. 基于海马体位置细胞的认知地图构建与导航. 自动化学报. 2021(03): 666-677 . 本站查看
13. 冀俊忠,刘金铎,邹爱笑,杨翠翠. 一种融合多源信息的脑效应连接网络蚁群学习算法. 自动化学报. 2021(04): 864-881 . 本站查看
14. 万刚,武易天. 地图空间认知的数学基础. 测绘学报. 2021(06): 726-738 . 百度学术
15. 洪涛,史涛,任红格. 一种改进型RatSLAM算法构建认知地图的研究. 现代计算机. 2021(21): 47-52 . 百度学术
16. 韩昆,吴德伟,来磊. 类脑导航中基于差分Hebbian学习的网格细胞构建模型. 系统工程与电子技术. 2020(03): 674-679 . 百度学术
17. 黄宜庆,王正刚,王徽,葛愿. 基于边缘梯度算法的多移动机器人协作地图构建. 信息与控制. 2020(01): 62-68 . 百度学术
18. 于乃功,廖诣深,郑相国. 一种基于海马位置细胞选择机制的空间认知模型. 生物医学工程学杂志. 2020(01): 27-37 . 百度学术
19. 胡小平,毛军,范晨,张礼廉,何晓峰,韩国良,范颖. 仿生导航技术综述. 导航定位与授时. 2020(04): 1-10 . 百度学术
20. 于乃功,冯慧,廖诣深,郑相国. 一种基于感知速度与感知角度的网格野计算模型. 生物医学工程学杂志. 2020(05): 863-874 . 百度学术
21. 晁丽君,熊智,杨闯,华冰,王雅婷,刘建业. 无人飞行器三维类脑SLAM自主导航方法. 飞控与探测. 2020(05): 35-43 . 百度学术
22. 张孝伍. 图上的概率分布及位置方向信息的表征方法. 青岛理工大学学报. 2019(01): 113-121 . 百度学术
23. 方略,何洪军. 基于鼠脑海马位置细胞与Q学习面向目标导航. 生物信息学. 2019(01): 31-38 . 百度学术
24. 王均,凌有铸,王静. 基于特征融合的仿生SLAM算法研究. 安徽工程大学学报. 2019(02): 26-33 . 百度学术
25. 刘建业,杨闯,熊智,赖际舟,熊骏. 无人机类脑吸引子神经网络导航技术. 导航定位与授时. 2019(05): 52-60 . 百度学术
26. 韩昆,吴德伟,来磊,杨林. 自主导航条件下网格细胞放电模型. 电子科技大学学报. 2019(05): 711-716 . 百度学术
27. 丛明,邹强,刘冬,杜宇. 定位细胞认知机理启发的机器人导航研究综述. 机械工程学报. 2019(23): 1-12 . 百度学术
28. 邹强,丛明,刘冬,杜宇. 仿鼠脑海马的机器人地图构建与路径规划方法. 华中科技大学学报(自然科学版). 2018(12): 83-88 . 百度学术
29. 吴德伟,何晶,韩昆,李卉. 无人作战平台认知导航及其类脑实现思想. 空军工程大学学报(自然科学版). 2018(06): 33-38 . 百度学术
其他类型引用(29)
-