基于感知掩蔽深度神经网络的单通道语音增强方法

韩伟; 张雄伟; 闵刚; 张启业

doi:10.16383/j.aas.2017.c150719

基于感知掩蔽深度神经网络的单通道语音增强方法

doi: 10.16383/j.aas.2017.c150719

韩伟^1,,
张雄伟^1, ,,
闵刚^1,2,,
张启业^3,

1.
解放军理工大学南京 210007
2.
西安通信学院西安 710106
3.
中国人民解放军96637部队北京 102101

基金项目:

国家自然科学基金 61471394

江苏省自然科学基金 BK20140074

国家自然科学基金 61402519

江苏省自然科学基金 BK20140071

详细信息

作者简介:
韩伟解放军理工大学指挥信息系统学院博士研究生.2013年获得解放军理工大学硕士学位.主要研究方向为语音信号处理技术, 深度学习和语音分离.E-mail:lan3533065@163.com

闵刚解放军理工大学指挥信息系统学院博士研究生.西安通信学院讲师.2008年获得解放军理工大学硕士学位.主要研究方向为语音信号处理理论与技术, 语音编码, 语音增强.E-mail:mgxaty@gmail.com

张启业解放军96637部队助理工程师.2013年获得解放军理工大学硕士学位.主要研究方向为光通信理论与技术.E-mail:wangwangzhang555@163.com

通讯作者:
张雄伟解放军理工大学指挥信息系统学院教授.1992年获得南京通信工程学院博士学位.主要研究方向为智能信息处理, 语音与图像信号处理, 数字通信.本文通信作者.E-mail:xwzhang9898@163.com

计量
- 文章访问数: 3281
- HTML全文浏览量: 1301
- PDF下载量: 876
- 被引次数: 58
出版历程
- 收稿日期: 2015-10-31
- 录用日期: 2016-06-06
- 刊出日期: 2017-02-01

A Single-channel Speech Enhancement Approach Based on Perceptual Masking Deep Neural Network

HAN Wei^1
,,
ZHANG Xiong-Wei^{1
, ,},
MIN Gang^{1,2
,},
ZHANG Qi-Ye^3
,

1.
PLA University of Science and Technology, Nanjing 210007
2.
Xi'an Communications Institute, Xi'an 710106
3.
Unit 96637 of PLA, Beijing 102101

Funds:

National Natural Science Foundation of China 61471394

Natural Science Foundation of Jiangsu Province BK20140074

National Natural Science Foundation of China 61402519

Natural Science Foundation of Jiangsu Province BK20140071

More Information

Author Bio:
Ph. D. candidate at the College of Command Information System, PLA University of Science and Technology. He received his master degree from PLA University of Science and Technology in 2013. His research interest covers acoustic and speech signal processing, deep learning and speech separation

Ph. D. candidate at the College of Command Information System, PLA University of Science and Technology and lecturer at Xi′an Communications Institute. He received his master degree from PLA University of Science and Technology in 2008. His research interest covers acoustic and speech signal processing theory and techniques, speech coding and speech enhancement

Assistant engineer at the Unit 96637 of PLA. He received his master degree from PLA University of Science and Technology in 2013. His research interest covers optical communication theory and techniques

Corresponding author: ZHANG Xiong-Wei Professor at the College of Command Information System, PLA University of Science and Technology. He received his Ph. D. degree from Nanjing Institute of Communication Engineering in 1992. His research interest covers intelligence information processing, speech and image signal processing, and telecommunication systems. Corresponding author of this paper

摘要

摘要: 本文将心理声学掩蔽特性应用于基于深度神经网络（Deep neural network，DNN）的单通道语音增强任务中，提出了一种具有感知掩蔽特性的DNN结构.首先，提出的DNN对带噪语音幅度谱特征进行训练并分别得到纯净语音和噪声的幅度谱估计.其次，利用估计的纯净语音幅度谱计算噪声掩蔽阈值.然后，将噪声掩蔽阈值和估计的噪声幅度谱联合计算得到一个感知增益函数.最后，利用感知增益函数从带噪语音幅度谱中估计出增强语音幅度谱.在TIMIT数据库上，对不同信噪比下的20种噪声进行的仿真实验表明，无论噪声类型是否在语音的训练集中出现，所提出的感知掩蔽DNN都能够在有效去除噪声的同时保持较小的语音失真，增强效果明显优于常见的DNN增强方法以及NMF（Nonnegative matrix factorization）增强方法.
- 语音增强 /
- 深度神经网络 /
- 感知增益函数 /
- 掩蔽阈值
Abstract: A new deep neural network (DNN) is proposed for single-channel speech enhancement, which incorporates the perceptual masking properties of psychoacoustic models. Firstly, the proposed DNN is trained to learn both the clean speech magnitude spectrum and the noise magnitude spectrum from the noisy magnitude spectrum. Secondly, the estimated clean speech magnitude spectrum is used to calculate the noise masking threshold. Then, the noise masking threshold and the estimated noise magnitude spectrum are combined to calculate a perceptual gain function. Finally, the enhanced speech magnitude spectrum are obtained by jointly training the perceptual gain function and the noisy speech magnitude spectrum. Experimental results on TIMIT with 20 noise types at various SNR (signal-noise ratio) levels demonstrate that the proposed perceptual masking DNN can effectively remove the noise while maintaining small speech distortion, so as to obtain better performance than the common DNN methods and the NMF (nonnegative matrix factorization) method, no matter noise conditions are included in the training set or not.
- Speech enhancement /
- deep neural network /
- perceptual gain function /
- masking threshold
注释:

1) 本文责任编委柯登峰

HTML全文

本文责任编委柯登峰

图 1 基于DNN的语音增强

Fig. 1 Speech enhancement based on DNN

下载: 全尺寸图片幻灯片

图 2 基于PM-DNN的语音增强

Fig. 2 Speech enhancement based on PM-DNN

下载: 全尺寸图片幻灯片

图 3 基于PM-DNN的语音增强框图

Fig. 3 The framework of speech enhancement based on PM-DNN

下载: 全尺寸图片幻灯片

图 4 PM-DNN目标函数中的权重$\alpha$和$\beta$对20种噪声的PESQ均值影响

Fig. 4 The PESQ scores of PM-DNN objective function with different $\alpha$ and $\beta$ (For each condition, the numbers are the mean values over all the 20 noise types.)

下载: 全尺寸图片幻灯片

图 5 4种增强方法在20种不同噪声情况下的PESQ值(每种噪声的PESQ值是在-5 dB, 0 dB, 5 dB和10 dB 4种信噪比下的平均值.)

Fig. 5 The PESQ scores of the 4 enhancement methods for the 20 noise types (For each noise type, the numbers are the mean values over four input SNR conditions, i.e. from -5 dB to 10 dB spaced by 5 dB.)

下载: 全尺寸图片幻灯片

图 6 4种增强方法在20种不同噪声情况下的LSD值(每种噪声的LSD值是在-5 dB, 0 dB, 5 dB和10 dB 4种信噪比下的平均值.)

Fig. 6 The LSD values of the 4 enhancement methods for the 20 noise types (For each noise type, the numbers are the mean values over four input SNR conditions, i.e. from -5 dB to 10 dB spaced by 5 dB.)

下载: 全尺寸图片幻灯片

图 7 4种增强方法在20种不同噪声情况下的fwSNRseg值(每种噪声的fwSNRseg值是在-5 dB, 0 dB, 5 dB和10 dB 4种信噪比下的平均值.)

Fig. 7 The fwSNRseg values of the 4 enhancement methods for the 20 noise types (For each noise type, the numbers are the mean values over four input SNR conditions, i.e. from -5 dB to 10 dB spaced by 5 dB.)

下载: 全尺寸图片幻灯片

图 8 语谱图

Fig. 8 Spectrograms

下载: 全尺寸图片幻灯片

表 1 4种信噪比下, 不同方法对20种噪声的PESQ均值

Table 1 The PESQ scores of different methods at four different input SNR levels (For each condition, the numbers are the mean values over all the 20 noise types.)

SNR (dB)	NMF	DNN	IRM-DNN	PM-DNN (First output)	PM-DNN	NMF (Mask)	DNN (Mask)	IRM-DNN (Mask)	PM-DNN (Mask)
-5	1.705	1.74	1.787	1.732	1.875	1.701	1.775	1.74	1.834
0	2.002	1.995	2.061	1.996	2.165	1.995	2.034	2.015	2.122
5	2.261	2.194	2.35	2.256	2.445	2.262	2.284	2.308	2.411
10	2.524	2.35	2.631	2.518	2.714	2.52	2.535	2.596	2.691

下载: 导出CSV

参考文献(25)

[1]	Boll S F. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1979, 27(2):113-120 doi: 10.1109/TASSP.1979.1163209
[2]	Chen J D, Benesty J, Huang Y T, Doclo S. New insights into the noise reduction Wiener filter. IEEE Transactions on Audio, Speech and Language Processing, 2006, 14(4):1218-1234 doi: 10.1109/TSA.2005.860851
[3]	Ephraim Y, Malah D. Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1984, 32(6):1109-1121 doi: 10.1109/TASSP.1984.1164453
[4]	Gerkmann T, Hendriks R C. Unbiased MMSE-based noise power estimation with low complexity and low tracking delay. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(4):1383-1393 doi: 10.1109/TASL.2011.2180896
[5]	Jensen J R, Benesty J, Christensen M G, Jensen S H. Enhancement of single-channel periodic signals in the time-domain. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(7):1948-1963 doi: 10.1109/TASL.2012.2191957
[6]	Wilson K W, Raj B, Smaragdis P, Divakaran A. Speech denoising using nonnegative matrix factorization with priors. In:Proceedings of the 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. Las Vegas, USA:IEEE, 2008. 4029-4032
[7]	Sun C L, Zhu Q, Wan M H. A novel speech enhancement method based on constrained low-rank and sparse matrix decomposition. Speech Communication, 2014, 60:44-55 doi: 10.1016/j.specom.2014.03.002
[8]	Sun M, Li Y N, Gemmeke J, Zhang X W. Speech enhancement under low SNR conditions via noise estimation using sparse and low-rank NMF with Kullback-Leibler divergence. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(7):1233-1242 doi: 10.1109/TASLP.2015.2427520
[9]	Xu Y, Du J, Dai L R, Lee C H. A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(1):7-19 doi: 10.1109/TASLP.2014.2364452
[10]	Huang P S, Kim M, Hasegawa-Johnson M, Smaragdis P. Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(12):2136-2147 doi: 10.1109/TASLP.2015.2468583
[11]	Wang Y X, Narayanan A, Wang D L. On training targets for supervised speech separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(12):1849-1858 doi: 10.1109/TASLP.2014.2352935
[12]	Sun M, Zhang X W, Van hamme H, Zheng T F. Unseen noise estimation using separable deep auto encoder for speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(1):93-104 doi: 10.1109/TASLP.2015.2498101
[13]	Williamson D S, Wang Y X, Wang D L. Complex ratio masking for monaural speech separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(3):483-492 doi: 10.1109/TASLP.2015.2512042
[14]	Narayanan A, Wang D L. Improving robustness of deep neural network acoustic models via speech separation and joint adaptive training. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(1):92-101 https://www.researchgate.net/publication/273296153_Improving_Robustness_of_Deep_Neural_Network_Acoustic_Models_via_Speech_Separation_and_Joint_Adaptive_Training
[15]	Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18(7):1527-1554 doi: 10.1162/neco.2006.18.7.1527
[16]	Bengio Y. Learning deep architectures for AI. Foundations and Trends^® in Machine Learning, 2009, 2(1):1-127 doi: 10.1561/2200000006
[17]	Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. In:Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. Fort Lauderdale, USA:JMLR, 2011. 315-323
[18]	张勇, 刘轶, 刘宏.结合人耳听觉感知的两级语音增强算法.信号处理, 2014, 30(4):363-373 http://www.cnki.com.cn/Article/CJFDTOTAL-XXCN201404001.htm Zhang Yong, Liu Yi, Liu Hong. A two-stage speech enhancement algorithm combined with human auditory perception. Journal of Signal Processing, 2014, 30(4):363-373 http://www.cnki.com.cn/Article/CJFDTOTAL-XXCN201404001.htm
[19]	Johnston J D. Transform coding of audio signals using perceptual noise criteria. IEEE Journal on Selected Areas in Communications, 1988, 6(2):314-323 doi: 10.1109/49.608
[20]	Udrea R M, Vizireanu N D, Ciochina S. An improved spectral subtraction method for speech enhancement using a perceptual weighting filter. Digital Signal Processing, 2008, 18(4):581-587 doi: 10.1016/j.dsp.2007.08.002
[21]	Hu Y, Loizou P C. Incorporating a psychoacoustical model in frequency domain speech enhancement. IEEE Signal Processing Letters, 2004, 11(2):270-273 doi: 10.1109/LSP.2003.821714
[22]	Rix A W, Beerends J G, Hollier M P, Hekstra A P. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In:Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Salt Lake City, USA:IEEE, 2001. 749-752
[23]	邹霞, 陈亮, 张雄伟.基于Gamma语音模型的语音增强算法.通信学报, 2006, 27(10):118-123 http://www.cnki.com.cn/Article/CJFDTOTAL-TXXB200610019.htm Zou Xia, Chen Liang, Zhang Xiong-Wei. Speech enhancement with Gamma speech modeling. Journal on Communications, 2006, 27(10):118-123 http://www.cnki.com.cn/Article/CJFDTOTAL-TXXB200610019.htm
[24]	Hu Y, Loizou P C. Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 2008, 16(1):229-238 doi: 10.1109/TASL.2007.911054
[25]	Huang P S, Kim M, Hasegawa-Johnson M, Smaragdis P. Deep learning for monaural speech separation. In:Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing. Florence, Italy:IEEE, 2014. 1562-1566

施引文献

期刊类型引用(29)

1.	李倩，聂简，黄鸿殿，孔庆宇，奔粤阳. 基于大脑海马认知机理的主从式AUV协同定位方法. 中国惯性技术学报. 2024(01): 27-33 . 百度学术
2.	游雄，李科，田江鹏，杨剑，余岸竹，贾奋励. 机器地图信息加工模型. 武汉大学学报(信息科学版). 2024(04): 516-526 . 百度学术
3.	高昊，王仁茂. 基于类脑仿生的环境感知技术. 舰船电子对抗. 2024(05): 42-46+55 . 百度学术
4.	陈荟慧，钟委钊. 基于人机协作的高质量城市图像采集方法. 应用科学学报. 2023(05): 801-814 . 百度学术
5.	朱祥维，沈丹，肖凯，马岳鑫，廖祥，古富强，余芳文，高柯夫，刘经南. 类脑导航的机理、算法、实现与展望. 航空学报. 2023(19): 6-38 . 百度学术
6.	于乃功，廖诣深. 基于鼠脑内嗅—海马认知机制的移动机器人空间定位模型. 生物医学工程学杂志. 2022(02): 217-227 . 百度学术
7.	刘溢，阳加远，张驰. 一种基于RTX的移动机器人实时控制平台. 电子技术与软件工程. 2022(08): 169-172 . 百度学术
8.	于子航，王改云. 基于路径积分强化的机器人目标导向运动控制. 计算机仿真. 2022(07): 412-415+516 . 百度学术
9.	董卫华，刘毅龙，黑巧松，杨天宇. 泛地图空间认知理论与方法研究框架. 武汉大学学报(信息科学版). 2022(12): 2007-2014 . 百度学术
10.	阮晓钢，李鹏，朱晓庆，刘鹏飞. 基于目标导向行为和空间拓扑记忆的视觉导航方法. 计算机学报. 2021(03): 594-608 . 百度学术
11.	赵辰豪，吴德伟，韩昆，代传金. 无环境信息下多尺度网格细胞群空间表征模型. 系统工程与电子技术. 2021(03): 814-822 . 百度学术
12.	阮晓钢，柴洁，武悦，张晓平，黄静. 基于海马体位置细胞的认知地图构建与导航. 自动化学报. 2021(03): 666-677 . 本站查看
13.	冀俊忠，刘金铎，邹爱笑，杨翠翠. 一种融合多源信息的脑效应连接网络蚁群学习算法. 自动化学报. 2021(04): 864-881 . 本站查看
14.	万刚，武易天. 地图空间认知的数学基础. 测绘学报. 2021(06): 726-738 . 百度学术
15.	洪涛，史涛，任红格. 一种改进型RatSLAM算法构建认知地图的研究. 现代计算机. 2021(21): 47-52 . 百度学术
16.	韩昆，吴德伟，来磊. 类脑导航中基于差分Hebbian学习的网格细胞构建模型. 系统工程与电子技术. 2020(03): 674-679 . 百度学术
17.	黄宜庆，王正刚，王徽，葛愿. 基于边缘梯度算法的多移动机器人协作地图构建. 信息与控制. 2020(01): 62-68 . 百度学术
18.	于乃功，廖诣深，郑相国. 一种基于海马位置细胞选择机制的空间认知模型. 生物医学工程学杂志. 2020(01): 27-37 . 百度学术
19.	胡小平，毛军，范晨，张礼廉，何晓峰，韩国良，范颖. 仿生导航技术综述. 导航定位与授时. 2020(04): 1-10 . 百度学术
20.	于乃功，冯慧，廖诣深，郑相国. 一种基于感知速度与感知角度的网格野计算模型. 生物医学工程学杂志. 2020(05): 863-874 . 百度学术
21.	晁丽君，熊智，杨闯，华冰，王雅婷，刘建业. 无人飞行器三维类脑SLAM自主导航方法. 飞控与探测. 2020(05): 35-43 . 百度学术
22.	张孝伍. 图上的概率分布及位置方向信息的表征方法. 青岛理工大学学报. 2019(01): 113-121 . 百度学术
23.	方略，何洪军. 基于鼠脑海马位置细胞与Q学习面向目标导航. 生物信息学. 2019(01): 31-38 . 百度学术
24.	王均，凌有铸，王静. 基于特征融合的仿生SLAM算法研究. 安徽工程大学学报. 2019(02): 26-33 . 百度学术
25.	刘建业，杨闯，熊智，赖际舟，熊骏. 无人机类脑吸引子神经网络导航技术. 导航定位与授时. 2019(05): 52-60 . 百度学术
26.	韩昆，吴德伟，来磊，杨林. 自主导航条件下网格细胞放电模型. 电子科技大学学报. 2019(05): 711-716 . 百度学术
27.	丛明，邹强，刘冬，杜宇. 定位细胞认知机理启发的机器人导航研究综述. 机械工程学报. 2019(23): 1-12 . 百度学术
28.	邹强，丛明，刘冬，杜宇. 仿鼠脑海马的机器人地图构建与路径规划方法. 华中科技大学学报(自然科学版). 2018(12): 83-88 . 百度学术
29.	吴德伟，何晶，韩昆，李卉. 无人作战平台认知导航及其类脑实现思想. 空军工程大学学报(自然科学版). 2018(06): 33-38 . 百度学术

其他类型引用(29)

资源附件(0)

访问统计

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

基于感知掩蔽深度神经网络的单通道语音增强方法

doi: 10.16383/j.aas.2017.c150719

通讯作者:
张雄伟解放军理工大学指挥信息系统学院教授.1992年获得南京通信工程学院博士学位.主要研究方向为智能信息处理, 语音与图像信号处理, 数字通信.本文通信作者.E-mail:xwzhang9898@163.com

计量

A Single-channel Speech Enhancement Approach Based on Perceptual Masking Deep Neural Network

期刊类型引用(29)

其他类型引用(29)

计量

目录

留言板

基于感知掩蔽深度神经网络的单通道语音增强方法

doi: 10.16383/j.aas.2017.c150719

通讯作者: 张雄伟解放军理工大学指挥信息系统学院教授.1992年获得南京通信工程学院博士学位.主要研究方向为智能信息处理, 语音与图像信号处理, 数字通信.本文通信作者.E-mail:xwzhang9898@163.com

计量

出版历程

A Single-channel Speech Enhancement Approach Based on Perceptual Masking Deep Neural Network

期刊类型引用(29)

其他类型引用(29)

计量

出版历程

目录

通讯作者:
张雄伟解放军理工大学指挥信息系统学院教授.1992年获得南京通信工程学院博士学位.主要研究方向为智能信息处理, 语音与图像信号处理, 数字通信.本文通信作者.E-mail:xwzhang9898@163.com