Sound Source Localization Algorithm Based on Cepstral BRIR Binaural Cross-correlation in Reverberant Environment
-
摘要: 在实际封闭环境中,针对存在混响而导致声源定位性能下降的问题,提出一种基于倒谱双耳房间脉冲响应(Binaural room impulse response,BRIR)的双耳互相关声源定位方法.该方法通过从倒谱BRIR中减去混响分量,然后反变换到时域得到估计的脉冲响应,再与数据库中的头部脉冲响应(Head related impulse response,HRIR)进行互相关运算,最大互相关值相对应的位置就是所估计的声源位置.仿真实验结果表明,提出的算法能减少混响环境中带来的定位误差,提高声源定位的精度.Abstract: In an actual closed environment, for the presence of reverberation causes sound source localization performance degradation, a sound source localization algorithm based on a cepstral binaural room impulse response (BRIR) binaural cross-correlation is proposed. The method is based on subtracting the reverberation component from the BRIR, and the estimated time domain impulse response is derived from the cepstral BRIR inverse transformation. Then by performing cross-correlation operation with the database HRIR (head related impulse response), the maximum cross-correlation value corresponds to the position corresponding to the estimated location of the sound source. Simulation results show that the proposed algorithm can reduce positioning errors caused by reverberation environment, and improve sound localization accuracy.
-
Key words:
- Sound source localization /
- binaural cross-correlation /
- cepstral /
- robustness
-
表 1 在不同混响时间下三种定位方法的声源方位估计
Table 1 Sound source azimuth estimation of three location methods in different reverberation time
实际角度(°) 0 10 15 20 30 35 CEP-BRIR-CC
声源定位法RT=0s 估计角度(°) 0.08 10.24 15.06 20.23 30.15 35.23 绝对误差(°) 0.08 0.24 0.06 0.23 0.15 0.23 RT=0.3s 估计角度(°) 0.17 9.03 14.82 21.09 30.25 36.39 绝对误差(°) 0.17 0.97 1.18 1.09 0.25 1.39 RT=0.5s 估计角度(°) -0.29 8.79 13.67 18.69 30.69 36.87 绝对误差(°) 0.29 1.21 1.33 1.31 0.69 1.87 CEP-GCC-ITD
声源定位法RT=0s 估计角度(°) -0.08 10.67 15.92 20.86 30.42 35.37 绝对误差(°) 0.08 0.67 0.92 0.86 0.42 0.37 RT=0.3s 估计角度(°) 0.39 8.11 12.81 17.23 28.85 33.14 绝对误差(°) 0.39 1.89 2.19 2.77 1.14 1.86 RT=0.5s 估计角度(°) -1.69 7.06 11.91 16.14 28.15 32.06 绝对误差(°) 1.69 2.94 3.09 3.86 1.85 2.94 CEP-CC-ITD
声源定位法RT=0s 估计角度(°) 0.07 10.73 15.95 21.46 30.85 35.62 绝对误差(°) 0.07 0.73 0.95 1.46 0.85 0.62 RT=0.3 s 估计角度(°) 0.63 8.68 12.78 23.06 27.62 32.97 绝对误差(°) 0.63 1.32 2.22 3.06 2.38 2.03 RT=0.5s 估计角度(°) -2.06 6.12 11.66 15.89 26.85 38.77 绝对误差(°) 2.06 3.88 3.34 4.11 3.15 3.77 表 2 三种定位方法的统计结果
Table 2 The statistical results of three localization methods
角度
方法—60° —15° 0° 30° 45° 估计值 误差 估计值 误差 估计值 误差 估计值 误差 估计值 误差 CEP-BRIR-CC —54.8° 5.2° —19.6° 4.6° —3° 3° 35.2° 5.2° 41.1° 3.9° CEP-GCC-ITD —67.6° 7.6° —22.3° 7.3° 7.5° 7.5° 36.9° 6.9° 52.8° 7.8° CEP-CC-ITD —50.9° 9.1° —23.5° 8.5° 8.8° 8.8° 22.0° 8.0° 54.2° 9.2° -
[1] Li H, Hong X. Binaural auditory localization of signals processed by speech enhancement methods. In:Proceedings of the 7th International Congress on Image and Signal Processing. Dalian, China:IEEE, 2014. 883-887 [2] Wu X, Talagala D S, Zhang W, Abhayapala T D. Binaural localization of speech sources in 3-D using a composite feature vector of the HRTF. In:Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. South Brisbane, QLD:IEEE, 2015. 2654-2658 [3] 周蕙瑜.双通道立体声的虚拟重发技术研究[硕士学位论文], 电子科技大学, 中国, 2006.Zhou Hui-Yu. Dual-channel Stereo Virtual Retransmission Technology Research[Master dissertation], University of Electronic Science and Technology, China, 2006. [4] Portello A, Bustamante G, Danés P, Mifsud A. Localization of multiple sources from a binaural head in a known noisy environment. In:Proceedings of the 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. Chicago, USA:IEEE, 2014. 3168-3174 [5] Liu H, Zhang J. A binaural sound source localization model based on time-delay compensation and interaural coherence. In:Proceedings of the 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing. Florence, Italy:IEEE, 2014. 1424-1428 [6] 白振华.听觉定位中HRTF的研究[硕士学位论文], 东南大学, 中国, 2003.Bai Zhen-Hua. Study of HRTF in Auditory Localization[Master dissertation], Southeast University, China, 2003. [7] 罗元, 陈凯, 张毅.一种结合听觉掩蔽与双耳互相关的声源定位算法.计算机应用与软件, 2015, 32(3):141-144 http://www.cnki.com.cn/Article/CJFDTOTAL-JYRJ201503035.htmLuo Yuan, Chen Kai, Zhang Yi. A sound source localisation algorithm based on the combination of auditory masking and binaural cross-correlation. Computer Applications and Software, 2015, 32(3):141-144 http://www.cnki.com.cn/Article/CJFDTOTAL-JYRJ201503035.htm [8] Raspaud M, Viste H, Evangelista G. Binaural source localization by joint estimation of ILD and ITD. IEEE Transactions on Audio, Speech, and Language Processing, 2010, 18(1):68-77 doi: 10.1109/TASL.2009.2023644 [9] 吴玉秀, 孟庆浩, 曾明.基于声音的分布式多机器人相对定位.自动化学报, 2014, 40(5):798-809 http://www.aas.net.cn/CN/abstract/abstract18348.shtmlWu Yu-Xiu, Meng Qing-Hao, Zeng Ming. Sound based relative localization for distributed multi-robot systems. Acta Automatica Sinica, 2014, 40(5):798-809 http://www.aas.net.cn/CN/abstract/abstract18348.shtml [10] Zannini C M, Parisi R, Uncini A. Binaural sound source localization in the presence of reverberation. In:Proceedings of the 17th International Conference on Digital Signal Processing. Corfu, Greece:IEEE, 2011. 1-6 [11] Woodruff J, Wang D L. Binaural localization of multiple sources in reverberant and noisy environments. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(5):1503-1512 doi: 10.1109/TASL.2012.2183869 [12] Barker J, Vincent E, Ma N, Christensen H, Green P. The PASCAL CHiME speech separation and recognition challenge. Computer Speech and Language, 2013, 27(3):621-633 doi: 10.1016/j.csl.2012.10.004 [13] Stéphenne A, Champagne B. A new cepstral prefiltering technique for estimating time delay under reverberant conditions. Signal Processing, 1997, 59(3):253-266 doi: 10.1016/S0165-1684(97)00051-0 [14] 屈丹, 杨绪魁, 张文林.特征空间本征音说话人自适应.自动化学报, 2015, 41(7):1244-1252 http://www.aas.net.cn/CN/abstract/abstract18698.shtmlQu Dan, Yang Xu-Kui, Zhang Wen-Lin. Feature space eigenvoice speaker adaptation. Acta Automatica Sinica, 2015, 41(7):1244-1252 http://www.aas.net.cn/CN/abstract/abstract18698.shtml [15] Mosayyebpour S, Lohrasbipeydeh H, Esmaeili M, Gulliver T A. Time delay estimation via minimum-phase and all-pass component processing. In:Proceedings of the 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing. Vancouver, BC:IEEE, 2013. 4285-4289 [16] 马浩, 吴镇扬, 张杰, 胡红梅.与头相关传递函数的双耳特征提取与分类.电路与系统学报, 2007, 12(5):58-64 http://www.cnki.com.cn/Article/CJFDTOTAL-DLYX200705012.htmMa Hao, Wu Zhen-Yang, Zhang Jie, Hu Hong-Mei. Binaural character extraction and clustering of head related transfer function. Journal of Circuits and Systems, 2007, 12(5):58-64 http://www.cnki.com.cn/Article/CJFDTOTAL-DLYX200705012.htm