-
摘要: 卷积混叠环境下的盲源分离(Blind source separation, BSS)是一个极具挑战性和实际意义的问题. 本文在独立分量分析框架下, 建立非负矩阵分解(Nonnegative matrix factorization, NMF)模型, 设计新的优化目标函数, 通过严格的数学理论推导, 得到新的模型参数更新规则; 并对解混叠矩阵进行标准化处理, 避免幅度歧义性问题; 在源信号的重构阶段, 通过实时更新非负矩阵分解模型参数, 避免源信号的排序歧义性问题. 实验结果验证了所提算法在分离中英文语音混叠信号、音乐混叠信号时的有效性和优越性.Abstract: Blind source separation (BSS) for convolutive mixed environment is a challenging and practical topic. In this paper, a nonnegative matrix factorization (NMF) model is established based on the framework of independent component analysis, and a new optimization objective function is designed. Through strict mathematical theory derivation, new model parameters update rules are obtained, and the demixing matrix is standardized to avoid the scale ambiguity. In the stage of source reconstruction, the permutation ambiguity can be avoided by updating the parameters of the NMF model in real time. Experimental results verify the effectiveness and superiority of the proposed algorithm in separating Chinese speech mixtures, English speech mixtures, and music signal mixtures.
-
表 1 两组中文语音源信号
Table 1 Two groups of Chinese speech sources
中文数据 源信号 时长 语音 1 IC0936W0131 5 s 语音 2 IC0936W0134 5 s 表 2 两组英文语音源信号
Table 2 Two groups of English speech sources
英文数据 源信号 时长 语音 1 dev1_female3_src_1 10 s 语音 2 dev1_female3_src_2 10 s 表 3 两组音乐源信号
Table 3 Two groups of music sources
音乐数据 源信号 时长 音乐 1 dev1_wdrums_src_1 11 s 音乐 2 dev1_wdrums_src_3 11 s 表 4 高混响、高噪声环境中的实验结果
Table 4 Experimental results in high reverberation and high noise environment
$RT_{60}=400$ ms SNR = 5 dB SDR SIR SDR SIR Full-Rank 0.1969 4.5580 −4.2087 6.7379 VolMin-AO 1.1786 4.3729 −3.8684 6.6486 Rank1-NMF −1.8239 0.7933 −9.8632 2.7641 RBTD −6.7646 1.2411 −9.1111 1.8784 Proposed 1.0278 5.7190 −1.8554 4.6515 -
[1] 张贤达, 保铮. 盲信号分离. 电子学报, 2001, 29(z1): 1766-1771 doi: 10.3321/j.issn:0372-2112.2001.z1.010Zhang Xian-Da, Bao Zheng. Blind signal separation. Acta Electronica Sinica, 2001, 29(z1): 1766-1771 doi: 10.3321/j.issn:0372-2112.2001.z1.010 [2] Yilmaz O, Rickard S. Blind separation of speech mixtures via time-frequency masking. IEEE Transactions on Signal Processing, 2004, 52(7): 1830-1847. doi: 10.1109/TSP.2004.828896 [3] Mcdermott J H. The cocktail party problem. Neural Computation, 2005, 17(9): 1875-1902 doi: 10.1162/0899766054322964 [4] Ozerov A, Fevotte C. Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Transactions on Audio Speech and Language Processing, 2010, 18(3): 550-563 doi: 10.1109/TASL.2009.2031510 [5] Ito N, Ikeshita R, Sawada H, Nakatani T. A joint diagonalization based efficient approach to underdetermined blind audio source separation using the multichannel wiener filter. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 1950-1965 doi: 10.1109/TASLP.2021.3079815 [6] Shah G, Koch P, Papadias C B. On the blind recovery of cardiac and respiratory sounds. IEEE Journal of Biomedical and Health Informatics, 2015, 19(1): 151-157 doi: 10.1109/JBHI.2014.2349156 [7] Xie Y, Xie K, Yang Q Y, Xie S L. Reverberant blind separation of heart and lung sounds using nonnegative matrix factorization and auxiliary function technique. Biomedical Signal Processing and Control, 2021, 69(7): Article No. 102899 [8] Zhang S Q, You B, Lang X, Zhou Y F, An F, Dai Y, et al. Efficient rejection of artifacts for short-term few-channel EEG based on fast adaptive multidimensional sub-bands blind source separation. IEEE Transactions on Instrumentation and Measurement, 2021, 70: Article No. 4007516 [9] Miettinen J, Nitzan E, Vorobyov S A, Ollila E. Graph signal processing meets blind source separation. IEEE Transactions on Signal Processing, 2020, 69: 2585-2599 [10] Einizade A, Sardouie S H, Shamsollahi M B. Simultaneous graph learning and blind separation of graph signal sources. IEEE Signal Processing Letters, 2021, 28: 1495-1499 doi: 10.1109/LSP.2021.3093872 [11] Yang Y C, Nagarajaiah S. Structural damage identification via a combination of blind feature extraction and sparse representation classification. Mechanical Systems and Signal Processing, 2014, 45(1): 1-23 doi: 10.1016/j.ymssp.2013.09.009 [12] Yang Y C, Li S L, Nagarajaiah S, Li H, Zhou P. Real-time output-only identification of time-varying cable tension from accelerations via complexity pursuit. Journal of Structural Engineering, 2016, 142(1): Article No. 04015083 [13] 谢胜利, 何昭水, 傅予力. 基于稀疏元分析的欠定混叠自适应盲分离方法. 中国科学(E辑: 信息科学), 2007, 37(8): 1086-1098Xie Sheng-Li, He Zhao-Shui, Fu Yu-Li. Underdetermined aliasing adaptive blind separation method based on sparse element analysis. Chinese Science (Series E: Information Science), 2007, 37(8): 1086-1098 [14] Lathauwer L D, Castaing J. Blind identification of underdetermined mixtures by simultaneous matrix diagonalization. IEEE Transactions on Signal Processing, 2008, 56(3): 1096-1105 doi: 10.1109/TSP.2007.908929 [15] 汤辉, 王殊. 基于稳健联合分块对角化的卷积盲分离. 自动化学报, 2013, 39(9): 1502-1510Tang Hui, Wang Shu. Robust joint block diagonalization based convolutive blind source separation. Acta Automatica Sinica, 2013, 39(9): 1502-1510 [16] 朱孝龙, 张贤达. 基于奇异值分解的超定盲信号分离. 电子与信息学报, 2004, 26(3): 337-343Zhu Xiao-Long, Zhang Xian-Da, Overdetermined blind signal separation based on singular value decomposition. Journal of Electronics & Information Technology, 2004, 26(3): 337-343 [17] Yatabe K, Kitamura D. Determined BSS based on time-frequency masking and its application to harmonic vector analysis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 1609-1625 doi: 10.1109/TASLP.2021.3073863 [18] 肖明, 谢胜利, 傅予力. 基于超平面法矢量的欠定盲信号分离算法. 自动化学报, 2008, 34(2): 142-149Xiao Ming, Xie Sheng-Li, Fu Yu-Li. Underdetermined blind signal separation algorithm based on hyperplane normal vector. Acta Automatica Sinica, 2008, 34(2): 142-149 [19] Nion D, Mokios K N, Sidiropoulos N D, Potamianos A. Batch and adaptive PARAFAC-based blind separation of convolutive speech mixtures. IEEE Transactions on Audio, Speech, and Language Processing, 2010, 18(6): 1193-1207 doi: 10.1109/TASL.2009.2031694 [20] Matsuoka K. Minimal distortion principle for blind source separation. In: Procedings of the 3rd International Conference on Independent Component Analysis and Blind Signal Separation. Tobata, Japan: IEEE, 2001. 722−729 [21] Sawada H, Mukai R, Araki S. A robust and precise method for solving the permutation problem of frequency-domain blind source separation. IEEE Transactions Speech Audio Processing, 2004, 12(5): 530-538 doi: 10.1109/TSA.2004.832994 [22] Sawada H, Araki S, Mukai R. Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment. IEEE Transactions on Audio Speech and Language Processing, 2011, 19(3): 516-527 doi: 10.1109/TASL.2010.2051355 [23] Xie K, Zhou G X, Yang J J, He Z S, Xie S L. Eliminating the permutation ambiguity of convolutive blind source separation by using coupled frequency bins. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(2): 589-599 doi: 10.1109/TNNLS.2019.2906833 [24] Sawada H, Araki S, Mukai R, Makina S. Grouping separated frequency components by estimating propagation model parameters in frequency-domain blind source separation. IEEE Transactions on Audio Speech and Language Processing, 2007, 15(5): 1592-1604 doi: 10.1109/TASL.2007.899218 [25] Xie S L, Yang L, Yang J M, Zhou G X, Xiang Y. Time-frequency approach to underdetermined blind source separation. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(2): 306-316 doi: 10.1109/TNNLS.2011.2177475 [26] 刘秋红, 许漫坤, 李天昀, 陆明明. 基于互补对称滤波器的APCMA信号的盲分离算法. 电子学报, 2020, 48(12): 2394-2401Liu Qiu-Hong, Xu Man-Kun, Li Tian-Jun, Lu Ming-Ming. Blind separation algorithm of APCMA signal based on complementary symmetric filter. Acta Electronica Sinica, 2020, 48(12): 2394-2401 [27] He Z S, Xie S L, Ding S X, Cichocki A. Convolutive blind source separation in the frequency domain based on sparse representation. IEEE Transactions on Audio Speech and Language Processing, 2007, 15(5): 1551-1563 doi: 10.1109/TASL.2007.898457 [28] Xie Y, Xie K, Xie S L. Underdetermined blind source separation of speech mixtures unifying dictionary learning and sparse representation. International Journal of Machine Learning and Cybernetics, 2021, 12(12), 3573-3583 doi: 10.1007/s13042-021-01406-5 [29] Xu Z B, Zhang H, Wang Y, Chang X Y, Liang Y. L1/2 regularization. Science China (Information Sciences), 2010, 53(6): 1159-1169 doi: 10.1007/s11432-010-0090-0 [30] Xu Z B, Chang X Y, Xu F M, Zhang H. L1/2 Regularization: A thresholding representation theory and a fast solver. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(7): 1013-1027 doi: 10.1109/TNNLS.2012.2197412 [31] Yang J J, Guo Y, Yang Z Y, Xie S L. Underdetermined convolutive blind source separation combining density-based clustering and sparse reconstruction in time-frequency domain. IEEE Transactions on Circuits and Systems I: Regular Papers, 2019, 66(8): 3015-3027 doi: 10.1109/TCSI.2019.2908394 [32] Xie Y, Xie K, Xie S L. Underdetermined blind separation of source using Lp-norm diversity measures. Neurocomputing, 2020, 411, 259-267 doi: 10.1016/j.neucom.2020.06.029 [33] Lee D D, Seung H S. Learning the parts of objects by non-negative matrix factorization. Nature, 1999, 401(6755): 788-791 doi: 10.1038/44565 [34] Gillis N, Vavasis S A. Fast and robust recursive algorithms for separable nonnegative matrix factorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(4): 698-714 doi: 10.1109/TPAMI.2013.226 [35] Rahiche A, Cheriet M. Blind decomposition of multispectral document images using orthogonal nonnegative matrix factorization. IEEE Transactions on Image Processing, 2021, 30: 5997-6012 doi: 10.1109/TIP.2021.3088266 [36] Kitamura D, Ono N, Sawada H, Kameoka H, Saruwatari H. Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization. IEEE/ACM Transactions on Audio Speech and Language Processing, 2016, 24(9): 1626-1641 doi: 10.1109/TASLP.2016.2577880 [37] Al-Tmeme A, Woo W L, Dlay S S, Gao B. Underdetermined convolutive source separation using GEM-MU with variational approximated optimum model order NMF2D. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017, 25(1): 31-45 [38] Xie Y, Xie K, Xie S L. Underdetermined convolutive blind separation of sources integrating tensor factorization and expectation maximization. Digital Signal Processing, 2019, 87: 145-154 doi: 10.1016/j.dsp.2019.01.022 [39] Sekiguchi K, Bando Y, Nugraha A A, Yoshii K, Kawahara T. Fast multichannel nonnegative matrix factorization with directivity-aware jointly-diagonalizable spatial covariance matrices for blind source separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 2610-2625 doi: 10.1109/TASLP.2020.3019181 [40] Duong N, Vincent E, Gribonval R. Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Transactions on Audio, Speech, and Language Processing, 2010, 18(7): 1830-1840 doi: 10.1109/TASL.2010.2050716 [41] Bando Y, Sekiguchi K, Masuyama Y, Nugraha A A, Fontaine M, Yoshii K. Neural full-rank spatial covariance analysis for blind source separation. IEEE Signal Processing Letters, 2021, 28: 1670-1674 doi: 10.1109/LSP.2021.3101699 [42] Kolda T. Tensor decompositions and applications. Siam Review, 2009, 51(3): 455-500 doi: 10.1137/07070111X [43] Weiss A. Blind direction-of-arrival estimation in acoustic vector-sensor arrays via tensor decomposition and Kullback-Leibler divergence covariance fitting. IEEE Transactions on Signal Processing, 2021, 69: 531-545 doi: 10.1109/TSP.2020.3043814 [44] Mitsufuji Y, Takamune N, Koyama S, Saruwatari H. Multichannel blind source separation based on evanescent-region-aware non-negative tensor factorization in spherical harmonic domain. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 607-617 doi: 10.1109/TASLP.2020.3045528 [45] Tan V Y F, Févotte C. Automatic relevance determination in nonnegative matrix factorization with the-divergence. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(7): 1592-1605 doi: 10.1109/TPAMI.2012.240 [46] Audio Labs. RIR generator [Online], available: https://www.audiolabs-erlangen.de/fau/professor/habets/software/rir-generator, November 22, 2022 [47] Vincent E, Gribonval R, Fevotte C. Performance measurement in blind audio source separation. IEEE Transactions on Audio Speech and Language Processing, 2006, 14(4): 1462-1469 doi: 10.1109/TSA.2005.858005 [48] Fu X, Ma W K, Huang K, Sidiropoulos N. Blind separation of quasi-stationary sources: Exploiting convex geometry in covariance domain. IEEE Transactions on Signal Processing, 2015, 63(9): 2306-2320 doi: 10.1109/TSP.2015.2404577 [49] AISHELL-ASR0009-OS1 open source mandarin speech corpus [Online], available: http://www.aishelltech.com/kysjcp, November 22, 2022 [50] SiSEC 2013. Audio source separation [Online], available: http://sisec.wiki.irisa.fr/tiki-index.php?page=Professionally+produced+music+recordings, November 22, 2022