刘文举 聂帅 梁山 张学良

LIU Wen-Ju, NIE Shuai, LIANG Shan, ZHANG Xue-Liang. Deep Learning Based Speech Separation Technology and Its Developments. ACTA AUTOMATICA SINICA, 2016, 42(6): 819-833. doi: 10.16383/j.aas.2016.c150734
国家自然科学基金资助 61573357, 61503382, 61403370, 61273267, 91120303, 61365006


    聂帅 中国科学院自动化研究所博士研究生. 2013年获得内蒙古大学学士学位. 主要研究方向为语音信号处理技术, 深度学习, 语音分离, 计算听觉场景分析. E-mail: shuai.nie@nlpr.ia.ac.cn

    梁山 中国科学院自动化研究所助理研究员. 2008年获得西安电子科技大学学士学位, 2014年获得中国科学院自动化研究所博士学位. 主要研究方向为语音信号处理技术, 语音分离, 计算听觉场景分析, 语音识别. E-mail: sliang@nlpr.ia.ac.cn

    张学良 内蒙古大学副教授. 2003年获得内蒙古大学学士学位, 2005年获得哈尔滨工业大学硕士学位, 2010年获得中国科学院自动化研究所博士学位. 主要研究方向为语音分离, 计算听觉场景分析, 语音信号处理. E-mail: cszxl@imu.edu.cn


    刘文举 中国科学院自动化研究所研究员. 主要研究方向为计算听觉场景分析, 语音增强, 语音识别, 声纹识别, 声源定位和声音事件检测. 本文通信作者. E-mail: lwj@nlpr.ia.ac.cn

Deep Learning Based Speech Separation Technology and Its Developments


National Natural Science Foundation of China 61573357, 61503382, 61403370, 61273267, 91120303, 61365006

    Author Bio:

    NIE Shuai Ph. D. candidate at the Institute of Automation, Chinese Academy of Sciences. He received his bachelor degree from Inner Mongolia University in 2013. His research interest covers acoustic and speech signal processing, deep learning, speech separation, and computational auditory scene analysis

    LIANG Shan Assistant professor at the Institute of Automation, Chinese Academy of Sciences. He received his bachelor degree from Xidian University in 2008, and Ph. D. degree from the Institute of Automation, Chinese Academy of Sciences in 2014. His research interest covers acoustic and speech signal processing, speech separation, computational auditory scene analysis, and speech recognition

    ZHANG Xue-Liang Associate professor at Inner Mongolia University. He received his bachelor degree from Inner Mongolia University in 2003, master degree from Harbin Institute of Technology in 2005, and Ph. D. degree from the Institute of Automation, Chinese Academy of Sciences in 2010, respectively. His research interest covers speech separation, computational auditory scene analysis, and speech signal processing

    Corresponding author: LIUWen-Ju Professor at the Institute of Automation, Chinese Academy of Sciences. His research interest covers computational auditory scene analysis, speech enhancement, speech recognition, speaker recognition, source location, and voice event detection. Corresponding author of this paper
  • 摘要: 现阶段, 语音交互技术日益在现实生活中得到广泛的应用, 然而, 由于干扰的存在, 现实环境中的语音交互技术远没有达到令人满意的程度. 针对加性噪音的语音分离技术是提高语音交互性能的有效途径, 几十年来, 全世界范围内的许多研究者为此投入了巨大的努力, 提出了很多实用的方法. 特别是近年来, 由于深度学习研究的兴起, 基于深度学习的语音分离技术日益得到了广泛关注和重视, 显露出了相当光明的应用前景, 逐渐成为语音分离中一个新的研究趋势. 目前已有很多基于深度学习的语音分离方法被提出, 但是, 对于深度学习语音分离技术一直以来都缺乏一个系统的分析和总结, 不同方法之间的联系和区分也很少被研究. 针对这个问题, 本文试图对语音分离的主要流程和整体框架进行细致的分析和总结, 从特征、模型以及目标三个方面对现有的前沿研究进展进行全面而深入的综述, 最后对语音分离技术进行展望.
  • 图  1  监督性语音分离系统的结构框图

    Fig.  1  A block diagram of the supervised speech separation system

    图  2  Huang 等提出的声源分离系统的网络结构[28]

    Fig.  2  The network structure of the proposed source separation system by Huang et al.[28]

    图  3  Wang 等提出的语音分离系统的网络结构[21]

    Fig.  3  The network structure of the proposed speech separation system by Wang et al. for speech separation[21]

    图  4  Narayanan 等提出的神经网络的结构[60]

    Fig.  4  The structure of the proposed network by Narayanan et al.[60]

    图  5  Xu 等提出的基于DNN 的语音分离系统的网络结构[18]

    Fig.  5  The structure of the proposed DNN-based speech separation system by Xu et al.[18]

    图  6  Nie 等提出的基于DSN-TS 的语音分离系统的网络结构[33]

    Fig.  6  The structure of the proposed DSN-TS-based speech separation system by Nie et al.[33]

    图  7  Zhang 等提出的基于DSN 的语音分离系统的网络结构[34]

    Fig.  7  The structure of the proposed DSN-based speech separation system by Zhang et al.[34]

    图  8  Huang 等提出的基于DRNN 的语音分离系统的网络结构[29]

    Fig.  8  The structure of the proposed DRNN-based speech separation system by Huang et al.[29]

