| [1] | Godoy E, Rosec O, Chonavel T. Voice conversion using dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora. IEEE Transactions on Audio, Speech and Language Processing, 2011, 20(4): 1313-1323 | 
		
				| [2] | Toda T, Chen L H, Saito D, Villavicencio F, Wester M, Wu Z, et al. The voice conversion challenge 2016. In: Proceedings of the 2016 Interspeech. San Francisco, USA: 2016. 1632−1636 | 
		
				| [3] | Dong M, Yang C, Lu Y, Ehnes J W, Huang D, Ming H, et al. Mapping frames with DNN-HMM recognizer for non-parallel voice conversion. In: Proceedings of the 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA). Hong Kong, China: IEEE, 2015. 488−494 | 
		
				| [4] | Zhang M, Tao J, Tian J, Wang X. Text-independent voice conversion based on state mapped codebook. In: Proceedings of the 2008 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Las Vegas, USA: IEEE, 2008. 4605−4608 | 
		
				| [5] | Nakashika T, Takiguchi T, Minami Y. Non-parallel training in voice conversion using an adaptive restricted boltzmann machine. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(11): 2032-2045 doi:  10.1109/TASLP.2016.2593263 | 
		
				| [6] | Mouchtaris A, Van der Spiegel J, Mueller P. Nonparallel training for voice conversion based on a parameter adaptation approach. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(3): 952-963 doi:  10.1109/TSA.2005.857790 | 
		
				| [7] | Hsu C C, Hwang H T, Wu Y C, Tsaoet Y, Wang H M. Voice conversion from non-parallel corpora using variational auto-encoder. In: Proceedings of the 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA). Jeju, South Korea: IEEE, 2016. 1−6 | 
		
				| [8] | Hsu C C, Hwang H T, Wu Y C, Tsao Y, Wang H M. Voice conversion from unaligned corpora using variational autoencoding Wasserstein generative adversarial networks. In: Proceedings of the 2017 Interspeech. Stockholm, Sweden, 2017. 3364−3368 | 
		
				| [9] | Kameoka H, Kaneko T, Tanaka K, Hojo N. StarGAN-VC: Non-parallel many-to-many voice conversion using star generative adversarial networks. In: Proceedings of the 2018 IEEE Spoken Language Technology Workshop (SLT). Athens, Greece: IEEE, 2018. 266−273 | 
		
				| [10] | Fang F, Yamagishi J, Echizen I, Lorenzo-Trueba J. High-quality nonparallel voice conversion based on cycle-consistent adversarial network. In: Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Calgary, Canada: IEEE, 2018. 5279−5283 | 
		
				| [11] | Arjovsky M, Chintala S, Bottou L. Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning International Conference on Machine Learning. Sydney, Australia: ACM, 2017. 214−223 | 
		
				| [12] | 王坤峰, 苟超, 段艳杰, 林懿伦, 郑心湖, 王飞跃. 生成式对抗网络GAN的研究进展与展望. 自动化学报, 2017, 43(3): 321-332Wang Kun-Feng, Gou Chao, Duan Yan-Jie, Lin Yi-Lun, Zheng Xin-Hu, Wang Fei-Yue. Generative Adversarial Networks: The State of the Art and Beyond. Acta Automatica Sinica, 2017, 43(3): 321-332. | 
		
				| [13] | Baby D, Verhulst S. Sergan. Speech enhancement using relativistic generative adversarial networks with gradient penalty. In: Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Brighton, United Kingdom: IEEE, 2019. 106−110 | 
		
				| [14] | Dehak N, Kenny P J, Dehak R, Dumouchelet P, Ouellet P. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 2010, 19(4): 788-798 | 
		
				| [15] | 汪海彬, 郭剑毅, 毛存礼, 余正涛. 基于通用背景-联合估计 (UB-JE) 的说话人识别方法. 自动化学报, 2018, 44(10): 1888-1895Wang Hai-Bin, Guo Jian-Yi, Mao Cun-Li, Yu Zheng-Tao. Speaker recognition based on universal Background-Joint Estimation (UB-JE). Acta Automatica Sinica, 2018, 44(10): 1888-1895 | 
		
				| [16] | Matějka P, Glembek O, Castaldo F, Alam M J, Plchot O, Kenny P, et al. Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification. In: Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing. Prague, Czech Republic: IEEE, 2011. 4828−4831 | 
		
				| [17] | Kanagasundaram A, Vogt R, Dean D, Sridharan S, Mason M. I-vector based speaker recognition on short utterances. In: Proceedings of the 12th Annual Conference of the International Speech Communication Association. International Speech Communication Association (ISCA). Florence, Italy, 2011. 2341−2344 | 
		
				| [18] | 张一珂, 张鹏远, 颜永红. 基于对抗训练策略的语言模型数据增强技术. 自动化学报, 2018, 44(5): 891-900Zhang Yi-Ke, Zhang Peng-Yuan, Yan Yong-Hong. Data augmentation for language models via adversarial training. Acta Automatica Sinica, 2018, 44(5): 891-900 | 
		
				| [19] | Mao X, Li Q, Xie H, Lau R Y K, Wang Z, Smolley S P. Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 2794−2802 | 
		
				| [20] | Morise M, Yokomori F, Ozawa K. WORLD: a vocoder-based high-quality speech synthesis system for real-time applications. IEICE TRANSACTIONS on Information and Systems, 2016, 99(7): 1877-1884 | 
		
				| [21] | Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville A C. Improved training of wasserstein gans. In: Proceedings of the Advances in Neural Information Processing Systems. Leicester, United Kingdom: IEEE, 2017. 5767−5777 | 
		
				| [22] | Lorenzo-Trueba J, Yamagishi J, Toda T, Satio D, Villavicencio F, Kinnunen T, et al. The voice conversion challenge 2018: Promoting development of parallel and nonparallel methods. In: Proceedings of the Odyssey 2018 The Speaker and Language Recognition Workshop. Les Sables d'Olonne, France: ISCA Speaker and Language Characterization Special Interest Group, 2018. 195−202 | 
		
				| [23] | Maas A L, Hannun A Y, Ng A Y. Rectifier nonlinearities improve neural network acoustic models. Computer Science, 2013, 30(1): 1152-1160 | 
		
				| [24] | 梁瑞秋, 赵力, 王青云. 语音信号处理(C++版). 北京: 机械工业出版社, 2018.Liang Rui-Qiu, Zhao Li, Wang Qing-Yun. Speech Signal Preprocessing (C++). Beijing: China Machine Press, 2018. | 
		
				| [25] | 张雄伟, 陈亮, 杨吉斌. 现代语音处理技术及应用. 北京: 机械工业出版社, 2003.Zhang Xiong-Wei, Chen Liang, Yang Ji-Bin. Modern Speech Processing Technology and Application. Beijing: China Machine Press, 2003. | 
		
				| [26] | Chou J C, Lee H Y. One-shot voice conversion by separating speaker and content representations with instance normalization. In: Proceedings of the 2019 Interspeech. Graz, Austria, 2019. 664−668 |