2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

深层神经网络中间层可见化建模

高莹莹 朱维彬

高莹莹, 朱维彬. 深层神经网络中间层可见化建模. 自动化学报, 2015, 41(9): 1627-1637. doi: 10.16383/j.aas.2015.c150023
引用本文: 高莹莹, 朱维彬. 深层神经网络中间层可见化建模. 自动化学报, 2015, 41(9): 1627-1637. doi: 10.16383/j.aas.2015.c150023
GAO Ying-Ying, ZHU Wei-Bin. Deep Neural Networks with Visible Intermediate Layers. ACTA AUTOMATICA SINICA, 2015, 41(9): 1627-1637. doi: 10.16383/j.aas.2015.c150023
Citation: GAO Ying-Ying, ZHU Wei-Bin. Deep Neural Networks with Visible Intermediate Layers. ACTA AUTOMATICA SINICA, 2015, 41(9): 1627-1637. doi: 10.16383/j.aas.2015.c150023

深层神经网络中间层可见化建模

doi: 10.16383/j.aas.2015.c150023
详细信息
    作者简介:

    高莹莹 北京交通大学信息科学研究所博士研究生.主要研究方向为情感语音合成与机器学习.E-mail:10112060@bjtu.edu.cn

    通讯作者:

    朱维彬 北京交通大学信息科学研究所副教授.主要研究方向为语音识别,语音合成与机器学习.本文通信作者.E-mail:wbzhu@bjtu.edu.cn

Deep Neural Networks with Visible Intermediate Layers

  • 摘要: 深层神经网络的中间层是隐含的、未知的,这使得深层网络的学习过程不可追踪,学习结果无法解释,在一 定程度上制约了深度学习的发展.本文通过引入先验知识使深层网络的中间层具有明确的含义与显性的影响 关系,即中间层可见化,从而部分人工干预深层网络的内部结构,约束网络学习的方向.基于深层堆叠网 络 (Deep stacking network, DSN),提出两种中间层部分可见的深层神经网络:输入层部分可见的深层堆叠网络(Input-layer visible DSN, IVDSN)和隐含层部分可见的深层堆叠网络(Hidden-layer visible DSN, HVDSN),部分可见是为了保留对未知信息的提取能力和一定的容错能力.以基于文本的言语情 感计算为例测试所提网络的有效性,结果表明先验知识的引入有助于提升深层神经网络的 性能;所提两种网络均可实现中间层的部分可见化,其中HVDSN结构更精简,性能也更优.
  • [1] Yoo H J. Deep convolution neural networks in computer vision: a review. IEIE Transactions on Smart Processing and Computing, 2015, 4(1): 35-43
    [2] Oquab M, Bottou L, Laptev I, Sivic J. Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Columbus, OH: IEEE, 2014. 1717-1724
    [3] Zhang C, Zhang Z Y. Improving multiview face detection with multi-task deep convolutional neural networks. In: Proceedings of the 2014 IEEE Winter Conference on Applications of Computer Vision (WACV). Steamboat Springs, CO: IEEE, 2014. 1036-1041
    [4] Sainath T N, Kingsbury B, Saon G, Soltaua H, Mohamed A, Dahlb G, Ramabhadran R. Deep convolutional neural networks for large-scale speech tasks. Neural Networks, 2015, 64: 39-48
    [5] Deng L, Hinton G, Kingsbury B. New types of deep neural network learning for speech recognition and related applications: an overview. In: Proceedings of the 2013 International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada: IEEE, 2013. 8599-8603
    [6] Bengio S, Heigold G. Word embeddings for speech recognition. In: Proceedings of the 15th Conference of the International Speech Communication Association, Interspeech. Singapore: ISCA, 2014. 1053-1057
    [7] Le Q V, Mikolov T. Distributed representations of sentences and documents. In: Eprint Arxiv, 2014. 1188-1196
    [8] Kiros R, Zemel R S, Salakhutdinov R. A multiplicative model for learning distributed text-based attribute representations. In: Eprint Arxiv, 2014. 2348-2356
    [9] Lee C Y, Xie S N, Gallagher P, Zhang Z, Tu Z W. Deeply-supervised nets. In: Eprint Arvix, 2014. 562-570
    [10] Weston J, Ratle F, Mobahi H, Collobert R. Deep learning via semi-supervised embedding. Neural Networks: Tricks of the Trade. Berlin Heidelberg: Springer, 2012. 639-655
    [11] Deng L, Yu D, Platt J. Scalable stacking and learning for building deep architectures. In: Proceedings of the 2012 International Conference on Acoustics, Speech, and Signal Processing. Kyoto: IEEE, 2012. 2133-2136
    [12] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786): 504-507
    [13] Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. Sardinia, Italy: JMLR: W&CP, 2010. 249-256
    [14] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18(7): 1527-1554
    [15] Hinton G E. Training products of experts by minimizing contrastive divergence. Neural Computation, 2002, 14(8): 1711-1800
    [16] Yu D, Deng L. Accelerated parallelizable neural network learning algorithm for speech recognition. In: Proceedings of the 2011 Annual Conference of the International Speech Communication Association. Florence, Italy: ISCA, 2011. 2281-2284
    [17] Ekman P. An argument for basic emotions. Cognition and Emotion, 1992, 6(3-4): 169-200
    [18] Cowie R, Cornelius R R. Describing the emotional states that are expressed in speech. Speech Communication, 2003, 40(1-2): 5-32
    [19] Calvo R A, Mac K S. Emotions in text: dimensional and categorical models. Computational Intelligence, 2013, 29(3): 527-543
    [20] Trilla T, Alias F. Sentence-based sentiment analysis for expressive text-to-speech. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(2): 223-233
    [21] Bellegarda J R. A data-driven affective analysis framework toward naturally expressive speech synthesis. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(5): 1113-1122
    [22] Moors A, Ellsworth P C, Scherer K R, Frijda N H. Appraisal theories of emotion: state of the art and future development. Emotion Review, 2013, 5(2): 119-124
    [23] Gao Ying-Ying, Zhu Wei-Bin. A study of a transcription system for speech emotion. Chinese Journal of Phonetics, 2013, 4: 71-81(高莹莹, 朱维彬. 言语情感描述体系的试验性研究. 中国语音学报, 2013, 4: 71-81)
    [24] Zhang Song. Recitation Science. Beijing: Communication University of China Press, 2007.(张颂. 朗读学. 北京: 中国传媒大学出版社, 2007.)
    [25] Zhang Song. China Broadcasting Science. Beijing: Communication University of China Press, 2003.(张颂. 中国播音学. 北京: 中国传媒大学出版社, 2003.)
    [26] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. The Journal of Machine Learning Research, 2003, 3: 993-1022
  • 加载中
计量
  • 文章访问数:  2207
  • HTML全文浏览量:  118
  • PDF下载量:  1817
  • 被引次数: 0
出版历程
  • 收稿日期:  2015-01-19
  • 修回日期:  2015-05-13
  • 刊出日期:  2015-09-20

目录

    /

    返回文章
    返回