2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于权值动量的RBM加速学习算法研究

李飞 高晓光 万开方

李飞, 高晓光, 万开方. 基于权值动量的RBM加速学习算法研究. 自动化学报, 2017, 43(7): 1142-1159. doi: 10.16383/j.aas.2017.c160325
引用本文: 李飞, 高晓光, 万开方. 基于权值动量的RBM加速学习算法研究. 自动化学报, 2017, 43(7): 1142-1159. doi: 10.16383/j.aas.2017.c160325
LI Fei, GAO Xiao-Guang, WAN Kai-Fang. Research on RBM Accelerating Learning Algorithm with Weight Momentum. ACTA AUTOMATICA SINICA, 2017, 43(7): 1142-1159. doi: 10.16383/j.aas.2017.c160325
Citation: LI Fei, GAO Xiao-Guang, WAN Kai-Fang. Research on RBM Accelerating Learning Algorithm with Weight Momentum. ACTA AUTOMATICA SINICA, 2017, 43(7): 1142-1159. doi: 10.16383/j.aas.2017.c160325

基于权值动量的RBM加速学习算法研究

doi: 10.16383/j.aas.2017.c160325
基金项目: 

国家自然科学基金 61573285

国家自然科学基金 61305133

详细信息
    作者简介:

    李飞 西北工业大学电子信息学院博士研究生.2011年获得西北工业大学系统工程专业学士学位.主要研究方向为机器学习和深度学习.E-mail:nwpulf@mail.nwpu.edu.cn

    万开方 西北工业大学电子信息学院博士研究生.2010年获得西北工业大学系统工程专业学士学位.主要研究方向为航空火力控制.E-mail:yibai 2003@126.com

    通讯作者:

    高晓光 西北工业大学电子信息学院教授.1989年获得西北工业大学飞行器导航与控制系统博士学位.主要研究方向为贝叶斯和航空火力控制.本文通信作者. E-mail:cxg2012@nwpu.edu.cn

Research on RBM Accelerating Learning Algorithm with Weight Momentum

Funds: 

National Natural Science Foundation of China 61573285

National Natural Science Foundation of China 61305133

More Information
    Author Bio:

      Ph. D. candidate at the School of Electronics and Information, Northwestern Polytechnical University

      Ph. D. candidate at the School of Electronics and Information, Northwestern Polytechnical University. He received his bachelor degree in system engineering from Northwestern Polytechnical University in 2010. His main research interest is airborne flre control

    Corresponding author: GAO Xiao-Guang  Professor at the School of Electronics and Information, Northwestern Polytechnical University. She received her Ph. D. degree in aircraft navigation and control system in 1989. Her research interest covers Bayes and airborne flre control. Corresponding author of this paper. E-mail:cxg2012@nwpu.edu.cn
  • 摘要: 动量算法理论上可以加速受限玻尔兹曼机(Restricted Boltzmann machine,RBM)网络的训练速度.本文通过对现有动量算法进行仿真研究,发现现有动量算法在受限玻尔兹曼机网络训练中加速效果较差,且在训练后期逐渐失去了加速性能.针对以上问题,本文首先基于Gibbs采样收敛性定理对现有动量算法进行了理论分析,证明了现有动量算法的加速效果是以牺牲网络权值为代价的;然后,本文进一步对网络权值进行研究,发现网络权值中包含大量真实梯度的方向信息,这些方向信息可以用来对网络进行训练;基于此,本文提出了基于网络权值的权值动量算法,最后给出了仿真实验.实验结果表明,本文提出的动量算法具有更好的加速效果,并且在训练后期仍然能够保持较好的加速性能,可以很好地弥补现有动量算法的不足.
    1)  本文责任编委 魏庆来
  • 图  1  RBM结构

    Fig.  1  Configuration of RBM

    图  2  动量示意图

    Fig.  2  Momentum diagram

    图  3  重构误差对比图

    Fig.  3  Compassion of reconstruction errors

    图  4  重构误差差值对比图

    Fig.  4  Compassion of the difference of the reconstruction errors

    图  5  Sigmoid函数示意图

    Fig.  5  Sigmoid diagram

    图  6  网络权值 $w$ 对比图

    Fig.  6  Compassion of $w$

    图  7  网络权值 $w$ 差值对比图

    Fig.  7  Compassion of the difference of $w$

    图  8  梯度对比图

    Fig.  8  Compassion of gradients

    图  9  梯度差值对比图

    Fig.  9  Compassion of the difference of gradients

    图  10  权值衰减下网络权值 $w$ 对比图

    Fig.  10  Compassion of $w$

    图  11  权值衰减下网络权值差值对比图

    Fig.  11  Compassion of the difference of $w$

    图  12  权值衰减下重构误差对比图

    Fig.  12  Compassion of the reconstruction errors

    图  13  权值衰减下重构误差差值对比图

    Fig.  13  Compassion of the difference of the reconstruction errors

    图  14  梯度对比图

    Fig.  14  Compassion of the gradients

    图  15  梯度差值对比图

    Fig.  15  Compassion of the difference of gradients

    图  16  重构误差对比图

    Fig.  16  Compassion of the reconstruction errors

    图  17  网络权值对比图

    Fig.  17  Compassion of the difference of $w$

    图  18  重构误差对比图

    Fig.  18  Compassion of reconstruction errors

    图  19  网络权值对比图

    Fig.  19  Compassion of the value of $w$

    图  20  梯度对比图

    Fig.  20  Compassion of gradients

    图  21  重构误差对比图

    Fig.  21  Compassion of reconstruction errors

    图  22  重构误差差值对比图

    Fig.  22  Compassion of the difference of the reconstruction errors

    图  23  迭代初期梯度差值对比图

    Fig.  23  Compassion of the difference of the gradients in initial stages of iteration

    图  24  迭代中期梯度差值对比图

    Fig.  24  Compassion of the difference of the gradients in mid-term of iteration

    图  25  迭代中期重构误差对比图

    Fig.  25  Compassion of reconstruction errors in mid-term of iteration

    图  26  迭代后期梯度对比图

    Fig.  26  Compassion of gradients in late-stage of iteration

    图  27  迭代末期梯度对比图

    Fig.  27  Compassion of gradients in late-stage of iteration

    图  28  网络权值对比图

    Fig.  28  Compassion of $w$

    图  29  网络梯度差值对比图

    Fig.  29  Compassion of the difference of the gradients

    图  30  原始图片

    Fig.  30  Original image

    图  31  CD算法重构图

    Fig.  31  Reconstructed image by CD

    图  32  CM算法重构图

    Fig.  32  Reconstructed image by CM

    图  33  NM算法重构图

    Fig.  33  Reconstructed image by NM

    图  34  CDW算法重构图

    Fig.  34  Reconstructed image by CDW

    图  35  CMW算法重构图

    Fig.  35  Reconstructed image by CMW

    图  36  NMW算法重构图

    Fig.  36  Reconstructed image by NMW

    图  37  原始图片

    Fig.  37  Original image

    图  38  重构误差对比图

    Fig.  38  Compassion of reconstruction errors

    图  39  原始图片

    Fig.  39  Original image

    图  40  重构误差对比图

    Fig.  40  Compassion of reconstruction errors

    图  41  原始图片

    Fig.  41  Original image

    图  42  重构误差对比图

    Fig.  42  Compassion of reconstruction errors

    图  43  原始图片

    Fig.  43  Original image

    图  44  重构误差对比图

    Fig.  44  Compassion of reconstruction errors

    表  1  网络参数值

    Table  1  The value of network parameters

    网络参数初始值
    $a$ zeros $(1, 784) $
    $b$ zeros $(1, 500) $
    $w$ $0.1\times randn(784,500)$
    $\eta $ $0.1$
    $\mu $ $0.9 $
    下载: 导出CSV

    表  2  训练参数

    Table  2  Training parameters

    算法参数 $\mu $ $\lambda$ $\alpha $
    CD0.9
    CM0.9
    NM0.9
    CMD0.90.00001
    NMD0.90.00001
    CDW0.90.0001
    CMW0.90.0001
    NMW0.90.0001
    下载: 导出CSV

    表  3  记号示意图

    Table  3  Sign diagram

    代号差值项
    ACM-CD
    BNM-CD
    CCMW-CD
    DNMW-CD
    ECDW-CD
    FCMW-CD
    GNMW-CD
    下载: 导出CSV

    表  4  网络参数值

    Table  4  The value of network parameters

    网络参数初始值
    $a$ zeros $(1, 1024) $
    $b$ zeros $(1, 800) $
    $w$ $0.1\times randn(1024,800)$
    $\eta $ $0.01$
    $\mu $ $0.9$
    下载: 导出CSV

    表  5  网络参数值

    Table  5  The value of network parameters

    网络参数初始值
    $a$ zeros $(1, 3072) $
    $b$ zeros $(1, 2000) $
    $w$ $0.1\times randn(3072,2000)$
    $\eta $ $0.01$
    $\mu $ $0.9$
    下载: 导出CSV

    表  6  网络参数值

    Table  6  The value of network parameters

    网络参数初始值
    $a$ zeros $(1, 3072) $
    $b$ zeros $(1, 2000) $
    $w$ $0.1\times randn(3072,2000)$
    $\eta $ $0.01$
    $\mu $ $0.9 $
    下载: 导出CSV

    表  7  网络参数值

    Table  7  The value of network parameters

    网络参数初始值
    $a$ zeros $(1, 4096) $
    $b$ zeros $(1, 3000) $
    $w$ $0.1\times randn(4096,3000)$
    $\eta $ $0.01$
    $\mu $ $0.9 $
    下载: 导出CSV
  • [1] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786):504-507 doi: 10.1126/science.1127647
    [2] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18(7):1527-1554 doi: 10.1162/neco.2006.18.7.1527
    [3] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In:Proceedings of Advances in Neural Information Processing Systems 25. Cambridge, MA:MIT Press, 2012.
    [4] Bengio Y. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2009, 21(6):1-27 http://www.iro.umontreal.ca/~pift6266/A08/documents/ftml.pdf
    [5] Deng L, Abdel-Hamid O, Yu D. A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion. In:Proceedings of the 2013 International Conference on Acoustics Speech and Signal Processing (ICASSP). Vancouver, BC, Canada:IEEE, 2013. 6669-6673
    [6] Deng L. Design and learning of output representations for speech recognition. In:Neural Information Processing Systems (NIPS) Workshop on Learning Output Representations. Lake Tahoe, USA:NIPS, 2013.
    [7] Tan C C, Eswaran C. Reconstruction and recognition of face and digit images using autoencoders. Neural Computing and Applications, 2010, 19(7):1069-1079 doi: 10.1007/s00521-010-0378-4
    [8] 郭潇逍, 李程, 梅俏竹.深度学习在游戏中的应用.自动化学报, 2016, 42(5):676-684 http://www.aas.net.cn/CN/abstract/abstract18857.shtml

    Guo Xiao-Xiao, Li Cheng, Mei Qiao-Zhu. Deep learning applied to games. Acta Automatica Sinica, 2016, 42(5):676-684 http://www.aas.net.cn/CN/abstract/abstract18857.shtml
    [9] 田渊栋.阿法狗围棋系统的简要分析.自动化学报, 2016, 42(5):671-675 http://www.aas.net.cn/CN/abstract/abstract18856.shtml

    Tian Yuan-Dong. A simple analysis of AlphaGo. Acta Automatica Sinica, 2016, 42(5):671-675 http://www.aas.net.cn/CN/abstract/abstract18856.shtml
    [10] 段艳杰, 吕宜生, 张杰, 赵学亮, 王飞跃.深度学习在控制领域的研究现状与展望.自动化学报, 2016, 42(5):643-654 http://www.aas.net.cn/CN/abstract/abstract18852.shtml

    Duan Yan-Jie, Lv Yi-Sheng, Zhang Jie, Zhao Xue-Liang, Wang Fei-Yue. Deep learning for control:the state of the art and prospects. Acta Automatica Sinica, 2016, 42(5):643-654 http://www.aas.net.cn/CN/abstract/abstract18852.shtml
    [11] 耿杰, 范剑超, 初佳兰, 王洪玉.基于深度协同稀疏编码网络的海洋浮筏SAR图像目标识别.自动化学报, 2016, 42(4):593-604 http://www.aas.net.cn/CN/abstract/abstract18846.shtml

    Geng Jie, Fan Jian-Chao, Chu Jia-Lan, Wang Hong-Yu. Research on marine floating raft aquaculture SAR image target recognition based on deep collaborative sparse coding network. Acta Automatica Sinica, 2016, 42(4):593-604 http://www.aas.net.cn/CN/abstract/abstract18846.shtml
    [12] Deng L, Hinton G, Kingsbury B. New types of deep neural network learning for speech recognition and related applications:an overview. In:Proceedings of the 2013 International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vancouver, BC, Canada:IEEE, 2013. 8599-8603
    [13] Erhan D, Courville A, Bengio Y, Vincent P. Why does unsupervised pre-training help deep learning? In:Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS). Chia Laguna Resort, Sardinia, Italy:AISTATS, 2010. 201-208
    [14] Smolensky P. Information processing in dynamical systems:foundations of harmony theory. Parallel Distributed Processing:Explorations in the Microstructure of Cognition, vol.1:Foundations. Cambridge:MIT Press, 1986. 194-281
    [15] Hinton G E. Training products of experts by minimizing contrastive divergence. Neural Computation, 2002, 14(8):1771-1800 doi: 10.1162/089976602760128018
    [16] Tieleman T. Training restricted Boltzmann machines using approximations to the likelihood gradient. In:Proceedings of the 25th International Conference on Machine Learning. New York:ACM, 2008. 1064-1071
    [17] Tieleman T, Hinton G. Using fast weights to improve persistent contrastive divergence. In:Proceedings of the 26th International Conference on Machine Learning (ICML). Montreal, Quebec, Canada:ACM, 2009. 1033-1040
    [18] Desjardins G, Courville A C, Bengio Y, Vincent P, Dellaleau O. Tempered Markov chain Monte Carlo for training of restricted Boltzmann machines. In:Proceedings of the 13th International Workshop on Artificial Intelligence and Statistics (AISTATS). Chia Laguna Resort, Sardinia, Italy:AISTATS, 2010. 45-152
    [19] Cho K, Raiko T, Ilin A. Parallel tempering is efficient for learning restricted Boltzmann machines. In:Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN). Barcelona, Spain:IEEE, 2010. 3246-3253
    [20] Brakel P, Dieleman S, Schrauwen B. Training restricted Boltzmann machines with multi-tempering:harnessing parallelization. In:European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN). Belgium:Evere, 2012. 287-292
    [21] Fischer A, Igel C. Training restricted Boltzmann machines:an introduction. Pattern Recognition, 2014, 47(1):25-39 doi: 10.1016/j.patcog.2013.05.025
    [22] Polyak B T. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 1964, 4(5):1-17 doi: 10.1016/0041-5553(64)90137-5
    [23] Fischer A, Igel C. Empirical analysis of the divergence of Gibbs sampling based learning algorithms for restricted Boltzmann machines. Artificial Neural Networks. Berlin Heidelberg:Springer, 2010. 208-217
    [24] Hinton G E. A practical guide to training restricted Boltzmann machines. Neural Networks:Tricks of the Trade (Second edition). Berlin Heidelberg:Springer, 2012. 599-619
    [25] Sutskever I, Martens J, Dahl G, Hinton G. On the importance of initialization and momentum in deep learning. In:Proceedings of the 30th International Conference on Machine Learning. Atlanta, Georgia, USA:ICML, 2013. 1139-1147
    [26] Zarȩba S, Gonczarek A, Tomczak J M, Świątek J. Accelerated learning for restricted Boltzmann machine with momentum term. Progress in Systems Engineering. Switzerland:Springer International Publishing, 2015. 330:187-192
    [27] Bengio Y, Delalleau O. Justifying and generalizing contrastive divergence. Neural Computation, 2009, 21(6):1601-1621 doi: 10.1162/neco.2008.11-07-647
    [28] Carreira-Perpiñán M Á, Hinton G E. On contrastive divergence learning. In:Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics (AISTATS). Barbados:The Society for Artificial Intelligence and Statistics, 2005. 59-66
    [29] Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11):2278-2324 doi: 10.1109/5.726791
    [30] Krizhevsky A. Learning multiple layers of features from tiny images[Master dissertation], University of Toronto, Toronto, Canada, 2009.
    [31] Roweis S. available:http://www.cs.nyu.edu/~roweis/, July 2, 2016.
    [32] Torralba A, Fergus R, Freeman W T. 80 million tiny images:a large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(11):1958-1970 doi: 10.1109/TPAMI.2008.128
    [33] LeCun Y, Huang F J, Bottou L. Learning methods for generic object recognition with invariance to pose and lighting. In:Proceedings of the 2004 IEEE Computer Society Conference Computer Vision and Pattern Recognition. Washington, DC, USA:IEEE, 2004. 2(2):Ⅱ-97-104 https://nyuscholars.nyu.edu/en/publications/learning-methods-for-generic-object-recognition-with-invariance-t
  • 加载中
图(44) / 表(7)
计量
  • 文章访问数:  1997
  • HTML全文浏览量:  284
  • PDF下载量:  830
  • 被引次数: 0
出版历程
  • 收稿日期:  2016-04-11
  • 录用日期:  2016-09-30
  • 刊出日期:  2017-07-20

目录

    /

    返回文章
    返回