2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

递归最小二乘循环神经网络

赵杰 张春元 刘超 周辉 欧宜贵 宋淇

赵杰, 张春元, 刘超, 周辉, 欧宜贵, 宋淇. 递归最小二乘循环神经网络. 自动化学报, 2022, 48(8): 2050−2061 doi: 10.16383/j.aas.c190847
引用本文: 赵杰, 张春元, 刘超, 周辉, 欧宜贵, 宋淇. 递归最小二乘循环神经网络. 自动化学报, 2022, 48(8): 2050−2061 doi: 10.16383/j.aas.c190847
Zhao Jie, Zhang Chun-Yuan, Liu Chao, Zhou Hui, Ou Yi-Gui, Song Qi. Recurrent neural networks with recursive least squares. Acta Automatica Sinica, 2022, 48(8): 2050−2061 doi: 10.16383/j.aas.c190847
Citation: Zhao Jie, Zhang Chun-Yuan, Liu Chao, Zhou Hui, Ou Yi-Gui, Song Qi. Recurrent neural networks with recursive least squares. Acta Automatica Sinica, 2022, 48(8): 2050−2061 doi: 10.16383/j.aas.c190847

递归最小二乘循环神经网络

doi: 10.16383/j.aas.c190847
基金项目: 国家自然科学基金(61762032, 61662019, 11961018)资助
详细信息
    作者简介:

    赵杰:海南大学计算机科学与技术学院硕士研究生. 主要研究方向为深度学习和强化学习.E-mail: zhaojie@lonelyme.cn

    张春元:海南大学计算机科学与技术学院副教授. 2016年获得电子科技大学计算机软件与理论博士学位. 主要研究方向为深度学习与强化学习. 本文通信作者.E-mail: zcy7566@126.com

    刘超:海南大学计算机科学与技术学院硕士研究生. 主要研究方向为深度学习与强化学习.E-mail: lcdyx0618@126.com

    周辉:海南大学计算机科学与技术学院副教授. 2008年获得中国科学院软件研究所博士学位. 主要研究方向为自然语言处理, 人工智能写作与数据可视化.E-mail: zhouhui@hainanu.edu.cn

    欧宜贵:海南大学理学院教授. 2003年获得中国科学技术大学博士学位. 主要研究方向为最优化算法.E-mail: ouyigui@126.com

    宋淇:海南大学计算机科学与技术学院硕士研究生. 主要研究方向为深度学习与强化学习.E-mail: songqihnu@163.com

Recurrent Neural Networks With Recursive Least Squares

Funds: Supported by National Natural Science Foundation of China (61762032, 61662019, 11961018)
More Information
    Author Bio:

    ZHAO Jie Master student at the School of Computer Science and Technology, Hainan University. His research interest covers deep learning and reinforcement learning

    ZHANG Chun-Yuan Associate professor at the School of Computer Science and Technology, Hainan University. He received his Ph.D. degree in computer software and theory from University of Electronic Science and Technology of China in 2016. His research interest covers deep learning and reinforcement learning. Corresponding author of this paper

    LIU Chao Master student at the School of Computer Science and Technology, Hainan University. His research interest covers deep learning and reinforcement learning

    ZHOU Hui Associate professor at the School of Computer Science and Technology, Hainan University. He received his Ph.D. degree from the Software Institute, Chinese Academy of Sciences in 2008. His research interest covers natural language processing, artificial intelligence writing, and data visualization

    OU Yi-Gui Professor at the Sch-ool of Science, Hainan University. He received his Ph.D. degree from University of Science and Technology of China in 2003. His main research interest is numerical optimization algorithm

    SONG Qi Master student at the School of Computer Science and Technology, Hainan University. Her research interest covers deep learning and reinforcement learning

  • 摘要: 针对循环神经网络(Recurrent neural networks, RNNs)一阶优化算法学习效率不高和二阶优化算法时空开销过大, 提出一种新的迷你批递归最小二乘优化算法. 所提算法采用非激活线性输出误差替代传统的激活输出误差反向传播, 并结合加权线性最小二乘目标函数关于隐藏层线性输出的等效梯度, 逐层导出RNNs参数的迷你批递归最小二乘解. 相较随机梯度下降算法, 所提算法只在RNNs的隐藏层和输出层分别增加了一个协方差矩阵, 其时间复杂度和空间复杂度仅为随机梯度下降算法的3倍左右. 此外, 本文还就所提算法的遗忘因子自适应问题和过拟合问题分别给出一种解决办法. 仿真结果表明, 无论是对序列数据的分类问题还是预测问题, 所提算法的收敛速度要优于现有主流一阶优化算法, 而且在超参数的设置上具有较好的鲁棒性.
  • 图  1  RNN模型结构

    Fig.  1  RNN model structure

    图  2  收敛性比较实验结果

    Fig.  2  Experimental results on the convergence comparisons

    表  1  SGD-RNN与RLS-RNN复杂度分析

    Table  1  Complexity analysis of SGD-RNN and RLS-RNN

    SGD-RNN RLS-RNN
    时间复杂度$O_{s}$${\rm{O}}(\tau mdh)$
    $Z_{s}$ ${\rm{O}}(\tau mdh)$
    $H_{s}$ ${\rm{O}}(\tau mh(h+a))$ ${\rm{O}}(\tau mh(h+a))$
    ${\Delta}^O_{s}$ ${\rm{O}}(4\tau md)$ ${\rm{O}}(3\tau md)$
    ${\Delta}^H_{s}$ ${\rm{O}}(\tau mh(h+d))$ ${\rm{O}}(\tau mh(h+d))$
    ${P}_{s}^O$ ${\rm{O}}(2\tau mh^2)$
    ${P}_{s}^H$ ${\rm{O}}(2\tau m(h+a)^2)$
    ${\Theta}_{s}^O$ ${\rm{O}}(\tau mdh)$ ${\rm{O}}(\tau mdh)$
    ${\Theta}_{s}^H$ ${\rm{O}}(\tau mh(h+a))$ ${\rm{O}}(\tau mh(h+a))$
    合计 ${\rm{O}}(\tau m(3dh+3h^2+2ha))$ ${\rm{O}}(\tau m(7h^2+2a^2+3dh+6ha))$
    空间复杂度 $\Theta_{s}^O$ ${\rm{O}}(hd)$ ${\rm{O}}(hd)$
    $\Theta_{s}^H$ ${\rm{O}}(h(h+a))$ ${\rm{O}}(h(h+a))$
    ${P}_{s}^H$ ${\rm{O}}((h+a)^2)$
    ${P}_{s}^O$ ${\rm{O}}(h^2)$
    合计 ${\rm{O}}(h^2+hd+ha)$ ${\rm{O}}(hd+3ha+a^2+3h^2)$
    下载: 导出CSV

    表  2  初始化因子$\alpha$鲁棒性分析

    Table  2  Robustness analysis of the initializing factor $\alpha$

    $\alpha$ 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
    MNIST分类准确率 (%) 97.10 97.36 97.38 97.35 97.57 97.70 97.19 97.27 97.42 97.25 97.60
    IMDB分类准确率 (%) 72.21 73.50 73.24 73.32 74.02 73.01 73.68 73.25 73.20 73.42 73.12
    股价预测MSE ($\times 10^{-4}$) 5.32 5.19 5.04 5.43 5.42 5.30 4.87 4.85 5.32 5.54 5.27
    PM2.5预测MSE ($\times 10^{-3}$) 1.58 1.55 1.53 1.55 1.61 1.55 1.55 1.54 1.57 1.58 1.57
    下载: 导出CSV

    表  3  比例因子$\eta$鲁棒性分析

    Table  3  Robustness analysis of the scaling factor $\eta$

    $\eta$ 0.1 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
    MNIST分类准确率 (%) 97.80 97.59 97.48 97.61 97.04 97.62 97.44 97.33 97.38 97.37 97.45
    IMDB分类准确率 (%) 73.58 73.46 73.62 73.76 73.44 73.82 73.71 72.97 72.86 73.12 73.69
    股价预测MSE ($\times 10^{-4}$) 5.70 5.32 5.04 5.06 5.61 4.73 5.04 5.14 4.85 4.97 5.19
    PM2.5预测MSE ($\times 10^{-3}$) 1.53 1.55 1.56 1.59 1.56 1.53 1.58 1.55 1.54 1.50 1.52
    下载: 导出CSV
  • [1] Mikolov T, Karafiát M, Burget L, Cernockỳ J, Khudanpur S. Recurrent neural network based language model. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association. Chiba, Japan: ISCA, 2010. 1045−1048
    [2] Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In: Proceeding of the 27th International Conference on Neural Information Processing Systems. Montréal, Canada: MIT Press, 2014. 3104−3112
    [3] Graves A, Mohamed A R, Hinton G. Speech recognition with deep recurrent neural networks. In: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada: IEEE, 2013. 6645−6649
    [4] Pascanu R, Mikolov T, Bengio Y. On the difficulty of training recurrent neural networks. In: Proceedings of the 30th International Conference on Machine Learning. Atlanta, USA: JMLR.org, 2013. 1310−1318
    [5] Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 1994, 5(2): 157−166 doi: 10.1109/72.279181
    [6] Bottou L. On-line learning and stochastic approximations. On-Line Learning in Neural Networks. New York: Cambridge University Press, 1999. 9−42
    [7] Polyak B T. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 1964, 4(5): 1−17 doi: 10.1016/0041-5553(64)90137-5
    [8] Sutskever I, Martens J, Dahl G, Hinton G. On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on International Conference on Machine Learning. Atlanta, USA: JMLR.org, 2013. 1139−1147
    [9] Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, 2011, 12: 2121−2159
    [10] Tieleman T, Hinton G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude [Online], available: https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf, February 19, 2020
    [11] Zeiler M D. ADADELTA: An adaptive learning rate method [Online], available: https://arxiv.org/abs/1212.5701\newblock, February 19, 2020
    [12] Kingma D P, Ba J. Adam: A method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: ICLR, 2015.
    [13] Keskar N S, Socher R. Improving generalization performance by switching from adam to SGD [Online], available: https://arxiv.org/abs/1712.07628, February 19, 2020
    [14] Reddi S J, Kale S, Kumar S. On the convergence of Adam and beyond. In: Proceedings of the 6th International Conference on Learning Representations. Vancouver, Canada: OpenReview.net, 2018.
    [15] Martens J. Deep learning via hessian-free optimization. In: Proceedings of the 27th International Conference on Machine Learning. Haifa, Israel: Omnipress, 2010. 735−742
    [16] Keskar N S, Berahas A S. adaQN: An adaptive quasi-newton algorithm for training RNNs. In: Proceedings of the 2016 Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Riva del Garda, Italy: Springer, 2016. 1−16.
    [17] Martens J, Grosse R. Optimizing neural networks with kronecker-factored approximate curvature. In: Proceedings of the 32nd International Conference on Machine Learning. Lille, France: JMLR.org, 2015. 2408−2417
    [18] Shanno D F. Conditioning of quasi-newton methods for function minimization. Mathematics of Computation, 1970, 24(111): 647−656 doi: 10.1090/S0025-5718-1970-0274029-X
    [19] Liu D C, Nocedal J. On the limited memory BFGS method for large scale optimization. Mathematical Programming, 1989, 45(1−3): 503−528 doi: 10.1007/BF01589116
    [20] Le Q V, Ngiam J, Coates A, Lahiri A, Prochnow B, Ng A Y. On optimization methods for deep learning. In: Proceedings of the 28th International Conference on Machine Learning. Bellevue, USA: Omnipress, 2011. 265−272
    [21] Azimi-Sadjadi M R, Liou R J. Fast learning process of multilayer neural networks using recursive least squares method. IEEE Transactions on Signal Processing, 1992, 40(2): 446−450 doi: 10.1109/78.124956
    [22] 谭永红. 多层前向神经网络的RLS训练算法及其在辨识中的应用. 控制理论与应用, 1994, 11(5): 594−599

    Tan Yong-Hong. RLS training algorithm for multilayer feedforward neural networks and its application to identification. Control Theory and Applications, 1994, 11(5): 594−599
    [23] Xu Q, Krishnamurthy K, McMillin B, Lu W. A recursive least squares training algorithm for multilayer recurrent neural networks. In: Proceedings of the 1994 American Control Conference-ACC’94. Baltimore, USA: IEEE, 1994. 1712−1716
    [24] Peter T, Pospíchal J. Neural network training with extended Kalman filter Using graphics processing unit. In: Proceedings of the 18th International Conference on Artificial Neural Networks. Berlin, Germany: Springer, 2008. 198−207
    [25] Jaeger H. Adaptive nonlinear system identification with echo state networks. In: Proceedings of the 15th International Conference on Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2002. 609−616
    [26] Werbos P J. Backpropagation through time: What it does and how to do it. Proceedings of the IEEE, 1990, 78(10): 1550−1560 doi: 10.1109/5.58337
    [27] Sherman J, Morrison W J. Adjustment of an inverse matrix corresponding to a change in one element of a given matrix. The Annals of Mathematical Statistics, 1950, 21(1): 124−127 doi: 10.1214/aoms/1177729893
    [28] Paleologu C, Benesty J, Ciochina S. A robust variable forgetting factor recursive least-squares algorithm for system identification. IEEE Signal Processing Letters, 2008, 15: 597−600 doi: 10.1109/LSP.2008.2001559
    [29] Albu F. Improved variable forgetting factor recursive least square algorithm. In: Proceedings of the 12th International Conference on Control Automation Robotics and Vision (ICARCV). Guangzhou, China: IEEE, 2012. 1789−1793
    [30] Ekșioğlu E M. RLS adaptive filtering with sparsity regularization. In: Proceedings of the 10th International Conference on Information Science, Signal Processing and Their Applications (ISSPA 2010). Kuala Lumpur, Malaysia: IEEE, 2010. 550−553
    [31] LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278−2324 doi: 10.1109/5.726791
    [32] Gawlik D. New York stock exchange [Online], available: https://www.kaggle.com/dgawlik/nyse, February 19, 2020
    [33] Liang X, Zou T, Guo B, Li S, Zhang H Z, Zhang S Y, et al. Assessing Beijing's PM2.5 pollution: Severity, weather impact, APEC and winter heating. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2015, 471(2182): Article No. 20150257
    [34] He K M, Zhang X Y, Ren S Q, Sun J. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 1026−1034
    [35] Pennington J, Socher R, Manning C. GloVe: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: ACL, 2014. 1532−1543
  • 加载中
图(2) / 表(3)
计量
  • 文章访问数:  760
  • HTML全文浏览量:  183
  • PDF下载量:  308
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-12-12
  • 录用日期:  2020-04-07
  • 网络出版日期:  2022-07-12
  • 刊出日期:  2022-06-01

目录

    /

    返回文章
    返回