2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

一种求解符号回归问题的粒子群优化算法

马炫 李星 唐荣俊 刘庆

马炫, 李星, 唐荣俊, 刘庆. 一种求解符号回归问题的粒子群优化算法. 自动化学报, 2020, 46(8): 1714−1726 doi: 10.16383/j.aas.c180035
引用本文: 马炫, 李星, 唐荣俊, 刘庆. 一种求解符号回归问题的粒子群优化算法. 自动化学报, 2020, 46(8): 1714−1726 doi: 10.16383/j.aas.c180035
Ma Xuan, Li Xing, Tang Rong-Jun, Liu Qing. A particle swarm optimization approach for symbolic regression. Acta Automatica Sinica, 2020, 46(8): 1714−1726 doi: 10.16383/j.aas.c180035
Citation: Ma Xuan, Li Xing, Tang Rong-Jun, Liu Qing. A particle swarm optimization approach for symbolic regression. Acta Automatica Sinica, 2020, 46(8): 1714−1726 doi: 10.16383/j.aas.c180035

一种求解符号回归问题的粒子群优化算法

doi: 10.16383/j.aas.c180035
基金项目: 

国家自然科学基金 61502385

详细信息
    作者简介:

    李星  西安理工大学自动化与信息工程学院硕士研究生.主要研究方向为数据驱动建模, 机器学习. E-mail: peagle4@126.com

    唐荣俊  西安理工大学自动化与信息工程学院研究生.主要研究方向为数据驱动建模. E-mail: tangrj_xian@126.com

    刘庆  博士, 西安理工大学自动化与信息工程学院讲师.主要研究方向为数据驱动建模, 网络路由优化, 群智能算法. E-mail: liuqing@xaut.edu.cn

    通讯作者:

    马炫  西安理工大学自动化与信息工程学院教授. 2002年获得日本福井大学系统工程专业博士学位.主要研究方向为进化计算, 机器学习, 数据驱动建模.本文通信作者. E-mail: maxuan@xaut.edu.cn

A Particle Swarm Optimization Approach for Symbolic Regression

Funds: 

National Natural Science Foundation of China 61502385

More Information
    Author Bio:

    LI Xing  Master student at the School of Automation and Information Engineering, Xi\begin{document}$'$\end{document}an University of Technology. His research interest covers data-driven modeling, machine learning

    TANG Rong-Jun  Master student at the School of Automation and Information Engineering, Xi'an University of Technology. His main research interest is data-driven modeling

    LIU Qing  Ph. D., lecturer at the School of automation and Information Engineering, Xi'an University of technology. His research interest covers data-driven modeling, network routing optimization, and swarm intelligence algorithms

    Corresponding author: MA Xuan    Professor at the School of Automation and Information Engineering, Xi'an University of Technology. He received his Ph. D. degree in systems engineering from University of Fukui, Japan, in 2002. His research interest covers evolutionary computation, machine learning, and data-driven modeling. Corresponding author of this paper
  • 摘要: 符号回归以构建一个能拟合给定数据集的函数模型为目的, 是对基本函数、运算符、变量等进行组合优化的过程.本文提出了一种求解符号回归问题的粒子群优化算法.算法以语法树对函数模型进行表达, 采用基因表达式将语法树编码为一个粒子, 设计了粒子的飞行方法及$r$-邻域环状拓扑的粒子学习关系.为使粒子具有跳出局部极值的能力和减轻粒子快速趋同对全局寻优造成的不利影响, 分别设计了突变算子和散开算子.此外, 为了得到比较简洁的函数模型, 在粒子的评价函数中以罚函数的方式对编码的有效长度进行控制.仿真实验表明, 提出的算法可以获得拟合精度更高、简洁性更好的函数模型.
    Recommended by Associate Editor WU Zhou
    1)  本文责任编委 伍洲
  • 图  1  语法树

    Fig.  1  Syntax tree

    图  2  $X'$的语法树

    Fig.  2  The syntax tree of $X'$

    图  3  $r$-邻域环状粒子群拓扑

    Fig.  3  The ring topology of $r$-neighborhood particles

    图  4  散开算子

    Fig.  4  Scattering operator

    图  5  算法流程图

    Fig.  5  Algorithm flowchart

    图  6  成功运行百分比柱状图

    Fig.  6  Histogram of successful running percentage

    图  7  平均最佳适应值柱状图

    Fig.  7  Histogram of average best fitness value

    图  8  函数模型曲线

    Fig.  8  Function model curve

    图  9  相对误差分布比较

    Fig.  9  Comparison of the relative error distribution

    图  10  具有不同算子的算法迭代曲线比较

    Fig.  10  Iterative curve comparison of algorithms with different operators

    图  11  日本人口预测模型

    Fig.  11  Model of Japanese population prediction

    表  1  粒子编码

    Table  1  Particle coding

    Node code012345678910
    $X$+expln$x$212$x $1$x $$x$
    下载: 导出CSV

    表  2  粒子$X$、$gbest$、$pbest$以及$X'$的编码

    Table  2  The code of particle $X$, $gbest$, $pbest$ and $X'$

    Node code012345678910
    $X$+$ x $$\sin$$-$ln$ x $$a$$ x $$ x $$a$$ a$
    $pbest$+$ x $$\cos$/$-$$ x $$a$$ x $$a$$ x $$a$
    $gbest$$\exp$+$a$$-$ln$a$$a$$ x $$a$$ x $$a$
    $X'$++$\sin$/ln$ x $$a$$ x $$ x $$a$$a$
    下载: 导出CSV

    表  3  粒子$X$和$X'$的编码

    Table  3  The code of particle $X$ and $X'$

    Node code012345678910
    $X$++sin/ln$ x $$a$$ x $$ x $$a$$a$
    $X'$+$\times$sincoslna$a$$ x $$ x $$a$$a$
    下载: 导出CSV

    表  4  算法参数

    Table  4  Algorithm parameters

    参数参数取值
    节点评价个数$15\, 000\, 000$
    种群数量50
    首段长度10
    约束长度18
    罚因子0.0001
    环状结构邻域大小$2r+1, r=2$
    终止符$x, 1$ (单变量); $x, y$ (双变量)
    函数及运算符$+, -, \times, /$, sin, cos, e$^x$, ln
    测试次数100
    下载: 导出CSV

    表  5  测试函数

    Table  5  Benchmark functions

    FunctionsFitcases
    $F1=x^3+x^2+x$20 random points $\subseteq [-1, 1]$
    $F2=x^4+x^3+x^2+x$20 random points $\subseteq [-1, 1]$
    $F3=x^5+x^4+x^3+x^2+x$20 random points $\subseteq [-1, 1]$
    $F4=x^6+x^5+x^4+x^3+x^2+x$20 random points $\subseteq [-1, 1]$
    $F5=\sin(x^2)\cos(x)-1$20 random points $\subseteq [-1, 1]$
    $F6=\sin(x)+\sin(x+x^2)$20 random points $\subseteq [-1, 1]$
    $F7={\rm ln}(x+1)+{\rm ln}(x^2+1)$20 random points $\subseteq [0, 2]$
    $F8=\sqrt x$20 random points $\subseteq [0, 4]$
    $F9=\sin(x)+\sin(y^2)$100 random points $\subseteq [-1, 1]$
    $F10=2\sin(x)\cos(y)$100 random points $\subseteq [-1, 1]$
    下载: 导出CSV

    表  6  成功运行百分比(%)

    Table  6  Percentage of successful runs (%)

    TechniquesFunctions
    F1F2F3F4F5F6F7F8F9F10
    本文算法${\bf 100 }$${\bf 82}$${\bf 22}$3${\bf 60}$841179${\bf 94}$72
    AFSA978092477326${\bf 98}$78${\bf 80}$
    ABCP8950${\bf 22}$${\bf 12}$57${\bf 87}$58373321
    SC48227420353516718
    NSM48164419364028417
    SAC25325741732251344
    SAC35619622123251238
    SAC453171112023291438
    SAC553171111927301238
    CAC13419771222259115
    CAC23420771323259216
    CAC435227812222610316
    SBS31431596312831171333
    SBS32422678362744301727
    SBS345121109343346252633
    SBS41412295313438251933
    SBS4250221710413251242433
    SBS444025169354342283334
    SSC86628${\bf 22}$10485659212547
    SSC12673314${\bf 12}$474766383751
    SSC16553920114644${\bf 67}$293059
    SSC205827109524863263951
    下载: 导出CSV

    表  7  平均最佳适应值(%)

    Table  7  Mean best fitness values (%)

    TechniquesFunctions
    F1F2F3F4F5F6F7F8F9F10
    本文算法${\bf 0.00}$${\bf 0.03}$0.300.35${\bf 0.02}$${\bf 0.01}$0.140.05${\bf 0.01}$0.82
    AFSA${\bf 0.00}$0.060.350.33${\bf 0.02}$0.050.13${\bf 0.01}$0.23${\bf 0.69}$
    ABCP0.010.05${\bf 0.07}$${\bf 0.10}$0.050.02${\bf 0.06}$0.100.471.06
    SC0.180.260.390.410.210.220.130.265.542.26
    NSM0.160.290.340.400.190.170.110.195.442.16
    SAC20.160.270.420.500.220.230.150.275.993.19
    SAC30.130.270.410.480.180.230.150.275.773.13
    SAC40.150.290.400.460.170.220.150.265.773.03
    SAC50.150.290.510.460.170.210.150.265.772.98
    CAC10.330.410.520.530.310.420.170.357.834.40
    CAC20.320.410.530.530.310.420.170.357.384.30
    CAC40.330.410.330.530.300.420.170.197.804.32
    SBS310.180.290.300.360.170.300.150.184.782.75
    SBS320.180.230.290.360.130.280.100.184.472.77
    SBS340.160.230.310.330.130.210.110.194.172.90
    SBS410.180.260.270.380.120.200.130.204.402.75
    SBS420.120.240.290.300.120.180.100.163.952.76
    SBS440.180.240.330.350.150.160.110.192.851.75
    SSC80.090.150.190.290.100.090.070.153.911.53
    SSC120.170.170.180.280.100.120.070.133.541.45
    SSC160.100.150.230.260.100.10${\bf 0.06}$0.143.111.22
    SSC200.100.180.230.300.090.10${\bf 0.06}$0.142.641.23
    下载: 导出CSV

    表  8  "Ⅴ"型函数建模的算法参数

    Table  8  Algorithm parameters for "V" function

    参数参数取值
    迭代次数50 000
    种群数量300
    首段长度25
    约束长度35
    罚因子0.05
    环状结构邻域大小$2r+1, r=2$
    终止符$x, 1, 0.2, 2, 4, 0.7, 0.1, 0.1415, 0.01, 3$
    函数及运算符$+, -, \times, /, \sqrt{~~}, \sin, \cos, {\rm e}^x, \ln, x^2$
    测试次数100
    下载: 导出CSV

    表  9  与文献[20]结果的比较

    Table  9  Comparison with results in reference [20]

    算法评价指标F1F2F3F11
    TSOAverage fitness00.090.630.26
    Standard deviation00.350.790.36
    Perfect hits in % 100925752
    Number of evaluations1 2204 6607 0009 020
    本文算法Average fitness000.040.23
    Standard deviation000.260.30
    Perfect hits in % 1001009773
    Number of evaluations6921 5896 6458 350
    下载: 导出CSV

    表  10  观测数据

    Table  10  Observation data

    $x$1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
    13, 14, 15, 16, 17, 18, 19, 20
    $y$0.250841, $-$0.373517, $-$1.957016, $-$2.699129, 1.619885
    4.684452, 2.686799, $-$0.501882, $-$4.718648, $-$7.787834
    0.056755, 9.363730, 6.761007, 0.824883, $-$5.821076
    $ -$12.512195, $-$5.349206, 12.234454, 12.324229, 3.656108
    下载: 导出CSV

    表  11  不同约束长度下的函数模型

    Table  11  Function models with different constrained length

    约束长度有效长度编码模型表达式
    2524$\times, \cos, \times, x, /, \cos, x, \sqrt{~~}, \cos, \exp, \cos, \sin, \sin, x, \sqrt{~~}, \cos, $$\dfrac{x\cdot \cos x}{\sqrt {{\rm e}^{\sin x}}}\cdot \cos(\cos(\cos(\sin(\sqrt{\cos(\dfrac{\sqrt{x}-x}{x^2})}))))$
    $/, -, \times, \sqrt{~~}, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x$
    2221$\times, \cos, /, x, x, \sqrt{~~}, \exp, +, \sin, \sin, \sin, x, \cos, /, \ln, \cos, \sqrt{~~}, $ $\dfrac{x\cdot \cos x}{\sqrt{{\rm exp}({\sin x}+ \sin(\sin(\cos(\dfrac{\ln{\sqrt {{\rm e}^{\cos x}}}}{\cos x}))))}}$
    $x, \exp, \cos, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x$
    2019$\times, /, x, \cos, \sqrt{~~}, x, \exp, +, \cos, \sin, \sin, x, \sqrt{~~}, \cos, /, x, +, $ $\dfrac{x\cdot \cos x}{\sqrt{{\rm exp}({\sin x}+ \cos(\sin(\sqrt{\cos(\dfrac{x}{2x})})))}}$
    $/x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x$
    1818$ /, \cos, /, x, \sqrt{~~}, x, +, \exp, -, \sin, \exp, \ln, x, \sin, /, x, x, $ $\dfrac{x\cdot \cos x}{\sqrt{{\rm e}^{\sin x}+{\rm e}^{\sin x}-\ln{(\dfrac{x}{x})}}}$
    $x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x$
    1513$/, \cos, /, x, \sqrt{~~}, x, +, \exp, \exp, \sin, \sin, x, x, x, x, x, \times, $ $\dfrac{x\cdot \cos x}{\sqrt{{\rm e}^{\sin x}+{\rm e}^{\sin x}}}$
    $x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x$
    下载: 导出CSV

    表  12  预测值(百万)

    Table  12  Predicted values (million)

    年份文献[31]本文算法二者之差
    2020125.325125.2380.087
    2030119.125119.9020.777
    2040110.919112.3851.466
    2050101.923103.4251.502
    206092.84093.8451.005
    下载: 导出CSV
  • [1] 刘强, 秦泗钊.过程工业大数据建模研究展望.自动化学报, 2016, 42(2): 161-171 doi: 10.16383/j.aas.2016.c150510

    Liu Qiang, Qin S. Joe. Perspectives on big data modeling of process industries. Acta Automatica Sinica, 2016, 42(2): 161-171 doi: 10.16383/j.aas.2016.c150510
    [2] 司小胜, 胡昌华, 周东华.带测量误差的非线性退化过程建模与剩余寿命估计.自动化学报, 2013, 39(5): 530-541 doi: 10.3724/SP.J.1004.2013.00530

    Si Xiao-Sheng, Hu Chang-Hua, Zhou Dong-Hua. Nonlinear degradation process modeling and remaining useful life estimation subject to measurement error. Acta Automatica Sinica, 2013, 39(5): 530-541 doi: 10.3724/SP.J.1004.2013.00530
    [3] Yao X, Liu Y, Li J, He J. Current developments and future directions of bio-inspired computation and implications for ecoinformatics. Ecological informatics, 2006, 1(1): 9-22 doi: 10.1016/j.ecoinf.2005.07.001
    [4] Koza J R. Genetic Programming: On the Programming of Computers by Means of Natural Selection. Cambridge: MIT Press, 1992.
    [5] Ferreira C. Gene Expression Programming: A New Adaptive Algorithm for Solving Problems. Computer Science, 2001, 21(2): 87-129
    [6] Yassin M A, Alazba A A, Mattar M A. A new predictive model for furrow irrigation infiltration using gene expression programming. Computers and Electronics in Agriculture, 2016, 122: 168-175 doi: 10.1016/j.compag.2016.01.035
    [7] Mohamed M. Mostafa, Ahmed A. El-Masry. Oil price forecasting using gene expression programming and artificial neural networks. Economic Modelling, 2016, 54(1): 40-53 http://www.sciencedirect.com/science/article/pii/S0264999315004101
    [8] Weatheritt J, Sandberg R. A novel evolutionary algorithm applied to algebraic modifications of the RANS stress-strain relationship. Journal of Computational Physics, 2016, 325: 22-37 doi: 10.1016/j.jcp.2016.08.015
    [9] Phukoetphim P, Shamseldin A Y, Adams K. Multimodel approach using neural networks and symbolic regression to combine the estimated discharges of rainfall-runoff models. Journal of Hydrologic Engineering, 2016, 21(8): 04016022 doi: 10.1061/(ASCE)HE.1943-5584.0001332
    [10] Razaq S A, Shahid S, Ismail T, et al. Prediction of flow duration curve in ungauged catchments using genetic expression programming. Procedia Engineering, 2016, 154: 1431-1438 doi: 10.1016/j.proeng.2016.07.516
    [11] Soule T. Code growth in genetic programming. In: Proceedings of the 1995 Conference Genetic Programming. Atlanta, GA, USA: ACM, 1995. 148-159
    [12] Ragalo A W, Pillay N. A building block conservation and extension mechanism for improved performance in polynomial symbolic regression tree-based genetic programming. In: Proceedings of the 2012 Nature and Biologically Inspired Computing. New York, USA: IEEE, 2012. 123-129
    [13] Sotto L F D P, de Melo V V. Investigation of linear genetic programming techniques for symbolic regression. In: Proceedings of the 2014 Brazilian Conference on Intelligent Systems. New York, USA: IEEE, 2014. 146-151
    [14] Peng Y Z, Yuan C A, Qin X, et al. An improved gene expression programming approach for symbolic regression problems. Neurocomputing, 2014, 137(15): 293-301 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=9b3fe6fd0d6560aabb8e480a7ce4be02
    [15] Zhong J, Ong Y S, Cai W. Self-learning gene expression programming. IEEE Transactions on Evolutionary Computation, 2016, 20(1): 65-80 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=13200aa4d92c48a51b9eb60989f98571
    [16] Olmo J L, Romero J R, Ventura S. Swarm-based metaheuristics in automatic programming: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2014, 4(6): 445-469 doi: 10.1002/widm.1138
    [17] Karaboga D, Ozturk C, Karaboga N, et al. Artificial bee colony programming for symbolic regression. Information Sciences, 2012, 209: 1-15 doi: 10.1016/j.ins.2012.05.002
    [18] Liu Q, Odaka T, Kuroiwa J, et al. Application of an artificial fish swarm algorithm in symbolic regression. IEICE Transactions on Information & Systems, 2013, E96. D(4): 872-885 http://www.researchgate.net/publication/274462480_Application_of_an_Artificial_Fish_Swarm_Algorithm_in_Symbolic_Regression
    [19] Qi F, Ma Y H, Liu X Y, Ji G Y. A hybrid genetic programming with particle swarm optimization. In: Proceedings of the 2013 International Conference in Swarm Intelligence. Berlin, Germany: Springer-Verlag, 2013. 11-18
    [20] Veenhuis C, Koppen M, Kruger J, Nickolay B. Tree swarm optimization: An approach to PSO-based tree discovery. In: Proceedings of the 2005 IEEE Congress on Evolutionary Computation. New York, USA: IEEE, 2005. 1238-1245
    [21] Kennedy J, Eberhart R C. Particle swam optimization. In: Proceedings of the 1995 Neural Networks. Picataupy, USA: IEEE, 1995. 1942-1948
    [22] Zhang S, Xu J, Lee L H, et al. Optimal computing budget allocation for particle swarm optimization in stochastic optimization. IEEE Transactions on Evolutionary Computation, 2017, 21(2): 206-219 doi: 10.1109/TEVC.2016.2592185
    [23] Wu H Y, Nie C H, Kuo F C, Leung H, Colbourn C J. A discrete particle swarm optimization for covering array generation. IEEE Transactions on Evolutionary Computation, 2015, 19(4): 575-591 doi: 10.1109/TEVC.2014.2362532
    [24] Gong Y J, Zhang J, Chung H S H, et al. An efficient resource allocation scheme using particle swarm optimization. IEEE Transactions on Evolutionary Computation, 2012, 16(6): 801-816 doi: 10.1109/TEVC.2012.2185052
    [25] Lee K B, Kim J H. Multiobjective particle swarm optimization with preference-based sort and its application to path following footstep optimization for humanoid robots. IEEE Transactions on Evolutionary Computation, 2013, 17(6): 755-766 doi: 10.1109/TEVC.2013.2240688
    [26] Thongchart Kerdphol, Kiyotaka Fuji, Yasunori Mitani, Yaser Qudaih. Optimization of a battery energy storage system using particle swarm optimization for stand-alone microgrid. International Journal of Electrical Power & Energy Systems, 2016, 81: 32-39 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=03d5dc3e6af2f9033ef9603d73ebbbc0
    [27] Iba H, Garis H D, Sato T. Genetic programming using a minimum description length principle. Advances in Genetic Programming, 1994: 265-284 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=1611327e61e1eed6ae6e76abb3a46aa1
    [28] Liu Q F, Wei W H, Yuan H Q, Li Y. Topology selection for particle swarm optimization. Information Sciences, 2016, 363: 154-173 doi: 10.1016/j.ins.2016.04.050
    [29] 周晓君, 阳春华, 桂卫华, 董天雪.带可变随机函数和变异算子的粒子群优化算法.自动化学报, 2014, 40(7): 1339-1347 doi: 10.3724/SP.J.1004.2014.01339

    Zhou Xiao-Jun, Yang Chun-Hua, Gui Wei-Hua, Dong Tian-Xue. A particle swarm optimization algorithm with variable random functions and mutation. Acta Automatica Sinica, 2014, 40(7): 1339-1347 doi: 10.3724/SP.J.1004.2014.01339
    [30] Uy N Q, Hoai N X, O'Neill M, et al. Semantically-based crossover in genetic programming: Application to real-valued symbolic regression. Genetic Programming and Evolvable Machines, 2011, 12(2): 91-119 doi: 10.1007/s10710-010-9121-2
    [31] Population Statistics of Japan 2017[Online], available: http://www.ipss.go.jp/p-info/e/psj2017/PSJ2017.asp, October 9, 2018
  • 加载中
图(11) / 表(12)
计量
  • 文章访问数:  1477
  • HTML全文浏览量:  263
  • PDF下载量:  147
  • 被引次数: 0
出版历程
  • 收稿日期:  2018-01-16
  • 录用日期:  2018-10-06
  • 刊出日期:  2020-08-26

目录

    /

    返回文章
    返回