-
摘要: 针对K2算法过度依赖节点序和节点序搜索算法评价节点序效率较低的问题, 提出一种基于节点块序列约束的局部贝叶斯网络结构搜索算法, 该算法首先通过评分定向构建定向支撑树结构, 在此基础上构建节点块序列, 然后利用节点块序列确定每个节点的潜在父节点集, 通过搜索每个节点的父节点集构建网络结构, 最后对该结构进行非法结构修正得到最优贝叶斯网络结构.利用标准网络将算法与几种不同类型的改进算法进行对比分析, 验证该算法的有效性.Abstract: The performance of the K2 algorithm depends on node ordering heavily. The Bayesian learning algorithm based on node order search needs to evaluate the quality of the node sequence by K2 algorithm, resulting in the problem of low efficiency in large and medium-sized networks. In this paper, a new Bayesian structure learning algorithm is proposed to solve the BN structure learning problem by searching local Bayesian network structure construction using node chunk sequence constraints. The algorithm firstly constructs a directional maximum weight spanning tree structure by using score orientation and generates the node chunk sequence by this structure. Then, the node chunk sequence is used to determine the potential parent node set of each node. The network structure is built by searching the parent node set of each node, and the structure is modified by illegal structure to get the optimal Bayesian network structure. Finally, some experiments are designed to evaluate the performance of the proposed algorithm, in which the standard network is used to compare the algorithm with several different improved algorithms to verify the effectiveness of the algorithm. The results indicate that the proposed algorithm is worthy of being studied in the field of BNs construction.
-
Key words:
- Bayesian network structure learning /
- directional maximum weight spanning tree /
- the node chunk sequence /
- K2 algorithm
1) 本文责任编委 黎铭 -
表 1 标准贝叶斯网络的参数
Table 1 Parameters of standard Bayesian networks
网络 节点数 边数 变量域 最大节点出入度 Alarm 37 46 2~4 6 Insurance 27 52 2~5 9 Hailfinder 56 66 2~11 17 表 2 标准贝叶斯网络结构平均BIC得分
Table 2 Average BIC score in standard Bayesian network structure
网络 1 000 2 000 3 000 5 000 Alarm -10 874.35 -20 382.51 -30 057.28 -48 056.66 Insurance -15 998.20 -30 103.21 -43 863.15 -73 069.35 Hailfinder -67 046.62 -126 837.31 -176 348.73 -279 766.12 表 3 NCSC算法与标准节点序的K2算法在Alarm网络中运行结果
Table 3 Results of NCSC algorithm and standard node sequence K2 algorithm in Alarm network
Alarm BIC Ext (s) 结构 C M R A W 1 000 NCSC -11 410.83 32.98±1.82 41.8 2.1 2.8 6.5 11.4 K2 -11 189.51 10.27±0.61 44.2 1.8 0 4.3 6.1 2 000 NCSC -20 791.86 42.14±1.88 41.7 1.9 2.2 6.2 10.3 K2 -20 605.99 11.53±0.47 44.5 1.5 0 2.7 4.2 3 000 NCSC -31 025.28 45.99±5.93 42.9 1.3 2.2 4.2 7.7 K2 -30 505.01 14.04±1.95 44.9 1.1 0 3.1 4.2 5 000 NCSC -49 812.61 56.25±2.13 43.2 1.2 1.1 4.7 7 K2 -49 235.77 16.51±0.74 45.1 1.0 0.0 3.1 4.1 表 4 NCSC算法与标准节点序的K2算法在Insurance网络中运行结果
Table 4 Results of NCSC algorithm and standard node sequence K2 algorithm in Insurance network
Insurance BIC Ext (s) 结构 C M R A W 1 000 NCSC -16 486.31 17.28±0.34 38.3 11.9 2.3 2.6 16.8 K2 -16 219.71 4.58±0.25 39.7 12.3 0 0.3 12.6 2 000 NCSC -30 756.3 19.61±1.13 39.1 10.7 2.4 3.2 16.3 K2 -30 544.86 5.11±0.31 42.2 9.8 0 0.5 10.3 3 000 NCSC -45 369.23 24.34±3.22 39.8 10.2 2.3 3.3 15.8 K2 -44 748.8 6.32±0.40 42.6 9.4 0 0.5 9.9 5 000 NCSC -73 566.77 29.92±2.91 40.6 9.4 2.3 3.2 14.9 K2 -73 148.28 7.68±0.45 43.8 8.2 0 0.6 8.8 表 5 NCSC算法与标准节点序的K2算法在Hailfinder网络中运行结果
Table 5 Results of NCSC algorithm and standard node sequence K2 algorithm in Hailfinder network
Insurance BIC Ext (s) 结构 C M R A W 1 000 NCSC -78 396.62 232.18±16.31 47.8 16.3 2.8 9.7 29.3 K2 -78 157.99 25.71±1.55 48.3 17.7 0 8.1 25.8 2 000 NCSC -131 604.2 250.58±7.94 50.2 13.9 1.6 10.1 25.6 K2 -130 945.5 30.22±1.51 51.6 14.4 0 9.1 23.5 3 000 NCSC -182 167.2 292.22±20.01 51.5 13.4 1.4 10.3 25.1 K2 -181 831.9 36.58±1.84 52.9 13.1 0 9.5 22.6 5 000 NCSC -283 288.4 356.14±16.77 51.8 12.8 1.8 10.1 24.7 K2 -282 937.2 43.37±1.83 53.5 12.5 0 9.9 22.4 表 6 五种算法在Alarm网络中运行时间(s)
Table 6 Running time of five algorithms in Alarm network (s)
Alarm 1 000 2 000 3 000 5 000 NCSC 32.98±1.82 42.14±1.88 45.99±2.04 56.25±2.13 K2+T 11.72±0.74 14.82±1.62 16.33±1.25 18.82±2.71 MAK 942.60±31.4 1 153.37±32.37 1 299.01±42.55 1 582.35±46.86 SHC 3 474.01±90.80 4 079.66±121.98 4 784.10±156.85 6 013.93±216.64 SAR 46.77±0.81 49.64±3.52 62.30±3.38 77.39±10.91 表 7 五种算法在Insurance网络中运行时间
Table 7 Running time of five algorithms in Insurance network (s)
Insurance 1 000 2 000 3 000 5 000 NCSC 17.28±0.34 19.61±1.13 24.34±3.22 29.92±2.91 K2+T 4.55±0.41 6.03±1.20 7.17±0.83 9.42±2.59 MAK 388.99±12.41 420.61±27.73 481.50±32.86 694.92±30.30 SHC 4 051.46±123.82 5 521.37±179.04 5 701.31±207.10 7 072.01±241.25 SAR 19.38±0.90 19.95±1.55 31.67±4.81 40.60±5.95 表 8 五种算法在Hailfinder网络中运行时间(s)
Table 8 Running time of five algorithms in Hailfinder network (s)
Hailflnder 1 000 2 000 3 000 5 000 NCSC 232.18±6.31 250.58±7.94 292.22±10.73 356.14±14.86 K2+T 28.54±0.92 32.82±0.93 39.21±1.16 49.38±1.76 MAK 2 203.76±61.58 2 611.30±66.94 3 100.82±96.87 3 854.99±120.45 SHC 18 273.69±468.55 23 500.71±318.11 29 255.29±380.52 35 785.38±420.00 SAR 229.07±7.82 264.35±15.53 307.52±12.35 362.27±20.55 -
[1] Tien I, Der Kiureghian A. Algorithms for Bayesian network modeling and reliability assessment of infrastructure systems. Reliability Engineering and System Safety, 2016, 156: 134-147 doi: 10.1016/j.ress.2016.07.022 [2] Gendelman R, Xing H M, Mirzoeva O K, Sarde P, Curtis C, Feiler H S, et al. Bayesian network inference modeling identifies TRIB1 as a novel regulator of cell-cycle progression and survival in cancer cells. Cancer Research, 2017, 77(7): 1575-1585 doi: 10.1158/0008-5472.CAN-16-0512 [3] Lee D, Pan R. A nonparametric Bayesian network approach to assessing system reliability at early design stages. Reliability Engineering and System Safety, 2018, 171: 57-66 doi: 10.1016/j.ress.2017.11.009 [4] Chaturvedi I, Ragusa E, Gastaldo P, Zunino R, Cambria E. Bayesian network based extreme learning machine for subjectivity detection. Journal of the Franklin Institute, 2018, 355(4): 1780-1797 doi: 10.1016/j.jfranklin.2017.06.007 [5] Khakzad N, Van Gelder P. Vulnerability of industrial plants to flood-induced natechs: a Bayesian network approach. Reliability Engineering and System Safety, 2018, 169: 403- 411 doi: 10.1016/j.ress.2017.09.016 [6] 刘建伟, 黎海恩, 罗雄麟.概率图模型学习技术研究进展.自动化学报, 2014, 40(6): 1025-1044 doi: 10.3724/SP.J.1004.2014.01025Liu Jian-Wei, Li Hai-En, Luo Xiong-Lin. Learning technique of probabilistic graphical models: a review. Acta Automatica Sinica, 2014, 40(6): 1025-1044 doi: 10.3724/SP.J.1004.2014.01025 [7] Contaldi C, Vafaee F, Nelson P C. Bayesian network hybrid learning using an elite-guided genetic algorithm. Artificial Intelligence Review, 2019, 52: 245-272 doi: 10.1007/s10462-018-9615-5 [8] Chow C, Liu C. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 1968, 14(3): 462-467 doi: 10.1109/TIT.1968.1054142 [9] Koopman R, Wang S H. Mutual information based labelling and comparing clusters. Scientometrics, 2017, 111(2): 1157 -1167 doi: 10.1007/s11192-017-2305-2 [10] Pearl J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Francisco, CA: Morgan Kaufmann Publishers, 1988. [11] Jiang J K, Wang J Y, Yu H, Xu H J. Poison identification based on Bayesian network: a novel improvement on K2 algorithm via Markov blanket. In: Proceedings of the 2013 Advances in Swarm Intelligence. Lecture Notes in Computer Science. Berlin, Germany: Heidelberg: Springer, 2013. 173- 182 [12] Cooper G F, Herskovits E. A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 1992, 9(4): 309-347 [13] Chen X W, Anantha G, Lin X T. Improving Bayesian network structure learning with mutual information-based node ordering in the K2 algorithm. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(5): 628-640 doi: 10.1109/TKDE.2007.190732 [14] Ko S, Kim D W. An efficient node ordering method using the conditional frequency for the K2 algorithm. Pattern Recognition Letters, 2014, 40: 80-87 doi: 10.1016/j.patrec.2013.12.021 [15] Leray P, Francois O. BNT Structure Learning Package: Documentation and Experiments, Technical Report, Laboratoire PSI, 2006. [16] Faulkner E. K2GA: heuristically guided evolution of Bayesian network structures from data. In: Proceedings of the 2007 IEEE Symposium on Computational Intelligence and Data Mining. Honolulu, HI, USA: IEEE, 2007. 18-25 [17] Wu Y H, McCall J, Corne D. Two novel ant colony optimization approaches for Bayesian network structure learning. In: Proceedings of the 2010 IEEE Congress on Evolutionary Computation. Barcelona, Spain: IEEE, 2010. 1-7 [18] Aouay S, Jamoussi S, Ben Ayed Y. Particle swarm optimization based method for Bayesian network structure learning. In: Proceedings of the 5th International Conference on Modeling, Simulation, and Applied Optimization. Hammamet, Tunisia: IEEE, 2013. 1-6 [19] 刘浩然, 孙美婷, 李雷, 刘永记, 刘彬.基于蚁群节点寻优的贝叶斯网络结构算法研究.仪器仪表报, 2017, 38(1): 143-150 http://d.old.wanfangdata.com.cn/Periodical/yqyb201701019Liu Hao-Ran, Sun Mei-Ting, Li Lei, Liu Yong-Ji, Liu Bin. Study on Bayesian network structure learning algorithm based on ant colony node order optimization. Chinese Journal of Scientific Instrument, 2017, 38(1): 143-150 http://d.old.wanfangdata.com.cn/Periodical/yqyb201701019 [20] Malone B, Kangas K, Järvisalo M, Koivisto M, Myllymäki P. Empirical hardness of finding optimal Bayesian network structures: algorithm selection and runtime prediction. Machine Learning, 2018, 107(1): 247-283 doi: 10.1007/s10994-017-5680-2 [21] 王双成, 冷翠平, 李小琳.小数据集的贝叶斯网络结构学习.自动化学报, 2009, 35(8): 1063-1070 doi: 10.3724/SP.J.1004.2009.01063Wang Shuang-Cheng, Leng Cui-Ping, Li Xiao-Lin. Learning Bayesian network structure from small data set. Acta Automatica Sinica, 2009, 35(8): 1063-1070 doi: 10.3724/SP.J.1004.2009.01063 [22] Schwarz G. Estimating the dimension of a model. The Annals of Statistics, 1978, 6(2): 461-464 http://d.old.wanfangdata.com.cn/OAPaper/oai_doaj-articles_97aaabbf8da8b3a5ff9b5d690a7fdbc3 [23] Murphy K. The Bayes net toolbox for MATLAB. Computing Science and Statistics, 2001, 33: 1024-1034 http://www.researchgate.net/publication/2413249_The_Bayes_Net_Toolbox_for_Matlab [24] Wang J Y, Liu S Y. Novel binary encoding water cycle algorithm for solving Bayesian network structures learning problem. Knowledge-Based Systems, 2018, 150: 95-110 doi: 10.1016/j.knosys.2018.03.007 [25] 刘浩然, 吕晓贺, 李轩, 李世昭, 史永红.基于Bayesian改进算法的回转窑故障诊断模型研究.仪器仪表学报, 2015, 36(7): 1554- 1561 doi: 10.3969/j.issn.0254-3087.2015.07.014Liu Hao-Ran, Lv Xiao-He, Li Xuan, Li Shi-Zhao, Shi Yong-Hong. A study on the fault diagnosis model of rotary kiln based on an improved algorithm of Bayesian. Chinese Journal of Scientific Instrument, 2015, 36(7): 1554-1561 doi: 10.3969/j.issn.0254-3087.2015.07.014 [26] Liu H, Zhou S G, Lam W, Guan J H. A new hybrid method for learning Bayesian networks: separation and reunion. Knowledge-Based Systems, 2017, 121: 185-197 doi: 10.1016/j.knosys.2017.01.029