2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于主动−被动增量集成的概念漂移适应方法

祁晓博 陈佳明 史颖 亓慧 郭虎升 王文剑

祁晓博, 陈佳明, 史颖, 亓慧, 郭虎升, 王文剑. 基于主动−被动增量集成的概念漂移适应方法. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240503
引用本文: 祁晓博, 陈佳明, 史颖, 亓慧, 郭虎升, 王文剑. 基于主动−被动增量集成的概念漂移适应方法. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240503
Qi Xiao-Bo, Chen Jia-Ming, Shi Ying, Qi Hui, Guo Hu-Sheng, Wang Wen-Jian. Concept drift adaptive method based on active-passive incremental ensemble. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240503
Citation: Qi Xiao-Bo, Chen Jia-Ming, Shi Ying, Qi Hui, Guo Hu-Sheng, Wang Wen-Jian. Concept drift adaptive method based on active-passive incremental ensemble. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240503

基于主动−被动增量集成的概念漂移适应方法

doi: 10.16383/j.aas.c240503 cstr: 32138.14.j.aas.c240503
基金项目: 国家自然科学基金(62476157, U21A20513, 62076154, 62276157), 山西省专利转化专项计划项目(202302009, 202302012), 山西省基础研究计划(自由探索类)项目(20210302123334), 太原师范学院成果转化与技术转移基地(2023P003) 资助
详细信息
    作者简介:

    祁晓博:太原师范学院计算机科学与技术学院副教授. 主要研究方向为数据挖掘与机器学习. E-mail: xbqi@tynu.edu.cn

    陈佳明:太原师范学院计算机科学与技术学院硕士研究生. 主要研究方向为数据挖掘与机器学习. E-mail: chenjiaming1023@163.com

    史颖:山西大学计算机与信息技术学院博士研究生. 主要研究方向为图像处理与机器学习. E-mail: sy@tynu.edu.cn

    亓慧:太原师范学院计算机科学与技术学院教授. 主要研究方向为数据挖掘与机器学习. E-mail: qihui@tynu.edu.cn

    郭虎升:山西大学计算机与信息技术学院教授. 主要研究方向为数据挖掘与计算智能. E-mail: guohusheng@sxu.edu.cn

    王文剑:山西大学计算智能与中文信息处理教育部重点实验室教授. 主要研究方向为数据挖掘与机器学习. 本文通信作者. E-mail: wjwang@sxu.edu.cn

Concept Drift Adaptive Method Based on Active-passive Incremental Ensemble

Funds: Supported by National Natural Science Foundation of China (62476157, U21A20513, 62076154, 62276157), the Shanxi Province Patent Transformation Special Programs (202302009, 202302012), the Basic Research Program (Free Exploration) of Shanxi Province (20210302123334), and Taiyuan Normal University Achievement Transformation and Technology Transfer Base (2023P003)
More Information
    Author Bio:

    QI Xiao-Bo Associate professor at the School of Computer Science and Technology, Taiyuan Normal University. Her research interest covers data mining and machine learning

    CHEN Jia-Ming Master student at the School of Computer Science and Technology, Taiyuan Normal University. His research interest covers data mining and machine learning

    SHI Ying Ph.D. candidate at the School of Computer and Information Technology, Shanxi University. Her research interest covers image processing and machine learning

    QI Hui Professor at the School of Computer Science and Technology, Taiyuan Normal University. Her research interest covers data mining and machine learning

    GUO Hu-Sheng Professor at the School of Computer and Information Technology, Shanxi University. His research interest covers data mining and computational intelligence

    WANG Wen-Jian Professor at the Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University. Her research interest covers data mining and machine learning. Corresponding author of this paper

  • 摘要: 数据流是一组随时间无限到来的数据序列, 在数据流不断产生过程中, 由于各种因素的影响, 数据分布随时间推移可能以不可预测的方式发生变化, 这种现象被称为概念漂移. 在漂移发生后, 当前学习模型需要及时响应数据流中的实时分布变化, 并有效处理不同类型的概念漂移, 从而避免模型泛化性能下降. 针对这一问题, 提出了一种基于主动-被动增量集成的概念漂移适应方法(Concept drift adaptation method based on active-passive incremental ensemble, CDAM-APIE). 该方法首先使用在线增量集成策略构建被动集成模型, 对新样本进行实时预测以动态更新基模型权重, 有利于快速响应数据分布的瞬时变化, 并增强模型适应概念漂移的能力. 在此基础上, 利用增量学习和概念漂移检测技术构建主动基模型, 提升模型在平稳数据流状态下的鲁棒性和漂移后的泛化性能. 实验结果表明, CDAM-APIE能够对概念漂移做出及时响应, 同时有效提高模型的泛化性能.
  • 图  1  CDAM-APIE整体框架图

    Fig.  1  The overall framework of CDAM-APIE

    图  2  四种类型的概念漂移

    Fig.  2  Four types of concept drift

    图  3  被动集成模型的过程

    Fig.  3  Process of passive incremental ensemble

    图  4  主动基模型的过程

    Fig.  4  Process of active base model

    图  5  不同方法的累积精度

    Fig.  5  Cumulative accuracy of different methods

    图  6  不同方法的实时精度

    Fig.  6  Real-time accuracy of different methods

    图  7  不同方法平均实时精度的Bonferroni-Dunn检验结果

    Fig.  7  Bonferroni-Dunn test for average real-time accuracy of different methods

    图  8  不同方法的鲁棒性比较

    Fig.  8  Comparison of the robustness of different methods

    图  9  不同方法的平均排名(平均值±标准差)

    Fig.  9  Average ranking of different methods (mean ± standard deviation)

    表  1  实验所用数据集

    Table  1  Datasets used in experiment

    数据集特征个数类别个数样本个数漂移类型漂移次数漂移位点
    Hyperplane102100k增量
    Sea32100k渐变325k,50k,75k
    Sea-re32100k重复325k,50k,75k
    LED-gradual2410100k渐变325k,50k,75k
    LED-abrupt2410100k突变150k
    RBFblips204100k突变325k,50k,75k
    Tree3010100k突变325k,50k,75k
    Sine42100k重复325k,50k,75k
    KDDcup994123494k
    Electricity6245k
    Covertype547581k
    Weather9395k
    下载: 导出CSV

    表  2  不同方法在各数据集上的平均实时精度

    Table  2  Average real-time accuracy on different methods on every dataset

    数据集AWEOza BaggingDWMOOBAC_OEATNNCDAM-APIE(本文)
    Hyperplane0.8882(4)0.8758(5)0.9029(2)0.8223(6)0.8966(3)0.8195(7)0.9088(1)
    Sea0.8335(3)0.8159(4)0.8410(2)0.7754(7)0.8027(5)0.7871(6)0.8432(1)
    Sea-re0.8564(4)0.8596(2)0.8581(3)0.8030(7)0.8055(6)0.8166(5)0.8605(1)
    LED-gradual0.6055(4)0.5979(5)0.5022(7)0.6163(3)0.5054(6)0.6282(2)0.6330(1)
    LED-abrupt0.5944(5)0.6075(3)0.4918(7)0.5948(4)0.5178(6)0.6147(2)0.6240(1)
    RBFblips0.8208(5)0.8852(3)0.7861(6)0.7811(7)0.9316(2)0.9855(1)0.8309(4)
    Tree0.3630(7)0.4982(6)0.6449(3)0.6938(2)0.5480(5)0.6300(4)0.8072(1)
    Sine0.9331(4)0.7489(7)0.9363(3)0.8595(6)0.9155(5)0.9381(1)0.9374(2)
    KDDcup990.9796(4)0.9920(2)0.9793(5)0.9913(3)0.9446(7)0.9589(6)0.9926(1)
    Electricity0.7678(7)0.7928(5)0.8153(3)0.8110(4)0.7919(6)0.8912(1)0.8300(2)
    Covertype0.2288(7)0.8735(2)0.8135(4)0.8052(5)0.7813(6)0.9362(1)0.8400(3)
    Weather0.8893(7)0.9952(2)0.9941(3)0.9862(4)0.9069(6)0. 9616(5)0.9956(1)
    平均排名5.13.84.04.85.33.41.6
    下载: 导出CSV

    表  3  不同方法的恢复速率

    Table  3  Recovery speed under accuracy of different methods

    漂移位点数据集AWEOza BaggingDWMOOBAC_OEATNNCDAM-APIE(本文)
    25kSea0.480.520.460.530.760.560.46
    LED-gradual1.151.200.721.091.331.121.16
    RBFblips0.290.380.280.370.170.070.15
    Tree3.601.771.390.891.291.330.58
    平均排名4.85.82.83.85.03.52.3
    50kSea0.200.530.191.170.450.360.17
    LED-gradual0.580.570.630.580.610.540.53
    LED-abrupt1.631.361.351.361.281.661.47
    RBFblips0.860.361.261.170.520.040.87
    Tree0.891.551.601.161.441.560.89
    平均排名3.63.85.04.63.84.02.6
    75kSea0.520.440.330.610.470.200.33
    LED-gradual1.090.440.511.061.230.990.47
    RBFblips0.110.130.240.190.070.020.08
    Tree2.201.741.042.621.441.180.76
    平均排名5.53.83.56.34.52.32.0
    下载: 导出CSV

    表  4  消融效果分析

    Table  4  Analysis of ablation effect

    数据集基模型被动集成模型主动基模型CDAM-APIE
    Hyperplane0.86030.89040.87160.9088
    Sea0.81180.83150.83330.8432
    Sea-re0.85540.85050.85170.8605
    LED-gradual0.58590.62660.62170.6330
    LED-abrupt0.60570.61540.61970.6240
    RBFblips0.84300.78250.80970.8309
    Tree0.49020.78730.79170.8072
    Sine0.65620.89410.93720.9374
    KDDcup990.99040.97860.99220.9926
    Electricity0.78280.81170.80920.8300
    Covertype0.82470.81780.82460.8400
    Weather0.99530.97710.99290.9956
    下载: 导出CSV

    表  5  CDAM-APIE在不同参数下的平均实时精度

    Table  5  Average real-time accuracy of CDAM-APIE under different parameters

    固定数据单元$k$$50$$100$$150$
    权重衰退率$\beta$0.800.850.900.950.800.850.900.950.800.850.900.95
    Hyperplane0.90050.90200.90460.90760.90500.90620.90760.90880.90870.90990.91060.9113
    Sea0.83880.84000.84130.84200.84170.84230.84300.84320.84310.84320.84310.8427
    Sea-re0.85730.85840.85910.86000.85960.85990.86030.86050.86040.86070.86090.8606
    LED-gradual0.63090.63130.63170.63200.63140.63190.63280.63300.63480.63530.63580.6362
    LED-abrupt0.62200.62230.62300.62340.62290.62330.62370.62400.62280.62350.62410.6243
    RBFblips0.83660.83690.83780.83680.83080.83140.83120.83090.85030.85030.85050.8500
    Tree0.80730.80790.80860.80890.80660.80680.80710.80720.79600.79610.79630.7970
    Sine0.93720.93720.93720.93720.93710.93730.93710.93740.93780.93790.93810.9381
    KDDcup990.99220.99220.99220.99220.99220.99220.99240.99260.99310.99310.99310.9932
    Electricity0.85480.85150.84820.84160.84010.83850.83580.83000.82050.82050.82070.8204
    Covertype0.85690.85430.85100.84600.84550.84440.84260.84000.84510.84480.84410.8415
    Weather0.99460.99450.99470.99560.99560.99570.99560.99560.99510.99520.99520.9953
    总体标准差$0.1210$0.1208$0.1210$
    下载: 导出CSV
  • [1] Din S, Yang Q, Shao J, Mawuli C, Ullah A, Ali W. Synchronization-based semi-supervised data streams classification with label evolution and extreme verification delay. Information Sciences, 2024, 678: Article No. 120933 doi: 10.1016/j.ins.2024.120933
    [2] Liao G, Zhang P, Yin H, Deng X, Li Y, Zhou H, et al. A novel semi-supervised classification approach for evolving data streams. Expert Systems With Applications, 2023, 215: Article No. 119273 doi: 10.1016/j.eswa.2022.119273
    [3] Zheng X, Li P, Wu X. Data Stream Classification Based on Extreme Learning Machine: A Review. Big Data Research, 2022, 30: Article No. 100356 doi: 10.1016/j.bdr.2022.100356
    [4] Agrahari S, Singh A. Concept drift detection in data stream mining: A literature review. Journal of King Saud University-Computer and Information Sciences, 2021, 34(10): 9523−9540
    [5] Krempl G, Zliobaite I, Brzezinski D, Hullermeier E, Last M, Lemaire V, et al. Open challenges for data stream mining research. ACM SIGKDD Explorations Newsletter, 2014, 16(1): 1−10 doi: 10.1145/2674026.2674028
    [6] Lughofer E, Pratama M. Online active learning in data stream regression using uncertainty sampling based on evolving generalized fuzzy models. IEEE Transactions on Fuzzy Systems, 2018, 26(1): 292−309 doi: 10.1109/TFUZZ.2017.2654504
    [7] 翟婷婷, 高阳, 朱俊武. 面向流数据分类的在线学习综述. 软件学报, 2020, 31(4): 912−931

    Zhai Ting-Ting, Gao Yang, Zhu Jun-Wu. Survey of online learning algorithms for streaming data classification. Journal of Software, 2020, 31(4): 912−931
    [8] Li H, Zhao T. A dynamic similarity weighted evolving fuzzy system for concept drift of data streams. Information Sciences, 2024, 659: Article No. 120062 doi: 10.1016/j.ins.2023.120062
    [9] 杜航原, 王文剑, 白亮. 一种基于优化模型的演化数据流聚类方法. 中国科学: 信息科学, 2017, 47(11): 1464−1482 doi: 10.1360/N112017-00107

    Du Hang-Yuan, Wang Wen-Jian, Bai Liang. A novel evolving data stream clustering method based on optimization model. Scientia Sinica: Informationis, 2017, 47(11): 1464−1482 doi: 10.1360/N112017-00107
    [10] Wang P, Jin N, Davies D, Woo W. Model-centric transfer learning framework for concept drift detection. Knowledge-Based Systems, 2023, 275: Article No. 110705 doi: 10.1016/j.knosys.2023.110705
    [11] 郭虎升, 张爱娟, 王文剑. 基于在线性能测试的概念漂移检测方法. 软件学报, 2020, 31(4): 932−947

    Guo Hu-Sheng, Zhang Ai-Juan, Wang Wen-Jian. Concept drift detection method based on online performance test. Journal of Software, 2020, 31(4): 932−947
    [12] Karimian M, Beigy H. Concept drift handling: A domain adaptation perspective. Expert Systems with Applications, 2023, 224: Article No. 119946 doi: 10.1016/j.eswa.2023.119946
    [13] Wozniak M, Zyblewski P, Ksieniewicz P. Active Weighted Aging Ensemble for drifted data stream classification. Information Sciences, 2023, 630: 286−304 doi: 10.1016/j.ins.2023.02.046
    [14] Cherif A, Badhib A, Ammar H, Alshehri S, Kalkatawi M, Imine A. Credit card fraud detection in the era of disruptive technologies: a systematic review. Journal of King Saud University-Computer and Information Sciences, 2023, 35(1): 145−174 doi: 10.1016/j.jksuci.2022.11.008
    [15] Halstead B, Koh Y, Riddle P, Pears P, Pechenizkiy M, Bifet A, et al. Analyzing and repairing concept drift adaptation in data stream classification. Machine Learning, 2022, 111(10): 3489−3523 doi: 10.1007/s10994-021-05993-w
    [16] Jiao B, Guo Y, Gong D, Chen Q. Dynamic ensemble selection for imbalanced data streams with concept drift. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(1): 1278−1291 doi: 10.1109/TNNLS.2022.3183120
    [17] Liu N, Zhao J. Streaming Data Classification Based on Hierarchical Concept Drift and Online Ensemble. IEEE Access, 2023, 11: 126040−126051 doi: 10.1109/ACCESS.2023.3327637
    [18] Wilson J, Chaudhury S, Lall B. Homogeneous–Heterogeneous Hybrid Ensemble for concept-drift adaptation. Neurocomputing, 2023, 557: Article No. 126741 doi: 10.1016/j.neucom.2023.126741
    [19] Gama J, Medas P, Castillo G, Rodrigues P. Learning with drift detection. In: Proceedings of the 17th Brazilian Symposium on Artificial Intelligence. Maranhao, Brazil: Springer, 2004. 286−295
    [20] Hinder F, Artelt A, Hammer B. Towards non-parametric drift detection via dynamic adapting window independence drift detection (DAWIDD). In: Proceedings of the 37th International Conference on Machine Learning. New York, USA: PMLR, 2020. 4249−4259
    [21] Wen Y, Liu X, Yu H. Adaptive tree-like neural network: Overcoming catastrophic forgetting to classify streaming data with concept drifts. Knowledge-Based Systems, 2024, 293: Article No. 111636 doi: 10.1016/j.knosys.2024.111636
    [22] Pratama M, Pedrycz W, Lughofer E. Evolving ensemble fuzzy classifier. IEEE Transactions on Fuzzy Systems, 2018, 26 (5): 2552−2567
    [23] Street W, Kim Y. A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the 7th ACM SIGKDD international conference on Knowledge discovery and data mining. New York, USA: ACM, 2001. 377−382
    [24] Wang H, Fan W, Yu P, Han J. Mining Concept-Drifting and Noisy Data Streams Using Ensemble Classifiers. In: Proceedings of the 9th ACM SIGKDD international conference on Knowledge discovery and data mining. New York, USA: ACM, 2003. 226−235
    [25] Weinberg A, Last M. EnHAT-Synergy of a tree-based Ensemble with Hoeffding Adaptive Tree for dynamic data streams mining. Information Fusion, 2023, 89: 397−404 doi: 10.1016/j.inffus.2022.08.026
    [26] Oza N, Russell S. Experimental comparisons of online and batch versions of bagging and boosting. In: Proceedings of the 7 ACM SIGKDD international conference on Knowledge discovery and data mining. New York, USA. ACM, 2001: 359−364
    [27] Kolter J, Maloof M. Dynamic weighted majority: an ensemble method for drifting concepts. Journal of Machine Learning Research, 2007, 8 (12): 2755−2790
    [28] 郭虎升, 丛璐, 高淑花, 王文剑. 基于在线集成的概念漂移自适应分类方法. 计算机研究与发展, 2023, 60(07): 1592−1602

    Guo Hu-Sheng, Cong Lu, Gao Shu-Hua, Wang Wen-Jian. Adaptive classification method for concept drift based on online ensemble. Journal of Computer Research and Development, 2023, 60(07): 1592−1602
    [29] Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A. A survey on concept drift adaptation. ACM Computing Surveys, 2014, 46 (4): 1−37
    [30] Wang B, Pineau J. Online bagging and boosting for imbalanced data streams. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(12): 3353−3366 doi: 10.1109/TKDE.2016.2609424
    [31] 赵鹏, 周志华. 基于决策树模型重用的分布变化流数据学习. 中国科学: 信息科学, 2021, 51(1): 1−12 doi: 10.1360/SSI-2020-0170

    Zhao Peng, Zhou Zhi-Hua. Learning from distribution-changing data streams via decision tree model reuse. Scientia Sinica: Informationis, 2021, 51(1): 1−12 doi: 10.1360/SSI-2020-0170
    [32] Pereira D, Afonso A, Medeiros F. Overview of Friedman's test and post-hoc analysis. Communications in Statistics-Simulation and Computation, 2015, 44(10): 2636−2653 doi: 10.1080/03610918.2014.931971
    [33] Demsar J. Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 2006, 7: 1−30
  • 加载中
计量
  • 文章访问数:  47
  • HTML全文浏览量:  24
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-07-15
  • 录用日期:  2024-12-13
  • 网络出版日期:  2025-01-07

目录

    /

    返回文章
    返回