2.793

2018影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

联合样本输出与特征空间的半监督概念漂移检测法及其应用

孙子健 汤健 乔俊飞

孙子健, 汤健, 乔俊飞. 联合样本输出与特征空间的半监督概念漂移检测法及其应用. 自动化学报, 2021, x(x): 1−14 doi: 10.16383/j.aas.c200984
引用本文: 孙子健, 汤健, 乔俊飞. 联合样本输出与特征空间的半监督概念漂移检测法及其应用. 自动化学报, 2021, x(x): 1−14 doi: 10.16383/j.aas.c200984
Sun Zi-Jian, Tang Jian, Qiao Jun-Fei. Semi-supervised concept drift detection method by combining sample output space and feature space with its application. Acta Automatica Sinica, 2021, x(x): 1−14 doi: 10.16383/j.aas.c200984
Citation: Sun Zi-Jian, Tang Jian, Qiao Jun-Fei. Semi-supervised concept drift detection method by combining sample output space and feature space with its application. Acta Automatica Sinica, 2021, x(x): 1−14 doi: 10.16383/j.aas.c200984

联合样本输出与特征空间的半监督概念漂移检测法及其应用

doi: 10.16383/j.aas.c200984
基金项目: 国家自然科学基金(62073006, 62021003, 61890930-5), 北京市自然科学基金(4212032, 4192009), 科学技术部国家重点研发计划(2018YFC1900800-5), 矿冶过程自动控制技术国家(北京市)重点实验室(BGRIMM-KZSKL-2020-02)资助
详细信息
    作者简介:

    孙子健:北京工业大学信息学部硕士研究生. 主要研究方向为概念漂移检测, 城市固废焚烧过程难测参数软测量. E-mail: sunzj@emails.bjut.edu.cn

    汤健:北京工业大学信息学部教授. 主要研究方向为小样本数据建模, 城市固废处理过程智能控制. E-mail: freeflytang@bjut.edu.cn

    乔俊飞:北京工业大学信息学部教授. 主要研究方向为污水处理过程智能控制, 神经网络结构设计与优化. 本文通信作者. E-mail: junfeiq@bjut.edu.cn

Semi-Supervised Concept Drift Detection Method by Combining Sample Output Space and Feature Space with Its Application

Funds: Supported by the National Natural Science Foundation of China (62073006, 62021003, 61890930-5), Natural Science Foundation of Beijing, China (4212032, 4192009), National Key R&D Program of China (2018YFC1900800-5) and the National (Beijing) Key Laboratory of Automatic Control Technology for Mining and Metallurgical Process (BGRIMM-KZSKL-2020-02)
More Information
    Author Bio:

    SUN Zi-Jian Master student at the FaculTy of Information Technology, Beijing Unversity of Technology. His research interest covers concept drift detection and soft measurement of difficulty-to-measure parameters in solid waste treatment incineration process

    TANG Jian Professor at Beijing University of Technology. His research interest covers small sample data modeling and intelligent control of municipal solid waste treatment process

    QIAO Jun-Fei Professor at the Faculty of Information Technology, Beijing University of Technology. His research interest covers intelligent control of wastewater treatment process, and structure design and optimization of neural networks. Corresponding author of this paper

  • 摘要: 城市固废焚烧(Municipal solid waste incineration, MSWI)过程受垃圾成分波动、设备磨损与维修、季节交替变化等因素的影响而存在概念漂移现象, 这导致用于污染物排放浓度的建模数据具有时变性. 为此, 需要识别能够表征概念漂移的新样本对污染物测量模型进行更新, 但现有漂移检测方法难以有效应用于建模样本真值获取困难的工业过程. 针对上述问题, 本文提出一种联合样本输出与特征空间的半监督概念漂移检测方法. 首先, 采用基于主成分分析(Principal component analysis, PCA)的无监督机制识别特征空间内的概念漂移样本; 然后, 在样本输出空间采用基于时间差分(Temporal-difference, TD)学习的半监督机制对上述概念漂移样本进行伪真值标注后, 再用Page-Hinkley检测法确认能够表征概念漂移的样本; 最后, 采用上述步骤获得的新样本结合历史样本对模型进行更新. 本文基于合成和真实工业过程数据集的仿真结果表明所提方法具有优于已有方法的性能, 能够在加强模型漂移适应性的同时有效缩减样本标注成本.
  • 图  1  MSWI工艺流程图

    Fig.  1  The flow chart of MSWI process

    图  2  常见概念漂移处理方式

    Fig.  2  The common way to deal with concept drift

    图  3  本文算法策略

    Fig.  3  The strategy of the proposed algorithm

    图  4  各特征在漂移环境中的变化情况

    Fig.  4  Changes of each feature in the concept drift environment

    图  5  原始模型测量结果

    Fig.  5  Measurement results of the original model

    图  6  针对特征空间的漂移检测结果

    Fig.  6  Drift detection results in the feature space

    图  7  针对特征空间漂移样本的伪真值标注结果

    Fig.  7  Pseudo-true value labeling results for samples with concept drift in the feature space

    图  8  针对输出空间的漂移检测结果

    Fig.  8  Drift detection results in the output space

    图  9  采用所提漂移检测算法后模型测量误差变化

    Fig.  9  Changes of model measurement error after adopting the proposed drift detection algorithm

    图  10  采用不同算法时模型测量误差变化

    Fig.  10  Changes in model measurement errors when using different algorithms

    表  1  各数据集参数介绍

    Table  1  Detailed introduction of each data set

    数据集样本总数建模样本数验证样本数漂移样本数特征空间维数
    合成15005005005005
    过程150050050050018
    下载: 导出CSV

    表  2  仿真参数设置

    Table  2  Simulation parameter setting

    参数名称数据集
    合成过程
    GPR核函数径向基核函数径向基核函数
    核函数宽度0.59671.5116
    核函数特征长度0.79391.4734
    待标注样本窗口容量(w)850
    PCA控制限置信度(ConfSPE, ConfT2)0.8, 0.80.9, 0.9
    TD学习最近邻数量(ε)65
    Page-Hinkley检测法基准累
    计平均测量误差(${\phi _0}$)
    2.291916.8846
    下载: 导出CSV

    表  3  所提算法检测信息

    Table  3  Detection information of the proposed algorithm

    合成数据集过程数据集
    缓存窗口填满次数509
    模型更新次数448
    标注漂移样本伪真值数350441
    原始模型RMSE7.647853.0210
    采用本文算法后模型RMSE2.584028.8785
    下载: 导出CSV

    表  4  不同算法检测性能比较

    Table  4  Comparison of detection performance of different algorithms

    数据集检测算法模型更新次数更新所需真值数模型测量RMSE其它
    合成无监督型1011012.5846需采用真值更新
    有监督型999902.2943需采用真值检测与更新
    本文算法44502.5840采用伪真值更新
    过程无监督型46346335.8261需采用真值更新
    有监督型1945028.4729需采用真值检测与更新
    本文算法8928.8785采用伪真值更新
    下载: 导出CSV

    表  5  不同模型测量性能比较

    Table  5  Comparison of measurement performance of different models

    数据集测量模型核函数(核宽度)最小叶尺寸训练RMSE训练R2测量RMSE
    合成SVR径向基(0.5600)0.24790.943.7900
    RT40.30340.913.1241
    GPR径向基(0.5967)0.18990.962.5840
    过程SVR径向基(1.1000)0.13690.9830.3916
    RT40.16300.9729.9548
    GPR径向基(1.5116)0.13480.9828.8785
    下载: 导出CSV

    表  6  不同距离函数对模型更新性能影响

    Table  6  The influence of different distance functions on model updating performance

    数据集距离函数伪真值标注平均误差模型测量RMSE
    合成曼哈顿距离3.34343.1939
    切比雪夫距离3.23823.2484
    欧式距离3.27602.5840
    过程曼哈顿距离38.004328.9954
    切比雪夫距离37.739228.9947
    欧式距离35.942928.8785
    下载: 导出CSV

    表  7  不同可变参数对应算法性能变化

    Table  7  Algorithm performance changes corresponding to different variable parameters

    样本窗口容量w最近邻数量εPCA控制限ConfSPE,ConfT2缓存窗口填满次数标注伪真值数更新次数伪真值标注平均误差模型测量RMSE
    3030.85, 0.85164641338.900531.0823
    0.9, 0.9164641548.201635.2513
    0.95, 0.95164641237.752828.9876
    50.85, 0.85164641540.000430.4071
    0.9, 0.9164641547.663634.2694
    0.95, 0.95154351339.025831.0078
    80.85, 0.85164641240.178228.8912
    0.9, 0.9164641546.556732.8323
    0.95, 0.95154351438.440030.5321
    5030.85, 0.859441842.992330.1536
    0.9, 0.99441836.899929.7216
    0.95, 0.959441731.282229.3330
    50.85, 0.859441843.448329.8960
    0.9, 0.99441935.942928.8785
    0.95, 0.959441731.967429.9178
    80.85, 0.859441842.975929.4615
    0.9, 0.99441837.033829.2796
    0.95, 0.959441631.426729.3356
    7030.85, 0.856414544.731533.6308
    0.9, 0.96414546.985936.2573
    0.95, 0.956414533.471133.1686
    50.85, 0.856414541.974432.4663
    0.9, 0.96414544.458034.3495
    0.95, 0.956414533.628734.2660
    80.85, 0.856414542.392931.0446
    0.9, 0.96414545.877134.5003
    0.95, 0.956414533.220633.5950
    下载: 导出CSV
  • [1] Kolekar K A, Hazra T, Chakrabarty S N. A review on prediction of municipal solid waste generation models. Procedia Environmental Sciences, 2016, 35: 238−244 doi: 10.1016/j.proenv.2016.07.087
    [2] Li X, Zhang C, Li Y, Zhi Q. The status of municipal solid waste incineration (MSWI) in China and its clean development. Energy Procedia, 2016, 104: 498−503 doi: 10.1016/j.egypro.2016.12.084
    [3] 乔俊飞, 郭子豪, 汤健. 面向城市固废焚烧过程的二噁英排放浓度检测方法综述. 自动化学报, 2020, 46(06): 1063−1089

    Qiao Jun-Fei, Guo Zi-Hao, Tang Jian. Dioxin emission concentration measurement approaches for municipal solid wastes incineration process: a survey. Acta Automatica Sinica, 2020, 46(06): 1063−1089
    [4] 汤健, 乔俊飞, 徐喆, 郭子豪. 基于特征约简与选择性集成算法的城市固废焚烧过程二噁英排放浓度软测量. [Online], available: http://kns.cnki.net/kcms/detail/44.1240.TP.20200924.1722.004.html, Sep 27, 2020.

    Tang Jian, Qiao Jun-Fei, Xu Zhe, Guo Zi-Hao. Soft measuring approach of dioxin emission concentration in municipal solid waste incineration process based on feature reduction and selective ensemble algorithm. [Online], available: http://kns.cnki.net/kcms/detail/44.1240.TP.20200924.1722.004.html, Sep 27, 2020.
    [5] 汤健, 夏恒, 乔俊飞, 郭子豪. 深度集成森林回归建模方法及应用研究. [Online], available: http://kns.cnki.net/kcms/detail/11.2286.T.20200723.1048.002.html, Jul 23, 2020.

    Tang Jian, Xia Heng, Qiao Jun-Fei, Guo Zi-Hao. Deep ensemble forest regression modeling method with its application research. [Online], available: http://kns.cnki.net/kcms/detail/11.2286.T.20200723.1048.002.html, Jul 23, 2020.
    [6] Wang S, Schlobach S, Klein M. What is concept drift and how to measure it? In: Proceedings of the International Conference on Knowledge Engineering and Knowledge Management. Lisbon, Portugal: Springer, 2010: 241–256.
    [7] Widmer G, Kubat M. Learning in the presence of concept drift and hidden contexts. Machine Learning, 1996, 23(1): 69−101
    [8] 汤健, 柴天佑, 刘卓, 余文, 周晓杰. 基于更新样本智能识别算法的自适应集成建模. 自动化学报, 2016, 042(007): 1040−1052

    TANG Jian, CHAI Tian-You, LIU Zhuo, YU Wen, ZHOU Xiao-Jie. Adaptive ensemble modelling approach based on updating sample intelligent identification. Acta Automatica Sinica, 2016, 042(007): 1040−1052
    [9] Žliobaitė I. Learning under concept drift: an overview. [Online], available: http://arxiv.org/abs/1010.4784, Oct 22, 2010.
    [10] Lu J, Liu A, Dong F, Gu F, Gama J, Zhang G. Learning under concept drift: a review. IEEE Transactions on Knowledge and Data Engineering, 2018, 31(12): 2346−2363
    [11] Gama J, Medas P, Castillo G, Rodrigues P. Learning with drift detection. In: Proceedings of the 17th Brazilian Symposium on Artificial Intelligence. São Luís, Brazil: Springer, 2004: 286–295.
    [12] Pesaranghader A, Viktor H L. Fast hoeffding drift detection method for evolving data streams. In: Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Riva del Garda, Italy: Springer, 2016: 96–111.
    [13] Yang Z, Al-Dahidi S, Baraldi P, Zio E, Montelatici L. A novel concept drift detection method for incremental learning in nonstationary environments. IEEE Transactions on Neural Networks and Learning Systems, 2019, 31(1): 309−320
    [14] Frías B I, Campo A J, Ramos J G, Morales B R, Ortiz D A, Caballero M Y. Online and non-parametric drift detection methods based on Hoeffding’s bounds. IEEE Transactions on Knowledge and Data Engineering, 2014, 27(3): 810−823
    [15] Mahdi O A, Pardede E, Ali N, Cao J. Diversity measure as a new drift detection method in data streaming. Knowledge-Based Systems, 2020, 191: 105227 doi: 10.1016/j.knosys.2019.105227
    [16] Korpela T, Kumpulainen P, Majanne Y, Häyrinen A, Lautala P. Indirect NOx emission monitoring in natural gas fired boilers. Control Engineering Practice. 2017, 65: 11–25.
    [17] Tang J, Yu W, Chai T Y, Zhao L J. Online principal component analysis with application to process modeling. Neurocomputing, 2012, 82: l67−168
    [18] Han X, Tian S, Romagnoli J A, Lic H, Suna W. PCA-SDG based process monitoring and fault diagnosis: application to an industrial pyrolysis furnace. IFAC-PapersOnLine, 2018, 51(18): 482−487 doi: 10.1016/j.ifacol.2018.09.378
    [19] Liu S, Feng L, Wu J, Hou G, Han G. Concept drift detection for data stream learning based on angle optimized global embedding and principal component analysis in sensor networks. Computers & Electrical Engineering, 2017, 58(2017): 327−336
    [20] Toubakh H, Sayed-Mouchaweh M. Hybrid dynamic data-driven approach for drift-like fault detection in wind turbines. Evolving Systems, 2015, 6(2): 115−129 doi: 10.1007/s12530-014-9119-8
    [21] Xu S, Feng L, Liu S, Qiao H. Self-adaption neighborhood density clustering method for mixed data stream with concept drift. Engineering Applications of Artificial Intelligence, 2020, 89(Mar.): 103451.1−103451.14
    [22] Wang X S, Kang Q, Zhou M C, Yao S Y. A multiscale concept drift detection method for learning from data streams. In: Proceedings of the 14th International Conference on Automation Science and Engineering. Munich, Germany: IEEE, 2018: 786–790.
    [23] Liu A, Lu J, Liu F, Zhang G. Accumulating regional density dissimilarity for concept drift detection in data streams. Pattern Recognition, 2018, 76: 256−272 doi: 10.1016/j.patcog.2017.11.009
    [24] Lughofer E, Weigl E, Heidl W, Eitzinger C, Radauer T. Recognizing input space and target concept drifts in data streams with scarcely labeled and unlabelled instances. Information Sciences, 2016, 355: 127−151
    [25] Haque A, Khan L, Baron M, Thuraisingham B, Aggarwal C, Efficient handling of concept drift and concept evolution over stream data. In: Proceedings of the 32nd International Conference on Data Engineering. Helsinki, Finland: IEEE, 2016: 481–492.
    [26] Tan C H, Lee V, Salehi M. Online semi-supervised concept drift detection with density estimation. [Online], available: https://arxiv.org/abs/1909.11251, Nov 11, 2019.
    [27] Zhou Z H, Li M. Semi-supervised regression with co-training. In: Proceedings of the International Joint Conference on Artificial Intelligence. Scotland, UK: AAAI. 2005: 908–913.
    [28] Miller J A. Bowman C T. Mechanism and modelling of nitrogen chemistry in combustion. Progress in Energy and Combustion Science, 1989, 15(4): 287−338 doi: 10.1016/0360-1285(89)90017-8
    [29] Kadlec P, Gabrys B, Strandt S. Data-driven soft sensors in the process industry. Computers & Chemical Engineering, 2009, 33(4): 795−814
    [30] Schlimmer J C, Granger R H. Incremental learning from noisy data. Machine learning, 1986, 1(3): 317−354
    [31] 杨俊志. 测量准确度及相关术语辨析. 测绘科学, 2011, 36(01): 75−76

    YANG Jun-Zhi. Full analysis on accuracy and related terms. Science of Surveying and Mapping, 2011, 36(01): 75−76
    [32] Wang B, Mao Z. Outlier detection based on gaussian process with application to industrial processes. Applied Soft Computing, 2019, 76: 505−516 doi: 10.1016/j.asoc.2018.12.029
    [33] Schulz E, Speekenbrink M, Krause A. A tutorial on gaussian process regression: modelling, exploring, and exploiting functions. Journal of Mathematical Psychology, 2018, 85(2018): 1−16
    [34] Yin S, Ding S X, Xie X, Luo H. A review on basic data-driven approaches for industrial process monitoring. IEEE Transactions on Industrial Electronics, 2014, 61(11): 6418−6428 doi: 10.1109/TIE.2014.2301773
    [35] Tang J, Yu W, Chai T Y, Liu Z, Zhou X. Selective ensemble modeling load parameters of ball mill based on multi-scale frequency spectral features and sphere criterion. Mechanical Systems & Signal Processing, 2016, 66: 485−504
    [36] Kaneko H, Funatsu K. Classification of the degradation of soft sensor models and discussion on adaptive models. AIChE Journal, 2013, 59(7): 2339−2347 doi: 10.1002/aic.14006
    [37] 袁小锋, 葛志强, 宋执环. 基于时间差分和局部加权偏最小二乘算法的过程自适应软测量建模. 化工学报, 2016, 2016(03): 724−728

    YUAN Xiao-Feng, GE Zhi-Qiang, SONG Zhi-Huan. Adaptive soft sensor based on time difference model and locally weighted partial least squares regression. Journal of Chemical Industry and Engineering (China), 2016, 2016(03): 724−728
    [38] Kaneko H, Funatsu K. Maintenance-free soft sensor models with time difference of process variables. Chemometrics and Intelligent Laboratory Systems, 2011, 107(2): 312−317 doi: 10.1016/j.chemolab.2011.04.016
    [39] 濮晓龙. 关于累积和(CUSUM)检验的改进. 应用数学学报, 2003, 2003(02): 225−241 doi: 10.3321/j.issn:0254-3079.2003.02.005

    Pu Xiao-Long. Improvement of CUSUM test. Acta Mathematicae Applicate Sinica, 2003, 2003(02): 225−241 doi: 10.3321/j.issn:0254-3079.2003.02.005
    [40] Ikonomovska E. Algorithms for learning regression trees and ensembles on evolving data streams. [Ph. D. Dissertation], Jožef Stefan International Postgraduate School, 2012.
    [41] Channoi K, Maneewongvatana S. Concept drift for CRD prediction in broiler farms. In: Proceedings of the 12th International Joint Conference on Computer Science and Software Engineering. Songkhla: Thailand: IEEE, 2015: 287–290.
  • 加载中
计量
  • 文章访问数:  233
  • HTML全文浏览量:  118
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-11-27
  • 录用日期:  2021-03-02
  • 网络出版日期:  2021-05-16

目录

    /

    返回文章
    返回