-
摘要: 数据流分类是数据流挖掘领域一项重要研究任务, 目标是从不断变化的海量数据中捕获变化的类结构. 目前, 几乎没有框架可以同时处理数据流中常见的多类非平衡、概念漂移、异常点和标记样本成本高昂问题. 基于此, 提出一种非平衡数据流在线主动学习方法(Online active learning method for imbalanced data stream, OALM-IDS). AdaBoost是一种将多个弱分类器经过迭代生成强分类器的集成分类方法, AdaBoost.M2引入了弱分类器的置信度, 此类方法常用于静态数据. 定义了基于非平衡比率和自适应遗忘因子的训练样本重要性度量, 从而使AdaBoost.M2方法适用于非平衡数据流, 提升了非平衡数据流集成分类器的性能. 提出了边际阈值矩阵的自适应调整方法, 优化了标签请求策略. 将概念漂移程度融入模型构建过程中, 定义了基于概念漂移指数的自适应遗忘因子, 实现了漂移后的模型重构. 在6个人工数据流和4个真实数据流上的对比实验表明, 提出的非平衡数据流在线主动学习方法的分类性能优于其他5种非平衡数据流学习方法.Abstract: Data stream classification is an important research task in the field of data stream mining, which aims to capture changing class structures from the ever-changing massive data. At present, almost no frameworks can simultaneously address the common problems in data stream, such as multi-class imbalance, concept drift, outlier and the exorbitant costs associated with labeling the unlabeled samples. In this paper, we propose an online active learning method for imbalanced data stream (OALM-IDS). AdaBoost is an ensemble classification method that iteratively generates a strong classifier from multiple weak classifiers. AdaBoost.M2 further introduces the confidence degree of weak classifiers, which is suitable for static data. In the method, we firstly define an importance measure of training sample based on imbalanced ratio and adaptive forgetting factor, which makes the AdaBoost.M2 method applying for imbalanced data stream and improves the performance of ensemble classifier. Then, we propose an adaptive adjustment method of marginal threshold matrix, which optimizes the label request strategy. Finally, we define an adaptive forgetting factor based on the concept drift index by bringing the degree of concept drift into the construction process of model, which realizes the model reconstruction after drift. Comparative experiments on six artificial data streams and four real data streams show that the classification performance of the online active learning method is better than those of the existing five learning methods for imbalance data stream.
-
Key words:
- Active learning /
- data stream classification /
- multi-class imbalance /
- concept drift
-
表 1 数据流的特征
Table 1 Data stream feature
数据流 样本数 特征数 类别数 类分布 漂移次数 异常点 ${\rm{DS} }_{1}$ 200000 21 5 (0.2, 0.2, 0.2, 0.2, 0.2) 0 0 ${\rm{DS}}_{2}$ 200000 21 5 (0.2, 0.2, 0.2, 0.2, 0.2) 3 10 ${\rm{DS}}_{3}$ 200000 21 5 (0.1, 0.3, 0.4, 0.2, 0.1) 0 0 ${\rm{DS}}_{4}$ 200000 21 5 (0.1, 0.3, 0.4, 0.2, 0.1) 3 10 ${\rm{DS}}_{5}$ 200000 21 5 (0.1, 0.3, 0.4, 0.2, 0.1), (0.4, 0.2, 0.1, 0.1, 0.2) 0 0 ${\rm{DS}}_{6}$ 200000 21 5 (0.1, 0.3, 0.4, 0.2, 0.1), (0.4, 0.2, 0.1, 0.1, 0.2) 3 10 Kddcup$99\_10\%$ 494000 42 23 — — — Statlog 570000 10 7 — — — IoT 663000 115 11 — — — HAR 10299 561 6 — — — 表 2 6种算法的准确率
Table 2 Precision values of six algorithms
数据流 LB BOLE ${\rm{ARF}}_{RE}$ OALE CALMID OALM-IDS ${\rm {DS} }_{1}$ $94.56\pm0.12$ $\boldsymbol{95.61} \;\pm \boldsymbol {0.11} $ $93.54\pm0.13$ $89.78\pm0.21$ $94.76\pm0.16$ $95.48\pm0.15$ ${\rm {DS}}_{2}$ $92.27\pm0.17$ $92.44\pm0.14$ $91.04\pm0.19$ $88.31\pm0.23$ $92.81\pm0.13$ ${\boldsymbol{93.94} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.12} }$ ${\rm {DS}}_{3}$ $88.39\pm0.22$ $89.52\pm0.14$ $90.95\pm0.13$ $88.83\pm0.16$ $92.57\pm0.13$ ${\boldsymbol{93.72} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.13} }$ ${\rm {DS}}_{4}$ $86.55\pm0.31$ $88.68\pm0.26$ $89.89\pm0.23$ $86.29\pm0.29$ $91.31\pm0.18$ ${\boldsymbol{92.18} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.21} }$ ${\rm {DS}}_{5}$ $85.64\pm0.29$ $87.04\pm0.34$ $89.61\pm0.51$ $88.83\pm0.21$ $91.13\pm0.21$ ${\boldsymbol{92.92} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.16} }$ ${\rm {DS}}_{6}$ $82.10\pm0.69$ $83.15\pm0.73$ $86.54\pm0.72$ $83.42\pm0.55$ $90.64\pm0.42$ ${\boldsymbol{92.41} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.21} }$ Kddcup$99\_10\%$ $83.87\pm0.43$ $81.09\pm0.56$ $85.48\pm0.65$ $81.01\pm0.36$ $92.06\pm0.19$ ${\boldsymbol{92.07} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.18} }$ Statlog $64.55\pm0.31$ $63.78\pm0.61$ $79.97\pm0.39$ $73.78\pm0.43$ $85.40\pm0.34$ ${\boldsymbol{85.68} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.33} }$ IoT $64.03\pm0.48$ $61.54\pm0.43$ $66.66\pm0.53$ $55.81\pm0.51$ $70.85\pm0.54$ ${\boldsymbol{73.12} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.38} }$ HAR $61.63\pm0.53$ $59.76\pm0.46$ $63.22\pm0.49$ $55.16\pm0.69$ $68.64\pm0.71$ ${\boldsymbol{69.98} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.51} }$ 表 3 6种算法的召回率
Table 3 Recall values of six algorithms
数据流 LB BOLE ${\rm{ARF}}_{RE}$ OALE CALMID OALM-IDS ${\rm{DS}}_{1}$ $95.37\pm0.18$ $95.96\pm0.13$ $93.39\pm0.11$ $90.13\pm0.13$ $95.91\pm0.11$ ${\boldsymbol{96.14} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.12} }$ ${\rm{DS}}_{2}$ $92.39\pm0.21$ $92.28\pm0.35$ $91.35\pm0.26$ $89.45\pm0.18$ $92.51\pm0.15$ ${\boldsymbol{94.08} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.14} }$ ${\rm{DS}}_{3}$ $87.55\pm0.19$ $88.19\pm0.22$ $86.14\pm0.21$ $88.52\pm0.22$ $90.55\pm0.13$ ${\boldsymbol{92.52} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.13} }$ ${\rm{DS}}_{4}$ $84.57\pm0.36$ $86.73\pm0.29$ $87.47\pm0.28$ $83.05\pm0.31$ $89.89\pm0.21$ ${\boldsymbol{92.44} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.18} }$ ${\rm{DS}}_{5}$ $84.14\pm0.43$ $86.44\pm0.49$ $87.26\pm0.69$ $83.26\pm0.36$ $90.25\pm0.18$ ${\boldsymbol{91.16} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.13} }$ ${\rm{DS}}_{6}$ $83.98\pm1.13$ $81.87\pm0.91$ $84.56\pm1.31$ $78.87\pm0.69$ $90.46\pm0.13$ ${\boldsymbol{90.71} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.21} }$ Kddcup$99\_10\%$ $60.82\pm0.71$ $62.75\pm0.64$ $58.17\pm1.32$ $58.44\pm1.63$ $61.88\pm0.43$ ${\boldsymbol{63.71} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.37} }$ Statlog $61.39\pm0.91$ $50.92\pm1.32$ $54.36\pm1.11$ $51.20\pm1.34$ $59.52\pm0.63$ ${\boldsymbol{63.12} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.39} }$ IoT $40.73\pm2.14$ $42.29\pm1.58$ $39.35\pm1.89$ $40.42\pm2.15$ $48.04\pm1.04$ ${\boldsymbol{51.26} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.81} }$ HAR $61.64\pm1.18$ $60.57\pm0.97$ $57.91\pm1.43$ $54.11\pm1.36$ $65.53\pm0.76$ ${\boldsymbol{66.57} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.46} }$ 表 4 6种算法的F1值
Table 4 F1 values of six algorithms
数据流 LB BOLE ${\rm{ARF}}_{RE}$ OALE CALMID OALM-IDS ${\rm{DS}}_{1}$ $94.96\pm0.11$ ${\boldsymbol{95.80} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.10} }$ $93.42\pm0.13$ $89.93\pm0.15$ $95.33\pm0.11$ ${\boldsymbol{95.80} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.10} }$ ${\rm{DS}}_{2}$ $92.32\pm0.16$ $92.34\pm0.13$ $91.18\pm0.15$ $88.85\pm0.21$ $92.65\pm0.13$ ${\boldsymbol{94.01} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.12} }$ ${\rm{DS}}_{3}$ $87.91\pm0.20$ $88.81\pm0.24$ $88.11\pm0.36$ $88.67\pm0.20$ $91.50\pm0.16$ ${\boldsymbol{93.07} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.14} }$ ${\rm{DS}}_{4}$ $85.35\pm0.42$ $87.38\pm0.36$ $88.42\pm0.51$ $84.50\pm0.33$ $90.51\pm0.21$ ${\boldsymbol{92.29} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.20} }$ ${\rm{DS}}_{5}$ $84.85\pm0.41$ $86.67\pm0.43$ $88.30\pm0.46$ $85.36\pm0.48$ $90.62\pm0.21$ ${\boldsymbol{91.93} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.18} }$ ${\rm{DS}}_{6}$ $82.97\pm0.87$ $82.43\pm0.71$ $85.35\pm0.91$ $80.59\pm0.63$ $90.46\pm0.39$ ${\boldsymbol{91.53} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.31} }$ Kddcup$99\_10\%$ $73.12\pm0.55$ $72.47\pm0.63$ $72.01\pm0.46$ $72.81\pm0.51$ $73.56\pm0.33$ ${\boldsymbol{74.65} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.20} }$ Statlog $66.18\pm0.83$ $54.32\pm1.91$ $63.85\pm1.03$ $63.42\pm0.98$ $74.42\pm0.36$ ${\boldsymbol{75.19} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.31} }$ IoT $47.01\pm1.24$ $48.40\pm0.96$ $47.34\pm1.89$ $44.94\pm1.36$ $54.26\pm0.65$ ${\boldsymbol{56.73} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.67} }$ HAR $59.93\pm0.91$ $58.81\pm1.21$ $58.52\pm0.79$ $54.43\pm1.13$ $65.43\pm0.63$ ${\boldsymbol{67.76} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.58} }$ 表 5 6种算法的Kappa系数值
Table 5 Kappa coefficient values of six algorithms
数据流 LB BOLE ${\rm{ARF}}_{RE}$ OALE CALMID OALM-IDS ${\rm{DS}}_{1}$ $90.17\pm0.12$ $91.18\pm0.14$ $90.59\pm0.16$ $85.47\pm0.21$ $90.48\pm0.19$ $\boldsymbol{91.31\pm0.12}$ ${\rm{DS}}_{2}$ $88.85\pm0.19$ $88.14\pm0.23$ $87.91\pm0.39$ $83.18\pm0.56$ $89.97\pm0.31$ ${\boldsymbol{90.66} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.23} }$ ${\rm{DS}}_{3}$ $85.25\pm0.22$ $85.86\pm0.38$ $86.68\pm0.29$ $83.91\pm0.39$ $88.91\pm0.26$ ${\boldsymbol{89.93} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.21} }$ ${\rm{DS}}_{4}$ $84.15\pm0.55$ $86.04\pm0.63$ $87.14\pm0.66$ $83.42\pm0.71$ $88.92\pm0.33$ ${\boldsymbol{89.33} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.36} }$ ${\rm{DS} }_{5}$ $83.85\pm0.77$ $85.83\pm0.69$ $86.45\pm0.81$ $86.67\pm0.70$ $88.57\pm0.31$ ${\boldsymbol{89.12} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.29} }$ ${\rm{DS} }_{6} $ $81.49\pm1.12$ $82.98\pm1.69$ $84.15\pm1.87$ $79.92\pm1.48$ $89.01\pm0.41$ ${\boldsymbol{89.73} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.28} }$ Kddcup$99\_10\% $ $80.93\pm0.67$ $75.62\pm1.13$ $79.32\pm1.32$ $78.31\pm0.91$ $83.32\pm0.26$ ${\boldsymbol{85.83} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.18} }$ Statlog $58.71\pm1.42$ $61.43\pm1.18$ $73.72\pm0.93$ $71.21\pm1.24$ $79.39\pm0.46$ ${\boldsymbol{80.11} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.19} }$ IoT $67.53\pm1.54$ $65.02\pm1.89$ $68.99\pm2.14$ $59.53\pm2.12$ $71.65\pm0.71$ ${\boldsymbol{73.29} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.68} }$ HAR $60.49\pm1.12$ $60.01\pm1.38$ $61.86\pm1.13$ $56.75\pm2.03$ $68.52\pm0.76$ ${\boldsymbol{69.64} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.71} }$ 表 6 参数$\theta $对OALM-IDS的影响
Table 6 Effect of parameter $\theta $ to OALM-IDS
数据流 $\theta $ $b$ 准确率 召回率 F1值 Kappa系数值 0.4 0.17143 $94.21\pm0.16 $ $93.18\pm0.12$ $94.13\pm0.11$ $90.11\pm0.12$ $\rm{DS}_{1}$ ${\boldsymbol{0.5}}$ ${\boldsymbol{0.180\,26} }$ ${\boldsymbol{95.48} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.15} }$ ${\boldsymbol{96.14}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.12}}$ ${\boldsymbol{95.80}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.10}}$ ${\boldsymbol{91.31}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.12}}$ 0.6 0.19782 $95.03\pm0.15 $ $93.19\pm0.12$ $95.16\pm0.10$ $91.01\pm0.12$ 0.4 0.17136 $93.01\pm0.12 $ $92.81\pm0.16$ $93.04\pm0.13$ $89.09\pm0.26$ $\rm{DS}_{2}$ ${\boldsymbol{0.5}}$ ${\boldsymbol{0.191\,78} }$ ${\boldsymbol{93.94}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.12}}$ ${\boldsymbol{94.08}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.14}}$ ${\boldsymbol{94.01}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.12}}$ ${\boldsymbol{90.66}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.23}}$ 0.6 0.20000 $93.18\pm0.13 $ $93.16\pm0.14$ $93.75\pm0.12$ $90.07\pm0.23$ 0.4 0.17821 $93.24\pm0.13 $ $92.05\pm0.13$ $92.54\pm0.16$ $88.56\pm0.22$ $\rm{DS}_{3}$ ${\boldsymbol{0.5}}$ ${\boldsymbol{0.195\,12} }$ ${\boldsymbol{93.72}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.13}}$ ${\boldsymbol{92.52}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.13}}$ ${\boldsymbol{93.07}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.14}}$ ${\boldsymbol{89.93}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.21}}$ 0.6 0.20000 $93.43\pm0.13 $ $92.24\pm0.13$ $92.10\pm0.14$ $88.71\pm0.21$ 0.4 0.18423 $91.63\pm0.21 $ $91.34\pm0.18$ $91.76\pm0.20$ $88.54\pm0.38$ $\rm{DS}_{4}$ ${\boldsymbol{0.5}}$ ${\boldsymbol{0.198\,77} }$ ${\boldsymbol{92.18}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.21}}$ ${\boldsymbol{92.44}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.18}}$ ${\boldsymbol{92.29}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.20}}$ ${\boldsymbol{89.33}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.36}}$ 0.6 0.20000 $91.06\pm0.21 $ $91.56\pm0.19$ $91.80\pm0.21$ $88.63\pm0.36$ 0.4 0.18002 $92.01\pm0.16 $ $90.46\pm0.13$ $90.76\pm0.18$ $88.42\pm0.29$ $\rm{DS}_{5}$ ${\boldsymbol{0.5}}$ ${\boldsymbol{0.197\,22} }$ ${\boldsymbol{92.92} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.16} }$ ${\boldsymbol{91.16}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.13}}$ ${\boldsymbol{91.93}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.18}}$ ${\boldsymbol{89.12}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.29}}$ 0.6 0.20000 $92.50\pm0.16 $ $90.76\pm0.13$ $91.21\pm0.19$ $88.56\pm0.30$ 0.4 0.18331 $91.02\pm0.21 $ $89.03\pm0.22$ $90.32\pm0.31$ $88.12\pm0.28$ $\rm{DS}_{6}$ ${\boldsymbol{0.5}}$ ${\boldsymbol{0.199\,23} }$ ${\boldsymbol{92.41}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.21}}$ ${\boldsymbol{90.71}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.21}}$ ${\boldsymbol{91.53}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.31}}$ ${\boldsymbol{89.73}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.28}}$ 0.6 0.20000 $91.01\pm0.21 $ $89.92\pm0.22$ $90.12\pm0.31$ $89.13\pm0.28$ 0.4 0.18188 $90.59\pm0.18 $ $63.51\pm0.37$ $73.35\pm0.20$ $83.14\pm0.18$ Kddcup$99\_10\%$ ${\boldsymbol{0.5}}$ ${\boldsymbol{0.199\,61} }$ ${\boldsymbol{92.07}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.18}}$ ${\boldsymbol{63.71}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.37}}$ ${\boldsymbol{74.65}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.20}}$ ${\boldsymbol{85.83}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.18}}$ 0.6 0.20000 $91.63\pm0.18 $ $63.63\pm0.37$ $74.43\pm0.21$ $85.61\pm0.18$ 0.4 0.19022 $84.75\pm0.33 $ $62.19\pm0.39$ $74.85\pm0.31$ $78.86\pm0.19$ Statlog ${\boldsymbol{0.5}}$ ${\boldsymbol{0.199\,94} }$ ${\boldsymbol{85.68}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.33}}$ ${\boldsymbol{63.12}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.39}}$ ${\boldsymbol{75.19}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.31}}$ ${\boldsymbol{80.11}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.19}}$ 0.6 0.20000 $85.66\pm0.33 $ $63.01\pm0.39$ $75.19\pm0.31$ $79.89\pm0.19$ 0.4 0.19113 $71.21\pm0.38 $ $49.86\pm0.81$ $51.21\pm0.67$ $71.61\pm0.68$ IoT ${\boldsymbol{0.5}}$ ${\boldsymbol{0.196\,84} }$ ${\boldsymbol{73.12}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.38}}$ ${\boldsymbol{51.26}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.81}}$ ${\boldsymbol{56.73}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.67}}$ ${\boldsymbol{73.29}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.68}}$ 0.6 0.20000 $72.11\pm0.39 $ $50.06\pm0.81$ $54.33\pm0.67$ $71.34\pm0.68$ 0.4 0.18634 $66.54\pm0.52 $ $64.32\pm0.48$ $65.05\pm0.59$ $66.81\pm0.72$ HAR ${\boldsymbol{0.5}}$ ${\boldsymbol{0.195\,47} }$ ${\boldsymbol{69.98}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.51}}$ ${\boldsymbol{66.57}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.46}}$ ${\boldsymbol{67.76}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.58}}$ ${\boldsymbol{69.64}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.71}}$ 0.6 0.20000 $64.32\pm0.52 $ $65.14\pm0.46$ $66.11\pm0.58$ $64.32\pm0.71$ -
[1] 于洪, 何德牛, 王国胤, 李劼, 谢永芳. 大数据智能决策. 自动化学报, 2020, 46(5): 878−896Yu Hong, He De-Niu, Wang Guo-Yin, Li Jie, Xie Yong-Fang. Big data for intelligent decision making. Acta Automatica Sinica, 2020, 46(5): 878−896 [2] Lu J, Liu A, Dong F, Gu F, Gama J, Zhang G. Learning under concept drift: A review. IEEE Transactions on Knowledge and Data Engineering, 2020, 31(12): 2346−2363 [3] Liu W, Zhang H, Liu Q. An air quality grade forecasting approach based on ensemble learning. In: Proceedings of the International Conference on Artificial Intelligence and Advanced Manufacturing. Dublin, Ireland: AIAM, 2019. 87−91 [4] Cano A, Krawczyk B. Kappa updated ensemble for drifting data stream mining. Machine Learning, 2020, 109(1): 175−218 doi: 10.1007/s10994-019-05840-z [5] Liu A, Lu J, Zhang G. Concept drift detection via equal intensity k-means space partitioning. IEEE Transactions on Cybernetics, 2020, 51(6): 3198−3211 [6] 王金甲, 张玉珍, 夏静, 王凤嫔. 多层局部块坐标下降法及其驱动的分类重构网络. 自动化学报, 2020, 46(12): 2647−2661Wang Jin-Jia, Zhang Yu-Zhen, Xia Jing, Wang Feng-Pin. Multi-layer local block coordinate descent algorithm and unfolding classification and reconstruction networks. Acta Automatica Sinica, 2020, 46(12): 2647−2661 [7] Lu Y, Cheung M Y, Tang Y Y. Adaptive chunk-based dynamic weighted majority for imbalanced data stream with concept drift. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(8): 2764−2778 doi: 10.1109/TNNLS.2019.2951814 [8] Grzyb J, Klikowski J, Woźniak M. Hellinger distance weighted ensemble for imbalanced data stream classification. Journal of Computational Science, 2021, 51: Article No. 101314 doi: 10.1016/j.jocs.2021.101314 [9] Kim T, Park C H. Anomaly pattern detection for streaming data. Expert Systems With Applications, 2020, 149: Article No. 113252 doi: 10.1016/j.eswa.2020.113252 [10] Wankhade K K, Dongre S S, Jondhale K C. Data stream classification: A review. Iran Journal of Computer Science, 2020, 3: 239−260 doi: 10.1007/s42044-020-00061-3 [11] Bahri M, Bifet A, Gama J, Gomes H M, Maniu S. Data stream analysis: Foundations, major tasks and tools. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2021, 11(3): Article No. e1405 [12] Kontopoulos I, Chatzikokolakis K, Tserpes K, Zissis D. Classification of vessel activity in streaming data. In: Proceedings of the 14th ACM International Conference on Distributed and Event-based Systems. Jerusalem, Israel: ACM, 2020. 153−164 [13] Wang S, Minku L L. Auc estimation and concept drift detection for imbalanced data streams with multiple classes. In: Proceedings of the International Joint Conference on Neural Networks. Glasgow, UK: IJCNN, 2020. 1−8 [14] Fan S, Zhang X, Song Z. Reinforced knowledge distillation: Multi-class imbalanced classifier based on policy gradient reinforcement learning. Neurocomputing, 2021, 463: 422−436 doi: 10.1016/j.neucom.2021.08.040 [15] Bifet A, Holmes G, Pfahringer B. Leveraging bagging for evolving data stream. In: Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Barcelona, Spain: PKDD, 2010. 135−150 [16] Mirza B, Lin Z. Meta-cognitive online sequential extreme learning machine for imbalanced and concept drifting data classification. Neural Networks, 2016, 80: 79−94 doi: 10.1016/j.neunet.2016.04.008 [17] Barros R S M, Carvalho-Santos S G T, Júnior P M G. A boosting-like online learning ensemble. In: Proceedings of the International Joint Conference on Neural Networks. Vancouver, Canada: IJCNN, 2016. 1871−1878 [18] Carvalho-Santos S G T, Barros R S M. Online AdaBoost-based methods for multi-class problems. Artificial Intelligence Review, 2020, 53(2): 1293−1322 doi: 10.1007/s10462-019-09696-6 [19] Ferreira L E B, Gomes H M, Bifet A, Oliveira L S. Adaptive random forests with resampling for imbalanced data stream. In: Proceedings of the International Joint Conference on Neural Networks. Budapest, Hungary: IJCNN, 2019. 1−6 [20] Ren P Z, Xiao Y, Chang X J, Huang P Y, Li Z, Gupta B B, et al. A survey of deep active learning. ACM Computing Surveys, 2021, 54(9): 1−40 [21] Yousaf M S, Ahmad I, Khurshid A, Ikram M. Machine assisted classification of chicken, beef and mutton tissues using optical polarimetry and bagging model. Photodiagnosis and Photodyna-mic Therapy, 2020, 31: Article No. 101779 doi: 10.1016/j.pdpdt.2020.101779 [22] Wang Y, Feng L. An adaptive boosting algorithm based on weighted feature selection and category classification confidence. Applied Intelligence, 2021, 51(10): 1−22 [23] Gomes H M, Bifet A, Read J, Barddal J P, Enembreck F, Pfharinger B, et al. Adaptive random forests for evolving data stream classification. Machine Learning, 2017, 106(9): 1469−1495 [24] Babüroǧlu E S, Durmuşoǧlu A, Dereli T. Novel hybrid pair recommendations based on a large-scale comparative study of concept drift detection. Expert Systems With Applications, 2021, 163: Article No. 113786 doi: 10.1016/j.eswa.2020.113786 [25] 刘子昂, 蒋雪, 伍冬睿. 基于池的无监督线性回归主动学习. 自动化学报, 2021, 47(12): 2771−2783Liu Zi-Ang, Jiang Xue, Wu Dong-Rui. Unsupervised pool-based active learning for linear regression. Acta Automatica Sinica, 2021, 47(12): 2771−2783 [26] Shekhar S, Ghavamzadeh M, Javidi T. Active learning for classification with abstention. IEEE Journal on Selected Areas in Information Theory, 2021, 2(2): 705−719 doi: 10.1109/JSAIT.2021.3081433 [27] Shan J, Zhang H, Liu W, Liu Q. Online active learning ensemble framework for drifted data stream. IEEE Transactions on Neural Networks and Learning Systems, 2018, 30(2): 486−498 [28] Liu W, Zhang H, Ding Z, Liu Q, Zhu C. A comprehensive active learning method for multi-class imbalanced data stream with concept drift. Knowledge-based Systems, 2021, 215: Article No. 106778 doi: 10.1016/j.knosys.2021.106778 [29] Gu X, Angelov P P. Multi-class fuzzily weighted adaptive boosting-based self-organising fuzzy inference ensemble systems for classification. IEEE Transactions on Fuzzy Systems, 2021, 30(9): 3722−3735 [30] Bifet A, Holmes G, Kirkby R, Pfahringer B. MOA: Massive online analysis. Journal of Machine Learning Research, 2010, 11: 1601−1604 doi: 10.21105/joss.01970