非平衡数据流在线主动学习方法

李艳红; 任霖; 王素格; 李德玉

doi:10.16383/j.aas.c211246

非平衡数据流在线主动学习方法

doi: 10.16383/j.aas.c211246

李艳红^{1, 2,},
任霖^{1, 2,},
王素格^{1, 2,},
李德玉^{1, 2,}

1.
山西大学计算机与信息技术学院太原 030006
2.
山西大学计算智能与中文信息处理教育部重点实验室太原 030006

基金项目: 国家自然科学基金(62076158, 62072294, 41871286), 山西省重点研发计划(201903D421041)资助

详细信息

作者简介:
李艳红：山西大学计算机与信息技术学院副教授. 主要研究方向为数据挖掘, 机器学习. 本文通信作者. E-mail: liyh@sxu.edu.cn

任霖：山西大学计算机与信息技术学院硕士研究生. 主要研究方向为数据挖掘, 机器学习. E-mail: renlinssdx@163.com

王素格：山西大学计算机与信息技术学院教授. 主要研究方向为自然语言处理, 机器学习. E-mail: wsg@sxu.edu.cn

李德玉：山西大学计算机与信息技术学院教授. 主要研究方向为数据挖掘, 人工智能. E-mail: lidy@sxu.edu.cn

计量
- 文章访问数: 659
- HTML全文浏览量: 209
- PDF下载量: 149
- 被引次数: 0
出版历程
- 收稿日期: 2021-12-29
- 录用日期: 2022-04-07
- 网络出版日期: 2024-06-19
- 刊出日期: 2024-07-23

Online Active Learning Method for Imbalanced Data Stream

LI Yan-Hong^{1, 2
,},
REN Lin^{1, 2
,},
WANG Su-Ge^{1, 2
,},
LI De-Yu^{1, 2
,}

1.
School of Computer and Information Technology, Shanxi University, Taiyuan 030006
2.
Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006

Funds: Supported by National Natural Science Foundation of China (62076158, 62072294, 41871286) and Shanxi Key Research and Development Program (201903D421041)

More Information

Author Bio:
LI Yan-Hong　Associate professor at the School of Computer and Information Technology, Shanxi University. Her research interest covers data mining and machine learning. Corresponding author of this paper

REN Lin　Master student at the School of Computer and Information Technology, Shanxi University. His research interest covers data mining and machine learning

WANG Su-Ge　Professor at the School of Computer and Information Technology, Shanxi University. Her research interest covers natural language processing and machine learning

LI De-Yu　Professor at the School of Computer and Information Technology, Shanxi University. His research interest covers data mining and artificial intelligence

摘要

摘要: 数据流分类是数据流挖掘领域一项重要研究任务, 目标是从不断变化的海量数据中捕获变化的类结构. 目前, 几乎没有框架可以同时处理数据流中常见的多类非平衡、概念漂移、异常点和标记样本成本高昂问题. 基于此, 提出一种非平衡数据流在线主动学习方法(Online active learning method for imbalanced data stream, OALM-IDS). AdaBoost是一种将多个弱分类器经过迭代生成强分类器的集成分类方法, AdaBoost.M2引入了弱分类器的置信度, 此类方法常用于静态数据. 定义了基于非平衡比率和自适应遗忘因子的训练样本重要性度量, 从而使AdaBoost.M2方法适用于非平衡数据流, 提升了非平衡数据流集成分类器的性能. 提出了边际阈值矩阵的自适应调整方法, 优化了标签请求策略. 将概念漂移程度融入模型构建过程中, 定义了基于概念漂移指数的自适应遗忘因子, 实现了漂移后的模型重构. 在6个人工数据流和4个真实数据流上的对比实验表明, 提出的非平衡数据流在线主动学习方法的分类性能优于其他5种非平衡数据流学习方法.
- 主动学习 /
- 数据流分类 /
- 多类非平衡 /
- 概念漂移
Abstract: Data stream classification is an important research task in the field of data stream mining, which aims to capture changing class structures from the ever-changing massive data. At present, almost no frameworks can simultaneously address the common problems in data stream, such as multi-class imbalance, concept drift, outlier and the exorbitant costs associated with labeling the unlabeled samples. In this paper, we propose an online active learning method for imbalanced data stream (OALM-IDS). AdaBoost is an ensemble classification method that iteratively generates a strong classifier from multiple weak classifiers. AdaBoost.M2 further introduces the confidence degree of weak classifiers, which is suitable for static data. In the method, we firstly define an importance measure of training sample based on imbalanced ratio and adaptive forgetting factor, which makes the AdaBoost.M2 method applying for imbalanced data stream and improves the performance of ensemble classifier. Then, we propose an adaptive adjustment method of marginal threshold matrix, which optimizes the label request strategy. Finally, we define an adaptive forgetting factor based on the concept drift index by bringing the degree of concept drift into the construction process of model, which realizes the model reconstruction after drift. Comparative experiments on six artificial data streams and four real data streams show that the classification performance of the online active learning method is better than those of the existing five learning methods for imbalance data stream.
- Active learning /
- data stream classification /
- multi-class imbalance /
- concept drift

HTML全文

图 1 算法框架

Fig. 1 Algorithm framework

下载: 全尺寸图片幻灯片

图 2 6种算法的ROC曲线

Fig. 2 ROC curve of six algorithms

下载: 全尺寸图片幻灯片

图 3 ${\rm{DS}}_{6}$的准确率曲线

Fig. 3 Precision curve of the ${\rm{DS}}_{6}$

下载: 全尺寸图片幻灯片

图 5 Statlog的准确率曲线

Fig. 5 Precision curve of the Statlog

下载: 全尺寸图片幻灯片

图 4 Kddcup$99\_10\%$的准确率曲线

Fig. 4 Precision curve of the Kddcup$99\_10\%$

下载: 全尺寸图片幻灯片

图 6 消融实验结果

Fig. 6 Result of the ablation experiment

下载: 全尺寸图片幻灯片

表 1 数据流的特征

Table 1 Data stream feature

数据流	样本数	特征数	类别数	类分布	漂移次数	异常点
${\rm{DS} }_{1}$	200000	21	5	(0.2, 0.2, 0.2, 0.2, 0.2)	0	0
${\rm{DS}}_{2}$	200000	21	5	(0.2, 0.2, 0.2, 0.2, 0.2)	3	10
${\rm{DS}}_{3}$	200000	21	5	(0.1, 0.3, 0.4, 0.2, 0.1)	0	0
${\rm{DS}}_{4}$	200000	21	5	(0.1, 0.3, 0.4, 0.2, 0.1)	3	10
${\rm{DS}}_{5}$	200000	21	5	(0.1, 0.3, 0.4, 0.2, 0.1), (0.4, 0.2, 0.1, 0.1, 0.2)	0	0
${\rm{DS}}_{6}$	200000	21	5	(0.1, 0.3, 0.4, 0.2, 0.1), (0.4, 0.2, 0.1, 0.1, 0.2)	3	10
Kddcup$99\_10\%$	494000	42	23	—	—	—
Statlog	570000	10	7	—	—	—
IoT	663000	115	11	—	—	—
HAR	10299	561	6	—	—	—

下载: 导出CSV

表 2 6种算法的准确率

Table 2 Precision values of six algorithms

数据流	LB	BOLE	${\rm{ARF}}_{RE}$	OALE	CALMID	OALM-IDS
${\rm {DS} }_{1}$	$94.56\pm0.12$	$\boldsymbol{95.61} \;\pm \boldsymbol {0.11} $	$93.54\pm0.13$	$89.78\pm0.21$	$94.76\pm0.16$	$95.48\pm0.15$
${\rm {DS}}_{2}$	$92.27\pm0.17$	$92.44\pm0.14$	$91.04\pm0.19$	$88.31\pm0.23$	$92.81\pm0.13$	${\boldsymbol{93.94} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.12} }$
${\rm {DS}}_{3}$	$88.39\pm0.22$	$89.52\pm0.14$	$90.95\pm0.13$	$88.83\pm0.16$	$92.57\pm0.13$	${\boldsymbol{93.72} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.13} }$
${\rm {DS}}_{4}$	$86.55\pm0.31$	$88.68\pm0.26$	$89.89\pm0.23$	$86.29\pm0.29$	$91.31\pm0.18$	${\boldsymbol{92.18} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.21} }$
${\rm {DS}}_{5}$	$85.64\pm0.29$	$87.04\pm0.34$	$89.61\pm0.51$	$88.83\pm0.21$	$91.13\pm0.21$	${\boldsymbol{92.92} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.16} }$
${\rm {DS}}_{6}$	$82.10\pm0.69$	$83.15\pm0.73$	$86.54\pm0.72$	$83.42\pm0.55$	$90.64\pm0.42$	${\boldsymbol{92.41} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.21} }$
Kddcup$99\_10\%$	$83.87\pm0.43$	$81.09\pm0.56$	$85.48\pm0.65$	$81.01\pm0.36$	$92.06\pm0.19$	${\boldsymbol{92.07} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.18} }$
Statlog	$64.55\pm0.31$	$63.78\pm0.61$	$79.97\pm0.39$	$73.78\pm0.43$	$85.40\pm0.34$	${\boldsymbol{85.68} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.33} }$
IoT	$64.03\pm0.48$	$61.54\pm0.43$	$66.66\pm0.53$	$55.81\pm0.51$	$70.85\pm0.54$	${\boldsymbol{73.12} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.38} }$
HAR	$61.63\pm0.53$	$59.76\pm0.46$	$63.22\pm0.49$	$55.16\pm0.69$	$68.64\pm0.71$	${\boldsymbol{69.98} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.51} }$

下载: 导出CSV

表 3 6种算法的召回率

Table 3 Recall values of six algorithms

数据流	LB	BOLE	${\rm{ARF}}_{RE}$	OALE	CALMID	OALM-IDS
${\rm{DS}}_{1}$	$95.37\pm0.18$	$95.96\pm0.13$	$93.39\pm0.11$	$90.13\pm0.13$	$95.91\pm0.11$	${\boldsymbol{96.14} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.12} }$
${\rm{DS}}_{2}$	$92.39\pm0.21$	$92.28\pm0.35$	$91.35\pm0.26$	$89.45\pm0.18$	$92.51\pm0.15$	${\boldsymbol{94.08} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.14} }$
${\rm{DS}}_{3}$	$87.55\pm0.19$	$88.19\pm0.22$	$86.14\pm0.21$	$88.52\pm0.22$	$90.55\pm0.13$	${\boldsymbol{92.52} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.13} }$
${\rm{DS}}_{4}$	$84.57\pm0.36$	$86.73\pm0.29$	$87.47\pm0.28$	$83.05\pm0.31$	$89.89\pm0.21$	${\boldsymbol{92.44} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.18} }$
${\rm{DS}}_{5}$	$84.14\pm0.43$	$86.44\pm0.49$	$87.26\pm0.69$	$83.26\pm0.36$	$90.25\pm0.18$	${\boldsymbol{91.16} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.13} }$
${\rm{DS}}_{6}$	$83.98\pm1.13$	$81.87\pm0.91$	$84.56\pm1.31$	$78.87\pm0.69$	$90.46\pm0.13$	${\boldsymbol{90.71} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.21} }$
Kddcup$99\_10\%$	$60.82\pm0.71$	$62.75\pm0.64$	$58.17\pm1.32$	$58.44\pm1.63$	$61.88\pm0.43$	${\boldsymbol{63.71} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.37} }$
Statlog	$61.39\pm0.91$	$50.92\pm1.32$	$54.36\pm1.11$	$51.20\pm1.34$	$59.52\pm0.63$	${\boldsymbol{63.12} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.39} }$
IoT	$40.73\pm2.14$	$42.29\pm1.58$	$39.35\pm1.89$	$40.42\pm2.15$	$48.04\pm1.04$	${\boldsymbol{51.26} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.81} }$
HAR	$61.64\pm1.18$	$60.57\pm0.97$	$57.91\pm1.43$	$54.11\pm1.36$	$65.53\pm0.76$	${\boldsymbol{66.57} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.46} }$

下载: 导出CSV

表 4 6种算法的F1值

Table 4 F1 values of six algorithms

数据流	LB	BOLE	${\rm{ARF}}_{RE}$	OALE	CALMID	OALM-IDS
${\rm{DS}}_{1}$	$94.96\pm0.11$	${\boldsymbol{95.80} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.10} }$	$93.42\pm0.13$	$89.93\pm0.15$	$95.33\pm0.11$	${\boldsymbol{95.80} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.10} }$
${\rm{DS}}_{2}$	$92.32\pm0.16$	$92.34\pm0.13$	$91.18\pm0.15$	$88.85\pm0.21$	$92.65\pm0.13$	${\boldsymbol{94.01} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.12} }$
${\rm{DS}}_{3}$	$87.91\pm0.20$	$88.81\pm0.24$	$88.11\pm0.36$	$88.67\pm0.20$	$91.50\pm0.16$	${\boldsymbol{93.07} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.14} }$
${\rm{DS}}_{4}$	$85.35\pm0.42$	$87.38\pm0.36$	$88.42\pm0.51$	$84.50\pm0.33$	$90.51\pm0.21$	${\boldsymbol{92.29} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.20} }$
${\rm{DS}}_{5}$	$84.85\pm0.41$	$86.67\pm0.43$	$88.30\pm0.46$	$85.36\pm0.48$	$90.62\pm0.21$	${\boldsymbol{91.93} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.18} }$
${\rm{DS}}_{6}$	$82.97\pm0.87$	$82.43\pm0.71$	$85.35\pm0.91$	$80.59\pm0.63$	$90.46\pm0.39$	${\boldsymbol{91.53} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.31} }$
Kddcup$99\_10\%$	$73.12\pm0.55$	$72.47\pm0.63$	$72.01\pm0.46$	$72.81\pm0.51$	$73.56\pm0.33$	${\boldsymbol{74.65} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.20} }$
Statlog	$66.18\pm0.83$	$54.32\pm1.91$	$63.85\pm1.03$	$63.42\pm0.98$	$74.42\pm0.36$	${\boldsymbol{75.19} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.31} }$
IoT	$47.01\pm1.24$	$48.40\pm0.96$	$47.34\pm1.89$	$44.94\pm1.36$	$54.26\pm0.65$	${\boldsymbol{56.73} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.67} }$
HAR	$59.93\pm0.91$	$58.81\pm1.21$	$58.52\pm0.79$	$54.43\pm1.13$	$65.43\pm0.63$	${\boldsymbol{67.76} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.58} }$

下载: 导出CSV

表 5 6种算法的Kappa系数值

Table 5 Kappa coefficient values of six algorithms

数据流	LB	BOLE	${\rm{ARF}}_{RE}$	OALE	CALMID	OALM-IDS
${\rm{DS}}_{1}$	$90.17\pm0.12$	$91.18\pm0.14$	$90.59\pm0.16$	$85.47\pm0.21$	$90.48\pm0.19$	$\boldsymbol{91.31\pm0.12}$
${\rm{DS}}_{2}$	$88.85\pm0.19$	$88.14\pm0.23$	$87.91\pm0.39$	$83.18\pm0.56$	$89.97\pm0.31$	${\boldsymbol{90.66} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.23} }$
${\rm{DS}}_{3}$	$85.25\pm0.22$	$85.86\pm0.38$	$86.68\pm0.29$	$83.91\pm0.39$	$88.91\pm0.26$	${\boldsymbol{89.93} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.21} }$
${\rm{DS}}_{4}$	$84.15\pm0.55$	$86.04\pm0.63$	$87.14\pm0.66$	$83.42\pm0.71$	$88.92\pm0.33$	${\boldsymbol{89.33} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.36} }$
${\rm{DS} }_{5}$	$83.85\pm0.77$	$85.83\pm0.69$	$86.45\pm0.81$	$86.67\pm0.70$	$88.57\pm0.31$	${\boldsymbol{89.12} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.29} }$
${\rm{DS} }_{6} $	$81.49\pm1.12$	$82.98\pm1.69$	$84.15\pm1.87$	$79.92\pm1.48$	$89.01\pm0.41$	${\boldsymbol{89.73} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.28} }$
Kddcup$99\_10\% $	$80.93\pm0.67$	$75.62\pm1.13$	$79.32\pm1.32$	$78.31\pm0.91$	$83.32\pm0.26$	${\boldsymbol{85.83} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.18} }$
Statlog	$58.71\pm1.42$	$61.43\pm1.18$	$73.72\pm0.93$	$71.21\pm1.24$	$79.39\pm0.46$	${\boldsymbol{80.11} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.19} }$
IoT	$67.53\pm1.54$	$65.02\pm1.89$	$68.99\pm2.14$	$59.53\pm2.12$	$71.65\pm0.71$	${\boldsymbol{73.29} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.68} }$
HAR	$60.49\pm1.12$	$60.01\pm1.38$	$61.86\pm1.13$	$56.75\pm2.03$	$68.52\pm0.76$	${\boldsymbol{69.64} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.71} }$

下载: 导出CSV

表 6 参数$\theta $对OALM-IDS的影响

Table 6 Effect of parameter $\theta $ to OALM-IDS

数据流	$\theta $	$b$	准确率	召回率	F1值	Kappa系数值
	0.4	0.17143	$94.21\pm0.16 $	$93.18\pm0.12$	$94.13\pm0.11$	$90.11\pm0.12$
$\rm{DS}_{1}$	${\boldsymbol{0.5}}$	${\boldsymbol{0.180\,26} }$	${\boldsymbol{95.48} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.15} }$	${\boldsymbol{96.14}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.12}}$	${\boldsymbol{95.80}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.10}}$	${\boldsymbol{91.31}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.12}}$
	0.6	0.19782	$95.03\pm0.15 $	$93.19\pm0.12$	$95.16\pm0.10$	$91.01\pm0.12$
	0.4	0.17136	$93.01\pm0.12 $	$92.81\pm0.16$	$93.04\pm0.13$	$89.09\pm0.26$
$\rm{DS}_{2}$	${\boldsymbol{0.5}}$	${\boldsymbol{0.191\,78} }$	${\boldsymbol{93.94}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.12}}$	${\boldsymbol{94.08}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.14}}$	${\boldsymbol{94.01}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.12}}$	${\boldsymbol{90.66}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.23}}$
	0.6	0.20000	$93.18\pm0.13 $	$93.16\pm0.14$	$93.75\pm0.12$	$90.07\pm0.23$
	0.4	0.17821	$93.24\pm0.13 $	$92.05\pm0.13$	$92.54\pm0.16$	$88.56\pm0.22$
$\rm{DS}_{3}$	${\boldsymbol{0.5}}$	${\boldsymbol{0.195\,12} }$	${\boldsymbol{93.72}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.13}}$	${\boldsymbol{92.52}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.13}}$	${\boldsymbol{93.07}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.14}}$	${\boldsymbol{89.93}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.21}}$
	0.6	0.20000	$93.43\pm0.13 $	$92.24\pm0.13$	$92.10\pm0.14$	$88.71\pm0.21$
	0.4	0.18423	$91.63\pm0.21 $	$91.34\pm0.18$	$91.76\pm0.20$	$88.54\pm0.38$
$\rm{DS}_{4}$	${\boldsymbol{0.5}}$	${\boldsymbol{0.198\,77} }$	${\boldsymbol{92.18}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.21}}$	${\boldsymbol{92.44}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.18}}$	${\boldsymbol{92.29}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.20}}$	${\boldsymbol{89.33}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.36}}$
	0.6	0.20000	$91.06\pm0.21 $	$91.56\pm0.19$	$91.80\pm0.21$	$88.63\pm0.36$
	0.4	0.18002	$92.01\pm0.16 $	$90.46\pm0.13$	$90.76\pm0.18$	$88.42\pm0.29$
$\rm{DS}_{5}$	${\boldsymbol{0.5}}$	${\boldsymbol{0.197\,22} }$	${\boldsymbol{92.92} }\;{\boldsymbol{\pm}}\;{\boldsymbol{0.16} }$	${\boldsymbol{91.16}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.13}}$	${\boldsymbol{91.93}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.18}}$	${\boldsymbol{89.12}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.29}}$
	0.6	0.20000	$92.50\pm0.16 $	$90.76\pm0.13$	$91.21\pm0.19$	$88.56\pm0.30$
	0.4	0.18331	$91.02\pm0.21 $	$89.03\pm0.22$	$90.32\pm0.31$	$88.12\pm0.28$
$\rm{DS}_{6}$	${\boldsymbol{0.5}}$	${\boldsymbol{0.199\,23} }$	${\boldsymbol{92.41}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.21}}$	${\boldsymbol{90.71}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.21}}$	${\boldsymbol{91.53}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.31}}$	${\boldsymbol{89.73}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.28}}$
	0.6	0.20000	$91.01\pm0.21 $	$89.92\pm0.22$	$90.12\pm0.31$	$89.13\pm0.28$
	0.4	0.18188	$90.59\pm0.18 $	$63.51\pm0.37$	$73.35\pm0.20$	$83.14\pm0.18$
Kddcup$99\_10\%$	${\boldsymbol{0.5}}$	${\boldsymbol{0.199\,61} }$	${\boldsymbol{92.07}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.18}}$	${\boldsymbol{63.71}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.37}}$	${\boldsymbol{74.65}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.20}}$	${\boldsymbol{85.83}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.18}}$
	0.6	0.20000	$91.63\pm0.18 $	$63.63\pm0.37$	$74.43\pm0.21$	$85.61\pm0.18$
	0.4	0.19022	$84.75\pm0.33 $	$62.19\pm0.39$	$74.85\pm0.31$	$78.86\pm0.19$
Statlog	${\boldsymbol{0.5}}$	${\boldsymbol{0.199\,94} }$	${\boldsymbol{85.68}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.33}}$	${\boldsymbol{63.12}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.39}}$	${\boldsymbol{75.19}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.31}}$	${\boldsymbol{80.11}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.19}}$
	0.6	0.20000	$85.66\pm0.33 $	$63.01\pm0.39$	$75.19\pm0.31$	$79.89\pm0.19$
	0.4	0.19113	$71.21\pm0.38 $	$49.86\pm0.81$	$51.21\pm0.67$	$71.61\pm0.68$
IoT	${\boldsymbol{0.5}}$	${\boldsymbol{0.196\,84} }$	${\boldsymbol{73.12}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.38}}$	${\boldsymbol{51.26}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.81}}$	${\boldsymbol{56.73}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.67}}$	${\boldsymbol{73.29}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.68}}$
	0.6	0.20000	$72.11\pm0.39 $	$50.06\pm0.81$	$54.33\pm0.67$	$71.34\pm0.68$
	0.4	0.18634	$66.54\pm0.52 $	$64.32\pm0.48$	$65.05\pm0.59$	$66.81\pm0.72$
HAR	${\boldsymbol{0.5}}$	${\boldsymbol{0.195\,47} }$	${\boldsymbol{69.98}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.51}}$	${\boldsymbol{66.57}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.46}}$	${\boldsymbol{67.76}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.58}}$	${\boldsymbol{69.64}}\;{\boldsymbol{\pm}}\;{\boldsymbol{0.71}}$
	0.6	0.20000	$64.32\pm0.52 $	$65.14\pm0.46$	$66.11\pm0.58$	$64.32\pm0.71$

下载: 导出CSV

参考文献(30)

[1]	于洪, 何德牛, 王国胤, 李劼, 谢永芳. 大数据智能决策. 自动化学报, 2020, 46(5): 878−896 Yu Hong, He De-Niu, Wang Guo-Yin, Li Jie, Xie Yong-Fang. Big data for intelligent decision making. Acta Automatica Sinica, 2020, 46(5): 878−896
[2]	Lu J, Liu A, Dong F, Gu F, Gama J, Zhang G. Learning under concept drift: A review. IEEE Transactions on Knowledge and Data Engineering, 2020, 31(12): 2346−2363
[3]	Liu W, Zhang H, Liu Q. An air quality grade forecasting approach based on ensemble learning. In: Proceedings of the International Conference on Artificial Intelligence and Advanced Manufacturing. Dublin, Ireland: AIAM, 2019. 87−91
[4]	Cano A, Krawczyk B. Kappa updated ensemble for drifting data stream mining. Machine Learning, 2020, 109(1): 175−218 doi: 10.1007/s10994-019-05840-z
[5]	Liu A, Lu J, Zhang G. Concept drift detection via equal intensity k-means space partitioning. IEEE Transactions on Cybernetics, 2020, 51(6): 3198−3211
[6]	王金甲, 张玉珍, 夏静, 王凤嫔. 多层局部块坐标下降法及其驱动的分类重构网络. 自动化学报, 2020, 46(12): 2647−2661 Wang Jin-Jia, Zhang Yu-Zhen, Xia Jing, Wang Feng-Pin. Multi-layer local block coordinate descent algorithm and unfolding classification and reconstruction networks. Acta Automatica Sinica, 2020, 46(12): 2647−2661
[7]	Lu Y, Cheung M Y, Tang Y Y. Adaptive chunk-based dynamic weighted majority for imbalanced data stream with concept drift. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(8): 2764−2778 doi: 10.1109/TNNLS.2019.2951814
[8]	Grzyb J, Klikowski J, Woźniak M. Hellinger distance weighted ensemble for imbalanced data stream classification. Journal of Computational Science, 2021, 51: Article No. 101314 doi: 10.1016/j.jocs.2021.101314
[9]	Kim T, Park C H. Anomaly pattern detection for streaming data. Expert Systems With Applications, 2020, 149: Article No. 113252 doi: 10.1016/j.eswa.2020.113252
[10]	Wankhade K K, Dongre S S, Jondhale K C. Data stream classification: A review. Iran Journal of Computer Science, 2020, 3: 239−260 doi: 10.1007/s42044-020-00061-3
[11]	Bahri M, Bifet A, Gama J, Gomes H M, Maniu S. Data stream analysis: Foundations, major tasks and tools. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2021, 11(3): Article No. e1405
[12]	Kontopoulos I, Chatzikokolakis K, Tserpes K, Zissis D. Classification of vessel activity in streaming data. In: Proceedings of the 14th ACM International Conference on Distributed and Event-based Systems. Jerusalem, Israel: ACM, 2020. 153−164
[13]	Wang S, Minku L L. Auc estimation and concept drift detection for imbalanced data streams with multiple classes. In: Proceedings of the International Joint Conference on Neural Networks. Glasgow, UK: IJCNN, 2020. 1−8
[14]	Fan S, Zhang X, Song Z. Reinforced knowledge distillation: Multi-class imbalanced classifier based on policy gradient reinforcement learning. Neurocomputing, 2021, 463: 422−436 doi: 10.1016/j.neucom.2021.08.040
[15]	Bifet A, Holmes G, Pfahringer B. Leveraging bagging for evolving data stream. In: Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Barcelona, Spain: PKDD, 2010. 135−150
[16]	Mirza B, Lin Z. Meta-cognitive online sequential extreme learning machine for imbalanced and concept drifting data classification. Neural Networks, 2016, 80: 79−94 doi: 10.1016/j.neunet.2016.04.008
[17]	Barros R S M, Carvalho-Santos S G T, Júnior P M G. A boosting-like online learning ensemble. In: Proceedings of the International Joint Conference on Neural Networks. Vancouver, Canada: IJCNN, 2016. 1871−1878
[18]	Carvalho-Santos S G T, Barros R S M. Online AdaBoost-based methods for multi-class problems. Artificial Intelligence Review, 2020, 53(2): 1293−1322 doi: 10.1007/s10462-019-09696-6
[19]	Ferreira L E B, Gomes H M, Bifet A, Oliveira L S. Adaptive random forests with resampling for imbalanced data stream. In: Proceedings of the International Joint Conference on Neural Networks. Budapest, Hungary: IJCNN, 2019. 1−6
[20]	Ren P Z, Xiao Y, Chang X J, Huang P Y, Li Z, Gupta B B, et al. A survey of deep active learning. ACM Computing Surveys, 2021, 54(9): 1−40
[21]	Yousaf M S, Ahmad I, Khurshid A, Ikram M. Machine assisted classification of chicken, beef and mutton tissues using optical polarimetry and bagging model. Photodiagnosis and Photodyna-mic Therapy, 2020, 31: Article No. 101779 doi: 10.1016/j.pdpdt.2020.101779
[22]	Wang Y, Feng L. An adaptive boosting algorithm based on weighted feature selection and category classification confidence. Applied Intelligence, 2021, 51(10): 1−22
[23]	Gomes H M, Bifet A, Read J, Barddal J P, Enembreck F, Pfharinger B, et al. Adaptive random forests for evolving data stream classification. Machine Learning, 2017, 106(9): 1469−1495
[24]	Babüroǧlu E S, Durmuşoǧlu A, Dereli T. Novel hybrid pair recommendations based on a large-scale comparative study of concept drift detection. Expert Systems With Applications, 2021, 163: Article No. 113786 doi: 10.1016/j.eswa.2020.113786
[25]	刘子昂, 蒋雪, 伍冬睿. 基于池的无监督线性回归主动学习. 自动化学报, 2021, 47(12): 2771−2783 Liu Zi-Ang, Jiang Xue, Wu Dong-Rui. Unsupervised pool-based active learning for linear regression. Acta Automatica Sinica, 2021, 47(12): 2771−2783
[26]	Shekhar S, Ghavamzadeh M, Javidi T. Active learning for classification with abstention. IEEE Journal on Selected Areas in Information Theory, 2021, 2(2): 705−719 doi: 10.1109/JSAIT.2021.3081433
[27]	Shan J, Zhang H, Liu W, Liu Q. Online active learning ensemble framework for drifted data stream. IEEE Transactions on Neural Networks and Learning Systems, 2018, 30(2): 486−498
[28]	Liu W, Zhang H, Ding Z, Liu Q, Zhu C. A comprehensive active learning method for multi-class imbalanced data stream with concept drift. Knowledge-based Systems, 2021, 215: Article No. 106778 doi: 10.1016/j.knosys.2021.106778
[29]	Gu X, Angelov P P. Multi-class fuzzily weighted adaptive boosting-based self-organising fuzzy inference ensemble systems for classification. IEEE Transactions on Fuzzy Systems, 2021, 30(9): 3722−3735
[30]	Bifet A, Holmes G, Kirkby R, Pfahringer B. MOA: Massive online analysis. Journal of Machine Learning Research, 2010, 11: 1601−1604 doi: 10.21105/joss.01970