-
摘要: 偏标记数据消歧是利用偏标记数据进行机器学习的基础.针对偏标记数据中广泛存在的数据不平衡问题, 以及现有消歧算法对样本间约束信息利用不足的问题, 本文提出一种基于成对约束的偏标记数据消歧算法.首先, 基于低秩表示, 推导出数据不平衡条件下样本低秩表示系数和样本相似度之间的关系; 其次, 基于推导结果, 分别构建基于样本间正约束和负约束的图模型, 通过最小化图模型的能量函数求解偏标记数据的标签.在5个公开数据集上的实验结果表明本文方法相对基准算法在消歧准确率上平均提高了2.9 % ~ 14.9 %.Abstract: Partial label data disambiguation is the basis of machine learning using partial label data. In order to solve the data imbalance problem widely existing in partial label data, and the problem that the existing disambiguation algorithms have insufficient utilization of constraints between samples, a partial label data disambiguation algorithm based on pairwise constraints is proposed in this paper. Firstly, the relation between low-rank representation coefficients and sample similarities in unbalanced datasets is deduced by utilizing low-rank representation. Secondly, according to the deduced results, two graphs are created based on positive constraint and negative constraint respectively. Finally, the labels of partial label data samples are obtained by minimizing energy functions based on graphs. Experimental results on five open datasets indicate that the proposed algorithm outperforms benchmark algorithms by 2.9 % ~ 14.9 % at disambiguation accuracy.
-
Key words:
- Partial label data /
- disambiguation /
- imbalanced data /
- low-rank representation /
- pairwise constraints
1) 本文责任编委 王立威 -
表 1 数据集信息
Table 1 The information of datasets
数据集 样本数量 特征维度 类别数量 平均候选标签数量 领域 Lost 1 122 108 16 2.23 人脸自动标注 MSRCV2 1 758 48 23 3.16 目标分类 BirdSong 4 998 38 13 2.18 鸟鸣分类 Soccer Player 17 472 279 171 2.09 人脸自动标注 Yahoo!News 22 991 163 219 1.91 人脸自动标注 表 2 各算法消歧准确率(%)
Table 2 The disambiguation accuracy of each algorithm (%)
算法 Lost MSRCV2 BirdSong Soccer Player Yahoo!News PLDPC-abs 57.93 65.81 76.73 70.28 82.03 PLDPC-$p$ 65.81 67.46 77.05 71.58 83.29 PLDPC-$n$ 41.98 37.26 62.30 62.84 57.17 PL-KNN 64.53 58.25 70.99 57.76 72.32 IPAL 77.54 71.44 76.61 67.35 82.37 PL-LEAF 79.32 66.67 75.55 70.50 82.90 MMS 91.71 68.27 66.47 70.03 87.32 PLDPC 87.61 72.70 79.25 73.68 85.22 表 3 各算法消歧处理时间(秒(s)、分钟(min)、天(d))
Table 3 The processing time of each algorithm (second (s), minute (min), day (d))
算法 Lost MSRCV2 BirdSong Soccer Player Yahoo!News PLDPC-abs 2.13 s 2.39 s 11.62 s 4 min 6 min PLDPC-$p$ 2.05 s 2.30 s 11.07 s 4 min 6 min PLDPC-$n$ 2.12 s 2.69 s 11.71 s 4 min 6 min PL-KNN 0.06 s 0.08 s 0.10 s 59.27 69.78 s IPAL 0.51 s 0.63 s 1.56 s 73.75 s 94.62 s PL-LEAF 56.04 s 4 min 35 min $>$1 d $>$1 d MMS 57.02 s 1 min 2 min 34 min 35min PLDPC 2.16 s 2.45 s 11.61 s 4 min 6 min -
[1] Su X P, Peng J Y, Feng X Y, Wu J. Labeling faces with names based on the name semantic network. Multimedia Tools and Applications, 2016, 75(11): 6445-6462 doi: 10.1007/s11042-015-2581-x [2] Jin R, Ghahramani Z. Learning with multiple labels. In: Proceedings of the 15th International Conference on Neural Information Processing Systems. Cambridge, MA: MIT Press, 2002. 921-928 [3] Zhang M L, Yu F, Tang C Z. Disambiguation-free partial label learning. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(10): 2155-2167 doi: 10.1109/TKDE.2017.2721942 [4] 张敏灵.偏标记学习研究综述.数据采集与处理, 2015, 30(1): 77-87 http://d.old.wanfangdata.com.cn/Periodical/sjcjycl201501007Zhang Min-Ling. Research on partial label learning. Journal of Data Acquisition and Processing, 2015, 30(1): 77-87 http://d.old.wanfangdata.com.cn/Periodical/sjcjycl201501007 [5] Nguyen N, Caruana R. Classification with partial labels. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Las Vegas, Nevada, USA: ACM, 2008: 551-559 [6] Chen Y C, Patel V M, Chellappa R, Phillips P J. Ambiguously labeled learning using dictionaries. IEEE Transactions on Information Forensics and Security, 2014, 9(12): 2076-2088 doi: 10.1109/TIFS.2014.2359642 [7] Hüllermeier E, Beringer J. Learning from ambiguously labeled examples. In: Proceedings of the 6th International Symposium on Intelligent Data Analysis. Madrid, Spain: Springer, 2005. 168-179 [8] Cour T, Sapp B, Taskar B. Learning from partial labels. Journal of Machine Learning Research, 2011, 12: 1501-1536 http://d.old.wanfangdata.com.cn/NSTLQK/NSTL_QKJJ0230239804/ [9] Zhang M L, Yu F. Solving the partial label learning problem: an instance-based approach. In: Proceedings of the 24th International Conference on Artificial Intelligence. Buenos Aires, Argentina: AAAI Press, 2015. 4048-4054 [10] Yu F, Zhang M L. Maximum margin partial label learning. Machine Learning, 2017, 106(4): 573-593 doi: 10.1007/s10994-016-5606-4 [11] Olson C C, Judd K P, Nichols J M. Manifold learning techniques for unsupervised anomaly detection. Expert Systems with Applications, 2018, 91: 374-385 doi: 10.1016/j.eswa.2017.08.005 [12] Liu G C, Lin Z C, Yu Y. Robust subspace segmentation by low-rank representation. In: Proceedings of the 27th International Conference on Machine Learning. Haifa, Israel: ICML, 2010. 663-670 [13] 王卫卫, 李小平, 冯象初, 王斯琪.稀疏子空间聚类综述.自动化学报, 2015, 41(8): 1373-1384 doi: 10.16383/j.aas.2015.c140891Wang Wei-Wei, Li Xiao-Ping, Feng Xiang-Chu, Wang Si-Qi. A survey on sparse subspace clustering. Acta Automatica Sinica, 2015, 41(8): 1373-1384 doi: 10.16383/j.aas.2015.c140891 [14] 李波, 卢春园, 冷成财, 金连宝.基于局部图拉普拉斯约束的鲁棒低秩表示聚类方法.自动化学报, 2015, 41(11): 1971-1980 doi: 10.16383/j.aas.2015.c150031Li Bo, Lu Chun-Yuan, Leng Cheng-Cai, Jin Lian-Bao. Robust low rank subspace clustering based on local graph Laplace constraint. Acta Automatica Sinica, 2015, 41(11): 1971-1980 doi: 10.16383/j.aas.2015.c150031 [15] Hou X, Yao G J, Wang J. Semi-supervised classification based on low rank representation. Algorithms, 2016, 9(3): Article No. 48 [16] Pasteris S, Vitale F, Gentile C, Herbster M. On pairwise clustering with side information[Online], available http://arxiv.org/abs/1706.06474, December 9, 2017 [17] 徐明亮, 王士同, 杭文龙.一种基于同类约束的半监督近邻反射传播聚类方法.自动化学报, 2016, 42(2): 255-269 doi: 10.16383/j.aas.2016.c150059Xu Ming-Liang, Wang Shi-Tong, Hang Wen-Long. A semi-supervised affinity propagation clustering method with homogeneity constraint. Acta Automatica Sinica, 2016, 42(2): 255-269 doi: 10.16383/j.aas.2016.c150059 [18] Zhu X J, Ghahramani Z, Lafferty J. Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the Twentieth International Conference on Machine Learning. Washington DC, USA: ICML, 2003. 912-919 [19] Zhu X J, Goldberg A B. Introduction to Semi-Supervised Learning. San Rafael: Morgan and Claypool Publishers, 2009. 1-130 [20] 由从哲.子空间聚类分析新算法及应用研究[博士学位论文], 江南大学, 中国, 2017You Cong-Zhe. Novel Subspace Clustering Algorithms and Applications[Ph. D. dissertation], Jiangnan University, China, 2017 [21] Zeng Z N, Xiao S J, Jia K, Chan T H, Gao S H, Xu D, et al. Learning by associating ambiguously labeled images. In: Proceedings of the 2013 IEEE Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE, 2013. 708-715 [22] Guillaumin M, Verbeek J, Schmid C. Multiple instance metric learning from automatically labeled bags of faces. In: Proceedings of the 11th European Conference on Computer Vision. Heraklion, Crete, Greece: Springe, 2010. 634-647 [23] Liu L P, Dietterich T G. A conditional multinomial mixture model for superset label learning. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada: Curran Associates Inc., 2012. 548-556 [24] Briggs F, Fern X Z, Raich R. Rank-loss support instance machines for MIML instance annotation. In: Proceedings of the 18th International Conference on Knowledge Discovery and Data Mining. Beijing, China: ACM, 2012. 534-542 [25] Luo J, Orabona F. Learning from candidate labeling sets. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems. Vancouver, British Columbia, Canada: Curran Associates Inc., 2010. 1504-1512 [26] Zhang M L, Zhou B B, Liu X Y. Partial label learning via feature-aware disambiguation. In: Proceedings of the 22th International Conference on Knowledge Discovery and Data Mining. San Francisco, California, USA: ACM, 2016. 1335-1344