-
摘要: 现有的一些多标签分类算法,因多标签数据含有高维的特征或标签信息而变得不可行.为了解决这一问题,提出基于去噪自编码器和矩阵分解的联合嵌入多标签分类算法Deep AE-MF.该算法包括两部分:特征嵌入部分使用去噪自编码器对特征空间学习得到非线性表示,标签嵌入部分则是利用矩阵分解直接学习到标签空间对应的潜在表示与解码矩阵.Deep AE-MF将特征嵌入和标签嵌入的两个阶段进行联合,共同学习一个潜在空间用于模型预测,进而得到一个有效的多标签分类模型.为了进一步提升模型性能,在Deep AE-MF方法中对标签间的负相关信息加以利用.通过在不同数据集上进行实验证明了提出Deep AE-MF方法的有效性和鲁棒性.Abstract: Some existing classification algorithms become infeasible anymore, because most multi-label data contains high-dimensional features or label information. To solve this problem, a joint embedded multi-label learning classification algorithm named Deep AE-MF is proposed in this paper, which is based on denoising auto-encoder and matrix factorization. The algorithm includes two parts:the feature embedding part uses denoising auto-encoder to obtain the nonlinear representation of feature space learning, and the label embedding part directly learns the potential representation and decoding matrix of the corresponding label space using matrix factorization. In order to get an effective classification model, Deep AE-MF combines the two phases of feature embedding and label embedding to learn a potential space for model prediction. To further improve the performance of the model, the negative correlation between tags is exploited in Deep AE-MF. Experiments on different datasets show the effectiveness and robustness of the proposed Deep AE-MF method.1) 本文责任编委 张敏灵
-
表 1 多标签数据集相关统计
Table 1 Multi-label datasets and associate statistics
数据集 标签数 实例数 特征数 标记密度 平均标记数 enron 53 1 702 8 000 0.0637 3.378 ohsumed 23 13 928 8 000 0.0720 1.663 movieLens 20 10 076 8 000 0.1020 2.043 TJ 9 5 892 8 000 0.2001 1.801 Delicious 983 16 105 500 0.0193 19.03 EURLex-4K 3 993 19 438 5 000 0.0013 5.31 表 2 多标签数据集字符数量统计
Table 2 The number of characters in a multi-label dataset
数据集 含有不同字符数的样本比例 50以内 50~100 100~200 200~400 400~800 800以上 enron 0.437133 0.287309 0.165100 0.052291 0.014101 0.0440658 ohsumed 0.591008 0.325526 0.082473 0.000992 0 0 movieLens 0.427197 0.558372 0.014431 0 0 0 TJ 0.134589 0.354888 0.339613 0.159708 0.011202 0 表 3 基于hamming loss的性能比较
Table 3 The hamming loss of ten multi-label algorithms with respect to different data sets
算法/数据集 enron ohsumed movieLens TJ Delicious EURLex-4K BR 0.0771 0.1484 0.1992 0.2923 0.0185 0.0032 CCA-SVM 0.1593 0.2148 0.3116 0.3764 - - CCA-Ridge 0.1549 0.2140 0.3045 0.3268 - - LS_ML 0.1000 0.2119 0.2474 0.2842 - - PLST 0.0843 0.1510 0.2186 0.2906 0.0183 0.0037 CPLST 0.0841 0.1512 0.2186 0.2906 0.0182 0.0038 FaIE 0.0841 0.1505 0.2188 0.2882 0.0183 0.0038 ML_CSSP 0.0836 0.1479 0.2075 0.2804 0.0181 0.0036 Deep AE-MF 0.0518 0.1693 0.1416 0.1891 0.0310 0.0013 Deep AE-MF+neg 0.0509 0.1630 0.1445 0.1869 0.0279 0.0012 表 4 基于Micro-F1-label的性能比较
Table 4 The Micro-F1-label of ten multi-label algorithms with respect to different data sets
算法/数据集 enron ohsumed movieLens TJ Delicious EURLex-4K BR 0.3451 0.1137 0.3308 0.4281 0.1370 0.1294 CCA-SVM 0.2622 0.1528 0.3058 0.4355 - - CCA-Ridge 0.2744 0.1509 0.3074 0.4344 - - LS_ML 0.3417 0.1531 0.3633 0.4931 - - PLST 0.3638 0.1589 0.3639 0.4781 0.1911 0.1540 CPLST 0.3643 0.1577 0.3642 0.4787 0.1911 0.1534 FaIE 0.3643 0.1593 0.3607 0.4839 0.1911 0.1539 ML_CSSP 0.3606 0.1543 0.3532 0.4850 0.1860 0.1534 Deep AE-MF 0.5475 0.1642 0.3968 0.5421 0.2757 0.4913 Deep AE-MF+neg 0.5531 0.1962 0.4122 0.5632 0.2775 0.4936 表 5 基于Macro-F1-label的性能比较
Table 5 The Macro-F1-label of ten multi-label algorithms with respect to different data sets
算法/数据集 enron ohsumed movieLens TJ Delicious EURLex-4K BR 0.0923 0.0656 0.2066 0.4146 0.0338 0.0371 CCA-SVM 0.1045 0.1150 0.2572 0.4282 - - CCA-Ridge 0.1019 0.1134 0.2556 0.4488 - - LS_ML 0.1158 0.1141 0.2971 0.4832 - - PLST 0.1149 0.0884 0.2742 0.4717 0.0460 0.0507 CPLST 0.1149 0.0863 0.2744 0.4725 0.0462 0.0514 FaIE 0.1147 0.0863 0.2609 0.4647 0.0461 0.0506 ML_CSSP 0.1147 0.0793 0.2375 0.4580 0.0437 0.0492 Deep AE-MF 0.1356 0.0960 0.3394 0.5440 0.1316 0.1477 Deep AE-MF+neg 0.1384 0.1011 0.3455 0.5629 0.1324 0.1483 表 6 基于F1的性能比较
Table 6 The F1 of ten multi-label algorithms with respect to different data sets
算法/数据集 enron ohsumed movieLens TJ Delicious EURLex-4K BR 0.2885 0.1046 0.2705 0.4482 0.1280 0.2061 CCA-SVM 0.2758 0.1354 0.2982 0.4191 - - CCA-Ridge 0.2937 0.1344 0.2983 0.4360 - - LS_ML 0.3510 0.1352 0.3523 0.4821 - - PLST 0.4029 0.1343 0.3158 0.4753 0.1650 0.2502 CPLST 0.4036 0.1330 0.3164 0.4758 0.1651 0.2503 FaIE 0.4000 0.1327 0.3171 0.4738 0.1650 0.2502 ML_CSSP 0.3814 0.1318 0.2854 0.4799 0.1632 0.2419 Deep AE-MF 0.4491 0.1489 0.3307 0.4677 0.2138 0.4291 Deep AE-MF+neg 0.4582 0.1491 0.3381 0.5013 0.2310 0.4365 表 7 基于P@K的性能比较
Table 7 The P@K of six multi-label algorithms with respect to different data sets
数据集 EURLex-4K Delicious 度量准则/算法 LEML PD-sparse Deep AE-MF Deep AE-MF+neg LEML PD-sparse Deep AE-MF Deep AE-MF+neg P@1 0.6340 0.7643 0.8078 0.8104 0.6567 0.5182 0.6633 0.6754 P@3 0.5035 0.6037 0.6821 0.6893 0.6055 0.4418 0.6095 0.6123 P@5 0.4128 0.4972 0.5764 0.5805 0.5608 0.5656 0.5764 0.5834 表 8 Student$'$s t test结果$P$值(加粗表示$P$值大于0.05)
Table 8 $P$ value of Student$'$s t test results (Bold indicates that $P$ value is greater than 0.05)
enron ohsumed movieLens TJ Delicious EURLex-4K Deep AE-MF hamming loss BR 1.87E-5 1.02E-3 7.03E-6 2.94E-7 1.32E-5 9.64E-3 LS_ML 2.93E-5 1.27E-4 5.92E-7 3.28E-7 - - CCA-SVM 3.38E-8 2.04E-6 4.47E-7 4.55E-10 - - CCA-Ridge 5.34E-9 6.01E-6 2.33E-7 3.97E-7 - - PLST 2.41E-8 2.91E-3 8.36E-12 3.04E-9 8.04E-6 4.67E-4 CPLST 2.43E-8 3.04E-3 2.05E-5 1.32E-9 5.01E-6 9.75E-4 FaIE 3.62E-9 5.83E-4 1.25E-11 3.09E-9 1.61E-5 5.38E-4 ML_CSSP 9.35E-8 8.36E-2 8.18E-7 7.93E-10 3.08E-6 4.29E-3 Deep AE-MF+neg 1.90E-5 7.39E-4 3.89E-7 2.73E-4 3.21E-3 1.09E-1 Deep AE-MF Macro-F1-label BR 4.85E-10 3.01E-6 1.73E-7 3.61E-7 2.63E-9 3.12E-9 LS_ML 4.03E-10 1.25E-1 3.26E-7 4.11E-8 - - CCA-SVM 3.19E-8 5.48E-2 3.21E-7 3.37E-9 - - CCA-Ridge 6.06E-11 4.84E-4 1.51E-5 3.01E-6 - - PLST 1.51E-9 2.23E-3 1.93E-5 6.64E-7 4.38E-8 4.13E-12 CPLST 1.42E-9 5.19E-3 5.21E-5 1.03E-6 8.21E-9 1.62E-11 FaIE 1.72E-10 3.99E-2 1.83E-5 5.11E-7 2.26E-7 1.45E-10 ML_CSSP 1.64E-10 4.12E-4 4.03E-6 3.03E-7 6.63E-9 8.11E-11 Deep AE-MF+neg 1.61E-5 5.51E-7 8.11E-2 3.09E-7 1.18E-3 2.34E-4 Deep AE-MF Micro-F1-label BR 1.62E-8 2.82E-5 2.34E-8 5.07E-11 1.35E-8 9.95E-9 LS_ML 3.90E-7 1.54E-4 2.75E-7 1.31E-10 - - CCA-SVM 2.74E-7 5.75E-4 4.25E-9 6.72E-9 - - CCA-Ridge 2.70E-7 1.84E-4 4.85E-8 1.06E-10 - - PLST 5.01E-6 8.47E-3 9.98E-9 2.71E-10 5.21E-8 1.02E-9 CPLST 7.08E-6 6.36E-3 4.18E-9 4.14E-11 5.08E-8 1.73E-12 FaIE 1.40E-5 5.86E-3 1.61E-9 1.08E-10 5.35E-9 4.44E-10 ML_CSSP 6.03E-5 3.01E-4 2.84E-9 6.08E-12 5.86E-7 2.21E-9 Deep AE-MF+neg 1.2E-2 3.31E-3 8.03E-5 3.45E-8 4.21E-4 2.21E-3 -
[1] Gong Y C, Ke Q F, Isard M, Lazebnik S. A multi-view embedding space for modeling internet images, tags, and their semantics. International Journal of Computer Vision, 2014, 106 (2):210-233 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=04fd939c81a601681d8a34f3be315ec8 [2] Cambria E. Affective computing and sentiment analysis. IEEE Intelligent Systems, 2016, 31 (2):102-107 http://www.springer.com/978-94-007-1756-5 [3] 张晨光, 张燕, 张夏欢.最大规范化依赖性多标记半监督学习方法.自动化学报, 2015, 41 (9):1577-1588 http://www.aas.net.cn/CN/abstract/abstract18732.shtmlZhang Chen-Guang, Zhang Yan, Zhang Xia-Huan. Normalized dependence maximization multi-label semi-supervised learning method. Acta Automatica Sinica, 2015, 41 (9):1577-1588 http://www.aas.net.cn/CN/abstract/abstract18732.shtml [4] Poria S, Cambria E, Bajpai R, Hussain A. A review of affective computing:from unimodal analysis to multimodal fusion. Information Fusion, 2017, 37 :98-125 doi: 10.1016/j.inffus.2017.02.003 [5] Boutell M R, Luo J B, Shen X P, Brown C M. Learning multi-label scene classification. Pattern Recognition, 2004, 37 (9):1757-1771 doi: 10.1016/j.patcog.2004.03.009 [6] Wu Q, Ye Y, Ho S S, Zhou S. Semi-supervised multi-label collective classification ensemble for functional genomics. BMC Genomics, 2014, 15 (S9):S17 doi: 10.1186/1471-2164-15-S9-S17 [7] Kazawa H, Izumitani T, Taira H, Maeda E. Maximal margin labeling for multi-topic text categorization. In: Proceedings of the 2005 Advances in Neural Information Processing Systems. Vancouver, Canada: The MIT Press, 2005. 649-656 [8] Hüllermeier E, Fürnkranz J, Cheng W W, Brinker K. Label ranking by learning pairwise preferences. Artificial Intelligence, 2008, 172 (16-17):1897-1916 doi: 10.1016/j.artint.2008.08.002 [9] Zaragoza J H, Sucar L E, Morales E F, Bielza C, Larrañaga P. Bayesian chain classifiers for multidimensional classification. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence. Barcelona, Brazil, 2011. 2192-2197 [10] Elisseeff A, Weston J. A kernel method for multi-labelled classification. In: Proceedings of the 2002 Advances in Neural Information Processing Systems. Cambridge: MIT, 2002. 681-687 [11] Xu J H. An extended one-versus-rest support vector machine for multi-label classification. Neurocomputing, 2011, 74 (17):3114-3124 doi: 10.1016/j.neucom.2011.04.024 [12] 张敏灵.一种新型多标记懒惰学习算法.计算机研究与发展, 2012, 49(11):2271-2282 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=jsjyjyfz201211001Zhang Min-Ling. An improved multi-label lazy learning approach. Journal of Computer Research and Development, 2012, 49 (11):2271-2282 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=jsjyjyfz201211001 [13] Zhang M L, Peña J M, Robles V. Feature selection for multi-label naive Bayes classification. Information Sciences, 2009, 179 (19):3218-3229 doi: 10.1016/j.ins.2009.06.010 [14] Guo Y, Wu Q Y, Deng C R, Chen J, Tan M K. Double forward propagation for memorized batch normalization. In: Proceedings of the 32nd AAAI Conference on Artiflcial Intelligence. New Orleans, USA: AAAI Press, 2018. [15] Li L, Wang H F. Towards label imbalance in multi-label classification with many labels[Online], available: https://arxiv.org/abs/1604.01304, May 24, 2018. [16] Wu Q Y, Ye Y M, Zhang H J, Chow T W S, Ho S S. ML-TREE:a tree-structure-based approach to multilabel learning. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26 (3):430-443 doi: 10.1109/TNNLS.2014.2315296 [17] Wu Q Y, Tan M K, Song H J, Chen J, Ng M K. ML-FOREST:a multi-label tree ensemble method for multi-label classification. IEEE Transactions on Knowledge and Data Engineering, 2016, 28 (10):2665-2680 doi: 10.1109/TKDE.2016.2581161 [18] Kapoor A, Jain P, Viswanathan R. Multilabel classification using Bayesian compressed sensing. In: Proceedings of the 2012 Advances in Neural Information Processing Systems. Lake Tahoe, Nevada, USA: NIPS, 2012. 2645-2653 [19] Park C H, Lee M. On applying linear discriminant analysis for multi-labeled problems. Pattern Recognition Letters, 2008, 29 (7):878-887 doi: 10.1016/j.patrec.2008.01.003 [20] Ji S W, Tang L, Yu S P, Ye J P. Extracting shared subspace for multi-label classification. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Las Vegas, USA: ACM, 2008. 381-389 [21] Yu S P, Yu K, Tresp V, Kriegel H P. Multi-output regularized feature projection. IEEE Transactions on Knowledge and Data Engineering, 2006, 18 (12):1600-1613 doi: 10.1109/TKDE.2006.194 [22] Wang J, Yang Y, Mao J H, Huang Z H, Huang C, Xu W. CNN-RNN: a unified framework for multi-label image classification. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016. 2285-2294 [23] Nam J, Kim J, Mencía E L, Gurevych I, Fürnkranz J. Large-scale multi-label text classification-revisiting neural networks. In: Proceedings of the 2014 Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Berlin, Heidelberg, Germany: Springer, 2014. 437-452 [24] Hsu D, Kakade S M, Langford J, Zhang T. Multi-label prediction via compressed sensing. In: Proceedings of the 2009 Advances in Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2009. 772-780 [25] Tai F, Lin H T. Multilabel classification with principal label space transformation. Neural Computation, 2012, 24 (9):2508-2542 doi: 10.1162/NECO_a_00320 [26] Chen Y N, Lin H T. Feature-aware label space dimension reduction for multi-label classification. In: Proceedings of the 2012 Advances in Neural Information Processing Systems. Lake Tahoe, Nevada, US: NIPS, 2012. 1529-1537 [27] Lin Z J, Ding G G, Hu M Q, Wang J M. Multi-label classification via feature-aware implicit label space encoding. In: Proceedings of the 31st International Conference on Machine Learning. Beijing, China, 2014. 325-333 [28] Yu H F, Jain P, Kar P, Dhillon I. Large-scale multi-label learning with missing labels. In: Proceedings of the 31st International Conference on Machine Learning. Beijing, China: ACM, 2014. 593-601 [29] Tsoumakas G, Katakis I. Multi-label classification:an overview. International Journal of Data Warehousing and Mining, 2007, 3 (3):1-13 [30] 付忠良.多标签代价敏感分类集成学习算法.自动化学报, 2014, 40(6):1075-1085 http://www.aas.net.cn/CN/abstract/abstract18377.shtmlFu Zhong-Liang. Cost-sensitive ensemble learning algorithm for multi-label classification problems. Acta Automatica Sinica, 2014, 40 (6):1075-1085 http://www.aas.net.cn/CN/abstract/abstract18377.shtml [31] Abdi H, Williams L J. Principal component analysis. Wiley Interdisciplinary Reviews:Computational Statistics, 2010, 2 (4):433-459 doi: 10.1002/wics.101 [32] Pudil P, Somol P, Haindl M. Introduction to Statistical Pattern Recognition (Second Edition). San Diego:Academic Press, 1990. 441-507 [33] Roweis S T, Saul L K. Nonlinear dimensionality reduction by locally linear embedding. Science, 2000, 290 (5500):2323-2326 doi: 10.1126/science.290.5500.2323 [34] Tsoumakas G, Katakis I, Vlahavas I. Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings of the 2008 ECML/PKDD Workshop on Mining Multidimensional Data. Antwerp, Belgium: Springer, 2008. 53-59 [35] Tsoumakas G, Vlahavas I. Random k-labelsets: an ensemble method for multilabel classification. In: Proceedings of the 2008 European Conference on Machine Learning. Berlin, Heidelberg, Germany: Springer, 2007. 406-417 [36] 唐朝辉, 朱清新, 洪朝群, 祝峰.基于自编码器及超图学习的多标签特征提取.自动化学报, 2016, 42(7):1014-1021 http://www.aas.net.cn/CN/abstract/abstract18892.shtmlTang Chao-Hui, Zhu Qing-Xin, Hong Chao-Qun, Zhu William. Multi-label feature selection with autoencoders and hypergraph learning. Acta Automatica Sinica, 2016, 42 (7):1014-1021 http://www.aas.net.cn/CN/abstract/abstract18892.shtml [37] Zhang Y, Schneider J. Multi-label output codes using canonical correlation analysis. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. Ft. Lauderdale, USA: JMLR, 2011. 873-882 [38] Wang W R, Arora R, Livescu K, Bilmes J. On deep multi-view representation learning. In: Proceedings of the 32nd International Conference on Machine Learning. Lille, France: Omni Press, 2015. 1083-1092 [39] Schapire R E, Singer Y. BoosTexter:a boosting-based system for text categorization. Machine Learning, 2000, 39 (2-3):135-168 http://d.old.wanfangdata.com.cn/NSTLQK/NSTL_QKJJ021532080/ [40] Yang Y M. An evaluation of statistical approaches to text categorization. Information Retrieval, 1999, 1 (1-2):69-90 doi: 10.1023/A%3A1009982220290 [41] Godbole S, Sarawagi S. Discriminative methods for multi-labeled classification. In: Proceedings of the 2004 Pacific-Asia Conference on Knowledge Discovery and Data Mining. Berlin, Heidelberg, Germany: Springer, 2004. 22-30 [42] Yen I E H, Huang X R, Ravikumar P, Zhong K, Dhillon I. PD-Sparse: a primal and dual sparse approach to extreme multiclass and multilabel classification. In: Proceedings of the 2016 International Conference on Machine Learning. NY, USA: ACM, 2016. 3069-3077 [43] Bhatia K, Jain H, Kar P, Varma M, Jain P. Sparse local embeddings for extreme multi-label classification. In: Proceedings of the 2015 Advances in Neural Information Processing Systems. Montreal, Canada: Cornell University Library, 2015. 730-738 [44] Bi W, Kwok J. Efficient multi-label classification with many labels. In: Proceedings of the 2013 International Conference on Machine Learning. Atlanta, GA, USA: ACM, 2013. 405-413 [45] Demšar J. Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 2006, 7 :1-30 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=8d2ca9dbb44bf0def88798c7dffbf6f4