-
摘要: 属性网络异常检测在网络安全、电子商务和金融交易等领域中具有重要的理论与现实意义, 近年来受到了越来越多的关注. 大多数异常检测方法凭借网络有限的属性或结构信息进行决策生成, 往往难以对异常模式做出可靠的描述. 此外, 网络节点对应的实体往往关联着丰富的领域知识, 这些知识对于异常的识别具有重要的潜在价值. 针对上述情况, 提出一种融合知识的多视图网络异常检测模型, 在多视图学习模式下通过数据与知识的互补融合实现了对异常节点的有效识别. 首先, 使用TransR模型由领域知识图谱抽取知识向量表示, 并借助输入网络的拓扑关系构造其孪生网络. 接着, 在多视图学习框架下构建属性编码器和知识编码器, 分别将属性网络及其孪生网络嵌入到各自的表示空间, 并聚合为统一网络表示. 最后, 综合不同维度上的重构误差进行节点异常分数评价, 从而识别网络中的异常节点. 在真实网络数据集上的对比实验表明, 提出的模型能够实现对领域知识的有效融合, 并获得优于基线方法的异常检测性能.Abstract: Outlier detection on attributed networks is of important theoretical and practical significance in the network security, ecommerce, financial transaction and many other fields, and receives more and more attentions in recent years. Most existing outlier detection methods usually generate decisions by pattern mining on the network structure or node attributes. However, it is difficult to make a reliable description for abnormal objects by just relying on the limited attribute and structure information directly available from given network data. Furthermore, the nodes in networks are usually associated with abundant domain knowledge in the real world, which has great potential value for outlier detection. To this end, this paper proposes a multi-view network outlier detection model based on knowledge fusion, which identifies the abnormal pattern effectively by complementary fusion of network data and associated knowledge under the multi-view learning mode. Firstly, the model applies TransR to extract knowledge vector representation from domain knowledge graph, and constructs a twin network with the topology structure of the input network. Then, the attribute encoder and the knowledge encoder are constructed under the multi-view learning framework to embed he attributed network and its twin network into their respective representation spaces separately. On this basis, the network embeddings in two views are integrated into a unified representation by the aggregator. Finally, the abnormal score of each node is evaluated by integrating the reconstruction errors in the two different dimensions, and the abnormal nodes in the network are then recognized. Extensive experiments on real network datasets demonstrate that the proposed model can realize effective fusion of domain knowledge and acquire better outlier detection performance than baseline approaches.
-
Key words:
- Attributed networks /
- outlier detection /
- graph neural network /
- knowledge fusion /
- multi-view learning
-
表 1 实验数据集统计信息
Table 1 Statistics of datasets in experiment
数据集 网络特性 领域知识 节点 属性 边 异常率 实体 关系 三元组 AmazonBooks 24 915 28 128 742 0.0247 124 320 93 541 853 MoviesLens 2 182 20 31 573 0.0522 50 875 52 181 639 Last.FM 23 566 8 187 472 0.0258 47 986 12 325 147 表 2 各方法在不同数据集上的AUC值
Table 2 AUC values of each method on different datasets
方法 AmazonBooks MovieLens Last.FM Radar 0.7205 0.7586 0.6891 GAAN 0.8436 0.8203 0.7504 Dominant 0.7585 0.8246 0.7331 SpecAE 0.6824 0.7348 0.6706 ALARM 0.7643 0.8165 0.7729 MOD-KF_Add 0.8230 0.8852 0.8106 MOD-KF_Concat 0.8364 0.8743 0.8213 表 3 不同算法的Precision@K结果
Table 3 Results of different algorithms in terms of Precision@K
数据集 K 异常检测方法 Radar GAAN Dominant SpecAE ALARM MOD-KF_Add MOD-KF_Concat AmazonBooks 50 0.8236 0.9008 0.8412 0.7784 0.8431 0.9248 0.9183 100 0.8393 0.9078 0.7927 0.8069 0.8113 0.9194 0.9146 200 0.7543 0.8851 0.7329 0.7208 0.7840 0.8742 0.8616 500 0.7340 0.8633 0.7624 0.6905 0.7517 0.8526 0.8519 MovieLens 5 0.8040 0.7960 0.9840 0.7920 0.9680 0.9800 0.9720 10 0.8420 0.8540 0.8940 0.8180 0.8960 0.9540 0.9440 50 0.7194 0.8405 0.8368 0.8177 0.8526 0.9357 0.9022 100 0.7271 0.8111 0.8224 0.7670 0.8365 0.8978 0.9095 Last.FM 50 0.8174 0.7579 0.8148 0.7392 0.8384 0.9002 0.8886 100 0.7855 0.7413 0.7969 0.7132 0.8085 0.8826 0.8871 200 0.7247 0.7554 0.7331 0.6807 0.7948 0.9053 0.8966 500 0.6716 0.7525 0.7143 0.6365 0.7743 0.8623 0.8704 表 4 不同算法的Recall@K结果
Table 4 Results of different algorithms in terms of Recall@K
数据集 K 异常检测方法 Radar GAAN Dominant SpecAE ALARM MOD-KF_Add MOD-KF_Concat AmazonBooks 50 0.0631 0.0728 0.0662 0.0618 0.0660 0.0746 0.0743 100 0.1319 0.1471 0.1353 0.1185 0.1304 0.1486 0.1506 200 0.2425 0.2839 0.2495 0.2311 0.2351 0.2792 0.2805 500 0.5880 0.6810 0.6063 0.5211 0.5903 0.6776 0.6846 MovieLens 5 0.0322 0.0356 0.0353 0.0296 0.0423 0.0420 0.0416 10 0.0684 0.0752 0.0781 0.0654 0.0733 0.0813 0.0852 50 0.3238 0.3228 0.3611 0.3141 0.3599 0.4040 0.3961 100 0.6359 0.7041 0.7098 0.5811 0.7118 0.7862 0.7770 Last.FM 50 0.0659 0.0609 0.0644 0.0610 0.0656 0.0712 0.0724 100 0.1074 0.1086 0.1164 0.1034 0.1121 0.1221 0.1257 200 0.2291 0.2380 0.2475 0.2231 0.2420 0.2788 0.2771 500 0.5467 0.5990 0.5957 0.5131 0.6340 0.6931 0.7022 -
[1] Wu L C, Wang D L, Song K S, Feng S, Zhang Y F, Yu G. Dual-view hypergraph neural networks for attributed graph learning. Knowledge-Based Systems, 2021, 227: Article No. 107185 doi: 10.1016/j.knosys.2021.107185 [2] Yang Z, Liu X D, Li T, Wu D, Wang J J, Zhao Y W, et al. A systematic literature review of methods and datasets for anomaly-based network intrusion detection. Computers & Security, 2022, 116: Article No. 102675 [3] Song J S, Qu X R, Hu Z H, Li Z, Gao J, Zhang J. A subgraph-based knowledge reasoning method for collective fraud detection in E-commerce. Neurocomputing, 2021, 461: 587-597 doi: 10.1016/j.neucom.2021.03.134 [4] Wang Z Y, Wei W, Mao X L, Guo G B, Zhou P, Jiang S. User-based network embedding for opinion spammer detection. Pattern Recognition, 2022, 125: Article No. 108512 doi: 10.1016/j.patcog.2021.108512 [5] Breunig M M, Kriegel H P, Ng R T, Sander J. LOF: Identifying density-based local outliers. SIGMOD Record, 2000, 29(2): 93-104 doi: 10.1145/335191.335388 [6] Perozzi B, Akoglu L. Scalable anomaly ranking of attributed neighborhoods. In: Proceedings of the SIAM International Conference on Data Mining (SDM). Miami, Florida, USA: SIAM, 2016. 207−215 [7] Li J D, Dani H, Hu X, Liu H. Radar: Residual analysis for anomaly detection in attributed networks. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. Melbourne, Australia: IJCAI, 2017. 2152−2158 [8] Gutierrez-Gomez L, Bovet A, Delvenne J C. Multi-scale anomaly detection on attributed networks. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI, 2020. 678−685 [9] Chen Z X, Liu B, Wang M Q, Dai P, Lv J, Bo L F. Generative adversarial attributed network anomaly detection. In: Proceedings of the 29th ACM International Conference on Information and Knowledge Management. Ireland: ACM, 2020. 1989−1992 [10] Ding K Z, Li J D, Bhanushali R, Liu H. Deep anomaly detection on attributed networks. In: Proceedings of the 19th SIAM International Conference on Data Mining. Calgary, Alberta, Canada: SIAM, 2019. 594−602 [11] Li Y N, Huang X, Li J D, Du M N, Zou N. SpecAE: Spectral autoencoder for anomaly detection in attributed networks. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management. Beijing, China: ACM, 2019. 2233−2236 [12] Hooi B, Song H A, Beutel A, Shah N, Shin K, Faloutsos C. Fraudar: Bounding graph fraud in the face of camouflage. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA: ACM, 2016. 895−904 [13] Xu X W, Yuruk N, Feng Z D, Schweiger T A J. Scan: A structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Jose, California, USA: ACM, 2007. 824−833 [14] Yu W C, Cheng W, Aggarwal C C, Zhang K, Chen H F, Wang W. NetWalk: A flexible deep embedding approach for anomaly detection in dynamic networks. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. London, UK: ACM, 2018. 2672−2681 [15] Sanchez P I, Muller E, Laforet F, Keller F, Bohm K. Statistical selection of congruent subspaces for mining attributed graphs. In: Proceedings of the IEEE 13th International Conference on Data Mining (ICDM). Dallas, TX, USA: IEEE, 2013. 647−656 [16] Oloulade B M, Gao J L, Chen J M, Lyu T F, Al-Sabri R. Graph neural architecture search: A survey. Tsinghua Science and Technology, 2022, 27(4): 692-708 doi: 10.26599/TST.2021.9010057 [17] Fan S H, Wang X, Shi C, Kuang K, Liu N, Wang B. Debiased graph neural networks with agnostic label selection bias. IEEE Transactions on Neural Networks and Learning Systems, 2022, DOI: 10.1109/TNNLS.2022.3141260 [18] Hong X B, Zhang T, Cui Z, Yang J. Variational gridded graph convolution network for node classification. IEEE-CAA Journal of Automatica Sinica, 2021, 8(10): 1697-1708 doi: 10.1109/JAS.2021.1004201 [19] Velickovic P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. Graph attention networks. In: Proceedings of the 6th International Conference on Learning Representations. Vancouver, BC, Canada: ICLR, 2017. [20] Hou M L, Wang L, Liu J Y, Kong X J, Xia F. A3Graph: Adversarial attributed autoencoder for graph representation learning. In: Proceedings of the 36th Annual ACM Symposium on Applied Computing. South Korea: ACM, 2021. 1697−1704 [21] Wang J, Liang J Y, Yao K X, Liang J Q, Wang D H. Graph convolutional autoencoders with co-learning of graph structure and node attributes. Pattern Recognition, 2022, 121: Article No. 108215 doi: 10.1016/j.patcog.2021.108215 [22] Abu-Salih B. Domain-specific knowledge graphs: A survey. Journal of Network and Computer Applications, 2021, 185: Article No. 103076 doi: 10.1016/j.jnca.2021.103076 [23] Li Y S, Kong D Y, Zhang Y J, Tan Y H, Chen L. Robust deep alignment network with remote sensing knowledge graph for zero-shot and generalized zero-shot remote sensing image scene classification. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 179: 145-158 doi: 10.1016/j.isprsjprs.2021.08.001 [24] Castellano G, Digeno V, Sansaro G, Vessio G, Leveraging knowledge graphs and deep learning for automatic art analysis.Knowledge-Based Systems, 2022, 248: Article No. 108859 doi: 10.1016/j.knosys.2022.108859 [25] 饶子昀, 张毅, 刘俊涛, 曹万华. 应用知识图谱的推荐方法与系统. 自动化学报, 2021, 47(9): 2061-2077 doi: 10.16383/j.aas.c200128Rao Zi-Yun, Zhang Yi, Liu Jun-Tao, Cao Wan-Hua. Recommendation methods and systems using knowledge graph. Acta Automatica Sinica, 2021, 47(9): 2061-2077 doi: 10.16383/j.aas.c200128 [26] Chen Y, Mensah S, Ma F, Wang H, Jiang Z A. Collaborative filtering grounded on knowledge graphs. Pattern Recognition Letters, 2021, 151: 55-61 [27] Du Y, Ranwez S, Sutton-Charani N, Ranwez V. Post-hoc recommendation explanations through an efficient exploitation of the DBpedia category hierarchy. Knowledge-Based Systems, 2022, 245: Article No. 108560 doi: 10.1016/j.knosys.2022.108560 [28] Ji S X, Pan S R, Cambria E, Marttinen P, Yu P S. A survey on knowledge graphs: Representation, acquisition, and applications. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(2): 494-514 doi: 10.1109/TNNLS.2021.3070843 [29] Lin Y K, Liu Z Y, Sun M S, Liu Y, Zhu X. Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence. Austin, Texas, USA: AAAI, 2015. 2181−2187 [30] Balazevic I, Allen C, Hospedales T M. TuckER: Tensor factorization for knowledge graph completion. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Hong Kong, China: Association for Computational Linguistics, 2019. 5184−5193 [31] Shang C, Tang Y, Huang J, Bi J B, He X D, Zhou B W. End-to-end structure-aware convolutional networks for knowledge base completion. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Honolulu, Hawaii, USA: AAAI, 2019. 3060−3067 [32] Ding K Z, Li J D, Liu H. Interactive anomaly detection on attributed networks. In: Proceedings of the 12th ACM International Conference on Web Search and Data Mining. Melbourne, Australia: ACM, 2019. 357−365 [33] Song X Y, Wu M X, Jermaine C, Ranka S. Conditional anomaly detection. IEEE Transactions on Knowledge & Data Engineering, 2007, 19(5): 631-645 [34] Peng Z, Luo M N, Li J D, Xue L G, Zheng Q H. A Deep Multi-View Framework for Anomaly Detection on Attributed Networks. IEEE Transactions on Knowledge & Data Engineering, 2022, 34(6): 2539-2552