-
摘要: 针对现有异常检测(Anomaly detection, AD)模型计算效率低和检测性能差等问题, 提出一种多尺度流模型(Multi-scale normalizing flow, MS-Flow), 通过多尺度交叉融合实现高效的视觉图像异常识别. 具体地, 在流模型(Normalizing flow, NF)内部构建层级式的多尺度架构来避免多通道数据的冗余交叉计算, 同时保证网络的多尺度表达能力. 此外, 设计的层级感知模块通过逐层级的多粒度特征融合, 在细粒度级别表达多尺度特征, 有效地提高分布估计的精确性. 该方法是一个平衡检测精度与计算效率的解决方案. 在两个公开数据集上的实验表明, 所提方法相较于以往的检测模型能够获得更高的检测精度(在MVTec AD和BTAD数据集上的平均AUROC (Area under the receiver operating characteristics)分别为99.7%和96.0%), 同时具有更高的计算效率, 浮点运算次数(Floating point operations, FLOPs)约为CS-Flow的1/8.Abstract: Aiming at the problems of low computational efficiency and poor detection performance of existing anomaly detection (AD) models, a model called MS-Flow (multi-scale normalizing flow) is proposed to achieve highly efficient image anomaly recognition with multi-scale cross fusion. Specifically, a hierarchical multi-scale architecture is built inside normalizing flow (NF) to avoid redundant cross-computation of multi-channel data and to ensure the multi-scale representation capability. In addition, the proposed hierarchical perception module represents the multi-scale features at a granular level by fusing the multi-grained features layer by layer, which effectively improves the precision of distribution estimation. This approach is a solution that balances detection accuracy and computational efficiency. Experiments on two public datasets show that MS-Flow achieved higher detection accuracy and computational efficiency than previous detection models: The average AUROC (area under the receiver operating characteristics) on the MVTec AD and BTAD datasets are 99.7% and 96.0%, respectively, and the FLOPs (floating point operations) is about 1/8 of CS-Flow.
-
表 1 MVTec AD和BTAD数据集的统计概述
Table 1 Statistical overview of the MVTec AD and BTAD datasets
类别 训练数据 测试数据 (正常) 测试数据 (异常) 异常类型 异常区域 图片尺寸(像素) MVTec AD (纹理) Carpet 280 28 89 5 97 1 024 Grid 264 21 57 5 170 1 024 Leather 245 32 92 5 99 1 024 Tile 230 33 84 5 86 840 Wood 247 19 60 5 168 1 024 MVTec AD (物体) Bottle 209 20 63 3 68 900 Cable 224 58 92 8 151 1 024 Capsule 219 23 109 5 114 1 000 Hazelnut 391 40 70 4 136 1 024 Metal Nut 220 22 93 4 132 700 Pill 267 26 141 7 245 800 Screw 320 41 119 5 135 1 024 Toothbrush 60 12 30 1 66 1 024 Transistor 213 60 40 4 44 1 024 Zipper 240 32 119 7 177 1 024 BTAD 01 400 21 49 1 — 1 600 02 399 30 200 1 — 600 03 1 000 400 41 1 — 800 总数量 5 428 918 1 548 76 >1 888 — 表 2 不同异常检测模型在MVTec AD数据集上的平均AUROC对比 (%)
Table 2 The average AUROC of different anomaly detection models on MVTec AD dataset (%)
类别 DifferNet[33] CFlow-AD[34] CS-Flow[17] PatchCore[23] FastFlow[24] MS-Flow (本文) 纹理 Carpet 92.9 98.7 100.0 98.7 100.0 100.0 Grid 84.0 99.6 99.0 98.2 99.7 100.0 Leather 97.1 100.0 100.0 100.0 100.0 100.0 Tile 99.4 99.8 100.0 98.7 100.0 100.0 Wood 99.8 99.1 100.0 99.2 100.0 100.0 物体 Bottle 99.0 100.0 99.8 100.0 100.0 100.0 Cable 95.9 97.6 99.1 99.5 100.0 99.6 Capsule 86.9 97.7 97.1 98.1 100.0 99.4 Hazelnut 99.3 99.9 99.6 100.0 100.0 100.0 Metal Nut 96.1 99.3 99.1 100.0 100.0 100.0 Pill 88.8 96.8 98.6 96.6 99.4 99.5 Screw 96.3 91.9 97.6 98.1 97.8 97.5 Toothbrush 98.6 99.7 91.9 100.0 94.4 100.0 Transistor 91.1 95.2 99.3 100.0 99.8 100.0 Zipper 95.1 98.5 99.7 99.4 99.5 99.8 平均值 94.9 98.3 98.7 99.1 99.4 99.7 表 3 不同异常检测模型在BTAD数据集上的平均AUROC对比 (%)
Table 3 The average AUROC of different anomalydetection models on BTAD dataset (%)
表 4 不同流模型的复杂性对比
Table 4 Complexity of different normalizing flows
模型 CFlow-AD CS-Flow FastFlow MS-Flow (本文) AUROC (%) 98.3 98.7 99.4 99.7 FLOPs (G) 13.8 65.8 13.9 8.1 Params (M) 81.6 275.2 17.7 14.1 表 5 不同特征提取器的适应性实验
Table 5 Adaptation study of different feature extractors
特征提取网络 $d$ AUROC (%) ResNet18 97.1 $\rightarrow$ 97.9 $\rightarrow$ 97.2 Wide-ResNet50 97.9 $\rightarrow$ 96.2 $\rightarrow$ 93.6 Swin-B 224 $\rightarrow$ 448 $\rightarrow$ 768 96.9 $\rightarrow$ 97.8 $\rightarrow$ 95.4 EfficientNet-B7 98.7 $\rightarrow$ 99.1 $\rightarrow$ 99.5 EfficientNet-B5 98.8 $\rightarrow$ 99.3 $\rightarrow$ 99.7 表 6 不同子特征数的适应性实验
Table 6 Adaptation study of different subfeature numbers
子特征数 子特征图尺寸(像素) AUROC (%) Params (M) 2 $152 \times 24 \times 24$ 96.21 9.42 4 $76 \times 24 \times 24$ 99.72 14.06 6 $51 \times 24 \times 24$ 99.79 15.74 8 $38 \times 24 \times 24$ 99.79 16.43 -
[1] Tran T M, Vu T N, Vo N D, Nguyen T V, Nguyen K. Anomaly analysis in images and videos: A comprehensive review. ACM Computing Surveys, 2022, 55(7): 1-37 [2] Bergmann P, Fauser M, Sattlegger D, Steger C. MVTec AD——A comprehensive real-world dataset for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 9592−9600 [3] Suganyadevi S, Seethalakshmi V, Balasamy K. A review on deep learning in medical image analysis. International Journal of Multimedia Information Retrieval, 2022, 11(1): 19-38 doi: 10.1007/s13735-021-00218-1 [4] Li Y Y, Wu J, Bai X, Yang X P, Tan X, Li G B, et al. Multi-granularity tracking with modularlized components for unsupervised vehicles anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle, USA: IEEE, 2020. 586−587 [5] Akcay S, Atapour-Abarghouei A, Breckon T P. GANomaly: Semi-supervised anomaly detection via adversarial training. In: Proceedings of the 14th Asian Conference on Computer Vision. Perth, Australia: Springer International Publishing, 2019. 622−637 [6] 马宾, 王一利, 徐健, 王春鹏, 李健, 周琳娜. 基于双向生成对抗网络的图像感知哈希算法. 电子学报, 2023, 51(5): 1405-1412Ma Bin, Wang Yi-Li, Xu Jian, Wang Chun-Peng, Li Jian, Zhou Lin-Na. An image perceptual hash algorithm based on bidirectional generative adversarial network. Acta Electronica Sinica, 2023, 51(5): 1405-1412 [7] Tang T W, Kuo W H, Lan J H, Ding C F, Hsu H, Young H T. Anomaly detection neural network with dual auto-encoders GAN and its industrial inspection applications. Sensors, 2020, 20(12): 3336 doi: 10.3390/s20123336 [8] Shi Y, Yang J, Qi Z. Unsupervised anomaly segmentation via deep feature reconstruction. Neurocomputing, 2021, 424: 9-22 doi: 10.1016/j.neucom.2020.11.018 [9] 伍麟, 郝鸿宇, 宋友. 基于计算机视觉的工业金属表面缺陷检测综述. 自动化学报, DOI: 10.16383/j.aas.c230039Wu Lin, Hao Hong-Yu, Song You. A review of metal surface defect detection based on computer vision. Acta Automatica Sinica, DOI: 10.16383/j.aas.c230039 [10] Kingma D P, Welling M. Auto-encoding variational bayes. arXiv preprint arXiv: 1312.6114, 2013. [11] LeCun Y. Generalization and network design strategies. Connectionism in Perspective, 1989, 19(143-155): 18 [12] Rudolph M, Wandt B, Rosenhahn B. Structuring autoencoders. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. Seoul, South Korea: IEEE, 2019. [13] Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial networks. Communications of the ACM, 2020, 63(11): 139-144 doi: 10.1145/3422622 [14] 吕承侃, 沈飞, 张正涛, 张峰. 图像异常检测研究现状综述. 自动化学报, 2022, 48(6): 1402-1428Lv Cheng-Kan, Shen Fei, Zhang Zheng-Tao, Zhang Feng. Review of image anomaly detection. Acta Automatica Sinica, 2022, 48(6): 1402-1428 [15] Bergman L, Hoshen Y. Classification-based anomaly detection for general data. arXiv preprint arXiv: 2005.02359, 2020. [16] Rippel O, Mertens P, Merhof D. Modeling the distribution of normal data in pre-trained deep features for anomaly detection. In: Proceedings of the 25th International Conference on Pattern Recognition. Milan, Italy: IEEE, 2021. 6726−6733 [17] Rudolph M, Wehrbein T, Rosenhahn B, Wandt B. Fully convolutional cross-scale-flows for image-based defect detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, USA: IEEE, 2022. 1088−1097 [18] Lei J, Hu X, Wang Y, Liu D. PyramidFlow: High-resolution defect contrastive localization using pyramid normalizing flow. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE, 2023. 14143−14152 [19] Rezende D, Mohamed S. Variational inference with normalizing flows. In: Proceedings of the 32nd International Conference on Machine Learning. Lille, France: PMLR, 2015. 1530−1538 [20] Dinh L, Sohl-Dickstein J, Bengio S. Density estimation using real NVP. arXiv preprint arXiv: 1605.08803, 2016. [21] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 770−778 [22] Cohen N, Hoshen Y. Sub-image anomaly detection with deep pyramid correspondences. arXiv preprint arXiv: 2005.02357, 2020. [23] Roth K, Pemula L, Zepeda J, Schölkopf B, Brox T, Gehler P. Towards total recall in industrial anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022. 14318−14328 [24] Yu J, Zheng Y, Wang X, Li W, Wu Y, Zhao R, et al. FastFlow: Unsupervised anomaly detection and localization via 2D normalizing flows. arXiv preprint arXiv: 2111.07677, 2021. [25] Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Houlsby N. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv preprint arXiv: 2010.11929, 2020. [26] Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th International Conference on Machine Learning. Los Angeles, USA: PMLR, 2019.6105−6114 [27] Lee S, Lee S, Song B C. CFA: Coupled-hypersphere-based feature adaptation for target-oriented anomaly localization. IEEE Access, 2022, 10: 78446-78454 doi: 10.1109/ACCESS.2022.3193699 [28] Defard T, Setkov A, Loesch A, Audigier R. PaDiM: A patch distribution modeling framework for anomaly detection and localization. In: Proceedings of the 25th International Conference on Pattern Recognition Workshops and Challenges. Cham, Switzerland: Springer, 2021. 475−489 [29] Yi J, Yoon S. Patch SVDD: Patch-level SVDD for anomaly detection and segmentation. In: Proceedings of the 15th Asian Conference on Computer Vision. Kyoto, Japan: Springer, 2020. 375−390 [30] Li C L, Sohn K, Yoon J, Pfister T. CutPaste: Self-supervised learning for anomaly detection and localization. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE, 2021. 9664−9674 [31] Napoletano P, Piccoli F, Schettini R. Anomaly detection in nanofibrous materials by CNN-based self-similarity. Sensors, 2018, 18(1): 209 doi: 10.1109/JSEN.2017.2771313 [32] Zagoruyko S, Komodakis N. Wide residual networks. arXiv preprint arXiv: 1605.07146, 2016. [33] Rudolph M, Wandt B, Rosenhahn B. Same same but differnet: Semi-supervised defect detection with normalizing flows. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, USA: IEEE, 2021. 1907−1916 [34] Gudovskiy D, Ishizaka S, Kozuka K. CFlow-AD: Real-time unsupervised anomaly detection with localization via conditional normalizing flows. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, USA: IEEE, 2022. 98−107 [35] Jia D, Wei D, Socher R, Li L J, Kai L, Li F F. Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE, 2009. 248−255 [36] Mishra P, Verk R, Fornasier D, Piciarelli C, Foresti G L. VT-ADL: A vision transformer network for image anomaly detection and localization. In: Proceedings of the 30th International Symposium on Industrial Electronics. Kyoto, Japan: IEEE, 2021. 1−6 [37] Fawcett T. ROC graphs: Notes and practical considerations for researchers. Machine Learning, 2004, 31(1): 1-38 [38] Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, et al. Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021. 10012−10022