基于多尺度流模型的视觉异常检测研究

毛国君; 吴星臻; 邢树礼

doi:10.16383/j.aas.c230476

基于多尺度流模型的视觉异常检测研究

doi: 10.16383/j.aas.c230476

毛国君^{1, 2,},
吴星臻^1,,
邢树礼^{1, 2,}

1.
福建理工大学计算机科学与数学学院福州 350118
2.
福建省大数据挖掘与应用技术重点实验室福州 350118

基金项目: 国家重点研发计划(2019YFD0900905), 国家自然科学基金(61773415)资助

详细信息

作者简介:
毛国君：福建理工大学计算机科学与数学学院教授. 主要研究方向为人工智能, 大数据, 数据挖掘和分布式计算. 本文通信作者. E-mail: 19662092@fjut.edu.cn

吴星臻：福建理工大学计算机科学与数学学院硕士研究生. 主要研究方向为计算机视觉, 图像处理和异常检测. E-mail: xzwu@smail.fjut.edu.cn

邢树礼：福建理工大学计算机科学与数学学院讲师. 主要研究方向为计算机视觉, 图像处理和大数据分析. E-mail: 19892311@fjut.edu.cn

计量
- 文章访问数: 922
- HTML全文浏览量: 765
- PDF下载量: 290
- 被引次数: 0
出版历程
- 收稿日期: 2023-08-02
- 录用日期: 2023-08-31
- 网络出版日期: 2024-01-04
- 刊出日期: 2024-03-29

Research on Visual Anomaly Detection Based on Multi-scale Normalizing Flow

MAO Guo-Jun^{1, 2
,},
WU Xing-Zhen^1
,,
XING Shu-Li^{1, 2
,}

1.
College of Computer Science and Mathematics, Fujian University of Technology, Fuzhou 350118
2.
Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fuzhou 350118

Funds: Supported by National Key Research and Development Program of China (2019YFD0900905) and National Natural Science Foundation of China (61773415)

More Information

Author Bio:
MAO Guo-Jun　Professor at the College of Computer Science and Mathematics, Fujian University of Technology. His research interest covers artificial intelligence, big data, data mining, and distributed computing. Corresponding author of this paper

WU Xing-Zhen　Master student at the College of Computer Science and Mathematics, Fujian University of Technology. His research interest covers computer vision, image processing, and anomaly detection

XING Shu-Li　Lecturer at the College of Computer Science and Mathematics, Fujian University of Technology. His research interest covers computer vision, image processing, and big data analytics

摘要

摘要: 针对现有异常检测(Anomaly detection, AD)模型计算效率低和检测性能差等问题, 提出一种多尺度流模型(Multi-scale normalizing flow, MS-Flow), 通过多尺度交叉融合实现高效的视觉图像异常识别. 具体地, 在流模型(Normalizing flow, NF)内部构建层级式的多尺度架构来避免多通道数据的冗余交叉计算, 同时保证网络的多尺度表达能力. 此外, 设计的层级感知模块通过逐层级的多粒度特征融合, 在细粒度级别表达多尺度特征, 有效地提高分布估计的精确性. 该方法是一个平衡检测精度与计算效率的解决方案. 在两个公开数据集上的实验表明, 所提方法相较于以往的检测模型能够获得更高的检测精度(在MVTec AD和BTAD数据集上的平均AUROC (Area under the receiver operating characteristics)分别为99.7%和96.0%), 同时具有更高的计算效率, 浮点运算次数(Floating point operations, FLOPs)约为CS-Flow的1/8.
- 异常检测 /
- 流模型 /
- 层级感知 /
- 多尺度特征
Abstract: Aiming at the problems of low computational efficiency and poor detection performance of existing anomaly detection (AD) models, a model called MS-Flow (multi-scale normalizing flow) is proposed to achieve highly efficient image anomaly recognition with multi-scale cross fusion. Specifically, a hierarchical multi-scale architecture is built inside normalizing flow (NF) to avoid redundant cross-computation of multi-channel data and to ensure the multi-scale representation capability. In addition, the proposed hierarchical perception module represents the multi-scale features at a granular level by fusing the multi-grained features layer by layer, which effectively improves the precision of distribution estimation. This approach is a solution that balances detection accuracy and computational efficiency. Experiments on two public datasets show that MS-Flow achieved higher detection accuracy and computational efficiency than previous detection models: The average AUROC (area under the receiver operating characteristics) on the MVTec AD and BTAD datasets are 99.7% and 96.0%, respectively, and the FLOPs (floating point operations) is about 1/8 of CS-Flow.
- Anomaly detection (AD) /
- normalizing flow (NF) /
- hierarchical perception /
- multi-scale features

HTML全文

图 1 本文所提模型架构图

Fig. 1 The architecture of the proposed model

下载: 全尺寸图片幻灯片

图 2 层级感知模块结构图

Fig. 2 The structure of hierarchical perception module

下载: 全尺寸图片幻灯片

图 3 MVTec AD和BTAD数据集中所有类别的样例图

Fig. 3 Example images for all categories of the MVTec AD and BTAD datasets

下载: 全尺寸图片幻灯片

图 4 不同流模型的测试图像负对数似然分布

Fig. 4 Negative log-likelihood distribution of test images for different normalizing flow

下载: 全尺寸图片幻灯片

图 5 不同耦合层数的适应性实验

Fig. 5 Adaptation study of different coupling layers

下载: 全尺寸图片幻灯片

图 6 异常定位

Fig. 6 Anomaly localization

下载: 全尺寸图片幻灯片

表 1 MVTec AD和BTAD数据集的统计概述

Table 1 Statistical overview of the MVTec AD and BTAD datasets

	类别	训练数据	测试数据 (正常)	测试数据 (异常)	异常类型	异常区域	图片尺寸(像素)
MVTec AD (纹理)	Carpet	280	28	89	5	97	1 024
	Grid	264	21	57	5	170	1 024
	Leather	245	32	92	5	99	1 024
	Tile	230	33	84	5	86	840
	Wood	247	19	60	5	168	1 024
MVTec AD (物体)	Bottle	209	20	63	3	68	900
	Cable	224	58	92	8	151	1 024
	Capsule	219	23	109	5	114	1 000
	Hazelnut	391	40	70	4	136	1 024
	Metal Nut	220	22	93	4	132	700
	Pill	267	26	141	7	245	800
	Screw	320	41	119	5	135	1 024
	Toothbrush	60	12	30	1	66	1 024
	Transistor	213	60	40	4	44	1 024
	Zipper	240	32	119	7	177	1 024
BTAD	01	400	21	49	1	—	1 600
	02	399	30	200	1	—	600
	03	1 000	400	41	1	—	800
	总数量	5 428	918	1 548	76	>1 888	—

下载: 导出CSV

表 2 不同异常检测模型在MVTec AD数据集上的平均AUROC对比 (%)

Table 2 The average AUROC of different anomaly detection models on MVTec AD dataset (%)

	类别	DifferNet^[33]	CFlow-AD^[34]	CS-Flow^[17]	PatchCore^[23]	FastFlow^[24]	MS-Flow (本文)
纹理	Carpet	92.9	98.7	100.0	98.7	100.0	100.0
	Grid	84.0	99.6	99.0	98.2	99.7	100.0
	Leather	97.1	100.0	100.0	100.0	100.0	100.0
	Tile	99.4	99.8	100.0	98.7	100.0	100.0
	Wood	99.8	99.1	100.0	99.2	100.0	100.0
物体	Bottle	99.0	100.0	99.8	100.0	100.0	100.0
	Cable	95.9	97.6	99.1	99.5	100.0	99.6
	Capsule	86.9	97.7	97.1	98.1	100.0	99.4
	Hazelnut	99.3	99.9	99.6	100.0	100.0	100.0
	Metal Nut	96.1	99.3	99.1	100.0	100.0	100.0
	Pill	88.8	96.8	98.6	96.6	99.4	99.5
	Screw	96.3	91.9	97.6	98.1	97.8	97.5
	Toothbrush	98.6	99.7	91.9	100.0	94.4	100.0
	Transistor	91.1	95.2	99.3	100.0	99.8	100.0
	Zipper	95.1	98.5	99.7	99.4	99.5	99.8
	平均值	94.9	98.3	98.7	99.1	99.4	99.7

下载: 导出CSV

表 3 不同异常检测模型在BTAD数据集上的平均AUROC对比 (%)

Table 3 The average AUROC of different anomalydetection models on BTAD dataset (%)

模型		类别		平均值
模型	01	02	03	平均值
VT-ADL^[36]	97.6	71.0	82.6	83.7
SPADE^[22]	91.4	71.4	99.9	87.6
PatchCore^[23]	90.9	79.3	99.8	90.0
PaDiM^[28]	99.8	82.0	99.4	93.7
MS-Flow (本文)	99.9	88.2	100.0	96.0

下载: 导出CSV

表 4 不同流模型的复杂性对比

Table 4 Complexity of different normalizing flows

	模型
	CFlow-AD	CS-Flow	FastFlow	MS-Flow (本文)
AUROC (%)	98.3	98.7	99.4	99.7
FLOPs (G)	13.8	65.8	13.9	8.1
Params (M)	81.6	275.2	17.7	14.1

下载: 导出CSV

表 5 不同特征提取器的适应性实验

Table 5 Adaptation study of different feature extractors

特征提取网络	$d$	AUROC (%)
ResNet18		97.1 $\rightarrow$ 97.9 $\rightarrow$ 97.2
Wide-ResNet50		97.9 $\rightarrow$ 96.2 $\rightarrow$ 93.6
Swin-B	224 $\rightarrow$ 448 $\rightarrow$ 768	96.9 $\rightarrow$ 97.8 $\rightarrow$ 95.4
EfficientNet-B7		98.7 $\rightarrow$ 99.1 $\rightarrow$ 99.5
EfficientNet-B5		98.8 $\rightarrow$ 99.3 $\rightarrow$ 99.7

下载: 导出CSV

表 6 不同子特征数的适应性实验

Table 6 Adaptation study of different subfeature numbers

子特征数	子特征图尺寸(像素)	AUROC (%)	Params (M)
2	$152 \times 24 \times 24$	96.21	9.42
4	$76 \times 24 \times 24$	99.72	14.06
6	$51 \times 24 \times 24$	99.79	15.74
8	$38 \times 24 \times 24$	99.79	16.43

下载: 导出CSV

参考文献(38)

[1]	Tran T M, Vu T N, Vo N D, Nguyen T V, Nguyen K. Anomaly analysis in images and videos: A comprehensive review. ACM Computing Surveys, 2022, 55(7): 1-37
[2]	Bergmann P, Fauser M, Sattlegger D, Steger C. MVTec AD——A comprehensive real-world dataset for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 9592−9600
[3]	Suganyadevi S, Seethalakshmi V, Balasamy K. A review on deep learning in medical image analysis. International Journal of Multimedia Information Retrieval, 2022, 11(1): 19-38 doi: 10.1007/s13735-021-00218-1
[4]	Li Y Y, Wu J, Bai X, Yang X P, Tan X, Li G B, et al. Multi-granularity tracking with modularlized components for unsupervised vehicles anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle, USA: IEEE, 2020. 586−587
[5]	Akcay S, Atapour-Abarghouei A, Breckon T P. GANomaly: Semi-supervised anomaly detection via adversarial training. In: Proceedings of the 14th Asian Conference on Computer Vision. Perth, Australia: Springer International Publishing, 2019. 622−637
[6]	马宾, 王一利, 徐健, 王春鹏, 李健, 周琳娜. 基于双向生成对抗网络的图像感知哈希算法. 电子学报, 2023, 51(5): 1405-1412 Ma Bin, Wang Yi-Li, Xu Jian, Wang Chun-Peng, Li Jian, Zhou Lin-Na. An image perceptual hash algorithm based on bidirectional generative adversarial network. Acta Electronica Sinica, 2023, 51(5): 1405-1412
[7]	Tang T W, Kuo W H, Lan J H, Ding C F, Hsu H, Young H T. Anomaly detection neural network with dual auto-encoders GAN and its industrial inspection applications. Sensors, 2020, 20(12): 3336 doi: 10.3390/s20123336
[8]	Shi Y, Yang J, Qi Z. Unsupervised anomaly segmentation via deep feature reconstruction. Neurocomputing, 2021, 424: 9-22 doi: 10.1016/j.neucom.2020.11.018
[9]	伍麟, 郝鸿宇, 宋友. 基于计算机视觉的工业金属表面缺陷检测综述. 自动化学报, DOI: 10.16383/j.aas.c230039 Wu Lin, Hao Hong-Yu, Song You. A review of metal surface defect detection based on computer vision. Acta Automatica Sinica, DOI: 10.16383/j.aas.c230039
[10]	Kingma D P, Welling M. Auto-encoding variational bayes. arXiv preprint arXiv: 1312.6114, 2013.
[11]	LeCun Y. Generalization and network design strategies. Connectionism in Perspective, 1989, 19(143-155): 18
[12]	Rudolph M, Wandt B, Rosenhahn B. Structuring autoencoders. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. Seoul, South Korea: IEEE, 2019.
[13]	Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial networks. Communications of the ACM, 2020, 63(11): 139-144 doi: 10.1145/3422622
[14]	吕承侃, 沈飞, 张正涛, 张峰. 图像异常检测研究现状综述. 自动化学报, 2022, 48(6): 1402-1428 Lv Cheng-Kan, Shen Fei, Zhang Zheng-Tao, Zhang Feng. Review of image anomaly detection. Acta Automatica Sinica, 2022, 48(6): 1402-1428
[15]	Bergman L, Hoshen Y. Classification-based anomaly detection for general data. arXiv preprint arXiv: 2005.02359, 2020.
[16]	Rippel O, Mertens P, Merhof D. Modeling the distribution of normal data in pre-trained deep features for anomaly detection. In: Proceedings of the 25th International Conference on Pattern Recognition. Milan, Italy: IEEE, 2021. 6726−6733
[17]	Rudolph M, Wehrbein T, Rosenhahn B, Wandt B. Fully convolutional cross-scale-flows for image-based defect detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, USA: IEEE, 2022. 1088−1097
[18]	Lei J, Hu X, Wang Y, Liu D. PyramidFlow: High-resolution defect contrastive localization using pyramid normalizing flow. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE, 2023. 14143−14152
[19]	Rezende D, Mohamed S. Variational inference with normalizing flows. In: Proceedings of the 32nd International Conference on Machine Learning. Lille, France: PMLR, 2015. 1530−1538
[20]	Dinh L, Sohl-Dickstein J, Bengio S. Density estimation using real NVP. arXiv preprint arXiv: 1605.08803, 2016.
[21]	He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 770−778
[22]	Cohen N, Hoshen Y. Sub-image anomaly detection with deep pyramid correspondences. arXiv preprint arXiv: 2005.02357, 2020.
[23]	Roth K, Pemula L, Zepeda J, Schölkopf B, Brox T, Gehler P. Towards total recall in industrial anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022. 14318−14328
[24]	Yu J, Zheng Y, Wang X, Li W, Wu Y, Zhao R, et al. FastFlow: Unsupervised anomaly detection and localization via 2D normalizing flows. arXiv preprint arXiv: 2111.07677, 2021.
[25]	Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Houlsby N. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv preprint arXiv: 2010.11929, 2020.
[26]	Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th International Conference on Machine Learning. Los Angeles, USA: PMLR, 2019.6105−6114
[27]	Lee S, Lee S, Song B C. CFA: Coupled-hypersphere-based feature adaptation for target-oriented anomaly localization. IEEE Access, 2022, 10: 78446-78454 doi: 10.1109/ACCESS.2022.3193699
[28]	Defard T, Setkov A, Loesch A, Audigier R. PaDiM: A patch distribution modeling framework for anomaly detection and localization. In: Proceedings of the 25th International Conference on Pattern Recognition Workshops and Challenges. Cham, Switzerland: Springer, 2021. 475−489
[29]	Yi J, Yoon S. Patch SVDD: Patch-level SVDD for anomaly detection and segmentation. In: Proceedings of the 15th Asian Conference on Computer Vision. Kyoto, Japan: Springer, 2020. 375−390
[30]	Li C L, Sohn K, Yoon J, Pfister T. CutPaste: Self-supervised learning for anomaly detection and localization. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE, 2021. 9664−9674
[31]	Napoletano P, Piccoli F, Schettini R. Anomaly detection in nanofibrous materials by CNN-based self-similarity. Sensors, 2018, 18(1): 209 doi: 10.1109/JSEN.2017.2771313
[32]	Zagoruyko S, Komodakis N. Wide residual networks. arXiv preprint arXiv: 1605.07146, 2016.
[33]	Rudolph M, Wandt B, Rosenhahn B. Same same but differnet: Semi-supervised defect detection with normalizing flows. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, USA: IEEE, 2021. 1907−1916
[34]	Gudovskiy D, Ishizaka S, Kozuka K. CFlow-AD: Real-time unsupervised anomaly detection with localization via conditional normalizing flows. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, USA: IEEE, 2022. 98−107
[35]	Jia D, Wei D, Socher R, Li L J, Kai L, Li F F. Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE, 2009. 248−255
[36]	Mishra P, Verk R, Fornasier D, Piciarelli C, Foresti G L. VT-ADL: A vision transformer network for image anomaly detection and localization. In: Proceedings of the 30th International Symposium on Industrial Electronics. Kyoto, Japan: IEEE, 2021. 1−6
[37]	Fawcett T. ROC graphs: Notes and practical considerations for researchers. Machine Learning, 2004, 31(1): 1-38
[38]	Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, et al. Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021. 10012−10022