Two-stage Multi-teacher Knowledge Distillation for Industrial Process Fault Detection
-
摘要: 现代工业过程数据具有大容量、高维度及复杂相关性等特征, 单一多元统计监测方法难以兼顾不同类型特征的监测需求. 现有多模型融合方法与深度学习技术虽能提升故障检测性能, 但前者依赖模型库构建, 难以统一建模, 后者存在结构复杂与参数冗余问题. 针对上述问题, 提出一种基于两阶段多教师知识蒸馏的工业过程建模与故障检测方法. 该方法通过蒸馏框架将核主成分分析与独立成分分析提取的异构知识内化至学生自编码器模型中, 实现非线性与非高斯特征的统一建模, 并通过两阶段蒸馏协同优化特征空间与重构空间. 第一阶段在特征层蒸馏以引导学生模型学习教师模型的特征分布, 第二阶段在重构层蒸馏以提升模型对过程变化的表征与重构能力. 在田纳西—伊斯曼仿真过程及合成氨实际过程上的实验结果表明, 该方法能够有效提升故障检测的准确性与鲁棒性, 并通过离线知识蒸馏实现在线阶段的统一建模与高效监测.Abstract: Modern industrial process data are characterized by large scale, high dimensionality, and complex correlations, making it difficult for a single multivariate statistical monitoring method to simultaneously address diverse monitoring requirements. Although multi-model fusion methods and deep learning techniques can improve fault detection performance, the former relies on the construction of model libraries and lacks unified modeling capability, while the latter suffers from complex structures and parameter redundancy. To address these issues, a two-stage multi-teacher knowledge distillation method for industrial process modeling and fault detection is proposed. In this framework, heterogeneous knowledge extracted by kernel principal component analysis and independent component analysis is embedded into a student autoencoder model, enabling unified modeling of nonlinear and non-Gaussian characteristics. A two-stage distillation strategy is further adopted to collaboratively optimize the feature space and reconstruction space. In the first stage, feature-level distillation guides the student model to learn the feature distributions of the teacher models. In the second stage, reconstruction-level distillation is performed to enhance the model's capability in representing and reconstructing process variations. Experiments on the Tennessee Eastman simulation process and a real ammonia synthesis process demonstrate that the proposed method can effectively improve fault detection accuracy and robustness, while achieving unified modeling and efficient online monitoring through offline knowledge distillation.
-
表 1 21个TE过程故障的FDR——基于ICA、KPCA、AE与多教师蒸馏方法(%)
Table 1 FDR of 21 TE process faults detected by ICA, KPCA, AE, and multi-teacher distillation methods (%)
故障
编号ICA KPCA AE MTDM $ I^{2(\tau_1)} $ $ Q^{(\tau_1)} $ $ T^{2(\tau_2)} $ $ Q^{(\tau_2)} $ $ T^{2(S)} $ $ Q^{(S)} $ $ T^{2(S)} $ $ Q^{(S)} $ F1 99.75 99.75 99.75 99.25 99.50 99.88 99.88 99.50 F2 98.62 98.88 98.75 98.12 98.75 98.88 98.88 98.38 F3 8.75 8.50 7.88 9.62 10.12 5.62 7.88 6.25 F4 100.00 100.00 100.00 9.62 72.88 100.00 99.50 96.38 F5 100.00 100.00 28.62 31.25 34.88 34.50 100.00 100.00 F6 100.00 100.00 99.50 99.50 99.25 100.00 100.00 100.00 F7 100.00 100.00 100.00 66.75 100.00 100.00 100.00 100.00 F8 98.38 98.12 98.12 97.25 97.50 95.12 98.38 97.50 F9 8.75 6.50 6.38 7.88 9.38 5.50 6.00 7.75 F10 90.50 90.62 54.75 49.25 49.12 56.38 88.00 81.50 F11 78.38 78.25 79.25 27.62 64.38 61.38 73.25 66.88 F12 99.88 99.88 99.12 97.25 99.00 97.12 99.75 99.62 F13 95.38 95.50 95.50 94.25 95.62 95.25 95.62 94.62 F14 100.00 100.00 100.00 93.00 100.00 97.88 100.00 99.88 F15 19.88 17.50 13.25 15.62 12.38 8.75 14.00 8.75 F16 92.12 93.88 37.00 34.88 32.00 51.88 90.62 83.38 F17 96.25 96.25 96.00 76.12 85.50 96.50 95.38 91.62 F18 90.38 91.00 91.25 89.50 90.38 90.38 91.00 91.50 F19 86.50 90.75 18.88 1.75 25.50 25.37 86.62 73.88 F20 90.38 90.75 68.38 41.38 50.25 61.00 78.38 75.88 F21 64.88 61.62 54.37 30.12 48.38 54.87 57.63 37.00 均值 81.85 81.80 68.89 55.71 65.47 68.39 80.04 76.68 表 2 合成氨实验案例的数据划分设置
Table 2 Data division settings for ammonia synthesis experimental cases
案例编号 训练集 验证集 测试集 工况切换点 Case1 1$ \sim $300 301$ \sim $560 561$ \sim $ 2000 140 Case2 1$ \sim $550 551$ \sim $800 801$ \sim $ 2000 200 表 3 2个合成氨过程一段炉案例的FDR——基于ICA、KPCA、AE与多教师蒸馏方法(%)
Table 3 FDR of 2 primary reformer cases in the ammonia synthesis process detected by ICA, KPCA, AE, and multi-teacher distillation methods (%)
案例
编号ICA KPCA AE MTDM $ I^{2(\tau_1)} $ $ Q^{(\tau_1)} $ $ T^{2(\tau_2)} $ $ Q^{(\tau_2)} $ $ T^{2(S)} $ $ Q^{(S)} $ $ T^{2(S)} $ $ Q^{(S)} $ Case1 72.77 91.92 90.77 96.46 100.00 14.69 89.54 86.46 Case2 82.50 83.60 86.70 92.80 100.00 82.70 84.10 85.20 均值 77.64 87.76 88.74 94.63 100.00 48.70 86.82 85.83 表 4 2个合成氨过程一段炉案例的FAR——基于ICA、KPCA、AE与多教师蒸馏方法(%)
Table 4 FAR of 2 primary reformer cases in the ammonia synthesis process detected by ICA, KPCA, AE, and multi-teacher distillation methods(%)
案例
编号ICA KPCA AE MTDM $ I^{2(\tau_1)} $ $ Q^{(\tau_1)} $ $ T^{2(\tau_2)} $ $ Q^{(\tau_2)} $ $ T^{2(S)} $ $ Q^{(S)} $ $ T^{2(S)} $ $ Q^{(S)} $ Case1 0 0 0.70 0 99.28 0 0 0 Case2 0 0 10.00 4.00 97.00 5.00 5.00 5.50 均值 0 0 5.35 2.00 98.14 2.50 2.50 2.75 表 5 MTDM与MMSF在TE过程代表性故障上的FDR对比(%)
Table 5 FDR comparison between MTDM and MMSF on representative TE process faults (%)
故障编号 MTDM MMSF $ T^{2(S)} $ $ Q^{(S)} $ $ T^{2(F)} $ $ Q^{(F)} $ F4 99.50 96.38 99.88 64.62 F10 88.00 81.50 86.00 81.88 F19 86.62 73.88 73.00 51.50 -
[1] Wang N, Yang F, Zhang R, Gao F. Intelligent fault diagnosis for chemical processes using deep learning multimodel fusion. IEEE Transactions on Cybernetics, 2020, 52(7): 7121−7135 doi: 10.1109/tcyb.2020.3038832 [2] Zhang X, Huang T, Wu B, Hu Y, Huang S, Zhou Q, et al. Multi-model ensemble deep learning method for intelligent fault diagnosis with high-dimensional samples. Frontiers of Mechanical Engineering, 2021, 16(2): 340−352 doi: 10.1007/s11465-021-0629-3 [3] Buciluǎ C, Caruana R, Niculescu-Mizil A. Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2006. 535−541 [4] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv preprint arXiv: 1503.02531, 2015. [5] Song J, Chen Y, Ye J W, Song M L. Spot-adaptive knowledge distillation. IEEE Transactions on Image Processing, 2022, 31: 3359−3370 doi: 10.1109/TIP.2022.3170728 [6] Li Y C, Wang X Y, Xu W C, Wang H Z, Qi Y N, Dong J H, et al. Feature distillation is the better choice for model-heterogeneous federated learning. arXiv preprint arXiv: 2507.10348, 2025 [7] Mansourian A M, Jalali A, Ahmadi R, Kasaei S. Attention-guided feature distillation for semantic segmentation. arXiv preprint arXiv: 2403.05451, 2024 [8] Dai T, Lin Y, Guo H, Wang J B, Zhu Z X. DCSF-KD: Dynamic channel-wise spatial feature knowledge distillation for object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence. Philadelphia, USA: AAAI Press, 2025. 2627−2635 [9] Mishra D, Uikey R. Unified knowledge distillation framework: Fine-grained alignment and geometric relationship preservation for deep face recognition. In: Proceedings of the 2025 IEEE International Joint Conference on Biometrics (IJCB). Osaka, Japan: IEEE, 2025. 1−10 [10] Karine A, Napoléon T, Jridi M. I2CKD: Intra-and inter-class knowledge distillation for semantic segmentation. Neurocomputing, 2025, 649: 130791 doi: 10.1016/j.neucom.2025.130791 [11] Mansourian A M, Ahmadi R, Ghafouri M, Babaei A M, Golezani E B, Ghamchi Z Y, et al. A comprehensive survey on knowledge distillation. arXiv preprint arXiv: 2503.12067, 2025. [12] Zhang W F, Biswas G, Zhao Q, Zhao H B, Feng W Q. Knowledge distilling based model compression and feature learning in fault diagnosis. Applied Soft Computing, 2020, 88: 105958 doi: 10.1016/j.asoc.2019.105958 [13] Petrosian O, Pengyi L, Yulong H, Jiarui L, Zhaoruikun S, Guofeng F, et al. DKDL-Net: A lightweight bearing fault detection model via decoupled knowledge distillation and low-rank adaptation fine-tuning. arXiv preprint arXiv: 2406.06653, 2024. [14] Ai M, Xie Y, Ding S X, Tang Z, Gui W. Domain knowledge distillation and supervised contrastive learning for industrial process monitoring. IEEE Transactions on Industrial Electronics, 2022, 70(9): 9452−9462 doi: 10.1109/tie.2022.3206696 [15] Liu Y, Huang J J, Jia M W. Knowledge distillation-based zero-shot learning for process fault diagnosis. Advanced Intelligent Systems, 2025, 7(6): 2400828 [16] Qian J C, Song Z H, Yao Y, Zhu Z R, Zhang X M. A review on autoencoder based representation learning for fault detection and diagnosis in industrial processes. Chemometrics and Intelligent Laboratory Systems, 2022, 231: 104711 doi: 10.1016/j.chemolab.2022.104711 [17] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18(7): 1527−1554 doi: 10.1162/neco.2006.18.7.1527 [18] Romero A, Ballas N, Kahou S E, Chassang A, Gatta C, Bengio Y. FitNets: Hints for thin deep nets. arXiv preprint arXiv: 1412.6550, 2014 [19] Schölkopf B, Smola A, Müller K R. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 1998, 10(5): 1299−1319 doi: 10.1162/089976698300017467 [20] Lee J M, Yoo C K, Choi S W, Vanrolleghem P A, Lee I B. Nonlinear process monitoring using kernel principal component analysis. Chemical Engineering Science, 2004, 59(1): 223−234 doi: 10.1016/j.ces.2003.09.012 [21] Deng X G, Tian X M, Chen S, Harris C J. Deep principal component analysis based on layerwise feature extraction and its application to nonlinear process monitoring. IEEE Transactions on Control Systems Technology, 2018, 27(6): 2526−2540 [22] Kong X Y, Ge Z Q. Deep learning of latent variable models for industrial process monitoring. IEEE Transactions on Industrial Informatics, 2021, 18(10): 6778−6788 doi: 10.1109/tii.2021.3134251 [23] Downs J J, Vogel E F. A plant-wide industrial process control problem. Computers & Chemical Engineering, 1993, 17(3): 245−255 doi: 10.1016/0098-1354(93)80018-i [24] Zheng J H, Yang Z Y, Ge Z Q. Deep residual principal component analysis as feature engineering for industrial data analytics. IEEE Transactions on Instrumentation and Measurement, 2024, 73: 1−10 [25] Chiang L H, Russell E L, Braatz R D. Fault Detection and Diagnosis in Industrial Systems. London, UK: Springer Science & Business Media, 2012. [26] Ku W F, Storer R H, Georgakis C. Disturbance detection and isolation by dynamic principal component analysis. Chemometrics and Intelligent Laboratory Systems, 1995, 30(1): 179−196 doi: 10.1016/0169-7439(95)00076-3 [27] Bao D, Wang Y J, Li S H. Dynamic graph embedding PCA to extract spatio-temporal information for fault detection. IEEE Transactions on Industrial Informatics, 2025, 21(2): 1714−1723 doi: 10.1109/TII.2024.3485805 [28] Chen Y Q, Zhang R D. Deep multiscale convolutional model with multihead self-attention for industrial process fault diagnosis. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2025, 55(4): 2503−2512 doi: 10.1109/TSMC.2024.3523708 [29] Shen B B, Jiang X Y, Yao L, Zeng J S. Gaussian mixture TimeVAE for industrial soft sensing with deep time series decomposition and generation. Journal of Process Control, 2025, 147: 103355 doi: 10.1016/j.jprocont.2024.103355 [30] Wang J B, Shao W M, Song Z H. Student’s-t mixture regression-based robust soft sensor development for multimode industrial processes. Sensors, 2018, 18(11): 3968 doi: 10.3390/s18113968 [31] Zheng J H, Zhou L, Lyu Y T, Yang Z Y, Ge Z Q. Multi-rate data distillation for deep process monitoring. IEEE Transactions on Instrumentation and Measurement, 2025, 74: 1−10 doi: 10.1109/tim.2025.3571076 [32] Zheng J, Zhao C H, Gao F. Retrospective comparison of several typical linear dynamic latent variable models for industrial process monitoring. Computers & Chemical Engineering, 2022, 157: 107587 doi: 10.1016/j.compchemeng.2021.107587 -
计量
- 文章访问数: 11
- HTML全文浏览量: 9
- 被引次数: 0
下载: