Study of Missing Value Imputation in Wind Turbine Data Based on Multivariate Spatiotemporal Integration Network
-
摘要: 风电场数据的完整性会因恶劣天气、输入信号丢失、传感器故障等原因遭到破坏, 而大面积的数据缺失将给风机设备的运行和维护带来严峻考验. 因此, 提出一个多变量时空融合网络(Multivariate spatiotemporal integration network, MSIN)来解决缺失数据问题. 首先, 提出包含缺失值定位−指引机制的MSIN结构, 揭示缺失部分数据的潜在信息, 确保插补数据符合真实分布. 其次, 在网络中设计多视角时空卷积模块, 捕捉同一风机多个变量与多个风机同一变量之间的局部空间和全局时间相关性, 用于提高插补数据的真实性. 接着, 提出网络实时自更新机制, 根据风电场实时变化情况实现在线调整, 能够提升网络泛化能力, 由此弥补重新训练模型的时间和空间成本高的缺陷. 最后, 通过真实的风机数据验证所提网络的有效性和优越性. 相关分析结果表明, 相较于MissForest等传统数据插补方法的插补性能, 平均绝对误差(Mean absolute error, MAE)、平均绝对百分比误差(Mean absolute percentage error, MAPE)和均方根误差(Root mean square error, RMSE)分别下降 18.54%、41.00% 和 3.15% 以上.Abstract: The integrity of wind farm data can be damaged by bad weather, input signal loss, sensor failure, etc., and the large-scale data loss will bring severe tests to the operation and maintenance of wind turbine equipment. Therefore, this paper proposes a multivariate spatiotemporal integration network (MSIN) to solve the missing data problem. Firstly, the structure of MSIN is proposed to include a localization guidance mechanism for missing values, which reveals the potential information of the missing part of the data and ensures that the imputed data conforms to the true distribution. Secondly, a multi-view spatiotemporal convolution module is designed in the network to capture the local spatial and global temporal correlations between multiple variables of the same wind turbine and the same variable of multiple wind turbines, which is used to improve the realism of the imputed data. Then, a real-time self-updating mechanism is proposed to adjust the network online according to the real-time changes of wind farms, which can improve the generalization ability of the network and thus make up for the defect of high time and space costs when retraining the model. Finally, the effectiveness and superiority of the proposed network are verified by real wind turbine data. The results show that the mean absolute error (MAE), the mean absolute percentage error (MAPE), and the root mean square error (RMSE) are reduced by more than 18.54%, 41.00% and 3.15%, respectively, when compared with the traditional data imputation methods such as MissForest and so on.
-
表 1 风机变量
Table 1 The variables of wind turbine
编号 变量 编号 变量 1 轮毂转速 14 风电机定子温度1 2 叶片桨距角1 15 风电机定子温度2 3 叶片桨距角2 16 风电机定子温度3 4 叶片桨距角3 17 风电机定子温度4 5 节点X方向振动值 18 风电机定子温度5 6 节点Y方向振动值 19 风电机定子温度6 7 电网侧输出功率 20 发电机输出功率 8 风向偏移角度 21 轮毂角度 9 速度传感器 22 发电机转矩 10 ISU温度 23 INU RMIO 温度 11 发电机环境温度1 24 齿轮箱前轴承温度 发电机环境温度2 齿轮箱后轴承温度 12 机舱温度 25 INU温度 13 风速 26 风向 表 2 不同提示率下的评估结果
Table 2 Evaluation results under different hint-rates
提示率 MAE MAPE RMSE 0.10 0.1549 3.0010 0.2396 0.20 0.1552 2.9599 0.2398 0.30 0.1557 2.3107 0.2384 0.40 0.1564 2.2437 0.2401 0.50 0.1552 3.3131 0.2390 0.60 0.1555 2.2019 0.2400 0.70 0.1577 2.2831 0.2398 0.80 0.1543 2.8454 0.2397 0.90 0.1541 1.1783 0.2381 0.95 0.1561 1.9770 0.2391 表 3 不同$ \alpha $下的评估结果
Table 3 Evaluation results under different$ \alpha $
$ \alpha $ MAE MAPE RMSE 0.0001 0.6231 27135.3668 0.4956 0.0010 0.4983 128671.0614 0.6251 0.0100 0.4963 42939.8706 0.6236 0.1000 0.4967 167721.3201 0.6238 1 0.3625 229.8665 0.4843 10 0.1805 23.6173 0.2644 100 0.1539 5.4836 0.2321 1000 0.1518 5.7790 0.2488 表 4 不同$ \beta $下的评估结果
Table 4 Evaluation results under different$ \beta $
$ \beta $ MAE MAPE RMSE 0.0001 0.1532 1.2270 0.2320 0.0010 0.1505 2.3903 0.2290 0.0100 0.1507 2.3558 0.2274 0.1000 0.1499 1.9291 0.2268 1 0.1530 4.0830 0.2319 10 0.1801 23.7244 0.2641 100 0.3652 237.1457 0.4874 1000 0.4970 35792.8434 0.6240 表 5 不同学习率下的评估结果
Table 5 Evaluation results under different learning rates
学习率 MAE MAPE RMSE 0.0001 0.2121 1.7066 0.2941 0.0010 0.1521 1.4009 0.2295 0.0100 0.4272 4.2201 0.5652 0.1000 0.4264 7.0552 0.5648 1 0.4302 5.2400 0.5676 10 0.4269 7.8907 0.5646 100 0.4272 9.6068 0.5657 1000 0.4298 6.7900 0.5674 表 6 风机数据在不同缺失率下的评价指标结果
Table 6 Results of evaluation metrics for wind turbine data with different missing rates
缺失率 MAE MAPE RMSE max min avg max min avg max min avg 0.1 0.1653 0.0822 0.1179 3.8283 1.2530 2.3968 0.2432 0.1556 0.1877 0.2 0.1768 0.1052 0.1298 3.7203 1.1687 2.4970 0.2656 0.1724 0.2032 0.3 0.1914 0.1127 0.1409 3.7355 1.2704 2.6702 0.2768 0.1884 0.2186 0.4 0.1791 0.1079 0.1356 3.5158 1.2851 2.6973 0.2841 0.1920 0.2244 0.5 0.1881 0.1217 0.1418 3.6810 1.2905 2.7583 0.2654 0.2068 0.2269 0.6 0.1968 0.1386 0.1544 3.7117 1.2130 2.7925 0.2823 0.2239 0.2753 0.7 0.1994 0.1789 0.1629 3.8964 1.2025 2.8347 0.2833 0.2353 0.2538 0.8 0.1999 0.1625 0.1787 3.9935 1.2148 2.8559 0.3004 0.2465 0.2734 表 7 七种插补方法一次迭代的运行时间(s)
Table 7 Running time of the seven imputation methods for one iteration (s)
插补方法 缺失率 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 MSIN 4.3156 4.7167 4.9595 5.1400 5.1159 4.9905 5.1656 5.0997 TimeGAN[28] 6.5895 6.6172 7.3519 8.8907 7.7120 8.4728 7.8757 8.3546 M-RNN[29] 81.1218 70.8649 69.9753 67.5593 69.0319 68.2631 71.2586 68.9668 MIRACLE[30] 0.2554 0.3761 0.3752 0.3925 0.3879 0.3692 0.3712 0.3941 MICE[31] 2.5963 2.1705 2.1164 2.7042 2.2922 2.3221 2.6145 2.5653 MissForest[32] 0.5963 0.5771 0.7897 0.7921 0.8396 0.9587 0.9132 0.8527 LGDI[33] 15.6514 14.0879 15.8731 16.3439 14.9822 17.3042 15.9346 17.8468 -
[1] 胡旭光, 马大中, 郑君, 张化光, 王睿. 基于关联信息对抗学习的综合能源系统运行状态分析方法. 自动化学报, 2020, 46(9): 1783−1797Hu Xu-Guang, Ma Da-Zhong, Zheng Jun, Zhang Hua-Guang, Wang Rui. An operation state analysis method for integrated energy system based on correlation information adversarial learning. Acta Automatica Sinica, 2020, 46(9): 1783−1797 [2] 王睿, 孙秋野, 张化光. 微电网的电流均衡/电压恢复自适应动态规划策略研究. 自动化学报, 2022, 48(2): 479−491Wang Rui, Sun Qiu-Ye, Zhang Hua-Guang. Research on current sharing/voltage recovery based adaptive dynamic programming control strategy of microgrids. Acta Automatica Sinica, 2022, 48(2): 479−491 [3] 李远征, 倪质先, 段钧韬, 徐磊, 杨涛, 曾志刚. 面向高比例新能源电网的重大耗能企业需求响应调度. 自动化学报, 2023, 49(4): 754−768Li Yuan-Zheng, Ni Zhi-Xian, Duan Jun-Tao, Xu Lei, Yang Tao, Zeng Zhi-Gang. Demand response scheduling of major energy-consuming enterprises based on a high proportion of renewable energy power grid. Acta Automatica Sinica, 2023, 49(4): 754−768 [4] Hu X G, Zhang H G, Ma D Z, Wang R. Hierarchical pressure data recovery for pipeline network via generative adversarial networks. IEEE Transactions on Automation Science and Engineering, 2022, 19(3): 1960−1970 doi: 10.1109/TASE.2021.3069003 [5] 张博玮, 郑建飞, 胡昌华, 裴洪, 董青. 基于流模型的缺失数据生成方法在剩余寿命预测中的应用. 自动化学报, 2023, 49(1): 185−196Zhang Bo-Wei, Zheng Jian-Fei, Hu Chang-Hua, Pei Hong, Dong Qing. Missing data generation method based on flow model and its application in remaining life prediction. Acta Automatica Sinica, 2023, 49(1): 185−196 [6] 杜党波, 张伟, 胡昌华, 周志杰, 司小胜, 张建勋. 含缺失数据的小波−卡尔曼滤波故障预测方法. 自动化学报, 2014, 40(10): 2115−2125Du Dang-Bo, Zhang Wei, Hu Chang-Hua, Zhou Zhi-Jie, Si Xiao-Sheng, Zhang Jian-Xun. A failure prognosis method based on wavelet-Kalman filtering with missing data. Acta Automatica Sinica, 2014, 40(10): 2115−2125 [7] Jin X H, Wang H, Kong Z Q, Xu Z W, Qiao W. Condition monitoring of wind turbine generators using SCADA data analysis. IEEE Transactions on Sustainable Energy, 2021, 12(1): 202−210 doi: 10.1109/TSTE.2020.2989220 [8] Liu Z P, Wang X F, Zhang L. Fault diagnosis of industrial wind turbine blade bearing using acoustic emission analysis. IEEE Transactions on Instrumentation and Measurement, 2020, 69(9): 6630−6639 doi: 10.1109/TIM.2020.2969062 [9] 刘畅, 郎劲. 基于混核LSSVM的批特征风功率预测方法. 自动化学报, 2020, 46(6): 1264−1273Liu Chang, Lang Jin. Wind power prediction method using hybrid kernel LSSVM with batch feature. Acta Automatica Sinica, 2020, 46(6): 1264−1273 [10] 孔小兵, 刘向杰. 双馈风力发电机非线性模型预测控制. 自动化学报, 2013, 39(5): 636−643Kong Xiao-Bing, Liu Xiang-Jie. Nonlinear model predictive control for DFIG-based wind power generation. Acta Automatica Sinica, 2013, 39(5): 636−643 [11] Peng Y Y, Qiao W, Qu L Y. Compressive sensing-based missing-data-tolerant fault detection for remote condition monitoring of wind turbines. IEEE Transactions on Industrial Electronics, 2022, 69(2): 1937−1947 doi: 10.1109/TIE.2021.3057039 [12] Coville A, Siddiqui A, Vogstad K O. The effect of missing data on wind resource estimation. Energy, 2011, 36(7): 4505−4517 doi: 10.1016/j.energy.2011.03.067 [13] Liu X, Zhang Z J. A two-stage deep autoencoder-based missing data imputation method for wind farm SCADA data. IEEE Sensors Journal, 2021, 21(9): 10933−10945 doi: 10.1109/JSEN.2021.3061109 [14] 许美玲, 邢通, 韩敏. 基于时空Kriging方法的时空数据插值研究. 自动化学报, 2020, 46(8): 1681−1688Xu Mei-Ling, Xing Tong, Han Min. Spatial-temporal data interpolation based on spatial-temporal Kriging method. Acta Automatica Sinica, 2020, 46(8): 1681−1688 [15] Ma D Z, Hu X G, Zhang H G, Sun Q Y, Xie X P. A hierarchical event detection method based on spectral theory of multidimensional matrix for power system. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2021, 51(4): 2173−2186 doi: 10.1109/TSMC.2019.2931316 [16] Hu X G, Zhang H G, Ma D Z, Wang R. A tnGAN-based leak detection method for pipeline network considering incomplete sensor data. IEEE Transactions on Instrumentation and Measurement, 2020, 70: Article No. 3510610 [17] Mostafa S M. Imputing missing values using cumulative linear regression. CAAI Transactions on Intelligence Technology, 2019, 4(3): 182−200 doi: 10.1049/trit.2019.0032 [18] Razavi-Far R, Cheng B Y, Saif M, Ahmadi M. Similarity-learning information-fusion schemes for missing data imputation. Knowledge-based Systems, 2020, 187: Article No. 104805 doi: 10.1016/j.knosys.2019.06.013 [19] Ye C, Wang H Z, Lu W B, Li J Z. Effective Bayesian-network-based missing value imputation enhanced by crowdsourcing. Knowledge-based Systems, 2020, 190: Article No. 105199 doi: 10.1016/j.knosys.2019.105199 [20] Zhang Z H. Multiple imputation with multivariate imputation by chained equation (MICE) package. Annals of Translational Medicine, 2016, 4(2): Article No. 30 [21] 文成林, 吕菲亚, 包哲静, 刘妹琴. 基于数据驱动的微小故障诊断方法综述. 自动化学报, 2016, 42(9): 1285−1299Wen Cheng-Lin, Lv Fei-Ya, Bao Zhe-Jing, Liu Mei-Qin. A review of data driven-based incipient fault diagnosis. Acta Automatica Sinica, 2016, 42(9): 1285−1299 [22] Tak S, Woo S, Yeo H. Data-driven imputation method for traffic data in sectional units of road links. IEEE Transactions on Intelligent Transportation Systems, 2016, 17(6): 1762−1771 doi: 10.1109/TITS.2016.2530312 [23] Folguera L, Zupan J, Cicerone D, Magallanes J F. Self-organizing maps for imputation of missing data in incomplete data matrices. Chemometrics and Intelligent Laboratory Systems, 2015, 143: 146−151 doi: 10.1016/j.chemolab.2015.03.002 [24] Pan H, Ye Z, He Q Y, Yan C Y, Yuan J Y, Lai X D, et al. Discrete missing data imputation using multilayer perceptron and momentum gradient descent. Sensors, 2022, 22(15): Article No. 5645 doi: 10.3390/s22155645 [25] Khan H, Wang X Z, Liu H. Handling missing data through deep convolutional neural network. Information Sciences, 2022, 595: 278−293 doi: 10.1016/j.ins.2022.02.051 [26] Yu B, Yin H T, Zhu Z X. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv preprint arXiv: 1709.04875, 2018. [27] Zhang J B, Zheng Y, Qi D K. Deep spatio-temporal residual networks for citywide crowd flows prediction. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI Press, 2017. 1655−1661 [28] Yoon J, Jarrett D, Schaar M V D. Time-series generative adversarial networks. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc., 2019. 5508−5518 [29] Yoon J, Zame W R, Schaar M V D. Estimating missing data in temporal data streams using multi-directional recurrent neural networks. IEEE Transactions on Biomedical Engineering, 2019, 66(5): 1477−1490 doi: 10.1109/TBME.2018.2874712 [30] Kyono T, Zhang Y, Bellot A, Schaar M V D. MIRACLE: Causally-aware imputation via learning missing data mechanisms. arXiv preprint arXiv: 2111.03187, 2021. [31] Zhang Y F, Thorburn P J, Xiang W, Fitch P. SSIM——A deep learning approach for recovering missing time series sensor data. IEEE Internet of Things Journal, 2019, 6(4): 6618−6628 doi: 10.1109/JIOT.2019.2909038 [32] Li Z G, He Q. Prediction of railcar remaining useful life by multiple data source fusion. IEEE Transactions on Intelligent Transportation Systems, 2015, 16(4): 2226−2235 doi: 10.1109/TITS.2015.2400424 [33] Wu R, Hamshaw S D, Yang L, Kincaid D W, Etheridge R, Ghasemkhani A. Data imputation for multivariate time series sensor data with large gaps of missing data. IEEE Sensors Journal, 2022, 22(11): 10671−10683 doi: 10.1109/JSEN.2022.3166643