-
摘要: 生存分析旨在预测某个感兴趣事件发生前的延续等待时间, 已广泛应用于临床治疗中患者的生存状态分析. 然而, 受限于研究代价高昂和环境因素的影响, 现有的生存分析方法不可避免地面临着高维小样本挑战以及复杂环境所引起的噪声敏感等问题. 为了克服上述缺陷, 本文提出一类噪声容错弱监督直推式矩阵补全(Weakly supervised transductive matrix completion, WSTMC)生存分析方法. 该方法首先将生存分析问题建模为多任务直推式矩阵补全模型, 然后引入高斯混合分布拟合真实数据中的复杂噪声以减轻模型的噪声敏感性, 同时设计了一类多任务直推式特征选择机制来缓解高维小样本所带来的过拟合缺陷. 此外, 设计了一类有效的拟期望最大化优化算法用于求解所提出的WSTMC模型. 最后, 5个微阵列基因表达数据集上的实验结果证实了所提出的WSTMC模型优于当前广泛使用的18种生存分析方法.Abstract: Survival analysis aims to predict the time of an event of interest, which has been widely applied in survival status prediction of patients in clinical treatment. However, limited to the high cost of research and the influence of environments, the existing survival analysis methods inevitably suffer the over-fitting defects caused by high dimensional small-sample-size and the noise sensitivity caused by complex environments. To address these challenges, we propose a novel noise-tolerant weakly supervised transductive matrix completion (WSTMC) model to predict survival statuses of the censored instances and new instances. Specifically, we first formulate the original survival analysis problem as a multitask matrix completion model. Then the MoG (mixture of Gaussians distribution) model is employed to fit the unknown complex noise, and thus alleviate the noise sensitivity. Meanwhile, we design a novel multitask transductive feature selection mechanism to adaptively select the sharing discriminant features across tasks. Furthermore, an efficient expectation-maximization-like optimization algorithm is designed to solve the proposed WSTMC model. Finally, the experimental results conducted on 5 real microarray gene expression datasets verify that our proposed WSTMC model outperforms 18 widely-used competing methods.
-
表 1 WSTMC及其他相关模型的时间复杂度比较
Table 1 Time complexity comparison of the proposed WSTMC and the other related models
模型 时间复杂度 Multi-LASSO[27] ${\rm{O} }\left({dtm}_{{\rm{tr}}}\right)$ Multi-${\ell }_{{2,1}}$[27] ${\rm{O} }\left({dtm}_{{\rm{tr}}}\right)$ MTLSA[7] ${\rm{O} }\left(N{dtm}_{{\rm{tr}}}\right)$ MTLSA.V2[7] ${\rm{O} }\left(N{dtm}_{{\rm{tr}}}\right)$ MTMC[8] ${\rm{O} }(Nmd\;{\rm{m} }{\rm{i} }{\rm{n} }\{m,d\left\}\right)$ NLMC[28] ${\rm{O}}\left(Nmdt\right)$ WSTMC ${\rm{O} }\left({N}_{{\rm{EM}}}{N}_{{\rm{BPL}}}\right(m{d}^{2}+{m}^{2}d\left)\right)$ 注: $m$表示样本数 (包括训练样本和测试样本); ${m}_{{\rm{tr}}}$表示训练样本数; $d$表示样本特征维数; $t$表示任务数; $N$表示迭代次数. 表 2 实验所用数据集概述
Table 2 Details of datasets used in this study
Dataset #Instances #Features #Censored #Labels #Ratios NSBCD 115 549 77 188 0.2094 DBCD 295 4919 216 18 0.0599 Lung 86 7129 62 110 0.0120 DLBCL 240 7399 102 21 0.0137 MCL 92 8810 28 14 0.0104 表 3 对比模型的特征比较
Table 3 Comparison of characteristics for the competing models
噪声容错性 直推式学习机制 时序稳定性 自适应特征选择 多任务学习机制 COX $\times$ $\times$ $\surd$ $\times$ $\times$ LASSO-COX $\times$ $\times$ $\surd$ $\surd$ $\times$ EN-COX $\times$ $\times$ $\surd$ $\surd$ $\times$ Cox-${l}_{{2,1}}$ $\times$ $\times$ $\surd$ $\surd$ $\times$ Cox-Trace $\times$ $\times$ $\surd$ $\times$ $\times$ Logistic $\times$ $\times$ $\surd$ $\times$ $\times$ Weibull $\times$ $\times$ $\surd$ $\times$ $\times$ Log-gaussian $\times$ $\times$ $\surd$ $\times$ $\times$ Log-logistic $\times$ $\times$ $\surd$ $\times$ $\times$ OLS $\times$ $\times$ $\times$ $\times$ $\times$ Tobit $\times$ $\times$ $\surd$ $\times$ $\times$ RWRSS $\times$ $\times$ $\surd$ $\surd$ $\times$ Multi-LASSO $\times$ $\times$ $\times$ $\times$ $\surd$ Multi-${l}_{{2,1}}$ $\times$ $\times$ $\times$ $\surd$ $\surd$ MTLSA $\times$ $\times$ $\surd$ $\surd$ $\surd$ MTLSA.V2 $\times$ $\times$ $\surd$ $\surd$ $\surd$ NLMC $\times$ $\surd$ $\times$ $\times$ $\surd$ MTMC $\times$ $\surd$ $\times$ $\times$ $\surd$ WSTMC $\surd$ $\surd$ $\surd$ $\surd$ $\surd$ 表 4 所提出的WSTMC模型和其他比较模型在C-index指标上的性能比较(标准差)
Table 4 Comparison of the WSTMC and competing models using C-index (standard deviations)
NSBCD Lung DBCD DLBCL MCL COX based COX 0.4411
(0.0589)0.5158
(0.1333)0.5539
(0.1233)0.4553
(0.0718)0.5773
(0.0591)LASSO-COX 0.5910
(0.1086)0.6698
(0.0910)0.6880
(0.0429)0.6344
(0.0421)0.6824
(0.0701)EN-COX 0.6046
(0.1000)0.6652
(0.0702)0.7214
(0.0306)0.6488
(0.0394)0.6734
(0.0733)Cox-${l}_{{2,1}}$ 0.7453
(0.0742)0.7470
(0.0450)0.7548
(0.0640)0.6499
(0.0474)0.7229
(0.0379)Cox-Trace 0.7550
(0.0737)0.7348
(0.0431)0.6946
(0.0576)0.6478
(0.0387)0.7127
(0.0902)Parametric models Logistic 0.3787
(0.0195)0.5714
(0.0596)0.4908
(0.0872)0.4840
(0.0496)0.4827
(0.0682)Weibull 0.3045
(0.1528)0.4287
(0.1023)0.4555
(0.1046)0.2507
(0.0627)0.4735
(0.0747)Log-gaussian 0.4435
(0.0539)0.4122
(0.0754)0.4875
(0.0553)0.3167
(0.0914)0.2564
(0.0715)Log-logistic 0.2378
(0.0500)0.5924
(0.0655)0.5257
(0.0232)0.4246
(0.1243)0.4802
(0.0724)Linear models OLS 0.6333
(0.1108)0.5743
(0.0658)0.5690
(0.0744)0.5024
(0.1023)0.5007
(0.1059)Tobit 0.3733
(0.0214)0.4689
(0.1358)0.4869
(0.0762)0.4969
(0.0527)0.4591
(0.0322)RWRSS 0.6766
(0.1277)0.6969
(0.0430)0.7216
(0.0446)0.6265
(0.0657)0.7118
(0.0737)Multi-task based Multi-LASSO 0.6117
(0.1493)0.4410
(0.1655)0.6256
(0.0749)0.6104
(0.0512)0.6539
(0.0140)Multi-${l}_{{2,1}}$ 0.6100
(0.1700)0.5248
(0.1130)0.6899
(0.0720)0.6115
(0.0512)0.6912
(0.0602)MTLSA.V2 0.6858
(0.0834)0.6769
(0.0271)0.7515
(0.0625)0.6545
(0.0600)0.7079
(0.0963)MTLSA 0.6820
(0.0446)0.6327
(0.0753)0.7581
(0.0304)0.6527
(0.0713)0.7274
(0.1257)NLMC 0.6827
(0.1415)0.6939
(0.1500)0.7563
(0.0565)0.6178
(0.0702)0.7232
(0.1035)MTMC 0.7620
(0.0576)0.6958
(0.0217)0.4292
(0.0660)0.6611
(0.0491)0.7223
(0.0284)WSTMC 0.7970
(0.0135)0.8153
(0.0992)0.7705
(0.0562)0.6810
(0.0571)0.7336
(0.0697)表 5 所提出的WSTMC模型和其他比较模型在Weighted average AUC指标上的性能比较(标准差)
Table 5 Comparison of the WSTMC and competing models using Weighted average AUC (standard deviations)
NSBCD Lung DBCD DLBCL MCL COX based Cox 0.4611
(0.1893)0.5464
(0.1632)0.5334
(0.1620)0.4480
(0.1079)0.4695
(0.1701)LASSO-COX 0.5986
(0.1589)0.7499
(0.1780)0.7068
(0.0292)0.7104
(0.0533)0.7401
(0.0166)EN-COX 0.6479
(0.0970)0.7540
(0.1398)0.7494
(0.0189)0.7260
(0.0618)0.7350
(0.0025)Cox-${l}_{{2,1}}$ 0.7752
(0.0450)0.8079
(0.0462)0.7545
(0.0365)0.7157
(0.0795)0.8215
(0.0737)Cox-Trace 0.6729
(0.0883)0.7074
(0.0455)0.7078
(0.0465)0.6768
(0.0903)0.7197
(0.0209)Parametric models Logistic 0.4597
(0.1742)0.6301
(0.0924)0.4840
(0.1086)0.5011
(0.0489)0.2986
(0.0501)Weibull 0.4575
(0.2622)0.4379
(0.1018)0.4707
(0.0809)0.4320
(0.1080)0.3240
(0.0484)Log-gaussian 0.4992
(0.2378)0.4182
(0.0680)0.4742
(0.0763)0.4270
(0.0977)0.4457
(0.0161)Log-logistic 0.3304
(0.1057)0.5822
(0.1544)0.5302
(0.0298)0.4712
(0.0627)0.2983
(0.0505)Linear models OLS 0.6599
(0.1042)0.5677
(0.1120)0.5998
(0.1096)0.4934
(0.1952)0.5594
(0.1191)Tobit 0.4567
(0.1812)0.4708
(0.1422)0.4668
(0.1021)0.5243
(0.0691)0.5074
(0.0283)RWRSS 0.7016
(0.1369)0.6821
(0.0840)0.6928
(0.0183)0.5622
(0.0127)0.7056
(0.1367)Multi-task based Multi-LASSO 0.6495
(0.1226)0.4410
(0.1655)0.6402
(0.0572)0.5876
(0.1047)0.6079
(0.0696)Multi-${l}_{{2,1}}$ 0.6501
(0.1314)0.5589
(0.1486)0.7125
(0.0775)0.6001
(0.0528)0.6476
(0.0653)MTLSA.V2 0.6822
(0.0576)0.8076
(0.0559)0.7569
(0.0645)0.7405
(0.0719)0.7639
(0.0651)MTLSA 0.7032
(0.0427)0.7169
(0.0964)0.8003
(0.0425)0.7385
(0.0638)0.8095
(0.0367)NLMC 0.5724
(0.0705)0.5842
(0.0994)0.6212
(0.0687)0.6130
(0.0657)0.7175
(0.0664)MTMC 0.8206
(0.0929)0.6035
(0.1422)0.4334
(0.0506)0.6989
(0.0351)0.8255
(0.0729)WSTMC 0.8662
(0.0788)0.8629
(0.0519)0.8007
(0.0549)0.7064
(0.0563)0.8430
(0.0767)表 6 在两种评价指标C-index和Weighted average AUC上的消融性实验性能比较(标准差)
Table 6 Comparison of the ablation experiments using C-index and Weighted average AUC (standard deviations)
NSBCD Lung DBCD DLBCL MCL C-index MTMC 0.7620
(0.0576)0.6958
(0.0217)0.4292
(0.0660)0.6611
(0.0491)0.7223
(0.0284)WSTMC-nM 0.7633
(0.0406)0.7053
(0.1566)0.6847
(0.0454)0.6661
(0.0670)0.7241
(0.1023)WSTMC-nT 0.7642
(0.0650)0.7345
(0.0767)0.7270
(0.0422)0.6659
(0.0529)0.7234
(0.0729)WSTMC-nF 0.7664
(0.0164)0.7293
(0.1086)0.7123
(0.0586)0.6641
(0.0497)0.7273
(0.0934)WSTMC 0.7970
(0.0135)0.8153
(0.0992)0.7705
(0.0562)0.6810
(0.0571)0.7336
(0.0697)Weighted average AUC MTMC 0.8206
(0.0929)0.6035
(0.1422)0.4334
(0.0506)0.6989
(0.0351)0.8255
(0.0729)WSTMC-nM 0.8547
(0.0441)0.6674
(0.0777)0.6353
(0.0467)0.6994
(0.0526)0.8256
(0.1488)WSTMC-nT 0.8557
(0.0447)0.7676
(0.0531)0.6061
(0.0726)0.6998
(0.0535)0.8334
(0.1075)WSTMC-nF 0.8421
(0.0915)0.7420
(0.0433)0.6560
(0.0435)0.7053
(0.0467)0.8268
(0.0230)WSTMC 0.8662
(0.0788)0.8629
(0.0519)0.8007
(0.0549)0.7064
(0.0563)0.8430
(0.0767) -
[1] Li Y, Wang L, Zhou J, Ye J. Multi-task learning based survival analysis for multi-source block-wise missing data. Neurocomputing, 2019, 364: 95-107 doi: 10.1016/j.neucom.2019.07.010 [2] Emmert-Streib F, Dehmer M. Introduction to Survival Analysis in Practice. Machine Learning and Knowledge Extraction, 2019, 1(3): 1013-1038 doi: 10.3390/make1030058 [3] Yang W S, Huang T, Zeng J L, Tang Y, Chen L J, Michra S, Liu Y E. Purchase prediction in free online games via survival analysis. In: Proceedings of the 2019 IEEE International Conference on Big Data. Los Angeles, USA: IEEE, 2019. 4444−4449 [4] Efron B. The efficiency of Cox's likelihood function for censored data. Journal of American Statistical Association, 1977, 72(359): 557-565 doi: 10.1080/01621459.1977.10480613 [5] Crowther M J, Lambert P C. A general framework for parametric survival analysis. Statistics in medicine, 2014, 33(30): 5280-5297 doi: 10.1002/sim.6300 [6] 刘慧婷, 冷新杨, 王利利, 等. 联合嵌入式多标签分类算法. 自动化学报, 2019, 45(10): 1969-1982Liu Hui-Ting, Leng Xin-Yang, Wang Li-Li, Zhao Peng. A joint embedded multi-label classification algorithm. Acta Automatica Sinica, 2019, 45(10): 1969-1982 [7] Li Y, Wang J P, Ye J P, Reddy C K. A multi-task learning formulation for survival analysis. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA: ACM, 2016. 1715−1724 [8] Goldberg A B, Zhu X J, Recht B, Xu J M, Nowak R. Transduction with matrix completion: Three birds with one stone. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems. Vancouver, British Columbia, Canada: MIT Press, 2010. 757−765 [9] Maz Y, Schmidt G. On approximate approximations using Gaussian kernels. IMA Journal of Numerical Analysis, 1996, 16(1): 13–29 doi: 10.1093/imanum/16.1.13 [10] Tibshirani R. The lasso method for variable selection in the Cox model. Statistics in Medicine, 1997, 16(4): 385-395 doi: 10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3 [11] Simon N, Hastie T, Tibshirani R. Regularization paths for Cox’s proportional hazards model via coordinate descent. Journal of Statistical Software, 2011, 39(5): 1-13 [12] Vinzamuri B, Reddy C K. Cox regression with correlation based regularization for electronic health records. In: Proceedings of the 13th IEEE International Conference on Data Mining, Dallas, USA: IEEE, 2013. 757−766 [13] Wang P, Reddy C K. Machine learning for survival analysis: A survey. ACM Computing Surveys, 2019, 51(6): 1-36 [14] Tobin J. Estimation of relationships for limited dependent variables. Econometrica, 1958, 26(1): 24-36 doi: 10.2307/1907382 [15] Buckley J, James I. Linear regression with censored data. Biometrika, 1979, 66(3): 429-436 doi: 10.1093/biomet/66.3.429 [16] Boyd S. Convex Optimization. IEEE Transactions on Automatic Control, 2006, 51(11): 1859-1859 doi: 10.1109/TAC.2006.884922 [17] 陈蕾, 杨庚, 陈正宇, 肖甫, 陈松灿. 基于线性Bregman迭代的结构化噪声矩阵补全算法. 计算机学报, 2015, 38(7): 1357-1371 doi: 10.11897/SP.J.1016.2015.01357Chen Lei, Yang Geng, Chen Zheng-Yu, Xiao Fu, Chen Song-Can. Linearized bregman iteration algorithm for matrix completion with structural noise. Chinese Journal of Computers, 2015, 38(7): 1357-1371 doi: 10.11897/SP.J.1016.2015.01357 [18] 陈蕾, 杨庚, 陈正宇, 肖甫, 许建. 基于结构化噪声矩阵补全的Web服务QoS预测. 通信学报, 2015, 36(6): 49-59Chen Lei, Yang Geng, Chen Zheng-Yu, Xiao Fu, Xu Jian. Web services QoS prediction via matrix completion with structural noise. Journal on Communications, 2015, 36(6): 49-59 [19] 练秋生, 富利鹏, 陈书贞, 等. 基于多尺度残差网络的压缩感知重构算法. 自动化学报, 2019, 45(11): 2082-2091Lian Qiu-Sheng, Fu Li-Peng, Chen Shu-Zhen, Shi Bao-Shun. A compressed sensing algorithm based on multiscale residual reconstruction network. Acta Automatica Sinica, 2019, 45(11): 2082-2091 [20] 王传云, 秦世引. 动态场景红外图像的压缩感知域高斯混合背景建模. 自动化学报, 2018, 44(7): 1212-1226Wang Chuan-Yun, Qin Shi-Yin. Background modeling of infrared image in dynamic scene with Gaussian mixture model in compressed sensing domain. Acta Automatica Sinica, 2018, 44(7): 1212-1226 [21] 刘洲洲, 李士宁, 王皓, 等. 联合弹性碰撞与梯度追踪的WSNs压缩感知重构. 自动化学报, 2020, 46(1): 178-192Liu Zhou-Zhou, Li Shi-Ning, Wang Hao, Zhang Qian-Yun. A compressed sensing reconstruction algorithm based on elastic collision and gradient pursuit strategy for WSNs. Acta Automatica Sinica, 2019, 46(1): 178-192 [22] Emmanuel J. Candes, Recht B. Exact Matrix Completion via Convex Optimization. Foundations of Computational Mathematics, 2009, 9(6): 717-772 doi: 10.1007/s10208-009-9045-5 [23] Fazel M. Matrix rank minimization with applications [Ph.D. dissertation], Stanford University, USA, 2002. [24] Cao X, Xu Z, Meng D. Spectral-spatial hyperspectral image classification via robust low-rank feature extraction and Markov random field. Remote Sensing, 2019, 11(13): 1-18 [25] Han Z, Wang Y, Zhao Q, Meng D, Tang Y. A generalized model for robust tensor factorization with noise modeling by mixture of Gaussians. IEEE transactions on neural networks and learning systems, 2018, 29(11): 5380-5393 doi: 10.1109/TNNLS.2018.2796606 [26] Xu Y, Yin W. A globally convergent algorithm for nonconvex optimization based on block coordinate update. Journal of Scientific Computing, 2017, 72(2): 700-734 doi: 10.1007/s10915-017-0376-0 [27] Zhou J Y, Chen J H, Ye J P. MALSAR: Multi-task learning via structural regularization [Online], available: http://jiayuzhou.github.io/MALSAR/, March 17, 2020 [28] Alameda X, Ricci E, Yan Y, Sebe N. Recognizing emotions from abstract paintings using non-linear matrix completion. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 5240−5248 [29] Sørlie T, Tibshirani R, Parker J. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proceedings of the National Academy of Sciences of the United States of America, 2003. 100(14): 8418−8423 [30] Li Y, Xu K S, Reddy C K. Regularized parametric regression for high-dimensional survival analysis. In: Proceedings of the 2016 SIAM International Conference on Data Mining. Miami, USA: SIAM, 2016. 765−773 [31] Beer D G, Kardia S, Huang C C. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature Medicine, 2002, 8(8): 816-824 doi: 10.1038/nm733 [32] Van Houwelingen H C, Bruinsma T, Wessels L F. Cross-validated Cox regression on microarray gene expression data. Statistics in Medicine, 2006, 25(18): 3201-3216 doi: 10.1002/sim.2353 [33] Rosenwald A, Wright G, Wiestner A. The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma. Cancer Cell, 2003, 3(2): 185-197 doi: 10.1016/S1535-6108(03)00028-X [34] Wang D, Wang C, Xiao J, Xiao Z, Chen W, Havyarimana V. Bayesian optimization of support vector machine for regression prediction of short-term traffic flow. Intelligent Data Analysis, 2019, 23(2): 481-497 doi: 10.3233/IDA-183832 [35] Thung K H, Yap P T, Shen D. Conversion and time-to-conversion predictions of mild cognitive impairment using low-rank affinity pursuit denoising and matrix completion. Medical image analysis, 2018, 45(2): 68-82 [36] 崔佳旭, 杨博. 贝叶斯优化方法和应用综述. 软件学报, 2018, 29(10): 3068-3090Cui Jia-Xu, Yang Bo. Survey on Bayesian optimization methodology and applications. Ruan Jian Xue Bao/Journal of Software, 2018, 29(10): 3068-3090 [37] Wang L, Li Y, Zhou J Y, Zhu D X, Ye J P. Multi-task survival analysis. In: Proceedings of the 2017 IEEE International Conference on Data Mining (ICDM). New Orleans, USA: IEEE, 2017. 485−494 [38] Faraway J J. Practical regression and ANOVA using R [Online], available: https://people.bath.ac.uk/jjf23/book/pra.pdf, March 17, 2020