多层异构生物网络候选疾病基因识别

丁苍峰; 王君; 张紫芸

doi:10.16383/j.aas.c210577

多层异构生物网络候选疾病基因识别

doi: 10.16383/j.aas.c210577 cstr: 32138.14.j.aas.c210577

1.
延安大学数学与计算机科学学院延安 716000

基金项目: 国家自然科学基金(62262067, 62041212, 61866038, 61763046, 61962059), 陕西省自然科学基础研究计划(2020JM-548, 2020JM-547), 延安大学基金(YDZ2019-04, YDBK2018-35)资助

详细信息

作者简介:
丁苍峰：延安大学数学与计算机科学学院副教授. 2018年获北京理工大学博士学位. 主要研究方向为多层复杂网络, 图神经网络和自然语言处理. 本文通信作者. E-mail: dcf@yau.edu.cn

王君：延安大学数学与计算机科学学院硕士研究生. 主要研究方向为知识图谱及其应用. E-mail: wangjun03006@163.com

张紫芸：延安大学数学与计算机科学学院硕士研究生. 主要研究方向为文本摘要及其应用. E-mail: zhangziyun1202@163.com

计量
- 文章访问数: 878
- HTML全文浏览量: 545
- PDF下载量: 143
- 被引次数: 0
出版历程
- 收稿日期: 2021-06-25
- 录用日期: 2022-02-10
- 网络出版日期: 2022-05-09
- 刊出日期: 2024-06-27

Identifying Candidate Disease Genes in Multilayer Heterogeneous Biological Networks

1.
College of Mathematics and Computer Science, Yan＇an University, Yan＇an 716000

Funds: Supported by National Natural Science Foundation of China (62262067, 62041212, 61866038, 61763046, 61962059), Natural Science Basic Research Program of Shaanxi (2020JM-548, 2020JM-547), and Yan＇an University Foundation Program (YDZ2019-04, YDBK2018-35)

More Information

Author Bio:
DING Cang-Feng　Associate professor at the College of Mathematics and Computer Science, Yan＇an University. He received his Ph.D. degree from Beijing Institute of Technology in 2018. His research interest covers multilayer complex network, graph neural network, and natural language processing. Corresponding author of this paper

WANG Jun　Master student at the College of Mathematics and Computer Science, Yan＇an University. His research interest covers knowledge graph and its applications

ZHANG Zi-Yun　Master student at the College of Mathematics and Computer Science, Yan＇an University. Her research interest covers text summarization and its applications

摘要

摘要: 现有大多数用于识别候选疾病基因的随机游走方法通常优先访问高度连接的基因, 而可能与已知疾病有关的不知名或连接性差的基因易被忽略或难以识别. 此外, 这些方法仅访问单个基因网络或各种基因数据的聚合网络, 导致偏差和不完整性. 因此, 设计一种能控制随机游走运动方向和整合多种数据源的候选疾病基因识别方法将是一个迫切需要解决的问题. 为此, 首先构建多层网络和多层异构基因网络. 然后, 提出一种游走于多层网络和多层异构网络的拓扑偏置重启随机游走(Biased random walk with restart, BRWR)算法来识别疾病基因. 实验结果表明, 游走于不同类型网络上的识别候选疾病基因的BRWR算法优于现有的算法. 最后, 应用于多层异构网络上的BRWR算法能预测未诊断的新生儿类早衰综合征中涉及的疾病基因.
- 多层异构网络 /
- 生物网络 /
- 偏置随机游走 /
- 候选基因识别
Abstract: Most of existing random walk methods to identify candidate disease genes preferentially visit highly-connected genes, while unwell-known or poorly-connected genes probably relevant to known diseases are more easily ignored or complicated to identify. Moreover, these methods access only a single gene network or an aggregated network of various gene data, leading to bias and incompleteness. Therefore, it is a pressing challenge for controlling the motion direction of random walk and for integrating multiple data sources involving different information for disease-gene identification. To this end, we first construct a multilayer network and multilayer heterogeneous genetic network. Then, we propose a topologically biased random walk with restart (BRWR) algorithm applicable to multilayer and multilayer heterogeneous networks for the identification of candidate disease genes. Experimental results show that the BRWR algorithm to identify candidate disease genes outperforms the state-of-the-art ones on different types of networks. Finally, the BRWR algorithm on multilayer heterogeneous networks is used to predict disease genes implicated in the undiagnosed neonatal progeroid syndrome.
- Multilayer heterogeneous network /
- biological network /
- biased random walk /
- candidate gene identification
注释:

1) 2¹ http://www.proteinatlas.org² http://www.biocarta.com

2) 1http://www.biocarta.com

3) 3³ http://human-phenotype-ontology.github.io/⁴ http://www.omim.org/

4) 4http://www.omim.org/

5) 5⁵ https://www.ncbi.nlm.nih.gov/geo/

HTML全文

图 1 多层网络、异构网络、多层异构网络以及探索它们的随机游走路径(箭头的实线)的示意图

Fig. 1 Schematic of multilayer, heterogeneous and multilayer heterogeneous networks, together with paths of random walks (arrow solid lines)

下载: 全尺寸图片幻灯片

图 2 非异构基因网络上不同方法的ROC曲线及其对应的AUC值

Fig. 2 ROC curves and AUC values of different algorithms on the non-heterogeneous gene networks

下载: 全尺寸图片幻灯片

图 3 异构基因网络上不同方法的ROC曲线及其对应的AUC值

Fig. 3 ROC curves and AUC values of different algorithms on the heterogeneous gene networks

下载: 全尺寸图片幻灯片

图 4 排名随偏置参数$ b $变化的累积分布

Fig. 4 The cumulative distributions of the ranking with change of the biased parameter $ b $

下载: 全尺寸图片幻灯片

图 5 排名随参数变化的累积分布

Fig. 5 The cumulative distributions of the ranking with change of the parameters

下载: 全尺寸图片幻灯片

图 6 所有偏置参数为5时的网络表示

Fig. 6 Network representation when all the biased parameters are 5

下载: 全尺寸图片幻灯片

图 7 所有偏置参数为 −5时的网络表示

Fig. 7 Network representation when all the biased parameters are −5

下载: 全尺寸图片幻灯片

图 8 所有偏置参数为 −1时的网络表示

Fig. 8 Network representation when all the biased parameters are −1

下载: 全尺寸图片幻灯片

图 9 所有偏置参数为0时的网络表示

Fig. 9 Network representation when all the biased parameters are 0

下载: 全尺寸图片幻灯片

图 10 所有偏置参数为1时的网络表示

Fig. 10 Network representation when all the biased parameters are 1

下载: 全尺寸图片幻灯片

表 1 表型、基因和聚合网络的统计属性

Table 1 Statistical properties of phenotype, gene and aggregated networks

网络	节点数	边数	平均度
COEX	10 415	998 712	47.44
PPI	12 893	70 141	7.73
PATH	10 966	274 051	13.47
聚合网络	17 611	1 342 703	25.79
表型网络	7 324	29 853	4.38

下载: 导出CSV

表 2 不同的非异构网络上的不同方法的AUC值(%)

Table 2 AUC values of different algorithms on different non-heterogeneous networks (%)

	PPI	COEX	PATH	Aggregated	Multilayer
RWR	73.35	72.84	74.43	76.53	77.98
ProDige	79.12	73.63	80.29	83.27	84.12
NDOS	78.27	74.78	79.86	84.49	87.95
DRS	78.93	74.94	80.87	84.78	88.45
BRIDGE	79.91	74.26	81.51	85.13	89.33
BRWR	81.15	75.20	84.18	86.73	90.17

下载: 导出CSV

表 3 不同异构网络上的不同方法的AUC值(%)

Table 3 AUC values of different algorithms on different heterogeneous networks (%)

	PPIH	COEXH	PATHH	AggregatedH	MultilayerH
CIPHER	74.52	73.51	78.30	77.89	78.31
RWRH	80.37	75.34	79.47	83.67	86.53
MAXIF	80.91	76.56	80.15	84.02	88.43
LapRWRH	81.91	77.80	80.90	84.93	88.78
NRWRH	81.36	78.38	82.70	86.56	89.36
IDLP	82.08	79.25	83.37	87.79	90.16
BRWRH	82.36	80.91	85.17	89.65	91.09

下载: 导出CSV

参考文献(59)

[1]	Guala D, Sonnhammer E L L. A large-scale benchmark of gene prioritization methods. Scientific Reports, 2017, 7: Article No. 46598 doi: 10.1038/srep46598
[2]	Schwikowski B, Uetz P, Fields S. A network of protein-protein interactions in yeast. Nature Biotechnology, 2000, 18(12): 1257−1261 doi: 10.1038/82360
[3]	Sharma V, Ranjan T, Kumar P, Pal A K, Jha V K, Sahni S, et al. Protein-protein interaction detection: Methods and analysis. Plant Biotechnology. New York: Apple Academic Press, 2018. 391−411
[4]	Chen Y, Jiang T, Jiang R. Uncover disease genes by maximizing information flow in the phenome-interactome network. Bioinformatics, 2011, 27(13): i167−i176 doi: 10.1093/bioinformatics/btr213
[5]	Zhang Y G, Wang Y, Liu J H, Liu X H, Hong Y X, Fan X, et al. IDLP: A novel label propagation framework for disease gene prioritization. In: Proceedings of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). Melbourne, Australia: Springer, 2018. 261−272
[6]	Chen Y, Wu X B, Jiang R. Integrating human omics data to prioritize candidate genes. BMC Medical Genomics, 2013, 6: Article No. 57 doi: 10.1186/1755-8794-6-57
[7]	Lee J H, Zhao X M, Yoon I, Lee J Y, Kwon N H, Wang Y Y, et al. Integrative analysis of mutational and transcriptional profiles reveals driver mutations of metastatic breast cancers. Cell Discovery, 2016, 2: Article No. 16025
[8]	Yang K, Zhao X Z, Waxman D, Zhao X M. Predicting drug-disease associations with heterogeneous network embedding. Chaos, 2019, 29(12): Article No. 123109 doi: 10.1063/1.5121900
[9]	Yang A Y, Chen J Q, Zhao X M. nMAGMA: A network-enhanced method for inferring risk genes from GWAS summary statistics and its application to schizophrenia. Briefings in Bioinformatics, 2021, 22(4): Article No. bbaa298 doi: 10.1093/bib/bbaa298
[10]	Köhler S, Bauer S, Horn D, Robinson P N. Walking the interactome for prioritization of candidate disease genes. The American Journal of Human Genetics, 2008, 82(4): 949−958 doi: 10.1016/j.ajhg.2008.02.013
[11]	Li Y J, Patra J C. Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network. Bioinformatics, 2010, 26(9): 1219−1224 doi: 10.1093/bioinformatics/btq108
[12]	Li Y J, Patra J C. Integration of multiple data sources to prioritize candidate genes using discounted rating system. BMC Bioinformatics, 2010, 11(S1): Article No. S20 doi: 10.1186/1471-2105-11-S1-S20
[13]	Li Y J, Li J Y. Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data. BMC Genomics, 2012, 13(S7): Article No. S27
[14]	Xie M Q, Xu Y J, Zhang Y G, Hwang T, Kuang R. Network-based phenome-genome association prediction by bi-random walk. PLoS One, 2015, 10(5): Article No. e0125138 doi: 10.1371/journal.pone.0125138
[15]	Zhao Z Q, Han G S, Yu Z G, Li J Y. Laplacian normalization and random walk on heterogeneous networks for disease-gene prioritization. Computational Biology and Chemistry, 2015, 57: 21−28 doi: 10.1016/j.compbiolchem.2015.02.008
[16]	Valdeolivas A, Tichit L, Navarro C, Perrin S, Odelin G, Levy N, et al. Random walk with restart on multiplex and heterogeneous biological networks. Bioinformatics, 2019, 35(3): 497−505 doi: 10.1093/bioinformatics/bty637
[17]	Doncheva N T, Kacprowski T, Albrecht M. Recent approaches to the prioritization of candidate disease genes. WIREs Systems Biology and Medicine, 2012, 4(5): 429−442 doi: 10.1002/wsbm.1177
[18]	Yang K, Lu K Z, Wu Y, Yu J, Liu B Y, Zhao Y, et al. A network-based machine-learning framework to identify both functional modules and disease genes. Human Genetics, 2021, 140(6): 897−913 doi: 10.1007/s00439-020-02253-0
[19]	Bonaventura M, Nicosia V, Latora V. Characteristic times of biased random walks on complex networks. Physical Review E, 2014, 89(1): Article No. 012803
[20]	Ding C F, Li K. Centrality ranking in multiplex networks using topologically biased random walks. Neurocomputing, 2018, 312: 263−275 doi: 10.1016/j.neucom.2018.05.109
[21]	Pio-Lopez L, Valdeolivas A, Tichit L, Remy É, Baudot A. MultiVERSE: A multiplex and multiplex-heterogeneous network embedding approach. Scientific Reports, 2021, 11(1): Article No. 8794 doi: 10.1038/s41598-021-87987-1
[22]	Peng J, Zhou Y Y, Wang K. Multiplex gene and phenotype network to characterize shared genetic pathways of epilepsy and autism. Scientific Reports, 2021, 11(1): Article No. 952 doi: 10.1038/s41598-020-78654-y
[23]	Novoa-del-Toro E M, Mezura-Montes E, Vignes M, Térézol M, Magdinier F, Tichit L, et al. A multi-objective genetic algorithm to find active modules in multiplex biological networks. PLoS Computational Biology, 2021, 17(8): Article No. e1009263 doi: 10.1371/journal.pcbi.1009263
[24]	Zhao B H, Hu S, Liu X E, Xiong H J, Han X, Zhang Z H, et al. A novel computational approach for identifying essential proteins from multiplex biological networks. Frontiers in Genetics, 2020, 11: Article No. 343 doi: 10.3389/fgene.2020.00343
[25]	Dursun C, Smith J R, Hayman G T, Kwitek A E, Bozdag S. NECo: A node embedding algorithm for multiplex heterogeneous networks. In: Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Seoul, South Korea: IEEE, 2020. 146−149
[26]	Bentley B, Branicky R, Barnes C L, Chew Y L, Yemini E, Bullmore E T, et al. The multilayer connectome of Caenorhabditis elegans. PLoS Computational Biology, 2016, 12(12): Article No. e1005283 doi: 10.1371/journal.pcbi.1005283
[27]	Shi C, Li Y T, Zhang J W, Sun Y Z, Yu P S. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(1): 17−37 doi: 10.1109/TKDE.2016.2598561
[28]	Gómez-Gardeñes J, Latora V. Entropy rate of diffusion processes on complex networks. Physical Review E, 2008, 78(6): Article No. 065102
[29]	Brin S, Page L. Reprint of: The anatomy of a large-scale hypertextual web search engine. Computer Networks, 2012, 56(18): 3825−3833 doi: 10.1016/j.comnet.2012.10.007
[30]	Pan J Y, Yang H J, Faloutsos C, Duygulu P. Automatic multimedia cross-modal correlation discovery. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Seattle, USA: ACM, 2004. 653−658
[31]	Smedley D, Köhler S, Czeschik J C, Amberger J, Bocchini C, Hamosh A, et al. Walking the interactome for candidate prioritization in exome sequencing studies of Mendelian diseases. Bioinformatics, 2014, 30(22): 3215−3222 doi: 10.1093/bioinformatics/btu508
[32]	Kivelä M, Arenas A, Barthelemy M, Gleeson J P, Moreno Y, Porter M A. Multilayer networks. Journal of Complex Networks, 2014, 2(3): 203−271 doi: 10.1093/comnet/cnu016
[33]	Rolland T, Taşan M, Charloteaux B, Pevzner S, Zhong Q, Sahni N, et al. A proteome-scale map of the human interactome network. Cell, 2014, 159(5): 1212−1226 doi: 10.1016/j.cell.2014.10.050
[34]	Del-Toro N, Dumousseau M, Orchard S, Jimenez R C, Galeota E, Launay G, et al. A new reference implementation of the PSICQUIC web service. Nucleic Acids Research, 2013, 41(W1): W601−W606 doi: 10.1093/nar/gkt392
[35]	Kanehisa M, Sato Y, Kawashima M. KEGG mapping tools for uncovering hidden features in biological data. Protein Science, 2022, 31(1): 47−53 doi: 10.1002/pro.4172
[36]	Gillespie M, Jassal B, Stephan R, Milacic M, Rothfels K, Senff-Ribeiro A, et al. The reactome pathway knowledgebase 2022. Nucleic Acids Research, 2022, 50(D1): D687−D692 doi: 10.1093/nar/gkab1028
[37]	Mi H Y, Muruganujan A, Casagrande J T, Thomas P D. Large-scale gene function analysis with the PANTHER classification system. Nature Protocols, 2013, 8(8): 1551−1566 doi: 10.1038/nprot.2013.092
[38]	Schaefer C F, Anthony K, Krupa S, Buchoff J, Day M, Hannay T, et al. PID: The pathway interaction database. Nucleic Acids Research, 2009, 37(S1): D674−D679
[39]	Köhler S, Gargano M, Matentzoglu N, Carmody L C, Lewis-Smith D, Vasilevsky N A, et al. The human phenotype ontology in 2021. Nucleic Acids Research, 2021, 49(D1): D1207−D1217 doi: 10.1093/nar/gkaa1043
[40]	Greene D, BioResource N I H R, Richardson S, Turro E. Phenotype similarity regression for identifying the genetic determinants of rare diseases. The American Journal of Human Genetics, 2016, 98(3): 490−499 doi: 10.1016/j.ajhg.2016.01.008
[41]	Aerts S, Lambrechts D, Maity S, Van Loo P, Coessens B, De Smet F, et al. Gene prioritization through genomic data fusion. Nature Biotechnology, 2006, 24(5): 537−544 doi: 10.1038/nbt1203
[42]	Piñero J, Bravo À, Queralt-Rosinach N, Gutiérrez-Sacristán A, Deu-Pons J, Centeno E, et al. DisGeNET: A comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Research, 2017, 45(D1): D833−D839 doi: 10.1093/nar/gkw943
[43]	Hanley J A, McNeil B J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 1982, 143(1): 29−36 doi: 10.1148/radiology.143.1.7063747
[44]	Mordelet F, Vert J P. ProDiGe: Prioritization of disease genes with multitask machine learning from positive and unlabeled examples. BMC Bioinformatics, 2011, 12: Article No. 389 doi: 10.1186/1471-2105-12-389
[45]	Wu X B, Jiang R, Zhang M Q, Li S. Network-based global inference of human disease genes. Molecular Systems Biology, 2008, 4: Article No. 189 doi: 10.1038/msb.2008.27
[46]	Chen X, Liu M X, Yan G Y. Drug-target interaction prediction by random walk on the heterogeneous network. Molecular BioSystems, 2012, 8(7): 1970−1978 doi: 10.1039/c2mb00002d
[47]	Blatti C, Sinha S. Characterizing gene sets using discriminative random walks with restart on heterogeneous biological networks. Bioinformatics, 2016, 32(14): 2167−2175 doi: 10.1093/bioinformatics/btw151
[48]	De Domenico M, Solé-Ribalta A, Gómez S, Arenas A. Navigability of interconnected networks under random failures. Proceedings of the National Academy of Sciences of the United States of America, 2014, 111(23): 8351−8356
[49]	Pivnick E K, Angle B, Kaufman R A, Hall B D, Pitukcheewanont P, Hersh J H, et al. Neonatal progeroid (Wiedemann-Rautenstrauch) syndrome: Report of five new cases and review. American Journal of Medical Genetics, 2000, 90(2): 131−140 doi: 10.1002/(SICI)1096-8628(20000117)90:2<131::AID-AJMG9>3.0.CO;2-E
[50]	Kiraz A, Ozen S, Tubas F, Usta Y, Aldemir O, Alanay Y. Wiedemann-Rautenstrauch syndrome: Report of a variant case. American Journal of Medical Genetics Part A, 2012, 158A(6): 1434−1436 doi: 10.1002/ajmg.a.35336
[51]	Kohl M, Wiese S, Warscheid B. Cytoscape: Software for visualization and analysis of biological networks. Data Mining in Proteomics: From Standards to Applications. Totowa: Humana, 2011. 291−303
[52]	Becerra C H, Contreras-García G A, Perez Vera L A, Díaz-Martínez L A, Beltran Avendaño M A, Salazar Martínez H A. Wiedemann-Rautenstrauch syndrome prenatal diagnosis. Journal of Perinatology, 2014, 34(12): 954−956 doi: 10.1038/jp.2014.156
[53]	Paolacci S, Bertola D, Franco J, Mohammed S, Tartaglia M, Wollnik B, et al. Wiedemann-Rautenstrauch syndrome: A phenotype analysis. American Journal of Medical Genetics Part A, 2017, 173(7): 1763−1772 doi: 10.1002/ajmg.a.38246
[54]	Navarro C L, Esteves-Vieira V, Courrier S, Boyer A, Duong Nguyen T, Huong L T T, et al. New ZMPSTE24 (FACE1) mutations in patients affected with restrictive dermopathy or related progeroid syndromes and mutation update. European Journal of Human Genetics, 2014, 22(8): 1002−1011 doi: 10.1038/ejhg.2013.258
[55]	Beauregard-Lacroix E, Salian S, Kim H, Ehresmann S, D'Amours G, Gauthier J, et al. A variant of neonatal progeroid syndrome, or Wiedemann-Rautenstrauch syndrome, is associated with a nonsense variant in POLR3GL. European Journal of Human Genetics, 2020, 28(4): 461−468 doi: 10.1038/s41431-019-0539-6
[56]	Arboleda G, Ramírez N, Arboleda H. The neonatal progeroid syndrome (Wiedemann-Rautenstrauch): A model for the study of human aging? Experimental Gerontology, 2007, 42(10): 939−943 doi: 10.1016/j.exger.2007.07.004
[57]	Zhao X M, Liu K Q, Zhu G H, He F, Duval B, Richer J M, et al. Identifying cancer-related microRNAs based on gene expression data. Bioinformatics, 2015, 31(8): 1226−1234 doi: 10.1093/bioinformatics/btu811
[58]	He F, Zhu G H, Wang Y Y, Zhao X M, Huang D S. PCID: A novel approach for predicting disease comorbidity by integrating multi-scale data. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2017, 14(3): 678−686 doi: 10.1109/TCBB.2016.2550443
[59]	Dong G Y, Feng J F, Sun F Z, Chen J Q, Zhao X M. A global overview of genetically interpretable multimorbidities among common diseases in the UK Biobank. Genome Medicine, 2021, 13(1): Article No. 110 doi: 10.1186/s13073-021-00927-6