2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

多层异构生物网络候选疾病基因识别

丁苍峰 王君 张紫芸

丁苍峰, 王君, 张紫芸. 多层异构生物网络候选疾病基因识别. 自动化学报, 2024, 50(6): 1246−1260 doi: 10.16383/j.aas.c210577
引用本文: 丁苍峰, 王君, 张紫芸. 多层异构生物网络候选疾病基因识别. 自动化学报, 2024, 50(6): 1246−1260 doi: 10.16383/j.aas.c210577
Ding Cang-Feng, Wang Jun, Zhang Zi-Yun. Identifying candidate disease genes in multilayer heterogeneous biological networks. Acta Automatica Sinica, 2024, 50(6): 1246−1260 doi: 10.16383/j.aas.c210577
Citation: Ding Cang-Feng, Wang Jun, Zhang Zi-Yun. Identifying candidate disease genes in multilayer heterogeneous biological networks. Acta Automatica Sinica, 2024, 50(6): 1246−1260 doi: 10.16383/j.aas.c210577

多层异构生物网络候选疾病基因识别

doi: 10.16383/j.aas.c210577
基金项目: 国家自然科学基金(62262067, 62041212, 61866038, 61763046, 61962059), 陕西省自然科学基础研究计划(2020JM-548, 2020JM-547), 延安大学基金(YDZ2019-04, YDBK2018-35)资助
详细信息
    作者简介:

    丁苍峰:延安大学数学与计算机科学学院副教授. 2018年获北京理工大学博士学位. 主要研究方向为多层复杂网络, 图神经网络和自然语言处理. 本文通信作者. E-mail: dcf@yau.edu.cn

    王君:延安大学数学与计算机科学学院硕士研究生. 主要研究方向为知识图谱及其应用. E-mail: wangjun03006@163.com

    张紫芸:延安大学数学与计算机科学学院硕士研究生. 主要研究方向为文本摘要及其应用. E-mail: zhangziyun1202@163.com

Identifying Candidate Disease Genes in Multilayer Heterogeneous Biological Networks

Funds: Supported by National Natural Science Foundation of China (62262067, 62041212, 61866038, 61763046, 61962059), Natural Science Basic Research Program of Shaanxi (2020JM-548, 2020JM-547), and Yan'an University Foundation Program (YDZ2019-04, YDBK2018-35)
More Information
    Author Bio:

    DING Cang-Feng Associate professor at the College of Mathematics and Computer Science, Yan'an University. He received his Ph.D. degree from Beijing Institute of Technology in 2018. His research interest covers multilayer complex network, graph neural network, and natural language processing. Corresponding author of this paper

    WANG Jun Master student at the College of Mathematics and Computer Science, Yan'an University. His research interest covers knowledge graph and its applications

    ZHANG Zi-Yun Master student at the College of Mathematics and Computer Science, Yan'an University. Her research interest covers text summarization and its applications

  • 摘要: 现有大多数用于识别候选疾病基因的随机游走方法通常优先访问高度连接的基因, 而可能与已知疾病有关的不知名或连接性差的基因易被忽略或难以识别. 此外, 这些方法仅访问单个基因网络或各种基因数据的聚合网络, 导致偏差和不完整性. 因此, 设计一种能控制随机游走运动方向和整合多种数据源的候选疾病基因识别方法将是一个迫切需要解决的问题. 为此, 首先构建多层网络和多层异构基因网络. 然后, 提出一种游走于多层网络和多层异构网络的拓扑偏置重启随机游走(Biased random walk with restart, BRWR)算法来识别疾病基因. 实验结果表明, 游走于不同类型网络上的识别候选疾病基因的BRWR算法优于现有的算法. 最后, 应用于多层异构网络上的BRWR算法能预测未诊断的新生儿类早衰综合征中涉及的疾病基因.
    1)  21 http://www.proteinatlas.org2 http://www.biocarta.com
    2)  1http://www.biocarta.com
    3)  33 http://human-phenotype-ontology.github.io/4 http://www.omim.org/
    4)  4http://www.omim.org/
    5)  55 https://www.ncbi.nlm.nih.gov/geo/
  • 图  1  多层网络、异构网络、多层异构网络以及探索它们的随机游走路径(箭头的实线)的示意图

    Fig.  1  Schematic of multilayer, heterogeneous and multilayer heterogeneous networks, together with paths of random walks (arrow solid lines)

    图  2  非异构基因网络上不同方法的ROC曲线及其对应的AUC值

    Fig.  2  ROC curves and AUC values of different algorithms on the non-heterogeneous gene networks

    图  3  异构基因网络上不同方法的ROC曲线及其对应的AUC值

    Fig.  3  ROC curves and AUC values of different algorithms on the heterogeneous gene networks

    图  4  排名随偏置参数$ b $变化的累积分布

    Fig.  4  The cumulative distributions of the ranking with change of the biased parameter $ b $

    图  5  排名随参数变化的累积分布

    Fig.  5  The cumulative distributions of the ranking with change of the parameters

    图  6  所有偏置参数为5时的网络表示

    Fig.  6  Network representation when all the biased parameters are 5

    图  7  所有偏置参数为 −5时的网络表示

    Fig.  7  Network representation when all the biased parameters are −5

    图  8  所有偏置参数为 −1时的网络表示

    Fig.  8  Network representation when all the biased parameters are −1

    图  9  所有偏置参数为0时的网络表示

    Fig.  9  Network representation when all the biased parameters are 0

    图  10  所有偏置参数为1时的网络表示

    Fig.  10  Network representation when all the biased parameters are 1

    表  1  表型、基因和聚合网络的统计属性

    Table  1  Statistical properties of phenotype, gene and aggregated networks

    网络节点数边数平均度
    COEX10 415998 71247.44
    PPI12 89370 1417.73
    PATH10 966274 05113.47
    聚合网络17 6111 342 70325.79
    表型网络7 32429 8534.38
    下载: 导出CSV

    表  2  不同的非异构网络上的不同方法的AUC值(%)

    Table  2  AUC values of different algorithms on different non-heterogeneous networks (%)

    PPI COEX PATH Aggregated Multilayer
    RWR 73.35 72.84 74.43 76.53 77.98
    ProDige 79.12 73.63 80.29 83.27 84.12
    NDOS 78.27 74.78 79.86 84.49 87.95
    DRS 78.93 74.94 80.87 84.78 88.45
    BRIDGE 79.91 74.26 81.51 85.13 89.33
    BRWR 81.15 75.20 84.18 86.73 90.17
    下载: 导出CSV

    表  3  不同异构网络上的不同方法的AUC值(%)

    Table  3  AUC values of different algorithms on different heterogeneous networks (%)

    PPIH COEXH PATHH AggregatedH MultilayerH
    CIPHER 74.52 73.51 78.30 77.89 78.31
    RWRH 80.37 75.34 79.47 83.67 86.53
    MAXIF 80.91 76.56 80.15 84.02 88.43
    LapRWRH 81.91 77.80 80.90 84.93 88.78
    NRWRH 81.36 78.38 82.70 86.56 89.36
    IDLP 82.08 79.25 83.37 87.79 90.16
    BRWRH 82.36 80.91 85.17 89.65 91.09
    下载: 导出CSV
  • [1] Guala D, Sonnhammer E L L. A large-scale benchmark of gene prioritization methods. Scientific Reports, 2017, 7: Article No. 46598 doi: 10.1038/srep46598
    [2] Schwikowski B, Uetz P, Fields S. A network of protein-protein interactions in yeast. Nature Biotechnology, 2000, 18(12): 1257−1261 doi: 10.1038/82360
    [3] Sharma V, Ranjan T, Kumar P, Pal A K, Jha V K, Sahni S, et al. Protein-protein interaction detection: Methods and analysis. Plant Biotechnology. New York: Apple Academic Press, 2018. 391−411
    [4] Chen Y, Jiang T, Jiang R. Uncover disease genes by maximizing information flow in the phenome-interactome network. Bioinformatics, 2011, 27(13): i167−i176 doi: 10.1093/bioinformatics/btr213
    [5] Zhang Y G, Wang Y, Liu J H, Liu X H, Hong Y X, Fan X, et al. IDLP: A novel label propagation framework for disease gene prioritization. In: Proceedings of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). Melbourne, Australia: Springer, 2018. 261−272
    [6] Chen Y, Wu X B, Jiang R. Integrating human omics data to prioritize candidate genes. BMC Medical Genomics, 2013, 6: Article No. 57 doi: 10.1186/1755-8794-6-57
    [7] Lee J H, Zhao X M, Yoon I, Lee J Y, Kwon N H, Wang Y Y, et al. Integrative analysis of mutational and transcriptional profiles reveals driver mutations of metastatic breast cancers. Cell Discovery, 2016, 2: Article No. 16025
    [8] Yang K, Zhao X Z, Waxman D, Zhao X M. Predicting drug-disease associations with heterogeneous network embedding. Chaos, 2019, 29(12): Article No. 123109 doi: 10.1063/1.5121900
    [9] Yang A Y, Chen J Q, Zhao X M. nMAGMA: A network-enhanced method for inferring risk genes from GWAS summary statistics and its application to schizophrenia. Briefings in Bioinformatics, 2021, 22(4): Article No. bbaa298 doi: 10.1093/bib/bbaa298
    [10] Köhler S, Bauer S, Horn D, Robinson P N. Walking the interactome for prioritization of candidate disease genes. The American Journal of Human Genetics, 2008, 82(4): 949−958 doi: 10.1016/j.ajhg.2008.02.013
    [11] Li Y J, Patra J C. Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network. Bioinformatics, 2010, 26(9): 1219−1224 doi: 10.1093/bioinformatics/btq108
    [12] Li Y J, Patra J C. Integration of multiple data sources to prioritize candidate genes using discounted rating system. BMC Bioinformatics, 2010, 11(S1): Article No. S20 doi: 10.1186/1471-2105-11-S1-S20
    [13] Li Y J, Li J Y. Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data. BMC Genomics, 2012, 13(S7): Article No. S27
    [14] Xie M Q, Xu Y J, Zhang Y G, Hwang T, Kuang R. Network-based phenome-genome association prediction by bi-random walk. PLoS One, 2015, 10(5): Article No. e0125138 doi: 10.1371/journal.pone.0125138
    [15] Zhao Z Q, Han G S, Yu Z G, Li J Y. Laplacian normalization and random walk on heterogeneous networks for disease-gene prioritization. Computational Biology and Chemistry, 2015, 57: 21−28 doi: 10.1016/j.compbiolchem.2015.02.008
    [16] Valdeolivas A, Tichit L, Navarro C, Perrin S, Odelin G, Levy N, et al. Random walk with restart on multiplex and heterogeneous biological networks. Bioinformatics, 2019, 35(3): 497−505 doi: 10.1093/bioinformatics/bty637
    [17] Doncheva N T, Kacprowski T, Albrecht M. Recent approaches to the prioritization of candidate disease genes. WIREs Systems Biology and Medicine, 2012, 4(5): 429−442 doi: 10.1002/wsbm.1177
    [18] Yang K, Lu K Z, Wu Y, Yu J, Liu B Y, Zhao Y, et al. A network-based machine-learning framework to identify both functional modules and disease genes. Human Genetics, 2021, 140(6): 897−913 doi: 10.1007/s00439-020-02253-0
    [19] Bonaventura M, Nicosia V, Latora V. Characteristic times of biased random walks on complex networks. Physical Review E, 2014, 89(1): Article No. 012803
    [20] Ding C F, Li K. Centrality ranking in multiplex networks using topologically biased random walks. Neurocomputing, 2018, 312: 263−275 doi: 10.1016/j.neucom.2018.05.109
    [21] Pio-Lopez L, Valdeolivas A, Tichit L, Remy É, Baudot A. MultiVERSE: A multiplex and multiplex-heterogeneous network embedding approach. Scientific Reports, 2021, 11(1): Article No. 8794 doi: 10.1038/s41598-021-87987-1
    [22] Peng J, Zhou Y Y, Wang K. Multiplex gene and phenotype network to characterize shared genetic pathways of epilepsy and autism. Scientific Reports, 2021, 11(1): Article No. 952 doi: 10.1038/s41598-020-78654-y
    [23] Novoa-del-Toro E M, Mezura-Montes E, Vignes M, Térézol M, Magdinier F, Tichit L, et al. A multi-objective genetic algorithm to find active modules in multiplex biological networks. PLoS Computational Biology, 2021, 17(8): Article No. e1009263 doi: 10.1371/journal.pcbi.1009263
    [24] Zhao B H, Hu S, Liu X E, Xiong H J, Han X, Zhang Z H, et al. A novel computational approach for identifying essential proteins from multiplex biological networks. Frontiers in Genetics, 2020, 11: Article No. 343 doi: 10.3389/fgene.2020.00343
    [25] Dursun C, Smith J R, Hayman G T, Kwitek A E, Bozdag S. NECo: A node embedding algorithm for multiplex heterogeneous networks. In: Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Seoul, South Korea: IEEE, 2020. 146−149
    [26] Bentley B, Branicky R, Barnes C L, Chew Y L, Yemini E, Bullmore E T, et al. The multilayer connectome of Caenorhabditis elegans. PLoS Computational Biology, 2016, 12(12): Article No. e1005283 doi: 10.1371/journal.pcbi.1005283
    [27] Shi C, Li Y T, Zhang J W, Sun Y Z, Yu P S. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(1): 17−37 doi: 10.1109/TKDE.2016.2598561
    [28] Gómez-Gardeñes J, Latora V. Entropy rate of diffusion processes on complex networks. Physical Review E, 2008, 78(6): Article No. 065102
    [29] Brin S, Page L. Reprint of: The anatomy of a large-scale hypertextual web search engine. Computer Networks, 2012, 56(18): 3825−3833 doi: 10.1016/j.comnet.2012.10.007
    [30] Pan J Y, Yang H J, Faloutsos C, Duygulu P. Automatic multimedia cross-modal correlation discovery. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Seattle, USA: ACM, 2004. 653−658
    [31] Smedley D, Köhler S, Czeschik J C, Amberger J, Bocchini C, Hamosh A, et al. Walking the interactome for candidate prioritization in exome sequencing studies of Mendelian diseases. Bioinformatics, 2014, 30(22): 3215−3222 doi: 10.1093/bioinformatics/btu508
    [32] Kivelä M, Arenas A, Barthelemy M, Gleeson J P, Moreno Y, Porter M A. Multilayer networks. Journal of Complex Networks, 2014, 2(3): 203−271 doi: 10.1093/comnet/cnu016
    [33] Rolland T, Taşan M, Charloteaux B, Pevzner S, Zhong Q, Sahni N, et al. A proteome-scale map of the human interactome network. Cell, 2014, 159(5): 1212−1226 doi: 10.1016/j.cell.2014.10.050
    [34] Del-Toro N, Dumousseau M, Orchard S, Jimenez R C, Galeota E, Launay G, et al. A new reference implementation of the PSICQUIC web service. Nucleic Acids Research, 2013, 41(W1): W601−W606 doi: 10.1093/nar/gkt392
    [35] Kanehisa M, Sato Y, Kawashima M. KEGG mapping tools for uncovering hidden features in biological data. Protein Science, 2022, 31(1): 47−53 doi: 10.1002/pro.4172
    [36] Gillespie M, Jassal B, Stephan R, Milacic M, Rothfels K, Senff-Ribeiro A, et al. The reactome pathway knowledgebase 2022. Nucleic Acids Research, 2022, 50(D1): D687−D692 doi: 10.1093/nar/gkab1028
    [37] Mi H Y, Muruganujan A, Casagrande J T, Thomas P D. Large-scale gene function analysis with the PANTHER classification system. Nature Protocols, 2013, 8(8): 1551−1566 doi: 10.1038/nprot.2013.092
    [38] Schaefer C F, Anthony K, Krupa S, Buchoff J, Day M, Hannay T, et al. PID: The pathway interaction database. Nucleic Acids Research, 2009, 37(S1): D674−D679
    [39] Köhler S, Gargano M, Matentzoglu N, Carmody L C, Lewis-Smith D, Vasilevsky N A, et al. The human phenotype ontology in 2021. Nucleic Acids Research, 2021, 49(D1): D1207−D1217 doi: 10.1093/nar/gkaa1043
    [40] Greene D, BioResource N I H R, Richardson S, Turro E. Phenotype similarity regression for identifying the genetic determinants of rare diseases. The American Journal of Human Genetics, 2016, 98(3): 490−499 doi: 10.1016/j.ajhg.2016.01.008
    [41] Aerts S, Lambrechts D, Maity S, Van Loo P, Coessens B, De Smet F, et al. Gene prioritization through genomic data fusion. Nature Biotechnology, 2006, 24(5): 537−544 doi: 10.1038/nbt1203
    [42] Piñero J, Bravo À, Queralt-Rosinach N, Gutiérrez-Sacristán A, Deu-Pons J, Centeno E, et al. DisGeNET: A comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Research, 2017, 45(D1): D833−D839 doi: 10.1093/nar/gkw943
    [43] Hanley J A, McNeil B J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 1982, 143(1): 29−36 doi: 10.1148/radiology.143.1.7063747
    [44] Mordelet F, Vert J P. ProDiGe: Prioritization of disease genes with multitask machine learning from positive and unlabeled examples. BMC Bioinformatics, 2011, 12: Article No. 389 doi: 10.1186/1471-2105-12-389
    [45] Wu X B, Jiang R, Zhang M Q, Li S. Network-based global inference of human disease genes. Molecular Systems Biology, 2008, 4: Article No. 189 doi: 10.1038/msb.2008.27
    [46] Chen X, Liu M X, Yan G Y. Drug-target interaction prediction by random walk on the heterogeneous network. Molecular BioSystems, 2012, 8(7): 1970−1978 doi: 10.1039/c2mb00002d
    [47] Blatti C, Sinha S. Characterizing gene sets using discriminative random walks with restart on heterogeneous biological networks. Bioinformatics, 2016, 32(14): 2167−2175 doi: 10.1093/bioinformatics/btw151
    [48] De Domenico M, Solé-Ribalta A, Gómez S, Arenas A. Navigability of interconnected networks under random failures. Proceedings of the National Academy of Sciences of the United States of America, 2014, 111(23): 8351−8356
    [49] Pivnick E K, Angle B, Kaufman R A, Hall B D, Pitukcheewanont P, Hersh J H, et al. Neonatal progeroid (Wiedemann-Rautenstrauch) syndrome: Report of five new cases and review. American Journal of Medical Genetics, 2000, 90(2): 131−140 doi: 10.1002/(SICI)1096-8628(20000117)90:2<131::AID-AJMG9>3.0.CO;2-E
    [50] Kiraz A, Ozen S, Tubas F, Usta Y, Aldemir O, Alanay Y. Wiedemann-Rautenstrauch syndrome: Report of a variant case. American Journal of Medical Genetics Part A, 2012, 158A(6): 1434−1436 doi: 10.1002/ajmg.a.35336
    [51] Kohl M, Wiese S, Warscheid B. Cytoscape: Software for visualization and analysis of biological networks. Data Mining in Proteomics: From Standards to Applications. Totowa: Humana, 2011. 291−303
    [52] Becerra C H, Contreras-García G A, Perez Vera L A, Díaz-Martínez L A, Beltran Avendaño M A, Salazar Martínez H A. Wiedemann-Rautenstrauch syndrome prenatal diagnosis. Journal of Perinatology, 2014, 34(12): 954−956 doi: 10.1038/jp.2014.156
    [53] Paolacci S, Bertola D, Franco J, Mohammed S, Tartaglia M, Wollnik B, et al. Wiedemann-Rautenstrauch syndrome: A phenotype analysis. American Journal of Medical Genetics Part A, 2017, 173(7): 1763−1772 doi: 10.1002/ajmg.a.38246
    [54] Navarro C L, Esteves-Vieira V, Courrier S, Boyer A, Duong Nguyen T, Huong L T T, et al. New ZMPSTE24 (FACE1) mutations in patients affected with restrictive dermopathy or related progeroid syndromes and mutation update. European Journal of Human Genetics, 2014, 22(8): 1002−1011 doi: 10.1038/ejhg.2013.258
    [55] Beauregard-Lacroix E, Salian S, Kim H, Ehresmann S, D'Amours G, Gauthier J, et al. A variant of neonatal progeroid syndrome, or Wiedemann-Rautenstrauch syndrome, is associated with a nonsense variant in POLR3GL. European Journal of Human Genetics, 2020, 28(4): 461−468 doi: 10.1038/s41431-019-0539-6
    [56] Arboleda G, Ramírez N, Arboleda H. The neonatal progeroid syndrome (Wiedemann-Rautenstrauch): A model for the study of human aging? Experimental Gerontology, 2007, 42(10): 939−943 doi: 10.1016/j.exger.2007.07.004
    [57] Zhao X M, Liu K Q, Zhu G H, He F, Duval B, Richer J M, et al. Identifying cancer-related microRNAs based on gene expression data. Bioinformatics, 2015, 31(8): 1226−1234 doi: 10.1093/bioinformatics/btu811
    [58] He F, Zhu G H, Wang Y Y, Zhao X M, Huang D S. PCID: A novel approach for predicting disease comorbidity by integrating multi-scale data. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2017, 14(3): 678−686 doi: 10.1109/TCBB.2016.2550443
    [59] Dong G Y, Feng J F, Sun F Z, Chen J Q, Zhao X M. A global overview of genetically interpretable multimorbidities among common diseases in the UK Biobank. Genome Medicine, 2021, 13(1): Article No. 110 doi: 10.1186/s13073-021-00927-6
  • 加载中
图(10) / 表(3)
计量
  • 文章访问数:  774
  • HTML全文浏览量:  370
  • PDF下载量:  107
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-06-25
  • 录用日期:  2022-02-10
  • 网络出版日期:  2022-05-09
  • 刊出日期:  2024-06-27

目录

    /

    返回文章
    返回