2.624

2020影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

多层异构生物网络候选疾病基因识别

丁苍峰 王君 张紫芸

丁苍峰, 王君, 张紫芸. 多层异构生物网络候选疾病基因识别. 自动化学报, 2022, 48(x): 1−15 doi: 10.16383/j.aas.c210577
引用本文: 丁苍峰, 王君, 张紫芸. 多层异构生物网络候选疾病基因识别. 自动化学报, 2022, 48(x): 1−15 doi: 10.16383/j.aas.c210577
Ding Cang-Feng, Wang Jun, Zhang Zi-Yun. Identifying candidate disease genes in multilayer heterogeneous networks. Acta Automatica Sinica, 2022, 48(x): 1−15 doi: 10.16383/j.aas.c210577
Citation: Ding Cang-Feng, Wang Jun, Zhang Zi-Yun. Identifying candidate disease genes in multilayer heterogeneous networks. Acta Automatica Sinica, 2022, 48(x): 1−15 doi: 10.16383/j.aas.c210577

多层异构生物网络候选疾病基因识别

doi: 10.16383/j.aas.c210577
基金项目: 国家自然科学基金(62041212, 61866038, 61763046, 61962059), 陕西省自然科学基础研究计划(2020JM-548, 2020JM-547), 延安大学基金(YDZ2019-04, YDBK2018-35)资助
详细信息
    作者简介:

    丁苍峰:延安大学数学与计算机科学学院副教授. 2018年获北京理工大学博士学位. 主要研究方向为多层复杂网络、图神经网络和自然语言处理. 本文通信作者. E-mail: dcf@yau.edu.cn

    王君:延安大学数学与计算机科学学院硕士研究生. 主要研究方向为知识图谱及其在红色文化、文学小说等领域相关应用. E-mail: wangjun03006@163.com

    张紫芸:延安大学数学与计算机科学学院硕士研究生. 主要研究方向为文本摘要及其在红色文本等领域相关应用. E-mail: zhangziyun1202@163.com

Identifying Candidate Disease Genes in Multilayer Heterogeneous Networks

Funds: Supported by National Natural Science Foundation of China (62041212, 61866038, 61763046, 61962059), Natural Science Basic Research Program of Shaanxi (2020JM-548, 2020JM-547), and Yan'an University Foundation Program (YDZ2019-04, YDBK2018-35)
More Information
    Author Bio:

    DING Cang-Feng Associate professor at school of mathematics and computer science, Yan'an University. He received his Ph. D. degree from Beijing Institute of Technology in 2018. His research interest covers multilayer complex network, graph neural network and natural language processing. Corresponding author of this paper

    Wang Jun Graduate student at school of mathematics and computer science, Yan'an University. His research interest covers knowledge graph and its related applications in the field of red culture and literary novels

    ZHANG Zi-Yun Graduate student at school of mathematics and computer science, Yan'an University. Her research interest covers text summarization and its related applications in the field of red text

  • 摘要: 现有大多数用于识别候选疾病基因的随机游走方法通常优先访问高度连接的基因, 而可能与已知疾病有关的不知名或连接性差的基因易被忽略或难以识别. 此外, 这些方法仅访问单个基因网络或各种基因数据的聚合网络, 导致偏差和不完整性. 因此, 设计一种能控制随机游走运动方向和整合多种数据源的候选疾病基因识别方法将是一个迫切需要解决的问题. 为此, 本文首先构建多层网络和多层异构基因网络. 然后, 提出了一种游走于多层和多层异构网络的拓扑偏置随机游走(Biased random walk with restart, BRWR)算法来识别疾病基因. 实验结果表明, 游走于不同类型网络上的识别候选疾病基因的BRWR算法优于现有的算法. 最后, 应用于多层异构网络上的BRWR算法能预测未诊断的新生儿类早衰综合征中涉及的疾病基因.
    1)  1 http://www.proteinatlas.org2 http://www.biocarta.com
    2)  http://www.biocarta.com
    3)  3 http://human-phenotype-ontology.github.io/4 http://www.omim.org/
    4)  http://www.omim.org/
    5)  5 https://www.ncbi.nlm.nih.gov/geo/
  • 图  1  多层网络(a)、异构网络(b)、多层异构网络(c)以及探索它们的随机游走路径(箭头的实线)的示意图

    Fig.  1  Schematic of multilayer (a), heterogeneous (b) and multilayer heterogeneous (c) networks, together with paths of random walks (arrow solid lines)

    图  2  非异构网络上不同方法的ROC曲线及其对应的AUC值

    Fig.  2  ROC curves and AUC values of different algorithms on the non-heterogeneous gene networks

    图  3  异构基因网络上不同方法的ROC曲线及其对应的AUC值

    Fig.  3  ROC curves and AUC values of different algorithms on the heterogeneous gene networks

    图  4  排名随偏置参数$ b $变数的累积分布

    Fig.  4  The cumulative distributions of the ranking with change of the parameter $ b $

    图  5  排名随参数变化的累积分布

    Fig.  5  The cumulative distributions of the ranking with change of the parameters

    图  6  所有偏置参数为5时的网络表示(由[51]绘制)

    Fig.  6  Network representation when all the biased parameters equal to 5

    图  7  所有偏置参数为−5时的网络表示

    Fig.  7  Network representation when all the biased parameters equal to −5

    图  8  所有偏置参数为−1时的网络表示

    Fig.  8  Network representation when all the biased parameters equal to −1

    图  9  所有偏置参数为0时的网络表示

    Fig.  9  Network representation when all the biased parameters equal to 0

    图  10  所有偏置参数为1时的网络表示

    Fig.  10  Network representation when all the biased parameters equal to 1

    表  1  网络节点数、边数和节点平均度的统计属性

    Table  1  Statistical properties of phenotype, gene and aggregated networks

    网络节点数边数平均度
    co-expression10 415998 71247.44
    PPI12 89370 1417.73
    pathway10 966274 05113.47
    Aggregated17 6111 342 70325.79
    phenotype7 32429 8534.38
    下载: 导出CSV

    表  2  不同的非异构网络上的不同方法的AUC值(%)

    Table  2  AUC values of different algorithms on different non-heterogeneous networks

    PPI COEX PATH Aggregated Multilayer
    RWR 73.35 72.84 74.43 76.53 77.98
    ProDige 79.12 73.63 80.29 83.27 84.12
    NDOS 78.27 74.78 79.86 84.49 87.95
    DRS 78.93 74.94 80.87 84.78 88.45
    BRIDGE 79.91 74.26 81.51 85.13 89.33
    BRWR 81.15 75.20 84.18 86.73 90.17
    下载: 导出CSV

    表  3  不同异构网络上的不同方法的AUC值(%)

    Table  3  AUC values of different algorithms on different heterogeneous networks

    PPIH COEXH PATHH AggregatedH MultilayerH
    CIPHER 74.52 73.51 78.30 77.89 78.31
    RWRH 80.37 75.34 79.47 83.67 86.53
    MAXIF 80.91 76.56 80.15 84.02 88.43
    LapRWRH 81.91 77.80 80.90 84.93 88.78
    NRWRH 81.36 78.38 82.70 86.56 89.36
    IDLP 82.08 79.25 83.37 87.79 90.16
    BRWRH 82.36 80.91 85.17 89.65 91.09
    下载: 导出CSV
  • [1] Guala D, Sonnhammer E L. A large-scale benchmark of gene prioritization methods. Scientific reports, 2017, 7(1): 1-10 doi: 10.1038/s41598-016-0028-x
    [2] Benno S, Perter U, Stanley S. A network of protein protein interactions in yeast. Nature Biotechnology, 2000, 18(12): 1257-1261 doi: 10.1038/82360
    [3] Sharma V, Ranjan T, Kumar P, Pal A K, Jha V K, Sahni S et al. Protein-protein interaction detection: Methods and analysis. Plant Biotechnol, 2017. 391-411
    [4] Chen Y, Jiang T, Jiang R. Uncover disease genes by maximizing information flow in the phenome interactome network. Bioinformatics, 2011, 27(13): i167-i176 doi: 10.1093/bioinformatics/btr213
    [5] Zhang Y G, Wang Y, Liu J H, Liu X H, Hong Y X, Fan X, et al. IDLP: A Novel Label Propagation Framework for Disease Gene Prioritization. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, Cham, 2018. 261−272
    [6] Chen Y, Wu X B, Jiang R. Integrating human omics data to prioritize candidate genes. BMC medical genomics, 2013, 6(1): 1-12 doi: 10.1186/1755-8794-6-1
    [7] Lee J H, Zhao X M, Yoon I, Lee J Y, Kwon N H, Wang Y Y, et al. Integrative analysis of mutational and transcriptional pro-files reveals driver mutations of metastatic breast cancers. Cell discovery, 2016, 2(1): 1-14
    [8] Yang K, Zhao X Z, Waxman D, Zhao X M. Predicting drug disease associations with heterogeneous network embedding. Chaos: An Interdisciplinary Journal of Nonlinear Science, 2019, 29(12): 123109 doi: 10.1063/1.5121900
    [9] Yang A, Chen J Q, Zhao X M. nMAGMA: a network-enhanced method for inferring risk genes from GWAS summary statistics and its application to schizophrenia. Briefings in Bioinformatics, 2021, 22(4): bbaa298 doi: 10.1093/bib/bbaa298
    [10] Köhler S, Bauer S, Horn D, Robinson P N. Walking the interactome for prioritization of candidate disease genes. The American Journal of Human Genetics, 2008, 82(4): 949-958 doi: 10.1016/j.ajhg.2008.02.013
    [11] Li Y J, Patra J C. Genome-wide inferring gene phenotype relationship by walking on the heterogeneous network. Bioinformatics, 2010, 26 (9): 1219-1224 doi: 10.1093/bioinformatics/btq108
    [12] Li Y J, Patra J C. Integration of multiple data sources to prioritize candidate genes using discounted rating system. BMC Bioinformatics, 2010, 11(1): 1-10 doi: 10.1186/1471-2105-11-1
    [13] Li Y J, Li J Y. Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data. BMC Genomics, 2012, 13(7): 1-12
    [14] Xie M Q, Xu Y J, Zhang Y G, Hwang T H, Kuang R. Network-based phenome-genome association prediction by bi-random walk. PloS One, 2015, 10(5): e0125138 doi: 10.1371/journal.pone.0125138
    [15] Zhao Z Q, Han G S, Yu Z G, Li J Y. Laplacian normalization and random walk on heterogeneous networks for disease gene prioritization. Computational Biology and Chemistry, 2015, 57: 21-28 doi: 10.1016/j.compbiolchem.2015.02.008
    [16] Valdeolivas A, Tichit L, Navarro C, Perrin S, Odelin G, Levy N, et al. Random Walk With Restart On Multiplex And Heterogeneous Biological Networks. Bioinformatics, 2019, 35(3): 497-505 doi: 10.1093/bioinformatics/bty637
    [17] Doncheva N T, Kacprowski T, Albrecht M. Recent approaches to the prioritization of candidate disease genes. Wiley Interdisciplinary Reviews: Systems Biology and Medicine, 2012, 4(5): 429-442 doi: 10.1002/wsbm.1177
    [18] Yang K, Lu K Z, Wu Y, Yu J, Liu B Y, Zhao Y. A network-based machine-learning framework to identify both functional modules and disease genes. BioData Mining, 2011, 140(1): 897-913
    [19] Bonaventura M, Nicosia V, Latora V. Characteristic times of biased random walks on complex networks.Physical Review E, 2014, 89(1): 012803 doi: 10.1103/PhysRevE.89.012803
    [20] Ding C F, Li K. Centrality ranking in multiplex networks using topologically biased random walks. Neurocomputing, 2018, 312: 263-275 doi: 10.1016/j.neucom.2018.05.109
    [21] Pio-Lopez L, Valdeolivas A, Tichit L, Remy L, Baudot A. MultiVERSE: a multiplex and multiplex-heterogeneous network embedding approach. Scientific Reports, 2021, 11(1): 1-20 doi: 10.1038/s41598-020-79139-8
    [22] Peng J L, Zhou Y Y, Wang K. Multiplex gene and phenotype network to characterize shared genetic pathways of epilepsy and autism. Scientific Reports, 2021, 11(1): 1-16 doi: 10.1038/s41598-020-79139-8
    [23] Novoa E M, Mezura E, Vignes M, Terezo M, Magdinier F, Tichit L, et al. A multi-objective genetic algorithm to find active modules in multiplex biological networks. PLoS computational biology, 2021, 17(8): e1009263 doi: 10.1371/journal.pcbi.1009263
    [24] Zhao B H, Hu S, Liu X E, Xiong H J, Han X, Zhang Z H, et al. A Novel Computational Approach for Identifying Essential Proteins From Multiplex Biological Networks. Frontiers in Genetics, 2020, 11: 343-357 doi: 10.3389/fgene.2020.00343
    [25] Dursun C, Smith J R, Hayman G T, Kwitek A E, Bozdag S. NECo: A node embedding algorithm for multiplex heterogeneous networks. In: 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Seoul, Korea (South): IEEE, 2020. 146−149
    [26] Bentley B, Branicky R, Barnes C L, Chew Y L, Yemini E, Bullmore E T, et al. The multilayer connectome of Caenorhabditis elegans. PLoS Computational Biology, 2016, 12(12): e1005283 doi: 10.1371/journal.pcbi.1005283
    [27] Shi C, Li Y T, Zhang J W, Sun Y Z, Philip S Y. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering, 2016, 29(1): 17-37
    [28] Gömez-Gardeñes J, Latora V. Entropy rate of diffusion processes on complex networks. Physical Review E, 2008, 78(6): 065102 doi: 10.1103/PhysRevE.78.065102
    [29] Sergey B, Lawrence P. Reprint of: The anatomy of a large-scale hypertextual web search engine. Computer Networks, 2012, 56(18): 3825-3833 doi: 10.1016/j.comnet.2012.10.007
    [30] Pan J Y, Yang H J, Faloutsos C, Duygulu P. Automatic multimedia cross-modal correlation discovery. In Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data mining. Seattle WA USA: ACM, 2004. 653−658
    [31] Smedley D, Köhler S, Czeschik J C, Amberger J, Bocchini C, Hamos A, et al. Walking the interactome for candidate prioritization in exome sequencing studies of Mendelian diseases. Bioinformatics, 2014, 30(22): 3215-222 doi: 10.1093/bioinformatics/btu508
    [32] Kivelä M, Arenas A, Barthelemy M, Gleeson J P, MorenoY, Porter M A. Multilayer networks. Journal of Complex Networks, 2014, 2(3): 203-271 doi: 10.1093/comnet/cnu016
    [33] Rolland T, Tasan M, Charloteaux B, Pevzner S, Zhong Q, Sahni N, et al. A proteome-scale map of the human interactome network. Cell, 2014, 159(5): 1212-1226 doi: 10.1016/j.cell.2014.10.050
    [34] Del-Toro N, Dumousseau M, Orchard S, Jimenez R C, Galeota E, Launay G, et al. A new reference implementation of the PSICQUIC web service. Nucleic Acids Research, 2013, 41(W1): W601-W606 doi: 10.1093/nar/gkt392
    [35] Kanehisa M, Sato Y, Kawashima M. KEGG mapping tools for uncovering hidden features in biological data. Protein Science, 2022, 31(1): 47-53 doi: 10.1002/pro.4172
    [36] Gillespie M, Jassal B, Stephan R, Milacic M, Rothfels K, Senff-Ribeiro A, et al. The reactome pathway knowledgebase 2022. Nucleic Acids Research, 2022, 50(D1): D687-D692 doi: 10.1093/nar/gkab1028
    [37] Mi H Y, Muruganujan A, Casagrande J T, Thomas P D. Large-scale gene function analysis with the PANTHER classification system. Nature Protocols, 2013, 8(8): 1551-1566 doi: 10.1038/nprot.2013.092
    [38] Schaefer C F, Anthony K, Krupa S, Buchoff J, Day M, Hannay T, et al. PID: the Pathway Interaction Database. Nucleic Acids Research, 2009, 37(1): 674-679
    [39] Köhler S, Gargano M, Matentzoglu N, Carmody L C, Lewis-Smith D, Vasilevsky N, et al. The Human Phenotype Ontology in 2021. Nucleic Acids Research, 2021, 49(D1): D1207-D1217 doi: 10.1093/nar/gkaa1043
    [40] Greene D, BioResource N, Richardson S, Turro E. Gene. Phenotype similarity regression for identifying the genetic determinants of rare diseases. The American Journal of Human Genetics, 2016, 98(3): 490-499 doi: 10.1016/j.ajhg.2016.01.008
    [41] Aerts S, Lambrechts D, Maity S, Van L P, Coessens B, Smet F D, et al. Gene prioritization through genomic data fusion. Nature Biotechnology, 2006, 24(5): 537-544 doi: 10.1038/nbt1203
    [42] Piñero J, Bravo A, Queralt-Rosinach N, Gutiérrez-Sacristán A, Deu-Pons J, Centeno, E, et al. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Research, 2017, 45(D1): D833-D839 doi: 10.1093/nar/gkw943
    [43] Hanley J A, McNeil B J. The meaning and use of the area under a receiver operating characteristic(ROC) curve. Radiology, 1982, 143(1): 29-36 doi: 10.1148/radiology.143.1.7063747
    [44] Mordelet F, Vert J P. Prodige: Prioritization of disease genes with multitask machine learning from positive and unlabeled examples. BMC bioinformatics, 2011, 12(1): 1-15 doi: 10.1186/1471-2105-12-1
    [45] Wu X B, Jiang R, Zhang M Q, Li S. Network-based global inference of human disease genes. Molecular Systems Biology, 2008, 4(1): 189 doi: 10.1038/msb.2008.27
    [46] Chen X, Liu M X, Yan G Y. Drug target interaction prediction by random walk on the heterogeneous network. Molecular BioSystems, 2012, 8(7): 1970-1978 doi: 10.1039/c2mb00002d
    [47] Blatti C, Sinha S. Characterizing gene sets using discriminative random walks with restart on heterogeneous biological networks. Bioinformatics, 2016, 32(14): 2167-2175 doi: 10.1093/bioinformatics/btw151
    [48] Domenico M, Sole-Ribalta A, Gomez S, Arenas A. Navigability of interconnected networks under random failures. Proceedings of the National Academy of Sciences, 2014, 111(23): 8351-8356 doi: 10.1073/pnas.1318469111
    [49] Pivnick E K, Angle B, Kaufman R A, Pitukcheewanont P, Hersh J H, Fowlkes J L, et al. Neonatal progeroid(Wiedemann-Rautenstrauch) syndrome: Report of five new cases and review. American Journal of Medical Genetics Part A, 2000, 90(2): 131-140 doi: 10.1002/(SICI)1096-8628(20000117)90:2<131::AID-AJMG9>3.0.CO;2-E
    [50] Kiraz A, Ozen S, Tubas F, Usta Y, Aldemir O, Alanay Y. Wiedemann Rautenstrauch syndrome: Report of a variant case. American Journal of Medical Genetics Part A, 2012, 158(6): 1434-1436
    [51] Kohl M, Wiese S, Warscheid B. Cytoscape: software for visualization and analysis of biological networks. Data mining in proteomics, 2011. 291-303
    [52] Becerra C H, Contreras-Garcia G A, Perez-Vera L A, Diaz-Martinez L A, Avendano B, Martinez H A. Wiedemann Rautenstrauch syndrome prenatal diagnosis. Journal of Perinatology, 2014, 34(12): 954-956 doi: 10.1038/jp.2014.156
    [53] Paolacci S, Bertola D, Franco J, Mohammed S, Tartaglia M, Wollnik B, et al. Wiedemann Rautenstrauch syndrome: A phenotype analysis. American Journal of Medical Genetics Part A, 2017, 173(10): 1763-1772
    [54] Navarro C L, Esteves-Vieira V, Courrier S, Boyer A, Nguyen N T, Huong L T, et al. New ZMPSTE24 (FACE1) mutations in patients affected with restrictive dermopathy or related progeroid syndromes and mutation update. European Journal of Human Genetics, 2014, 22(8): 1002-1011 doi: 10.1038/ejhg.2013.258
    [55] Beauregard B, Smrithi S, Kim H, Ehresmann S, Damours G, Gauthier J, et al. A variant of neonatal progeroid syndrome, or Wiedemann-Rautenstrauch syndrome, is associated with a nonsense variant in POLR3GL. European Journal of Human Genetics, 2020, 28(4): 461-468 doi: 10.1038/s41431-019-0539-6
    [56] Arboleda G, Ramrez N, Arboleda H. The neonatal progeroid syndrome (Wiedemann Rautenstrauch): A model for the study of human aging? Experimental Gerontology, 2007, 42(10): 939-943 doi: 10.1016/j.exger.2007.07.004
    [57] Zhao X M, Liu K Q, Zhu G H, He F, Duval B, Richer J, et al. Identifying cancer-related microRNAs based on gene expression data. Bioinformatics, 2015, 31(8): 1226-1234 doi: 10.1093/bioinformatics/btu811
    [58] He F, Zhu G H, Wang Y Y, Zhao X M, Huang D S. PCID: A novel approach for predicting disease comorbidity by integrating multi-scale data. IEEE/ACM transactions on computational biology and bioinformatics, 2016, 14(3): 678-686
    [59] Dong G Y, Feng J F, Sun F Z, Chen J Q, Zhao X M. A global overview of genetically interpretable multimorbidities among common diseases in the UK Biobank. Genome medicine, 2021, 13(1): 1-20 doi: 10.1186/s13073-020-00808-4
  • 加载中
计量
  • 文章访问数:  153
  • HTML全文浏览量:  71
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-06-25
  • 录用日期:  2022-02-10
  • 网络出版日期:  2022-05-09

目录

    /

    返回文章
    返回