2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于关键功能模块挖掘的蛋白质功能预测

赵碧海 李学勇 胡赛 张帆 田清龙 杨品红 刘臻

赵碧海, 李学勇, 胡赛, 张帆, 田清龙, 杨品红, 刘臻. 基于关键功能模块挖掘的蛋白质功能预测. 自动化学报, 2018, 44(1): 183-192. doi: 10.16383/j.aas.2018.c160592
引用本文: 赵碧海, 李学勇, 胡赛, 张帆, 田清龙, 杨品红, 刘臻. 基于关键功能模块挖掘的蛋白质功能预测. 自动化学报, 2018, 44(1): 183-192. doi: 10.16383/j.aas.2018.c160592
ZHAO Bi-Hai, LI Xue-Yong, HU Sai, ZHANG Fan, TIAN Qing-Long, YANG Pin-Hong, LIU Zhen. Prediction of Protein Functions Based on Essential Functional Modules Mining. ACTA AUTOMATICA SINICA, 2018, 44(1): 183-192. doi: 10.16383/j.aas.2018.c160592
Citation: ZHAO Bi-Hai, LI Xue-Yong, HU Sai, ZHANG Fan, TIAN Qing-Long, YANG Pin-Hong, LIU Zhen. Prediction of Protein Functions Based on Essential Functional Modules Mining. ACTA AUTOMATICA SINICA, 2018, 44(1): 183-192. doi: 10.16383/j.aas.2018.c160592

基于关键功能模块挖掘的蛋白质功能预测

doi: 10.16383/j.aas.2018.c160592
基金项目: 

湖南省教育厅项目 17C0133

国家自然科学基金 61772089

湖南省自然科学基金 2016JJ3016

湖南省教育厅项目 16A020

湖南省教育厅项目 16C0137

详细信息
    作者简介:

    赵碧海  博士, 长沙学院计算机工程与应用数学学院副教授.2014年获得中南大学信息学院博士学位.主要研究方向为生物信息学, 数据挖掘.E-mail:bihaizhao@163.com

    李学勇  长沙学院计算机工程与应用数学学院教授.2016年获得中南大学信息学院博士学位.主要研究方向为生物信息学.E-mail:xueyongli@163.com

    张帆  长沙学院计算机工程与应用数学学院讲师.2014年获北京航空航天大学计算机学院博士学位.主要研究方向为生物信息学.E-mail:zf_ccsu@163.com

    田清龙  长沙学院数学与计算机科学系讲师.2012年获湖南大学信息科学与工程学院硕士学位.主要研究方向为生物信息学, 机器学习.E-mail:chinatql@126.com

    杨品红  博士, 湖南文理学院生命科学学院教授.1999年获博士学位.主要研究方向为水生生物资源与利用.E-mail:yph098@163.com

    刘臻  博士, 长沙学院生物与环境工程学院教授.2010年获博士学位.主要研究方向为分子营养与调控研究.E-mail:zhenliuccsu@163.com

    通讯作者:

    胡赛  长沙学院计算机工程与应用数学学院副教授.2003年获得湖南大学数学与计量经济学院硕士学位.主要研究方向为生物信息学, 统计学.本文通信作者.E-mail:husaiccsu@163.com

Prediction of Protein Functions Based on Essential Functional Modules Mining

Funds: 

National Scientific Research Foundation of Hunan Province 17C0133

National Natural Science Foundation of China 61772089

Natural Science Foundation of Hunan Province 2016JJ3016

National Scientific Research Foundation of Hunan Province 16A020

National Scientific Research Foundation of Hunan Province 16C0137

More Information
    Author Bio:

     Ph. D., associate professor at the School of Computer Engineering and Applied Mathematics, Changsha University. He received his Ph. D. degree from Central South University in 2014. His research interest covers bioinformatics and data mining

     Professor at the School of Computer Engineering and Applied Mathematics, Changsha University. He received his Ph. D. degree from Central South University in 2016. His main research interest is bioinformatics

     Lecturer at the School of Computer Engineering and Applied Mathematics, Changsha University. She received her Ph. D. degree from Beihang University in 2014. Her main research interest is bioinformatics

     Lecturer at the School of Computer Engineering and Applied Mathematics, Changsha University. He received his master degree from Hunan University in 2012. His research interest covers bioinformatics and machine learning

     Professor at the School of Life Science, Hunan University of Arts and Science. He received his Ph. D. degree in 1999. His research interest covers aquatic biological resources and utilization

     Professor at the School of Biology and Environmental Engineering, Changsha University. He received his Ph. D. degree in 2010. His research interest covers molecular nutrition and regulation

    Corresponding author: HU Sai  Associate professor at the School of Computer Engineering and Applied Mathematics, Changsha University. She received her master degree from Hunan University in 2003. Her research interest covers bioinformatics and statistics. Corresponding author of this paper
  • 摘要: 精确注释蛋白质功能是从分子水平理解生物体的关键.由于内在的困难和昂贵的开销,实验方法注释蛋白质功能已经很难满足日益增长的序列数据.为此,提出了许多基于蛋白质相互作用(Protein-protein interaction,PPI)网络的计算方法预测蛋白质功能.当今蛋白质功能预测的趋势是融合蛋白质相互作用网络和异构生物数据.本文提出一种基于多关系网络中关键功能模块挖掘的蛋白质功能预测算法.关键功能模块由一组紧密联系且共享生物功能的蛋白质组成,它们能与网络中的剩余部分较好地区分开来.算法通过从多关系网络的每一个简单网络中挖掘高内聚、低耦合的子图形成关键功能模块.关键功能模块中邻居蛋白质的功能用于注释待预测功能的蛋白质.每一个简单网络在蛋白质功能预测中的重要性各不相同.实验结果表明,提出的方法性能优于现有的蛋白质功能预测方法.
    1)  本文责任编委 张学工
  • 图  1  结构域与蛋白质功能关系综合统计

    Fig.  1  Statistics of relationship between domains and protein functions

    图  2  蛋白质功能与共享复合物统计分析

    Fig.  2  Statistics of relationship between protein complexes and functions

    图  3  多关系网络可视化显示

    Fig.  3  Visualization of a multi-relationship network

    图  4  关键功能模块挖掘实例

    Fig.  4  Example of an essential functional module mining

    图  5  不同类型联系对预测的影响

    Fig.  5  Impact of different types of connection

    图  6  参数$T$的影响

    Fig.  6  The effect of threshold $T$

    图  7  各种算法综合性能对比

    Fig.  7  Overall performance comparison of various algorithms

    图  8  不同$K$值时各种算法的F-measure对比

    Fig.  8  Comparison of average F measure of various algorithms under different $K$ values

    图  9  留部分法实验结果

    Fig.  9  Results of leave-percent-out cross validation

    表  1  蛋白质功能数量统计

    功能数量 蛋白质数量 共享结构域的蛋白质数量 所占比例(%)
    1 1 512 523 34.59
    2 703 318 45.23
    3 317 166 52.37
    4 140 77 55.00
    5 106 54 50.94
    > 5 116 77 66.38
    下载: 导出CSV

    表  2  Krogan, Gavin和BioGrid运行结果

    Table  2  Results of methods on Krogan, Gavin and BioGrid

    Dataset Method Specificity Sensitivity F-Measure
    Krogan PEFM 0.412 0.423 0.418
    Krogan D-PIN 0.367 0.405 0.385
    Krogan FPM 0.317 0.342 0.329
    Krogan Zhang 0.231 0.221 0.226
    Krogan DCS 0.316 0.321 0.318
    Krogan NC 0.076 0.583 0.135
    Krogan PON 0.170 0.161 0.165
    Gavin PEFM 0.466 0.472 0.469
    Gavin D-PIN 0.443 0.486 0.463
    Gavin FPM 0.401 0.404 0.403
    Gavin Zhang 0.197 0.190 0.194
    Gavin DCS 0.381 0.393 0.387
    Gavin NC 0.210 0.603 0.311
    Gavin PON 0.155 0.145 0.150
    BioGrid PEFM 0.433 0.447 0.440
    BioGrid D-PIN 0.392 0.445 0.417
    BioGrid FPM 0.393 0.415 0.403
    BioGrid Zhang 0.236 0.233 0.235
    BioGrid DCS 0.370 0.375 0.372
    BioGrid NC 0.076 0.583 0.135
    BioGrid PON 0.170 0.161 0.165
    下载: 导出CSV
  • [1] Zhao B H, Wang J X, Li M, Li X Y, Li Y H, Wu F X, Pan Y. A new method for predicting protein functions from dynamic weighted interactome networks. IEEE Transactions on NanoBioscience, 2016, 15(2):131-139 doi: 10.1109/TNB.2016.2536161
    [2] Schwikowski B, Uetz P, Fields S. A network of protein-protein interactions in yeast. Nature Biotechnology, 2000, 18(12):1257-1261 doi: 10.1038/82360
    [3] Dutkowski J, Ideker T. Protein networks as logic functions in development and cancer. PLoS Computational Biology, 2011, 7(9):e1002180 doi: 10.1371/journal.pcbi.1002180
    [4] 胡赛, 熊慧军, 赵碧海, 李学勇, 王晶.动态加权蛋白质相互作用网络构建及其应用研究.自动化学报, 2015, 41(11):1893-1900 http://www.aas.net.cn/CN/abstract/abstract18764.shtml

    Hu Sai, Xiong Hui-Jun, Zhao Bi-Hai, Li Xue-Yong, Wang Jing. Construction of dynamic-weighted protein interactome network and its application. Acta Automatica Sinica, 2015, 41(11):1893-1900 http://www.aas.net.cn/CN/abstract/abstract18764.shtml
    [5] Zhao B H, Wang J X, Li X Y, Wu F X. Essential protein discovery based on a combination of modularity and conservatism. Methods, 2016, 110:54-63 doi: 10.1016/j.ymeth.2016.07.005
    [6] Li X Y, Wang J X, Zhao B H, Wu F X, Pan Y. Identification of protein complexes from multi-relationship protein interaction networks. Human Genomics, 2016, 10(S2):17 doi: 10.1186/s40246-016-0069-z
    [7] 胡赛, 熊慧军, 李学勇, 赵碧海, 倪问尹, 杨品红, 刘臻.多关系蛋白质网络构建及其应用研究.自动化学报, 2015, 41(12):2155-2163 http://www.aas.net.cn/CN/abstract/abstract18788.shtml

    Hu Sai, Xiong Hui-Jun, Li Xue-Yong, Zhao Bi-Hai, Ni Wen-Yin, Yang Pin-Hong, Liu Zhen. Construction of multi-relation protein networks and its application. Acta Automatica Sinica, 2015, 41(12):2155-2163 http://www.aas.net.cn/CN/abstract/abstract18788.shtml
    [8] Peng W, Wang J X, Cai J, Chen L, Li M, Wu F X. Improving protein function prediction using domain and protein complexes in PPI networks. BMC Systems Biology, 2014, 8(1):35 doi: 10.1186/1752-0509-8-35
    [9] Zotenko E, Mestre J, O'Leary D P, Przytycka T M. Why do hubs in the yeast protein interaction network tend to be essential:reexamining the connection between the network topology and essentiality. PLoS Computational Biology, 2008, 4(8):e1000140 doi: 10.1371/journal.pcbi.1000140
    [10] Nepusz T, Yu H Y, Paccanaro A. Detecting overlapping protein complexes in protein-protein interaction networks. Nature Methods, 2012, 9(5):471-472 doi: 10.1038/nmeth.1938
    [11] Xenarios I, Rice D W, Salwinski L, Baron M K, Marcotte E M, Eisenberg D. DIP:the database of interacting proteins. Nucleic Acids Research, 2000, 28(1):289-291 doi: 10.1093/nar/28.1.289
    [12] Stark C, Breitkreutz B J, Chatr-Aryamontri A, Boucher L, Oughtred R, Livstone M S, Nixon J, Van Auken K, Wang X D, Shi X Q, Reguly T, Rust J M, Winter A, Dolinski K, Tyers M. The BioGRID interaction database:2011 update. Nucleic Acids Research, 2011, 39(S1):D698-D704 https://www.researchgate.net/publication/5846950_The_BioGRID_interaction_database_2008_update
    [13] Gavin A C, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen L J, Bastuck S, Dümpelfeld B, Edelmann A, Heurtier M A, Hoffman V, Hoefert C, Klein K, Hudak M, Michon A M, Schelder M, Schirle M, Remor M, Rudi T, Hooper S, Bauer A, Bouwmeester T, Casari G, Drewes G, Neubauer G, Rick J M, Kuster B, Bork P, Russell R B, Superti-Furga G. Proteome survey reveals modularity of the yeast cell machinery. Nature, 2006, 440(7084):631-636 doi: 10.1038/nature04532
    [14] Krogan N J, Cagney G, Yu H Y, Zhong G Q, Guo X H, Ignatchenko A, Li J, Pu S Y, Datta N, Tikuisis A P, Punna T, Peregrín-Alvarez J M, Shales M, Zhang X, Davey M, Robinson M D, Paccanaro A, Bray J E, Sheung A, Beattie B, Richards D P, Canadien V, Lalev A, Mena F, Wong P, Starostine A, Canete M M, Vlasblom J, Wu S, Orsi C, Collins S R, Chandran S, Haw R, Rilstone J J, Gandi K, Thompson N J, Musso G, Onge P S, Ghanny S, Lam M H Y, Butland G, Altaf-Ul A M, Kanaya S, Shilatifard A, O'Shea E, Weissman J S, Ingles C J, Hughes T R, Parkinson J, Gerstein M, Wodak S J, Emili A, Greenblatt J F. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature, 2006, 440(7084):637-643 doi: 10.1038/nature04670
    [15] Martin D M A, Berriman M, Barton G J. GOtcha:a new method for prediction of protein function assessed by the annotation of seven genomes. BMC Bioinformatics, 2004, 5(1):178 doi: 10.1186/1471-2105-5-178
    [16] Lima T, Auchincloss A H, Coudert E, Keller G, Michoud K, Rivoire C, Bulliard V, de Castro E, Lachaize C, Baratin D, Phan I, Bougueleret L, Bairoch A. HAMAP:a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot. Nucleic Acids Research, 2009, 37(S1):D471-D478 http://www.citeulike.org/user/neils/article/3398716
    [17] Hawkins T, Chitale M, Luban S, Kihara D. PFP:automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins:Structure, Function, and Bioinformatics, 2009, 74(3):566-582 doi: 10.1002/prot.v74:3
    [18] Pu S Y, Wong J, Turner B, Cho E, Wodak S J. Up-to-date catalogues of yeast protein complexes. Nucleic Acids Research, 2009, 37(3):D825-D831 doi: 10.1093/nar/gkn1005
    [19] Zhang S, Chen H, Liu K, Sun Z R. Inferring protein function by domain context similarities in protein-protein interaction networks. BMC Bioinformatics, 2009, 10(1):395 doi: 10.1186/1471-2105-10-395
    [20] Liang S D, Zheng D D, Standley D M, Guo H R, Zhang C. A novel function prediction approach using protein overlap networks. BMC Systems Biology, 2013, 7(1):61 doi: 10.1186/1752-0509-7-61
  • 加载中
图(9) / 表(2)
计量
  • 文章访问数:  2556
  • HTML全文浏览量:  476
  • PDF下载量:  629
  • 被引次数: 0
出版历程
  • 收稿日期:  2016-09-02
  • 录用日期:  2017-01-16
  • 刊出日期:  2018-01-20

目录

    /

    返回文章
    返回