2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

SegHMC:一种基于Segmental HMM模型的顺式调控模块识别算法

郭海涛 霍红卫 于强

郭海涛, 霍红卫, 于强. SegHMC:一种基于Segmental HMM模型的顺式调控模块识别算法. 自动化学报, 2016, 42(11): 1718-1731. doi: 10.16383/j.aas.2016.c150309
引用本文: 郭海涛, 霍红卫, 于强. SegHMC:一种基于Segmental HMM模型的顺式调控模块识别算法. 自动化学报, 2016, 42(11): 1718-1731. doi: 10.16383/j.aas.2016.c150309
GUO Hai-Tao, HUO Hong-Wei, YU Qiang. SegHMC: an Algorithm for Discovery of Cis-regulatory Module Based on Segmental HMM. ACTA AUTOMATICA SINICA, 2016, 42(11): 1718-1731. doi: 10.16383/j.aas.2016.c150309
Citation: GUO Hai-Tao, HUO Hong-Wei, YU Qiang. SegHMC: an Algorithm for Discovery of Cis-regulatory Module Based on Segmental HMM. ACTA AUTOMATICA SINICA, 2016, 42(11): 1718-1731. doi: 10.16383/j.aas.2016.c150309

SegHMC:一种基于Segmental HMM模型的顺式调控模块识别算法

doi: 10.16383/j.aas.2016.c150309
基金项目: 

中国博士后科学基金 2015M582621

国家自然科学基金 61173025, 61373044, 61502366

详细信息
    作者简介:

    郭海涛 西安电子科技大学计算机学院博士研究生.主要研究方向为生物信息学算法, 并行算法.E-mail:ght_311@sina.com

    于强 博士, 西安电子科技大学计算机学院讲师.主要研究方向为生物信息学算法, 并行算法.E-mail:qyu@mail.xidian.edu.cn

    通讯作者:

    霍红卫 博士, 西安电子科技大学计算机学院教授.主要研究方向为大数据算法与压缩数据结构, 生物信息学算法, 算法工程.E-mail:hwhuo@mail.xidian.edu.cn

SegHMC: an Algorithm for Discovery of Cis-regulatory Module Based on Segmental HMM

Funds: 

the Chinese Postdoctoral Sci-ence Foundation 2015M582621

National Natural Science Foundation of China 61173025, 61373044, 61502366

More Information
    Author Bio:

    Ph.D. candidate at the School of Computer Science and Technology, Xidian University. His research interest covers bioinformatics algorithms and pararllel algorithms.

    Ph.D., lecturer at the School of Computer Science and Technology, Xidian University. His research interest covers bioinformatics algorithms and pararllel algorithms.

    Corresponding author: HUO Hong-Wei Ph.D., professor at the School of Computer Science and Technology, Xidian University. Her research interest covers big data algorithms and compressed data structures, bioinformatics algorithms, algorithm engineering.Corresponding author of this paper.
  • 摘要: 顺式调控模块(Cis-regulatory module,CRM)在真核生物基因的转录调控中起着重要作用,识别顺式调控模块是当前计算生物学的一个重要课题.虽然当前有许多计算方法用于识别顺式调控模块,但识别准确率仍有待进一步提高.将顺式调控模块的多种特征信息结合在一起,有助于提高识别顺式调控模块的准确率.基于此,本文提出了一种识别顺式调控模块的算法SegHMC(Segmental HMM model for discovery of cis-regulatory module).该算法建立了一种关于顺式调控模块识别问题的Segmental HMM模型,进一步扩展了顺式调控模块调控结构(或调控语法)的表示,不仅将顺式调控模块表示为模体(Motif)的组合,还进一步将模体共同出现的频率、模体顺序偏好以及顺式调控模块中相邻模体间的距离分布等特征引入到顺式调控模块的调控语法中.在模拟数据集和真实生物数据集上的实验结果表明,本文方法识别顺式调控模块的准确率显著优于当前的主要方法.
  • 图  1  顺式调控模块结构示意图(顺式调控模块是包含多个转录因子相应模体的序列区; 模体的方向、模体间的间隔距离、模体间的相互关系可能包含了给定顺式调控模块的重要性质.)

    Fig.  1  The structure discription of cis-regulatory modules (A cis-regulatory module is a sequence region that contains multiple motifs of multiple transcription factors; motif orientation, the interval distance between motifs and their cooperation relationship may imply the important regulatory properties of the cis-regulatory module.)

    图  2  Segmental HMM 状态转移图

    Fig.  2  The state transition diagram of segmental HMM

    图  3  SegHMC threshold和SegHMC Veterbi在模拟数据集上的预测性能

    Fig.  3  The prediction performances of SegHMC threshold and SegHMC Veterbi on the synthetic dataset

    图  4  SegHMC threshold和SegHMC Veterbi在Muscle集上的预测性能

    Fig.  4  The prediction performances of SegHMC threshold and SegHMC Veterbi on the muscle dataset

    图  5  SegHMC threshold和SegHMC Veterbi在果蝇早期发育数据集上的预测性能

    Fig.  5  The prediction performances of SegHMC threshold and SegHMC Veterbi on the early drosophila development dataset

    图  6  所有方法在果蝇早期发育数据集中各基因上CC的变化

    Fig.  6  A boxplot describing variation for all methods in CC across the genes in the early drosophila \\ development dataset

    图  7  引入结构信息后SegHMC在所有数据集上性能的提升

    Fig.  7  An effect of inclusion of structural information on the prediction performance of SegHMC for all datasets

  • [1] Wasserman W W, Sandelin A. Applied bioinformatics for the identification of regulatory elements. Nature Reviews Genetics, 2004, 5(4): 276-287 doi: 10.1038/nrg1315
    [2] Davidson E H. The Regulatory Genome: Gene Regulatory Networks in Development and Evolution. San Diego, California: Academic Press/Elsevier, 2006.
    [3] 王沛, 吕金虎. 基因调控网络的控制: 机遇与挑战. 自动化学报, 2013, 39(12): 1969-1979 http://www.aas.net.cn/CN/abstract/abstract18236.shtml

    Wang Pei, Lv Jin-Hu. Control of genetic regulatory networks: opportunities and challenges. Acta Automatica Sinica, 2013, 39(12): 1969-1979 http://www.aas.net.cn/CN/abstract/abstract18236.shtml
    [4] Chen L N, Wang R S, Zhang X S. Biomolecular Networks: Methods and Applications in Systems Biology. Hoboken, New Jersey: Wiley, 2009.
    [5] Kleinjan D A, Seawright A, Mella S, Carr C B, Tyas D A, Simpson T I, Mason J O, Price D J, van Heyningen V. Long-range downstream enhancers are essential for Pax6 expression. Developmental Biology, 2006, 299(2): 563-581 doi: 10.1016/j.ydbio.2006.08.060
    [6] Hardison R C, Taylor J. Genomic approaches towards finding cis-regulatory modules in animals. Nature Reviews Genetics, 2012, 13(7): 469-483 doi: 10.1038/nrg3242
    [7] Matys V, Kel-Margoulis O V, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K, Voss N, Stegmaier P, Lewicki-Potapov B, Saxel H, Kel A E, Wingender E. TRANSFAColedR and its module TRANSCompeloledR: transcriptional gene regulation in eukaryotes. Nucleic Acids Research, 2006, 34(Database issue): D108-D110
    [8] Portales-Casamar E, Thongjuea S, Kwon A T, Arenillas D, Zhao X, Valen E, Yusuf D, Lenhard B, Wasserman WW, Sandelin A. JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Research, 2010, 38(Database issue): D105-D110 http://cn.bing.com/academic/profile?id=2142570576&encoded=0&v=paper_preview&mkt=zh-cn
    [9] Klepper K, Sandve G K, Abul O, Johansen J, Drablos F. Assessment of composite motif discovery methods. BMC Bioinformatics, 2008, 9: 123 doi: 10.1186/1471-2105-9-123
    [10] Su J, Teichmann S A, Down T A. Assessing computational methods of cis-regulatory module prediction. PLoS Computational Biology, 2010, 6(12): e1001020 doi: 10.1371/journal.pcbi.1001020
    [11] Naval-Sánchez M, Potier D, Hulselmans G, Christiaens V, Aerts S. Identification of lineage-specific cis-regulatory modules associated with variation in transcription factor binding and chromatin activity using Ornstein-Uhlenbeck models. Molecular Biology and Evolution, 2015, 32(9): 2441-2455 doi: 10.1093/molbev/msv107
    [12] Suryamohan K, Halfon M S. Identifying transcriptional cis-regulatory modules in animal genomes. Wiley Interdisciplinary Reviews: Developmental Biology, 2015, 4(2): 59-84 doi: 10.1002/wdev.2015.4.issue-2
    [13] Thompson J A, Congdon C B. GAMI-CRM: using de novo motif inference to detect cis-regulatory modules. In: Proceedings of the 2014 IEEE Congress on Evolutionary Computation. Beijing, China: IEEE, 2014. 1022-1029
    [14] 郑树锐. 基于HMM模型的顺式调控模块识别方法的研究[硕士学位论文], 西安电子科技大学, 中国, 2012

    Zheng Shu-Rui. Research of Cis-regulatory Module Discovery Method Based on HMM Model [Master dissertation], Xidian University, China, 2012
    [15] Navarro C, Lopez F J, Cano C, Garcia-Alcalde F, Blanco A. CisMiner: genome-wide in-silico cis-regulatory module prediction by fuzzy itemset mining. PLoS One, 2014, 9(9): e108065 doi: 10.1371/journal.pone.0108065
    [16] Rouault H, Santolini M, Schweisguth F, Hakim V. Imogene: identification of motifs and cis-regulatory modules underlying gene co-regulation. Nucleic Acids Research, 2014, 42(10): 6128-6145 doi: 10.1093/nar/gku209
    [17] Potier D, Seyres D, Guichard C, Iche-Torres M, Aerts S, Herrmann C, Perrin L. Identification of cis-regulatory modules encoding temporal dynamics during development. BMC Genomics, 2014, 15(1): 534 doi: 10.1186/1471-2164-15-534
    [18] Thompson J A, Congdon C B. Initial results in using de novo motif inference to detect cis-regulatory modules. In: Proceedings of the 2013 International Conference on Bioinformatics, Computational Biology and Biomedical Informatics. Washington DC, USA: ACM, 2013. 687
    [19] Lemnian I M, Eggeling R, Grosse I. Extended sunflower hidden Markov models for the recognition of homotypic cis-regulatory modules. In: Proceedings of the 2013 German Conference on Bioinformatics. Gottingen, Germany, 2013. 101-109
    [20] Zhou Q, Wong W H. CisModule: de novo discovery of cis-regulatory modules by hierarchical mixture modeling. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(33): 12114-12119 doi: 10.1073/pnas.0402858101
    [21] Gan Y L, Guan J H, Zhou S G, Zhang W X. Identifying cis-regulatory elements and modules using conditional random fields. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2014, 11(1): 73-82 doi: 10.1109/TCBB.2013.131
    [22] Alkema W B, Johansson O, Lagergren J, Wasserman W W. MSCAN: identification of functional clusters of transcription factor binding sites. Nucleic Acids Research, 2004, 32(Web Server issue): W195-W198 http://cn.bing.com/academic/profile?id=2172186301&encoded=0&v=paper_preview&mkt=zh-cn
    [23] Aerts S, Van Loo P, Thijs G, Moreau Y, De Moor B. Computational detection of cis-regulatory modules. Bioinformatics, 2003, 19(Suppl 2): ii5-ii14 http://cn.bing.com/academic/profile?id=2116672274&encoded=0&v=paper_preview&mkt=zh-cn
    [24] Sharan R, Ovcharenko I, Ben-Hur A, Karp R M. CREME: a framework for identifying cis-regulatory modules in human-mouse conserved segments. Bioinformatics, 2003, 19(Suppl 1): i283-i291 doi: 10.1093/bioinformatics/btg1039
    [25] Arnold P, Erb I, Pachkov M, Molina N, van Nimwegen E. MotEvo: integrated Bayesian probabilistic methods for inferring regulatory sites and motifs on multiple alignments of DNA sequences. Bioinformatics, 2012, 28(4): 487-494 doi: 10.1093/bioinformatics/btr695
    [26] Sinha S, He X. MORPH: probabilistic alignment combined with hidden Markov models of cis-regulatory modules. PLoS Computational Biology, 2007, 3(11): e216 doi: 10.1371/journal.pcbi.0030216
    [27] González S, Montserrat-Sentís B, Sánchez F, Puiggrós M, Blanco E, Ramirez A, Torrents D. ReLA, a local alignment search tool for the identification of distal and proximal gene regulatory regions and their conserved transcription factor binding sites. Bioinformatics, 2012, 28(6): 736-770
    [28] Bailey T L, Noble W S. Searching for statistically significant regulatory modules. Bioinformatics, 2003, 19(Suppl 2): ii16-ii25 http://cn.bing.com/academic/profile?id=2151845703&encoded=0&v=paper_preview&mkt=zh-cn
    [29] Leoncini M, Montangero M, Pellegrini M, Tillan K P. CMStalker: a combinatorial tool for composite motif discovery. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2015, 12(5): 1123-1136 doi: 10.1109/TCBB.2014.2359444
    [30] Chan B Y, Kibler D. Using hexamers to predict cis-regulatory motifs in Drosophila. BMC Bioinformatics, 2005, 6: 262 doi: 10.1186/1471-2105-6-262
    [31] Kolbe D, Taylor J, Elnitski L, Eswara P, Li J, Miller W, Hardison R, Chiaromonte F. Regulatory potential scores from genome-wide three-way alignments of human, mouse, and rat. Genome Research, 2004, 14(4): 700-707 doi: 10.1101/gr.1976004
    [32] Sinha S, van Nimwegen E, Siggia E D. A probabilistic method to detect regulatory modules. Bioinformatics, 2003, 19(Suppl 1): i292-i301 doi: 10.1093/bioinformatics/btg1040
    [33] Nikulova A A, Favorov A V, Sutormin R A, Makeev V J, Mironov A A. CORECLUST: identification of the conserved CRM grammar together with prediction of gene regulation. Nucleic Acids Research, 2012, 40(12): e93 doi: 10.1093/nar/gks235
    [34] Durbin R, Eddy S R, Krogh A, Mitchison G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge: Cambridge University Press, 1998.
    [35] Lin T H, Ray P, Sandve G K, Uguroglu S, Xing E P. BayCis: a Bayesian hierarchical HMM for cis-regulatory module decoding in metazoan genomes. In: Proceedings of the 12th Annual International Conference on Research in Computational Molecular Biology. Singapore: Springer, 2008. 66-81
    [36] Zhou Q, Wong W H. Coupling hidden Markov models for the discovery of Cis-regulatory modules in multiple species. Annals of Applied Statistics, 2007, 1(1): 36-65 doi: 10.1214/07-AOAS103
    [37] Hu J F, Hu H Y, Li X M. MOPAT: a graph-based method to predict recurrent cis-regulatory modules from known motifs. Nucleic Acids Research, 2008, 36(13): 4488-4497 doi: 10.1093/nar/gkn407
    [38] Russell M J. A segmental HMM for speech pattern modelling. In: Processing of the 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing. Minneapolis, MN, USA: IEEE, 1993. 499-502
    [39] Stormo G D. DNA binding sites: representation and discovery. Bioinformatics, 2000, 16(1): 16-23 doi: 10.1093/bioinformatics/16.1.16
    [40] Liu X, Brutlag D L, Liu J S. BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. In: Proceedings of the 6th Pacific Symposium on Biocomputing. Hawaii, USA, 2001. 127-138
    [41] Wasserman W W, Fickett J W. Identification of regulatory regions which confer muscle-specific gene expression. Journal of Molecular Biology, 1998, 278(1): 167-181 doi: 10.1006/jmbi.1998.1700
    [42] Kulakovskiy I V, Makeev V J. Discovery of DNA motifs recognized by transcription factors through integration of different experimental sources. Biophysics, 2009, 54(6): 667-674 doi: 10.1134/S0006350909060013
    [43] Tweedie S, Ashburner M, Falls K, Leyland P, McQuilton P, Marygold S, Millburn G, Osumi-Sutherland D, Schroeder A, Seal R, Zhang H, The FlyBase Consortium. FlyBase: enhancing Drosophila Gene Ontology annotations. Nucleic Acids Research, 2009, 37(Database issue): D555-D559
    [44] Gallo S M, Gerrard D T, Miner D, Simich M, Des Soye B, Bergman C M, Halfon M S. REDfly v3.0: toward a comprehensive database of transcriptional regulatory elements in Drosophila. Nucleic Acids Research, 2011, 39(Database issue): D118-D123 http://cn.bing.com/academic/profile?id=2021841129&encoded=0&v=paper_preview&mkt=zh-cn
    [45] Tompa M, Li N, Bailey T L, Church G M, De Moor B, Eskin E, Favorov A V, Frith M C, Fu Y T, Kent W J, Makeev V J, Mironov A A, Noble W S, Pavesi G, Pesole G, Régnier M, Simonis N, Sinha S, Thijs G, van Helden J, Vandenbogaert M, Weng Z P, Workman C, Ye C, Zhu Z. Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology, 2005, 23: 137-144 doi: 10.1038/nbt1053
    [46] Shaw W M Jr, Burgin R, Howell P. Performance standards and evaluations in IR test collections: cluster-based retrieval models. Information Processing & Management, 1997, 33(1): 1-14 http://cn.bing.com/academic/profile?id=2115603438&encoded=0&v=paper_preview&mkt=zh-cn
    [47] Maeda T, Gupta M P, Stewart A F R. TEF-1 and MEF2 transcription factors interact to regulate muscle-specific promoters. Biochemical and Biophysical Research Communications, 2002, 294(4): 791-797 doi: 10.1016/S0006-291X(02)00556-9
    [48] Lifanov A P, Makeev V J, Nazina A G, Papatsenko D A. Homotypic regulatory clusters in Drosophila. Genome Research, 2003, 13(4): 579-588 doi: 10.1101/gr.668403
  • 加载中
图(7)
计量
  • 文章访问数:  1493
  • HTML全文浏览量:  213
  • PDF下载量:  565
  • 被引次数: 0
出版历程
  • 收稿日期:  2015-05-18
  • 录用日期:  2016-06-06
  • 刊出日期:  2016-11-01

目录

    /

    返回文章
    返回