2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于语义嵌入模型与交易信息的智能合约自动分类系统

黄步添 刘琦 何钦铭 刘振广 陈建海

黄步添, 刘琦, 何钦铭, 刘振广, 陈建海. 基于语义嵌入模型与交易信息的智能合约自动分类系统. 自动化学报, 2017, 43(9): 1532-1543. doi: 10.16383/j.aas.2017.c160655
引用本文: 黄步添, 刘琦, 何钦铭, 刘振广, 陈建海. 基于语义嵌入模型与交易信息的智能合约自动分类系统. 自动化学报, 2017, 43(9): 1532-1543. doi: 10.16383/j.aas.2017.c160655
HUANG Bu-Tian, LIU Qi, HE Qin-Ming, LIU Zhen-Guang, CHEN Jian-Hai. Towards Automatic Smart-contract Codes Classification by Means of Word Embedding Model and Transaction Information. ACTA AUTOMATICA SINICA, 2017, 43(9): 1532-1543. doi: 10.16383/j.aas.2017.c160655
Citation: HUANG Bu-Tian, LIU Qi, HE Qin-Ming, LIU Zhen-Guang, CHEN Jian-Hai. Towards Automatic Smart-contract Codes Classification by Means of Word Embedding Model and Transaction Information. ACTA AUTOMATICA SINICA, 2017, 43(9): 1532-1543. doi: 10.16383/j.aas.2017.c160655

基于语义嵌入模型与交易信息的智能合约自动分类系统

doi: 10.16383/j.aas.2017.c160655
详细信息
    作者简介:

    刘琦    新加坡国立大学计算机学院硕士研究生.主要研究方向为数据挖掘, 区块链.E-mail: leuchine@gmail.com

    何钦铭    浙江大学计算机科学与技术学院教授.主要研究方向为数据挖掘, 虚拟化, 区块链.E-mail: hqm@zju.edu.cn

    刘振广    新加坡国立大学计算机学院博士后.主要研究方向为数据挖掘, 区块链.E-mail: zhenguangliu@zju.edu.cn

    陈建海    浙江大学计算机科学与技术学院讲师.主要研究方向为虚拟化, 云计算, 区块链.E-mail: chenjh919@zju.edu.cn

    通讯作者:

    黄步添    浙江大学计算机科学与技术学院博士研究生.主要研究方向为虚拟化, 云计算, 区块链.本文通信作者, E-mail:butine@zju.edu.cn

Towards Automatic Smart-contract Codes Classification by Means of Word Embedding Model and Transaction Information

More Information
    Author Bio:

        Master student at the College of Computer Science, National University of Singapore, Singapore. His research interest covers data mining and blockchain

        Professor at the College of Computer Science and Technology, Zhejiang University. His research interest covers data mining, virtualization, and blockchain

        Postdoctor at the College of Computer Science, National University of Singapore, Singapore. His research interest covers data mining and blockchain

        Lecturer at the College of Computer Science and Technology, Zhejiang University. His research interest covers virtualization, cloud computing, and blockchain

    Corresponding author: HUANG Bu-Tian     Ph. D. candidate at the College of Computer Science and Technology, Zhejiang University. His research interest covers virtualization, cloud computing, and blockchain. Corresponding author of this paper, E-mail:butine@zju.edu.cn
  • 摘要: 作为区块链技术的一个突破性扩展,智能合约允许用户在区块链上实现个性化的代码逻辑从而使得区块链技术更加的简单易用.在智能合约代码信息迅速增长的背景下,如何管理和组织海量智能合约代码变得更具挑战性.基于人工智能技术的代码分类系统能根据代码的文本信息自动分门别类,从而更好地帮助人们管理和组织代码的信息.本文以Ethereum平台上的智能合约为例,鉴于词嵌入模型可以捕获代码的语义信息,提出一种基于词嵌入模型的智能合约分类系统.另外,每一个智能合约都关联着一系列交易,我们又通过智能合约的交易信息来更深入地了解智能合约的逻辑行为.据我们所知,本文是对智能合约代码自动分类问题的首次研究尝试.测试结果显示该系统具有较为令人满意的分类性能.
    1)  本文责任编委 袁勇
  • 图  1  Ethereum区块链

    Fig.  1  Ethereum blockchain

    图  2  系统框架

    Fig.  2  System architecture

    图  3  LSTM单元

    Fig.  3  LSTM unit

    图  4  标记流程

    Fig.  4  Mark process

    图  5  类别统计

    Fig.  5  Category statistics

    表  1  神经网络分类效果

    Table  1  Neural network classification effect

    类别有交易信息无交易信息
    PrecisionRecallAccuracyF1 scorePrecisionRecallAccuracyF1 score
    金融类0.9430.9450.9420.9430.8720.8680.8820.869
    游戏类0.9240.8970.9240.9100.8950.8740.8860.884
    彩票类0.8820.8910.9060.8860.8350.8520.8750.843
    Ethereum工具类0.9140.9210.9290.9170.8540.8710.8820.862
    信息管理类0.8620.8420.8830.8520.8050.8130.8290.809
    货币类0.9140.8820.9170.8980.8210.8090.8340.814
    娱乐类0.8730.8890.8930.8810.7830.7630.7920.773
    物联网类0.8610.8450.8820.8530.7960.7710.8090.783
    其他0.8320.8140.8450.8230.7530.7570.7910.754
    下载: 导出CSV

    表  2  朴素贝叶斯分类效果

    Table  2  Naive Bayesian classification effect

    类别有交易信息无交易信息
    PrecisionRecallAccuracyF1 scorePrecisionRecallAccuracyF1 score
    金融类0.8620.8930.8610.8770.8610.8150.8620.837
    游戏类0.8660.8790.8830.8720.8150.8260.8370.820
    彩票类0.8210.8170.8460.8190.7960.8050.8220.800
    Ethereum工具类0.8840.8540.8960.8680.8250.8470.8610.835
    信息管理类0.8290.8590.8600.8520.7570.7710.7960.764
    货币类0.8760.8530.8960.8640.7600.7650.7740.762
    娱乐类0.8450.8640.8720.8540.7160.7250.7350.720
    物联网类0.8260.8430.8620.8340.7460.7410.7590.743
    其他0.7840.8190.8250.8010.7450.7370.7630.740
    下载: 导出CSV

    表  3  支持向量机分类效果

    Table  3  Support vector machine classification effect

    类别有交易信息无交易信息
    PrecisionRecallAccuracyF1 scorePrecisionRecallAccuracyF1 score
    金融类0.8750.8970.9060.8850.8150.8310.8420.822
    游戏类0.8830.8350.8760.8580.8450.8210.8560.832
    彩票类0.8790.8460.8870.8620.8550.7930.8140.822
    Ethereum工具类0.8610.8650.8910.8620.8290.8270.8360.827
    信息管理类0.8040.8630.8770.8320.7640.7860.7890.774
    货币类0.8720.8620.8890.8660.7870.7920.8030.789
    娱乐类0.8630.8590.8730.8600.7080.7140.7260.710
    物联网类0.8290.8450.8670.8360.7560.7580.7630.756
    其他0.8040.8210.8560.8120.7310.7270.7340.728
    下载: 导出CSV
  • [1] Nakamoto S. Bitcoin: a peer-to-peer electronic cash system, http://www.bitcoin.org, September 7, 2017
    [2] Castro M, Liskov B. Practical byzantine fault tolerance. In: Proceedings of the Third Symposium on Operating Systems Design and Implementation (OSDI), USENIX Association, 1999, 99: 173-186
    [3] Pang G S, Jin H D, Jiang S Y. Cenknn: a scalable and effective text classifier. Data Mining and Knowledge Discovery, 2015, 29(3): 593-625 doi: 10.1007/s10618-014-0358-x
    [4] Tang B, He H B, Baggenstoss P M, Kay S. A Bayesian classification approach using class-specific features for text categorization. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(6): 1602-1606 doi: 10.1109/TKDE.2016.2522427
    [5] Wahiba B A, El Fadhl Ahmed B. New fuzzy decision tree model for text classification. In: Proceedings of the 1st International Conference on Advanced Intelligent System and Informatics (AISI2015). Switzerland: Springer, 2016. 309-320
    [6] Mikolov T, Sutskever I, Chen K, Corrado G S, Dean J. Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26. Lake Tahoe, Nevada, United States: Curran Associates Inc., 2013. 3111-3119
    [7] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv: 1409.0473, 2014.
    [8] Liu B. Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 2012, 5(1): 1-167
    [9] Fleder M, Kester M S, Pillai S. Bitcoin transaction graph analysis. arXiv preprint arXiv: 1502.01657, 2015.
    [10] Ron D, Shamir A. Quantitative analysis of the full bitcoin transaction graph. In: Proceedings of the 17th International Conference on Financial Cryptography and Data Security. Okinawa, Japan: Springer, 2013. 6-24
    [11] Shah D, Zhang K. Bayesian regression and bitcoin. In: Proceedings of the 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton). Monticello, USA: IEEE, 2014. 409-414
    [12] Luu L, Chu D H, Olickel H, Saxena P, Hobor A. Making smart contracts smarter. Cryptology ePrint Archive, Report 2016/633 [Online], available: http://eprint.iacr.org/2016/633, August 16, 2016.
    [13] Moore T, Christin N. Beware the middleman: empirical analysis of bitcoin-exchange risk. In: Proceedings of the 17th International Conference on Financial Cryptography and Data Security. Okinawa, Japan: Springer, 2013. 25-33
    [14] Omohundro S. Cryptocurrencies, smart contracts, and artificial intelligence. AI Matters, 2014, 1(2): 19-21 doi: 10.1145/2685328
    [15] Di Battista G, Di Donato V, Patrignani M, Pizzonia M, Roselli V, Tamassia R. Bitconeview: visualization of flows in the bitcoin transaction graph. In: Proceedings of the 2015 IEEE Symposium on Visualization for Cyber Security (VizSec). Chicago, USA: IEEE, 2015. 1-8
    [16] Sebastiani F. Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 2002, 34(1): 1-47 doi: 10.1145/505282.505283
    [17] Rocchio J J. Relevance feedback in information retrieval. The SMART Retrieval System. Englewood Cliffs, N.J.: Prentice Hall, Inc., 1971.
    [18] Rao Y H, Li Q, Mao X D, Liu W Y. Sentiment topic models for social emotion mining. Information Sciences, 2014, 266: 90-100 doi: 10.1016/j.ins.2013.12.059
    [19] Rao Y H, Xie H R, Li J, Jin F M, Wang F L, Li Q. Social emotion classification of short text via topic-level maximum entropy model. Information & Management, 2016, 53(8): 978-986
    [20] Salton G, Wong A, Yang C S. A vector space model for automatic indexing. Communications of the ACM, 1975, 18(11): 613-620 doi: 10.1145/361219.361220
    [21] Liu M Y, Yang J G. An improvement of TFIDF weighting in text categorization. In: Proceedings of the 2012 International Conference on Computer Technology and Science. Singapore: IACSIT Press, 2012. 44-47
    [22] Li C H, Park S C. Combination of modified BPNN algorithms and an efficient feature selection method for text categorization. Information Processing and Management, 2009, 45(3): 329-340 doi: 10.1016/j.ipm.2008.09.004
    [23] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786): 504-507 doi: 10.1126/science.1127647
    [24] Chen Z H, Ni C W, Murphey Y L. Neural network approaches for text document categorization. In: Proceedings of the 2006 IEEE International Joint Conference on Neural Network. Vancouver, Canada: IEEE, 2006. 1054-1060
    [25] Li C H, Song W, Park S C. An automatically constructed thesaurus for neural network based document categorization. Expert Systems with Applications, 2009, 36(8): 10969-10975 doi: 10.1016/j.eswa.2009.02.006
    [26] Turian J, Ratinov L, Bengio Y. Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2010. 384-394
    [27] Pennington J, Socher R, Manning C D. GloVe: global vectors for word representation. In: Proceedings of the Empiricial Methods in Natural Language Processing, 2014, 12: 1532-1543
    [28] Le Q V, Mikolov T. Distributed representations of sentences and documents. In: Proceedings of the 31th International Conference on Machine Learning. Beijing, China, 2014. 1188-1196
    [29] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735-1780 doi: 10.1162/neco.1997.9.8.1735
    [30] Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems 27. Montreal, Quebec, Canada: MIT Press, 2014.
    [31] Tim R, Grefenstette E, Hermann K M, Tomáš K, Blunsom P. Reasoning about entailment with neural attention. arXiv preprint arXiv: 1509.06664, 2015.
    [32] Huang P S, He X D, Gao J F, Deng L, Acero A, Heck L. Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. New York, NY, USA: ACM, 2013. 2333-2338
    [33] Mikolov T, Karafiát M, Burget L, Černocký J, Khudanpur S. Recurrent neural network based language model. In: INTERSPEECH 2010, Conference of the International Speech Communication Association. Makuhari, Chiba, Japan: ISCA, 2010. 1045-1048
    [34] Siegelmann H T, Sontag E D. On the computational power of neural nets. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory. New York, NY, USA: ACM, 1992. 440-449
    [35] Buterin V. Ethereum white paper [online], available: https://github.com/ethereum/wiki/wiki/White-Paper, September 7, 2017
    [36] Wood G. Ethereum: a secure decentralised generalised transaction ledger. Ethereum Project Yellow Paper, 2014.
    [37] Maas A L, Hannun A Y, Ng A Y. Rectifier nonlinearities improve neural network acoustic models. In: Proceeding of the 2013 ICML Workshop on Deep Learning for Audio, Speech, and Language Processing. Atlanta, Georgia, 2013.
    [38] Srivastava N, Hinton G E, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958 http://jmlr.csail.mit.edu/papers/v15/srivastava14a.html
    [39] Goodfellow I J, Warde-Farley D, Mirza M, Courville A C, Bengio Y. Maxout networks. ICML, 2013, 28(3): 1319-1327 http://jmlr.org/proceedings/papers/v28/goodfellow13.html
    [40] Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 2011, 12: 2121-2159 http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf
    [41] Zeiler M D. Adadelta: an adaptive learning rate method. arXiv preprint arXiv: 1212.5701, 2012.
  • 加载中
图(5) / 表(3)
计量
  • 文章访问数:  2249
  • HTML全文浏览量:  537
  • PDF下载量:  1284
  • 被引次数: 0
出版历程
  • 收稿日期:  2016-09-14
  • 录用日期:  2017-02-03
  • 刊出日期:  2017-09-20

目录

    /

    返回文章
    返回