2.793

2018影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

作者识别研究综述

张洋 江铭虎

张洋, 江铭虎. 作者识别研究综述. 自动化学报, 2021, x(x): 1−20 doi: 10.16383/j.aas.c200654
引用本文: 张洋, 江铭虎. 作者识别研究综述. 自动化学报, 2021, x(x): 1−20 doi: 10.16383/j.aas.c200654
Zhang Yang, Jiang Ming-Hu. A review on authorship identification research. Acta Automatica Sinica, 2021, x(x): 1−20 doi: 10.16383/j.aas.c200654
Citation: Zhang Yang, Jiang Ming-Hu. A review on authorship identification research. Acta Automatica Sinica, 2021, x(x): 1−20 doi: 10.16383/j.aas.c200654

作者识别研究综述

doi: 10.16383/j.aas.c200654
基金项目: 国家自然科学基金(62036001)资助
详细信息
    作者简介:

    张洋:清华大学人文学院中文系博士研究生, 主要研究方向为作者识别、文本分类、情感分析. E-mail: yumaoqiuq@163.com

    江铭虎:清华大学人文学院中文系教授, 主要研究方向为自然语言处理、脑与语言认知、模式识别、人工智能. 本文通信作者. E-mail: jiang.mh@mail.tsinghua.edu.cn

A Review on Authorship Identification Research

Funds: Supported by National Natural Science Foundation of China (62036001)
More Information
    Author Bio:

    ZHANG Yang Ph. D. candidate at Department of Chinese Language and Literature, School of Humanities, Tsinghua University. His research interest covers authorship identification, text categorization, sentiment analysis

    JIANG Ming-Hu Professor at Department of Chinese Language and Literature, School of Humanities, Tsinghua University. His research interest covers natural language processing, brain and language cognition, pattern recognition, artificial intelligence. Corresponding author of this paper

  • 摘要: 作者识别是根据已知文本推断未知文本作者的学科, 这是一个历史悠久的、涉及多学科的领域. 其传统研究主要基于文学或语言学的经验知识, 而现代研究则主要依靠数学方法量化作者的写作风格. 本文主要站在计算语言学的角度综述作者识别领域现代研究中的方法和思路. 首先, 简要介绍了作者识别的发展历程. 然后, 详细介绍了文体风格特征、作者识别方法以及该领域中多层面的研究. 接着介绍了与作者识别相关的一些评测、数据集及评价指标. 最后, 指出该领域存在的一些问题, 结合这些问题分析并展望了作者识别的发展趋势.
  • 图  1  作者识别流程图

    Fig.  1  Flow diagram of authorship identification

    表  1  文体风格特征对比表

    Table  1  Comparative table of stylometry

    文体特征特征细分获取难易度应用广泛度其他
    字符特征字符数量, 字符n-gram, 字符错误非常容易, 可直接提取很高主题独立, 可捕捉书写错误, 特征维度容易
    过大, 导致数据稀疏
    词汇特征词长, 词频, 词汇丰富度, 单词n-gram,
    词拼写错误
    容易, 直接提取或分词后提取很高主题相关, 可捕捉书写错误
    句法特征短语或句子结构, 词性n-gram, 句法n-gram, 重写规则频率较难, 深层句法特征需借助句
    法解析器
    主题独立, 通常不具有连续性, 解析器容易
    引入噪声
    语义特征同义词, 语义依赖困难, 需借助语义分析工具很低主题相关, 通常作为其他特征的补充, 很少
    独立使用
    下载: 导出CSV

    表  2  无监督方法对比表

    Table  2  Comparative table of unsupervised method

    方法模型策略算法
    k均值聚类k中心聚类样本与类中心距离最小迭代算法
    层次聚类聚类树类内样本距离最小启发式算法
    高斯混合聚类高斯混合模型似然函数最大期望最大化算法
    LSA矩阵分解模型平方损失最小奇异值分解
    LDALDA模型后验概率估计吉布斯抽样, 变分推理
    下载: 导出CSV

    表  3  有监督方法对比表

    Table  3  Comparative table of supervised method

    方法模型类型模型特点学习策略稳定性准确度
    NB生成模型特征与类别的联合概率分布, 条件独立假设极大似然估计, 最大后验概率估计
    SVM判别模型分离超平面, 核技巧极小化正则化合页损失, 软间隔最大化
    DT判别模型分类树、回归树正则化的极大似然估计
    KNN判别模型特征空间, 样本点
    NN判别模型神经元拓扑结构目标函数最小化偏高
    下载: 导出CSV
  • [1] 祁瑞华. 文本作者身份识别. 北京: 清华大学出版社, 2017.1−2.

    Qi Rui-Hua. Text Authorship Identification. Beijing: Tsinghua University Press, 2017.1−2.
    [2] Mendenhall T C. The characteristic curves of composition. Science, 1887, 9(214): 237−249
    [3] Yule G U. On sentence-length as a statistical characteristic of style in prose: with application to two cases of disputed authorship. Biometrica, 1939, 30(3): 363−390
    [4] Mosteller F, Wallace D L. Inference and Disputed Authorship: The Federalist. Addison-Wesley Publishing Company, 1964.
    [5] Damerau F J. The use of function word frequencies as indicators of style. Computers and the Humanities, 1975, 9(6): 271−280 doi: 10.1007/BF02396290
    [6] Efron B, Thisted R A. Estimating the number of unseen species: how many words did Shakespeare know. Biometrika, 1976, 63(3): 435−447
    [7] Chaski C E. Who’s at the keyboard? Authorship attribution in digital evidence investigations. International Journal of Digital Evidence, 2005, 4(1): 1−14
    [8] Hoover D L. Testing Burrows’s delta. Literary and Linguistic Computing, 2004, 19(4): 453−475 doi: 10.1093/llc/19.4.453
    [9] Frantzeskou G, Stamatatos E, Gritzalis S, Katsikas S. Effective identification of source code authors using byte-level information. In: Proceedings of the 28th International Conference on Software Engineering. Shanghai, China: ACM, 2006.893−896.
    [10] Koppel M, Schler J, Argamon S, Winter Y. The "fundamental problem" of authorship attribution. English Studies, 2012, 93(3): 284−291 doi: 10.1080/0013838X.2012.668794
    [11] Rudman J. The state of authorship attribution studies: some problems and solutions. Computers and the Humanities, 1997, 31(4): 351−365 doi: 10.1023/A:1001018624850
    [12] Koppel M, Schler J, Argamon S. Computational methods in authorship attribution. Journal of the American Society for Information Science and Technology, 2009, 60(1): 9−26 doi: 10.1002/asi.20961
    [13] Luyckx K. Scalability Issues in Authorship Attribution. Antwerp: UPA University Press, 2010.13−18.
    [14] Potha N, Stamatatos E. A profile-based method for authorship verification. In: Proceedings of 8th Hellenic Conference on Artificial Intelligence. Ioannina, Greece: Springer, 2014.313−326.
    [15] Bouanani S E M E, Kassou I. Authorship analysis studies: a survey. International Journal of Computer Applications, 2014, 86(12): 22−29 doi: 10.5120/15038-3384
    [16] Johnson A, Wright D. Identifying idiolect in forensic authorship attribution: an n-gram textbite approach. Language and Law, 2014, 1(1): 37−69
    [17] Keselj V, Peng F, Cercone N, Thomas C. N-gram-based author profiles for authorship attribution. In: Proceedings of the Conference Pacific Association for Computational Linguistics. Halifax, Canada: PACL, 2003.255−264.
    [18] Houvardas J, Stamatatos E. N-gram feature selection for authorship identification. In: Proceedings of 12th International Conference on Artificial Intelligence: Methodology, Systems, and Applications. Varna, Bulgaria: Springer, 2006.77−86.
    [19] Stamatatos E. Ensemble-based author identification using character n-grams. In: The 3rd International Workshop on Text-Based Information Retrieval. 2006.41−46.
    [20] Sapkota U, Bethard S, Montes-y-Gomez M, Solorio T. Not all character n-grams are created equal: a study in authorship attribution. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Denver, USA: ACL, 2015.93−102.
    [21] Sari Y, Vlachos A, Stevenson M. Continuous n-gram representations for authorship attribution. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Valencia, Spain: ACL, 2017.267−273.
    [22] Gomez-Adorno H, Posadas-Duran J-P, Sidorov G, Pinto D. Document embeddings learned on various types of n-grams for cross-topic authorship attribution. Computing, 2018, 100(7): 741−756 doi: 10.1007/s00607-018-0587-8
    [23] Burrows J. 'Delta': a measure of stylistic difference and a guide to likely authorship. Literary and Linguistic Computing, 2002, 17(3): 267−287 doi: 10.1093/llc/17.3.267
    [24] Hoover D L. Another perspective on vocabulary richness. Computers and the Humanities, 2003, 37(2): 151−178 doi: 10.1023/A:1022673822140
    [25] Garcia A M, Martin J C. Function words in authorship attribution studies. Literary and Linguistic Computing, 2007, 22(1): 49−66 doi: 10.1093/llc/fql048
    [26] Zhao Y, Zobel J. Effective and scalable authorship attribution using function words. In: Proceedings of 2nd Asia Information Retrieval Symposium. Jeju Island, Korea: Springer, 2005.174−189.
    [27] Coyotl-Morales R M, Villasenor-Pineda L, Montes-y-Gomez M, Rosso P. Authorship attribution using word sequences. In: Proceedings of 11th Iberoamerican Congress in Pattern Recognition. Cancun, Mexico: Springer, 2006.844−853.
    [28] Stamatatos E. Authorship attribution based on feature set subspacing ensembles. International Journal on Artificial Intelligence Tools, 2008, 15(5): 823−838
    [29] Koppel M, Schler J, Bonchek-Dokow E. Measuring differentiability: unmasking pseudonymous authors. Journal of Machine Learning Research, 2007, 8: 1261−1276
    [30] Savoy J. Authorship attribution based on specific vocabulary. ACM Transactions on Information Systems, 2012, 30(2): Article 12, 30 pages
    [31] Akimushkin C, Amancio D R, Oliveira O N. On the role of words in the network structure of texts: application to authorship attribution. Physica A: Statistical Mechanics and Its Applications, 2018, 495(1): 49−58
    [32] Raghavan S, Kovashka A, Mooney R. Authorship attribution using probabilistic context-free grammars. In: Proceedings of the ACL 2010 Conference Short Papers. Uppsala, Sweden: ACL, 2010.38−42.
    [33] Tschuggnall M, Specht G. Enhancing authorship attribution by utilizing syntax tree profiles. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. Gothenburg, Sweden: ACL, 2014.195−199.
    [34] Patchala J, Bhatnagar R. Authorship attribution by consensus among multiple features. In: Proceedings of the 27th International Conference on Computational Linguistics. Santa Fe, USA: ACL, 2018.2766−2777.
    [35] Zhang R, Hu Z, Guo H, Mao Y. Syntax encoding with application in authorship attribution. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: ACL, 2018.2742−2753.
    [36] Sidorov G, Velasquez F, Stamatatos E, Gelbukh A, Chanona-Hernandez, L. Syntactic n-grams as machine learning features for natural language processing. Expert Systems with Applications, 2014, 41(3): 853−860 doi: 10.1016/j.eswa.2013.08.015
    [37] Posadas-Duran J-P, Sidorov G, Batyrshin I. Complete syntactic n-grams as style markers for authorship attribution. In: Proceedings of 13th Mexican International Conference on Artificial Intelligence. Tuxtla Gutierrez, Mexico: Springer, 2014.9−17.
    [38] Posadas-Duran J-P, Sidorov G, Batyrshin I, Mirasol-Melendez E. Author verification using syntactic n-grams. In: Working Notes of the Conference and Labs of the Evaluation Forum 2015. Toulouse, France, 2015.4 pages.
    [39] Posadas-Duran J-P, Markov I, Gomez-Adorno H, Sidorov G, Batyrshin I, Gelbukh A, et al. Syntactic n-grams as features for the author profiling task. In: Working Notes of the Conference and Labs of the Evaluation Forum 2015. Toulouse, France, 2015.5 pages.
    [40] Gamon M. Linguistic Correlates of Style: Authorship classification with deep linguistic analysis features. In: Proceedings of the 20th International Conference on Computational Linguistics. Geneva, Switzerland: ACL, 2004.611−617.
    [41] 武晓春, 黄萱菁, 吴立德. 基于语义分析的作者身份识别方法研究. 中文信息学报, 2006, 20(6): 63−70

    Wu Xiao-Chun, Huang Xuan-Jing, Wu Li-De. Authorship identification based on semantic analysis. Journal of Chinese Information Processing, 2006, 20(6): 63−70
    [42] Argamon S, Whitelaw C, Chase P, Hota S R, Garg N, Levitan S. Stylistic text classification using functional lexical features. Journal of the American Society for Information Science and Technology, 2007, 58(6): 802−822 doi: 10.1002/asi.20553
    [43] Hedegaard S, Simonsen J G. Lost in translation: authorship attribution using frame semantics. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Portland, Oregon: ACL, 2011.65−70.
    [44] Daelemans W. Explanation in computational stylometry. In: Proceedings of the 14th International Conference on Computational Linguistics and Intelligent Text Processing. Samos, Greece: Springer, 2013.451−462.
    [45] Dasgupta A, Drineas P, Harb B, Josifovski V, Mahoney M W. Feature selection methods for text classification. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. San Jose, USA: ACM, 2007.230−239.
    [46] Lijo V P, Seetha H. Text-based sentiment analysis: review. International Journal of Knowledge and Learning, 2017, 12(1): 1−26 doi: 10.1504/IJKL.2017.088163
    [47] Cui M, Li L, Wang Z, You M. A survey on relation extraction. In: Proceedings of the 2nd China Conference on Knowledge Graph and Semantic Computing. Chengdu, China: Springer, 2017.50−58.
    [48] Ma J, Xue B, Zhang M. A profile-based authorship attribution approach to forensic identification in Chinese online messages. In: Proceedings of the 11th Pacific Asia Workshop on Intelligence and Security Informatics. Auckland, New Zealand: Springer, 2016.33−52.
    [49] 李航. 统计学习方法(第2版). 北京: 清华大学出版社, 2019.6−12, 27−28, 59, 237, 245−253, 435−436.

    Li Hang. Statistical Learning Methods (2nd Edition). Beijing: Tsinghua University Press, 2019.6−12, 27−28, 59, 237, 245−253, 435−436.
    [50] Jin M, Jiang M. Text clustering on authorship attribution based on the features of punctuations usage. In: Proceedings of IEEE 11th International Conference on Signal Processing. Beijing, China: IEEE, 2012.2175−2178.
    [51] Hacohen-Kerner Y, Margaliot O. Authorship attribution of responsa using clustering. Cybernetics and Systems, 2014, 45: 530−545 doi: 10.1080/01969722.2014.945311
    [52] Fifield D, Follan T, Lunde E. Unsupervised authorship attribution. arXiv: 1503.07613, 2015.
    [53] Mansoorizadeh M, Aminiyan M, Rahgooy T, Eskandari M. Multi feature space combination for authorship clustering. In: Working Notes of the Conference and Labs of the Evaluation Forum 2016. Evora, Portugal, 2016.7 pages.
    [54] Bagnall D. Authorship clustering using multi-headed recurrent neural networks. In: Working Notes of the Conference and Labs of the Evaluation Forum 2016. Evora, Portugal, 2016.14 pages.
    [55] Agarwal L, Thakral K, Bhatt G, Mittal A. Authorship clustering using TF-IDF weighted word-embeddings. In: Proceedings of the 11th Forum for Information Retrieval Evaluation. Kolkata, India: ACM, 2019.24−29.
    [56] Nakov P. Latent semantic analysis for German literature investigation. In: Proceedings of the International Conference on Computational Intelligence, Theory and Applications. Dortmund, Germany: Springer, 2001.834−841.
    [57] Satyam A, Dawn A K, Saha S K. A statistical analysis approach to author identification using latent semantic analysis. In: Working Notes of the Conference and Labs of the Evaluation Forum 2014. Sheffield, UK, 2014.5 pages.
    [58] Jelodar H, Wang Y, Yuan C, Feng X, Jiang X, Li Y, et al. Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. arXiv: 1711.04305, 2018.
    [59] Seroussi Y, Zukerman I, Bohnert F. Authorship attribution with latent Dirichlet allocation. In: Proceedings of the 15th Conference on Computational Natural Language Learning. Portland, USA: ACL, 2011.181−189.
    [60] Savoy J. Authorship attribution based on a probabilistic topic model. Information Processing and Management, 2013, 49(1): 341−354 doi: 10.1016/j.ipm.2012.06.003
    [61] Anwar W, Bajwa I S, Choudhary M A, Ramzan S. An empirical study on forensic analysis of Urdu text using LDA-based authorship attribution. IEEE Access, 2019, 7: 3224−3234 doi: 10.1109/ACCESS.2018.2885011
    [62] 张学工. 模式识别. 北京: 清华大学出版社, 2010. 48-53.

    Zhang Xue-Gong. Pattern Recognition. Beijing: Tsinghua University Press, 2010.48-53.
    [63] Zhao Y, Zobel J. Searching with style: authorship attribution in classic literature. In: Proceedings of the 13th Australasian Computer Science Conference. Ballarat, Australia: ACS, 2007.59−68.
    [64] Boutwell S R. Authorship Attribution of Short Messages Using Multimodal Features [Master dissertation], Naval Postgraduate School, 2011.
    [65] Altheneyan A S, Menai M E B. Naive bayes classifiers for authorship attribution of Arabic texts. Journal of King Saud University - Computer and Information Sciences, 2014, 26: 473−484 doi: 10.1016/j.jksuci.2014.06.006
    [66] Howedi F, Mohd M. Text classification for authorship attribution using naive bayes classifier with limited training data. Computer Engineering and Intelligent Systems, 2014, 5(4): 48−56
    [67] 周志华. 机器学习. 北京: 清华大学出版社, 2016. 33-35, 121-123.

    Zhou Zhi-Hua. Machine Learning. Beijing: Tsinghua University Press, 2016.33-35, 121-123.
    [68] Diederich J, Kindermann J, Leopold E, Paass G. Authorship attribution with support vector machines. Applied Intelligence, 2003, 19(1): 109−123
    [69] Schwartz R, Tsur O, Rappoport A, Koppel M. Authorship attribution of micro-messages. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Seattle, USA: ACL, 2013.1880−1891.
    [70] Mikros G K, Perifanos K A. Authorship attribution in Greek tweets using author's multilevel n-gram profiles. In: Proceedings of the 2013 AAAI Spring Symposium Series. Palo Alto, USA: AAAI, 2013.17−23.
    [71] Li J S, Monaco J V, Chen L-C, Tappert C C. Authorship authentication using short messages from social networking sites. In: Proceedings of the 11th IEEE International Conference on e-Business Engineering. Guangzhou, China: IEEE, 2014.314−319.
    [72] Martin-del-Campo-Rodriguez C, Alvarez D A P, Sifuentes C E M, Sidorov G, Batyrshin I, Gelbukh A. Authorship attribution through punctuation n-grams and averaged combination of SVM. In: Working Notes of the Conference and Labs of the Evaluation Forum 2019. Lugano, Switzerland, 2019.7 pages.
    [73] Soler-Company J, Wanner L. On the relevance of syntactic and discourse features for author profiling and identification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Valencia, Spain: ACL, 2017.681−687.
    [74] Lior R, Oded M. Data Mining with Decision Trees: Theory and Applications. Singapore: World Scientific, 2008.5−8.
    [75] Apte C, Weiss S. Data mining with decision trees and decision rules. Future Generation Computer Systems, 1997, 13: 197−210 doi: 10.1016/S0167-739X(97)00021-6
    [76] Frery J, Largeron C, Juganaru-Mathieu M. UJM at CLEF in author verification based on optimized classification trees. In: Working Notes of the Conference and Labs of the Evaluation Forum 2014. Sheffield, UK, 2014.7 pages.
    [77] Digamberrao K S, Prasad R S. Author identification using sequential minimal optimization with rule-based decision tree on Indian literature in Marathi. In: Proceedings of the International Conference on Computational Intelligence and Data Science. Gurgaon, India: Elsevier, 2018.1086−1101.
    [78] Maitra P, Ghosh S, Das D. Authorship verification - an approach based on random forest. In: Working Notes of the Conference and Labs of the Evaluation Forum 2015. Toulouse, France, 2015.9 pages.
    [79] Trstenjak B, Mikac S, Donko D. KNN with TF-IDF based framework for text categorization. Procedia Engineering, 2014, 69: 1356−1364 doi: 10.1016/j.proeng.2014.03.129
    [80] Halvani O, Steinebach M, Zimmermann R. Authorship verification via k-nearest neighbor estimation. In: Working Notes of the Conference and Labs of the Evaluation Forum 2013. Valencia, Spain, 2013.9 pages.
    [81] Anwar W, Bajwa I S, Ramzan S. Design and implementation of a machine learning-based authorship identification model. Scientific Programming, 2019: 14 pages
    [82] Sarwar R, Porthaveepong T, Rutherford A, Rakthanmanon T, Nutanong S. StyloThai: a scalable framework for stylometric authorship identification of Thai documents. ACM Transactions on Asian and Low-Resource Language Information Processing, 2020, 19(3): Article 36, 15 pages
    [83] Gurney K. An Introduction to Neural Networks. London: CRC Press, 1997.13−16.
    [84] Bagnall D. Author identification using multi-headed recurrent neural networks. In: Working Notes of the Conference and Labs of the Evaluation Forum 2015. Toulouse, France, 2015.11 pages.
    [85] Ruder S, Ghaffari P, Breslin J G, Ltd A. Character-level and multi-channel convolutional neural networks for large-scale authorship attribution. arXiv: 1609.06686, 2016.
    [86] Qian C, He T, Zhang R. Deep learning based authorship identification. Department of Electrical Engineering, Stanford, CA, 2017.9 pages.
    [87] Shrestha P, Sierra S, Gonzalez F A, Rosso P, Montes-y-Gomez M, Solorio T. Convolutional neural networks for authorship attribution of short texts. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Valencia, Spain: ACL, 2017.669−674.
    [88] Jafariakinabad F, Tarnpradab S, Hua K A. Syntactic recurrent neural network for authorship attribution. arXiv: 1902.09723, 2019.
    [89] Khomytska I, Teslyuk V. Statistical models for authorship attribution. In: Proceedings of the 9th International Conference on Computer Science and Information Technology. Sydney, Australia: Springer, 2019.579−592.
    [90] Grabchak M, Cao L, Zhang Z. Authorship attribution using diversity profiles. Journal of Quantitative Linguistics, 2018, 25(2): 142−155 doi: 10.1080/09296174.2017.1343268
    [91] Srinivasan L, Nalini C. An improved framework for authorship identification in online messages. Cluster Computing, 2019, 22: 12101−12110 doi: 10.1007/s10586-017-1563-3
    [92] Qian T, Liu B, Chen L, Peng Z. Tri-training for authorship attribution with limited training data. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, USA: ACL, 2014.345−351.
    [93] Luyckx K, Daelemans W. Authorship attribution and verification with many authors and limited data. In: Proceedings of the 22nd International Conference on Computational Linguistics. Manchester, UK: ACL, 2008.513−520.
    [94] Eder M. Does size matter? Authorship attribution, small samples, big problem. Digital Scholarship in the Humanities, 2015, 30(2): 167−182 doi: 10.1093/llc/fqt066
    [95] Koppel M, Schler J, Argamon S. Authorship attribution in the wild. Language Resources and Evaluation, 2011, 45(1): 83−94 doi: 10.1007/s10579-009-9111-2
    [96] Luyckx K, Daelemans W. The effect of author set size and data size in authorship attribution. Literary and Linguistic Computing, 2011, 26(1): 35−55 doi: 10.1093/llc/fqq013
    [97] Stamatatos E. On the robustness of authorship attribution based on character n-gram features. Journal of Law and Policy, 2013, 21(2): 421−439
    [98] Markov I, Stamatatos E, Sidorov G. Improving cross-topic authorship attribution: the role of pre-processing. In: Proceedings of the 18th International Conference on Computational Linguistics and Intelligent Text Processing. Budapest, Hungary: Springer, 2017.289−302.
    [99] Rahgouy M, Giglou H B, Rahgooy T, Sheykhlan M K, Mohammadzadeh E. Cross-domain authorship attribution: author identification using a multi-aspect ensemble approach. In: Working Notes of the Conference and Labs of the Evaluation Forum 2019. Lugano, Switzerland, 2019.8 pages.
    [100] Mikros G K, Argiri E K. Investigating topic influence in authorship attribution. In: Workshop on Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection. Amsterdam, Netherlands: ACM, 2007.7 pages.
    [101] Sari Y, Stevenson M, Vlachos A. Topic or style? Exploring the most useful features for authorship attribution. In: Proceedings of the 27th International Conference on Computational Linguistics. Santa Fe, USA: ACL, 2018.343−353.
    [102] Seroussi Y, Bohnert F, Zukerman I. Authorship attribution with author-aware topic models. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Jeju, Republic of Korea: ACL, 2012.264−269.
    [103] Seroussi Y, Zukerman I, Bohnert F. Authorship attribution with topic models. Computational Linguistics, 2014, 40(2): 269−310 doi: 10.1162/COLI_a_00173
    [104] Yang M, Chen X, Tu W, Lu Z, Zhu J, Qu Q. A topic drift model for authorship attribution. Neurocomputing, 2018, 273: 133−140 doi: 10.1016/j.neucom.2017.08.022
    [105] Halvani O, Winter C, Pflug A. Authorship verification for different languages, genres and topics. Digital Investigation, 2016, 16: S33−S43 doi: 10.1016/j.diin.2016.01.006
    [106] Bacciu A, Morgia M L, Mei A, Nemmi E N, Neri V, Stefa J. Cross-domain authorship attribution combining instance-based and profile-based features. In: Working Notes of the Conference and Labs of the Evaluation Forum 2019. Lugano, Switzerland, 2019.14 pages.
    [107] Stamatatos E. Authorship attribution using text distortion. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Valencia, Spain: ACL, 2017.1138−1149.
    [108] Stamatatos E. Masking topic-related information to enhance authorship attribution. Journal of the Association for Information Science and Technology, 2018, 69(3): 461−473 doi: 10.1002/asi.23968
    [109] Ishikawa M, Kawakami H. Compression-based distance between string data and its application to literary work classification based on authorship. Computational Statistics, 2013, 28(2): 851−873 doi: 10.1007/s00180-012-0332-2
    [110] Diamantini C, Panti M. An efficient and scalable data compression approach to classification. SIGKDD Explorations, 2000, 2(2): 49−55 doi: 10.1145/380995.381014
    [111] Cerra D, Datcu M, Reinartz P. Authorship analysis based on data compression. Pattern Recognition Letters, 2014, 42: 79−84 doi: 10.1016/j.patrec.2014.01.019
    [112] Halvani O, Winter C, Graner L. On the usefulness of compression models for authorship verification. In: Proceedings of the 12th International Conference on Availability, Reliability and Security. Reggio Calabria, Italy: ACM, 2017.10 pages.
    [113] Lichtblau D, Stoean C. Authorship attribution using the chaos game representation. arXiv: 1802.06007, 2018.
    [114] Lichtblau D, Stoean C. Text documents encoding through images for authorship attribution. In: Proceedings of the 6th International Conference on Statistical Language and Speech Processing. Mons, Belgium: Springer, 2018.178−189.
    [115] Boenninghoff B, Rupp J, Nickel R M, Kolossa D. Deep bayes factor scoring for authorship verification. In: Working Notes of the Conference and Labs of the Evaluation Forum 2020. Thessaloniki, Greece, 2020.12 pages.
    [116] Halvani O, Graner L, Regev R. Cross-domain authorship verification based on topic agnostic features. In: Working Notes of the Conference and Labs of the Evaluation Forum 2020. Thessaloniki, Greece, 2020.18 pages.
    [117] Kipnis A. Higher criticism as an unsupervised authorship discriminator. In: Working Notes of the Conference and Labs of the Evaluation Forum 2020. Thessaloniki, Greece, 2020.3 pages.
    [118] Weren E R D, Kauer A U, Mizusaki L, Moreira V P, Oliveira J P M, Wives L K. Examining multiple features for author profiling. Journal of Information and Data Management, 2014, 5(3): 266−279
    [119] Rangel F, Rosso P, Potthast M, Stein B. Overview of the 5th author profiling task at PAN 2017: gender and language variety identification in twitter. In: Working Notes of the Conference and Labs of the Evaluation Forum 2017. Dublin, Ireland, 2017.26 pages.
    [120] Martinc M, Skrjanec I, Zupan K, Pollak S. PAN 2017: author profiling - gender and language variety prediction. In: Working Notes of the Conference and Labs of the Evaluation Forum 2017. Dublin, Ireland, 2017.10 pages.
    [121] Tellez E S, Miranda-Jimenez S, Graff M, Moctezuma D. Gender and language-variety identification with MicroTC. In: Working Notes of the Conference and Labs of the Evaluation Forum 2017. Dublin, Ireland, 2017.10 pages.
    [122] Takahashi T, Tahara T, Nagatani K, Miura Y, Taniguchi T, Ohkuma T. Text and image synergy with feature cross technique for gender identification. In: Working Notes of the Conference and Labs of the Evaluation Forum 2018. Avignon, France, 2018.12 pages.
    [123] Daneshvar S, Inkpen D. Gender identification in twitter using n-grams and LSA. In: Working Notes of the Conference and Labs of the Evaluation Forum 2018. Avignon, France, 2018.10 pages.
    [124] Tellez E S, Miranda-Jimenez S, Moctezuma D, Graff M, Salgado V, Ortiz-Bejar J. Gender identification through multi-modal tweet analysis using MicroTC and bag of visual words. In: Working Notes of the Conference and Labs of the Evaluation Forum 2018. Avignon, France, 2018.12 pages.
    [125] Rangel F, Rosso P, Montes-y-Gomez M, Potthast M, Stein B. Overview of the 6th author profiling task at PAN 2018: multimodal gender identification in twitter. In: Working Notes of the Conference and Labs of the Evaluation Forum 2018. Avignon, France, 2018.38 pages.
    [126] Rangel F, Rosso P. Overview of the 7th author profiling task at PAN 2019: bots and gender profiling in twitter. In: Working Notes of the Conference and Labs of the Evaluation Forum 2019. Lugano, Switzerland, 2019.36 pages.
    [127] Radivchev V, Nikolov A, Lambova A. Celebrity profiling using TF-IDF, logistic regression, and SVM. In: Working Notes of the Conference and Labs of the Evaluation Forum 2019. Lugano, Switzerland, 2019.7 pages.
    [128] Hodge A, Price S. Celebrity profiling using twitter follower feeds. In: Working Notes of the Conference and Labs of the Evaluation Forum 2020. Thessaloniki, Greece, 2020.15 pages.
    [129] Siagian A H A M, Aritsugi M. DBMS-KU approach for author profiling and deception detection in Arabic. In: Working Notes of the Forum for Information Retrieval Evaluation 2019. Kolkata, India, 2019.7 pages.
    [130] Nayel H A. NAYEL@APDA: machine learning approach for author profiling and deception detection in Arabic texts. In: Working Notes of the Forum for Information Retrieval Evaluation 2019. Kolkata, India, 2019.8 pages.
    [131] Devi S, Kannimuthu S, Ravikumar G, Kumar A. KCE DALab-APDAFIRE2019: author profiling and deception detection in Arabic using weighted embedding. In: Working Notes of the Forum for Information Retrieval Evaluation 2019. Kolkata, India, 2019.8 pages.
    [132] Potthast M, Schremmer F, Hagen M, Stein B. Overview of the author obfuscation task at PAN 2018: a new approach to measuring safety. In: Working Notes of the Conference and Labs of the Evaluation Forum 2018. Avignon, France, 2018.20 pages.
    [133] Potthast M, Hagen M, Stein B. Author obfuscation: attacking the state of the art in authorship verification. In: Working Notes of the Conference and Labs of the Evaluation Forum 2016. Evora, Portugal, 2016.34 pages.
    [134] Mihaylova T, Karadjov G, Kiprov Y, Georgiev G, Koychev I, Nakov P. SU@PAN’2016: author obfuscation. In: Working Notes of the Conference and Labs of the Evaluation Forum 2016. Evora, Portugal, 2016.14 pages.
    [135] Mansoorizadeh M, Rahgooy T, Aminiyan M, Eskandari M. Author obfuscation using wordnet and language models. In: Working Notes of the Conference and Labs of the Evaluation Forum 2016. Evora, Portugal, 2016.8 pages.
    [136] Keswani Y, Trivedi H, Mehta P, Majumder P. Author masking through translation. In: Working Notes of the Conference and Labs of the Evaluation Forum 2016. Evora, Portugal, 2016.5 pages.
    [137] Castro-Castro D, Bueno R O, Munoz R. Author masking by sentence transformation. In: Working Notes of the Conference and Labs of the Evaluation Forum 2017. Dublin, Ireland, 2017.6 pages.
    [138] Kocher M, Savoy J. UniNE at CLEF 2018: author masking. In: Working Notes of the Conference and Labs of the Evaluation Forum 2018. Avignon, France, 2018.9 pages.
    [139] Rahgouy M, Giglou H B, Rahgooy T, Zeynali H, Rasouli S K M. Author masking directed by author’s style. In: Working Notes of the Conference and Labs of the Evaluation Forum 2018. Avignon, France, 2018.6 pages.
  • 加载中
计量
  • 文章访问数:  37
  • HTML全文浏览量:  15
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-08-14
  • 录用日期:  2021-02-09
  • 网络出版日期:  2021-03-17

目录

    /

    返回文章
    返回