2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

融合类别先验Mixup数据增强的罪名预测方法

线岩团 陈文仲 余正涛 张亚飞 王红斌

线岩团, 陈文仲, 余正涛, 张亚飞, 王红斌. 融合类别先验Mixup 数据增强的罪名预测方法. 自动化学报, 2022, 48(8): 2097−2107 doi: 10.16383/j.aas.c200908
引用本文: 线岩团, 陈文仲, 余正涛, 张亚飞, 王红斌. 融合类别先验Mixup 数据增强的罪名预测方法. 自动化学报, 2022, 48(8): 2097−2107 doi: 10.16383/j.aas.c200908
Xian Yan-Tuan, Chen Wen-Zhong, Yu Zheng-Tao, Zhang Ya-Fei, Wang Hong-Bin. Category prior guided mixup data argumentation for charge prediction. Acta Automatica Sinica, 2022, 48(8): 2097−2107 doi: 10.16383/j.aas.c200908
Citation: Xian Yan-Tuan, Chen Wen-Zhong, Yu Zheng-Tao, Zhang Ya-Fei, Wang Hong-Bin. Category prior guided mixup data argumentation for charge prediction. Acta Automatica Sinica, 2022, 48(8): 2097−2107 doi: 10.16383/j.aas.c200908

融合类别先验Mixup数据增强的罪名预测方法

doi: 10.16383/j.aas.c200908
基金项目: 云南省基础研究计划(202001AT070046), 国家重点研发计划(2018YFC0830104, 2018YFC0830105, 2018YFC0830100)和国家自然科学基金(61966020)资助
详细信息
    作者简介:

    线岩团:昆明理工大学信息工程与自动化学院副教授. 主要研究方向为自然语言处理, 信息抽取和机器翻译. E-mail: xianyt@kust.edu.cn

    陈文仲:昆明理工大学信息工程与自动化学院硕士研究生. 主要研究方向为自然语言处理和信息检索. E-mail: Chen_WenZhong@163.com

    余正涛:昆明理工大学信息工程与自动化学院教授. 主要研究方向为自然语言处理, 信息检索, 机器翻译和机器学习. 本文通信作者. E-mail: ztyu@hotmail.com

    张亚飞:昆明理工大学信息工程与自动化学院副教授. 主要研究方向为自然语言处理和模式识别. E-mail: zyfeimail@163.com

    王红斌:昆明理工大学信息工程与自动化学院副教授. 主要研究方向为自然语言处理和信息抽取. E-mail: wanghongbin@kust.edu.cn

Category Prior Guided Mixup Data Argumentation for Charge Prediction

Funds: Supported by Science and Technology Plan Projects of Yunnan Province (202001AT070046), National Key Research and Development Program Foundation of China (2018YFC0830104, 2018YFC0830105, 2018YFC0830100), and National Natural Science Foundation of China (61966020)
More Information
    Author Bio:

    XIAN Yan-Tuan Associate professor at the School of Information Engineering and Automation, Kunming University of Science and Technology. His research interest covers natural language processing, information extraction and machine translation

    CHEN Wen-Zhong Master student at the School of Information Engineering and Automation, Kunming University of Science and Technology. His research interest covers natural language processing and information retrieval

    YU Zheng-Tao Professor at the School of Information Engineering and Automation, Kunming University of Science and Technology. His research interest covers natural language processing, information retrieval, machine translation, and machine learning. Corresponding author of this paper

    ZHANG Ya-Fei Associate professor at the School of Information Engineering and Automation, Kunming University of Science and Technology. Her research interest covers natural language processing and pattern recognition

    WANG Hong-Bin Associate professor at the School of Information Engineering and Automation, Kunming University of Science and Technology. His research interest covers natural language processing and information extraction

  • 摘要: 罪名预测是人工智能技术应用于司法领域的代表性任务. 该任务根据案情描述和事实预测被告人被判的罪名. 由于各类罪名样本数量高度不平衡, 分类模型训练时分类器易偏向高频罪名类别, 从而导致低频罪名预测性能不佳. 针对罪名预测类别不平衡问题, 提出融合类别先验Mixup数据增强策略的罪名预测模型, 改进低频罪名预测效果. 该模型利用双向长短期记忆网络与结构化自注意力机制学习文本向量表示, 在此基础上, 通过Mixup数据增强策略在向量表示空间中合成伪样本, 并利用类别先验使合成样本的标签偏向低频罪名类别, 以此来扩增低频罪名训练样本. 实验结果表明, 与现有方法相比, 该方法在准确率、宏精确率、宏召回率和宏F1值上都获得了大幅提升, 低频罪名预测的宏F1值提升达到13.5%.
  • 图  1  罪名预测模型的总体结构图

    Fig.  1  Overview of proposed charge prediction model

    图  2  训练集罪名样本分布

    Fig.  2  Charge distribution of the training set

    图  3  训练集罪名部分样本分布

    Fig.  3  Charge distribution of the training set

    图  4  Beta 分布超参数的影响

    Fig.  4  Impact of Beta distribution parameters

    图  5  注意力头数的影响

    Fig.  5  Impact of head number in attention Layer

    图  6  低频罪名案例

    Fig.  6  Sample of low frequency charge

    图  7  易混淆罪名案例

    Fig.  7  Sample of confusing charge

    表  1  数据集统计信息

    Table  1  The statistics of different datasets

    数据集Criminal-SCriminal-MCriminal-L
    Train61589153521306900
    Test77021918938368
    Valid77551925038429
    下载: 导出CSV

    表  2  罪名预测对比实验结果

    Table  2  Comparative experimental results

    模型Criminal-SCriminal-MCriminal-L
    Acc.MPMRF1Acc.MPMRF1Acc.MPMRF1
    TFIDF+SVM85.849.741.943.589.658.850.152.191.867.554.157.5
    CNN91.950.544.946.193.557.648.150.593.966.050.354.7
    CNN-20092.651.146.347.392.856.250.050.894.161.950.053.1
    LSTM93.559.458.657.394.765.863.062.695.569.867.066.8
    LSTM-20092.766.058.457.094.466.562.462.795.172.866.767.6
    Fact-Law Att92.857.053.953.494.766.760.461.895.773.367.168.6
    Few-Shot Attri93.466.769.264.994.469.269.267.195.875.873.773.1
    SECaps94.871.370.369.495.471.370.269.696.081.979.779.5
    LSTM-Att95.275.174.473.595.975.976.675.296.686.279.580.8
    LSTM-Att-Manifold-Mixup95.373.775.373.396.380.179.179.596.885.881.582.3
    LSTM-Att-Prior-Mixup95.376.778.276.296.380.882.080.196.684.584.983.3
    下载: 导出CSV

    表  3  不同频率罪名预测宏 F1 值

    Table  3  Macro F1 value of different frequency charges

    模型低频 (49类)中频 (51类)高频 (49类)
    Few-Shot Attri49.760.085.2
    SECaps53.865.589.0
    LSTM-Att54.165.090.1
    LSTM-Att-Manifold-Mixup64.266.589.5
    LSTM-Att-Prior-Mixup67.3(↑13.5%)67.8(↑2.3%)90.0(↑1.0%)
    下载: 导出CSV

    表  4  易混淆罪名预测宏F1值

    Table  4  Macro F1 value for confusing charges

    模型F1值
    LSTM-20079.7
    Few-Shot Attri88.1
    SECaps90.5
    LSTM-Att91.8
    LSTM-Att-Manifold-Mixup92.3
    LSTM-Att-Prior-Mixup92.1(↑1.6%)
    下载: 导出CSV

    表  5  不同编码器对比实验结果

    Table  5  Comparative experimental results of different encoder

    模型Criminal-S
    Acc.MPMRF1
    BERT-CLS93.465.663.163.2
    BERT-CLS-Manifold-Mixup93.669.269.567.6
    BERT-CLS-Prior-Mixup93.870.672.970.6
    BERT-Att93.668.569.767.2
    BERT-Att-Manifold-Mixup94.170.873.070.9
    BERT-Att-Prior-Mixup94.471.473.371.1
    LSTM-Att-Prior-Mixup95.376.778.276.2
    下载: 导出CSV

    表  6  消融实验罪名预测结果

    Table  6  Results of ablation experiments

    模型Criminal-SCriminal-MCriminal-L
    Acc.MPMRF1Acc.MPMRF1Acc.MPMRF1
    LSTM-Att-Prior-Mixup95.376.778.276.296.380.882.080.196.684.584.983.3
    LSTM-Att95.275.174.473.595.975.976.675.296.686.279.580.8
    LSTM-Maxpool93.544.241.141.395.658.054.154.996.371.264.865.9
    下载: 导出CSV
  • [1] Zhong H X, Xiao C J, Tu C C, Zhang T Y, Liu Z Y, Sun M S. How does NLP benefit legal system: A summary of legal artificial intelligence. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Virtual Event: 2020. 5218−5230
    [2] Kort F. Predicting Supreme Court Decisions Mathematically: A Quantitative Analysis of the" Right to Counsel" cases. The American Political Science Review, 1957, 51(1): 1-12. doi: 10.2307/1951767
    [3] Mackaay E, Robillard P. Predicting judicial decisions: the nearest neighbour rule and visual representation of case patterns. De Gruyter, 1974, 41: 302-331.
    [4] Liu C L, Chang C T, Ho J H. Case instance generation and refinement for case-based criminal summary judgments in chinese. Journal of Information Science and Engineering, 2004, 20(4): 783-800.
    [5] Xiao C J, Zhong H X, Guo Z P, Tu C C, Liu Z Y, Sun M S, et al. CAIL2018: A large-scale legal dataset for judgment prediction, arXiv preprint, 2018, arXiv: 1807.02478
    [6] Zhong H X, Guo Z P, Tu C C, Xiao C J, Liu Z Y, Sun M S. Legal judgment prediction via topological learning. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: 2018. 3540−3549
    [7] Yang W, Jia W, Zhou X J. Legal judgment prediction via multi-perspective bi-feedback network. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macao, China: 2019. 4085−4091
    [8] 王文广, 陈运文, 蔡华, 曾彦能, 杨慧宇. 基于混合深度神经网络模型的司法文书智能化处理. 清华大学学报(自然科学版), 2019, 59(7): 505-511.

    Wang Guang-Wen, Chen Yun-Wen, Cai Hua, Zeng Yan-Neng, Yang Hui-Yu. Judicial document intellectual processing using hybrid deep neural networks. Journal of Tsinghua University(Science and Technology), 2019, 59(7): 505-511.
    [9] Jiang X, Ye H, Luo Z C, Chao W H, Ma W J. Interpretable rationale augmented charge prediction system. In: Proceedings of the 27th International Conference on Computational Linguistics. Santa Fe, New-Mexico, USA: 2018. 149−151
    [10] 刘宗林, 张梅山, 甄冉冉, 公佐权, 余南, 付国宏. 融入罪名关键词的法律判决预测多任务学习模型. 清华大学学报(自然科学版), 2019, 59(7): 497-504.

    Liu Zong-Lin, Zhang Mei-Shan, Zhen Ran-Ran, Gong Zuo-Quan, Yu Nan, Fu Guo-Hong. Multi-task learning model for legal judgment predictions with charge keywords. Journal of Tsinghua University(Science and Technology), 2019, 59(7): 497-504.
    [11] Xu N, Wang P, Chen L, Pan L, Wang X Y, Zhao J Z. Distinguish confusing law articles for legal judgment prediction. In: Proceedings of the 58th Annual Meeting of the Association-for-Computational-Linguistics, Virtual Event: 2020. 3086−3095
    [12] Hu Z K, Li X, Tu C C, Liu Z Y, Sun M S. Few-shot charge prediction with discriminative legal attributes. In: Proceedings of the 27th International Conference on Computational Linguistics. Santa Fe, New-Mexico, USA: 2018. 487−498
    [13] He C Q, Peng L, Le Y Q, He J W, Zhu X Y. SECaps: A sequence enhanced capsule model for charge prediction. In: Proceedings of the 28th International Conference on Artificial Neural Networks. Munich, Germany: Springer Verlag, 2019. 227−239
    [14] Zhang H, Cisse M, Dauphin Y N, David L P. Mixup: Beyond empirical risk minimization. arXiv preprint, 2017, arXiv: 1710.09412
    [15] Verma V, Lamb A, Beckham C, Najafi A, Mitiagkas I, Lopez-Paz D, et al. Manifold mixup: Better representations by interpolating hidden states. In: Proceedings of the 36th International Conference on Machine Learning. Long Beach, CA, USA: 2019. 11196−11205
    [16] Lin Z H, Feng M W, Santos C N, Yu M, Xiang B, Zhou B, et al. A structured self-attentive sentence embedding. In: Proceedings of the 5th International Conference on Learning Representations. Toulon, France: 2017. 1−15
    [17] Guo H Y, Mao Y Y, Zhang R C. Augmenting data with mixup for sentence classification: An empirical study. arXiv preprint, 2019, arXiv: 1905.08941
    [18] Chen J A, Yang Z C, Yang D Y. MixText: Linguistically-informed interpolation of hidden space for semi-supervised text classification. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Virtual Event: 2020. 2147−2157
    [19] Hochreiter S, Schmidhuber J. Long short-term memory. Neural computation, 1997, 9(8): 1735-1780. doi: 10.1162/neco.1997.9.8.1735
    [20] Devlin J, Chang M W, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceeding of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, USA: 2019. 4171− 4186
    [21] Kingma D P, Ba J L. Adam: A method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: 2015. 1−15
    [22] Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Information processing & management, 1988, 24(5): 513-523.
    [23] Suykens J A K, Vandewalle J. Least Squares Support Vector Machine Classifiers. Neural processing letters, 1999, 9(3): 293-300. doi: 10.1023/A:1018628609742
    [24] Kim Y. Convolutional neural networks for sentence classification. In: Proceedings of the 19th conference on Empirical Methods in Natural Language. Doha, Qatar: 2014. 1746−1751
    [25] Luo B F, Feng Y S, Xu J B, Zhang X, Zhao D Y. Learning to predict charges for criminal cases with legal basis. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark: 2017. 2727−2736
    [26] Zhong H X, Zhang Z Y, Liu Z Y, Sun M S. Open chinese language pre-trained model zoo [Online], available: https://github.com/thunlp/openclap, March 6, 2021.
  • 加载中
图(7) / 表(6)
计量
  • 文章访问数:  1122
  • HTML全文浏览量:  388
  • PDF下载量:  149
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-10-31
  • 修回日期:  2021-03-02
  • 网络出版日期:  2021-05-19
  • 刊出日期:  2022-06-01

目录

    /

    返回文章
    返回