2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于生成对抗网络的对抗攻击防御模型

孔锐 蔡佳纯 黄钢

孔锐, 蔡佳纯, 黄钢. 基于生成对抗网络的对抗攻击防御模型. 自动化学报, 2020, 41(x): 1−17 doi: 10.16383/j.aas.2020.c200033
引用本文: 孔锐, 蔡佳纯, 黄钢. 基于生成对抗网络的对抗攻击防御模型. 自动化学报, 2020, 41(x): 1−17 doi: 10.16383/j.aas.2020.c200033
Kong Rui, Cai Jia-Chun, Huang Gang. Defense to adversarial attack with generative adversarial network. Acta Automatica Sinica, 2020, 41(x): 1−17 doi: 10.16383/j.aas.2020.c200033
Citation: Kong Rui, Cai Jia-Chun, Huang Gang. Defense to adversarial attack with generative adversarial network. Acta Automatica Sinica, 2020, 41(x): 1−17 doi: 10.16383/j.aas.2020.c200033

基于生成对抗网络的对抗攻击防御模型

doi: 10.16383/j.aas.2020.c200033
基金项目: 广东省自然科学基金(2020A151501718)资助
详细信息
    作者简介:

    孔锐:暨南大学智能科学与工程学院教授. 主要研究方向为机器学习, 图像识别. E-mail: tkongrui@jnu.edu.cn

    蔡佳纯:暨南大学学信息科学技术学院硕士研究生. 主要研究方向为生成对抗网络, 模式识别. E-mail: gptcjc1126@163.com

    黄钢:暨南大学学信息科学技术学院硕士研究生. 主要研究方向为生成对抗网络, 模式识别. 本文通信作者. E-mail: hhhgggpps@gmail.com

    通讯作者:

    暨南大学学信息科学技术学院硕士研究生. 主要研究方向为生成对抗网络, 模式识别. 本文通信作者. E-mail: hhhgggpps@gmail.com

Defense to Adversarial Attack with Generative Adversarial Network

Funds: Supported by Natural Science Foundation of Guangdong Province, P. R. China (2020A151501718)
More Information
    Corresponding author: HUANG Gang Master student at the College of Information Science and Technology, Jinan University. His research interest covers generative adversarial networks, pattern recognition. Corresponding author of this paper
  • 摘要: 深度神经网络在解决复杂问题方面取得了惊人的成功, 广泛应用于生活中各个领域, 但是最近的研究表明, 深度神经网络容易受到精心设计的对抗样本的攻击, 导致网络模型输出错误的预测结果, 这对于深度学习网络的安全性是一种极大的挑战. 对抗攻击是深度神经网络发展过程中必须克服的一大障碍, 设计一种高效且能够防御多种对抗攻击算法, 且具有强鲁棒性的防御模型是有效推动对抗攻击防御的方向之一, 探究能否利用对抗性攻击来训练网络分类器从而提高其鲁棒性具有重要意义. 本文将生成对抗网络(Generative adversarial networks, GAN)和现有的攻击算法结合, 提出一种基于生成对抗网络的对抗攻击防御模型(AC-DefGAN), 利用对抗攻击算法生成攻击样本作为GAN的训练样本, 同时在网络中加入条件约束来稳定模型的训练过程, 利用分类器对生成器所生成样本的分类来指导GAN的训练过程, 通过自定义分类器需要防御的攻击算法来生成对抗样本以完成判别器的训练, 从而得到能够防御多种对抗攻击的分类器. 通过在MNIST、CIFAR-10和ImageNet数据集上进行实验, 证明训练完成后, AC-DefGAN可以直接对原始样本和对抗样本进行正确分类, 对各类对抗攻击算法达到很好的防御效果, 且比已有方法防御效果好、鲁棒性强.
  • 图  1  GAN架构图

    Fig.  1  GAN architecture diagram

    图  2  AC-DefGAN架构图

    Fig.  2  AC-DefGAN architecture diagram

    图  3  生成器架构图

    Fig.  3  The structure of generator

    图  4  判别器架构图

    Fig.  4  The structure of discriminator

    图  5  ${\Omega _{attack}}$算法流程图

    Fig.  5  Algorithm flowchart of ${\Omega _{attack}}$

    图  6  MNIST上d_loss变化趋势

    Fig.  6  Trends of d_loss on MNIST

    图  7  CIFAR-10上d_loss变化情况

    Fig.  7  Trends of d_loss on CIFAR-10

    图  8  CIFAR-10数据集下AC-DefGAN判别器识别对抗样本$x_{real}^{adv}$的准确率

    Fig.  8  Accuracy of AC-DefGAN discriminator in identifying adversarial samples$x_{real}^{adv}$

    表  1  MNIST数据集中各目标网络、AC-DefGAN对各类对抗样本的误分类率

    Table  1  Misclassification rates of various adversarial examples for target models and AC-DefGAN on MNIST

    攻击方法VGG11ResNet-18Dense-Net40InceptionV3
    Target modelAC-DefGANTarget modelAC-DefGANTarget modelAC-DefGANTarget modelAC-DefGAN
    FGSM(%)95.310.6687.830.5879.320.5692.910.61
    BIM(%)95.700.7888.510.6982.010.6493.120.73
    DeepFool(%)96.241.4289.741.1388.611.1093.801.25
    C&W(%)99.371.7997.521.7196.211.6898.931.75
    PGD(%)98.131.6195.811.5293.261.3797.151.58
    下载: 导出CSV

    表  2  MNIST数据集中AC-DefGAN对各攻击算法所生成对抗样本的误分类率

    Table  2  Misclassification rates of multiple adversarial examples for AC-DefGAN on MNIST

    攻击方法VGG11ResNet-18Dense-Net40InceptionV3
    BIM、FGSM(%)0.690.640.590.67
    BIM、DeepFool(%)1.110.910.871.01
    FGSM、DeepFool(%)1.050.860.810.93
    BIM、FGSM、DeepFool(%)1.010.840.790.89
    下载: 导出CSV

    表  3  MNIST数据集中AC-DefGAN与各防御模型对各类对抗样本的误分类率

    Table  3  Misclassification rates of various adversarial examplesfor AC-DefGAN and other defense strategies on MNIST

    攻击方法MagNetAdv. trainingAPE-GANmDefence-GAN-RecAC-DefGAN
    FGSM(%)80.9118.402.801.110.66
    BIM(%)83.0919.212.911.240.78
    DeepFool(%)89.9323.162.431.531.42
    C&W(%)93.1862.231.742.291.67
    粗体表示最优值.
    下载: 导出CSV

    表  4  MNIST数据集中目标网络、AC-DefGAN对不同扰动阈值$\varepsilon $的FGSM所产生对抗样本的误分类率(%)

    Table  4  Misclassification rates of adversarial examples generated from FGSM with different $\varepsilon $ for target model and AC-DefGAN on MNIST (%)

    FGSM的扰动阈值$\varepsilon $目标网络AC-DefGAN
    $\varepsilon = 0.1$96.290.68
    $\varepsilon = 0.2$96.980.83
    $\varepsilon = 0.3$97.350.91
    $\varepsilon = 0.4$98.761.69
    下载: 导出CSV

    表  5  CIFAR-10数据集中各目标网络、AC-DefGAN对各类对抗样本的误分类率

    Table  5  Misclassification rates of various adversarial examples for target models and AC-DefGAN on CIFAR-10

    攻击方法VGG19ResNet-18Dense-Net40InceptionV3
    Target modelAC-DefGANTarget modelAC-DefGANTarget modelAC-DefGANTarget modelAC-DefGAN
    FGSM(%)77.8116.9574.9213.0773.3713.7476.7415.49
    BIM(%)84.7319.8075.7414.2776.4413.8379.5218.93
    DeepFool(%)88.5223.4783.4822.5586.1621.7988.2623.15
    C&W(%)98.9431.1392.7930.2496.6829.8597.4330.97
    PGD(%)87.1328.3786.4126.2986.2825.9187.0426.74
    下载: 导出CSV

    表  6  CIFAR-10数据集中AC-DefGAN对各攻击算法所生成对抗样本的误分类率

    Table  6  Misclassification rates of multiple adversarial examples for AC-DefGAN on CIFAR-10

    攻击方法VGG19ResNet-18Dense-Net40InceptionV3
    BIM、FGSM(%)19.6213.7313.1816.45
    BIM、DeepFool(%)21.7118.6517.4222.14
    FGSM、DeepFool(%)20.9515.2116.3519.78
    BIM、FGSM、DeepFool(%)21.3717.5616.9320.81
    下载: 导出CSV

    表  7  CIFAR-10数据集中AC-DefGAN与各防御模型对各类对抗样本的误分类率

    Table  7  Misclassification rates of various adversarial examplesfor AC-DefGAN and other defense strategies on CIFAR-10

    攻击方法目标网络FAdv. trainingAPE-GANmDefence-GAN-RecAC-DefGAN
    FGSM(%)82.8332.6826.4122.5016.91
    BIM(%)89.7539.4924.3321.7219.83
    DeepFool(%)93.5444.7125.2928.0925.56
    C&W(%)98.7178.2330.5032.2130.24
    粗体表示最优值.
    下载: 导出CSV

    表  8  CIFAR-10数据集中目标网络、AC-DefGAN对不同扰动阈值$\varepsilon $的FGSM所产生对抗样本的误分类率(%)

    Table  8  Misclassification rates of adversarial examples generated from FGSM with different $\varepsilon $ for target model and AC-DefGAN on CIFAR-10 (%)

    FGSM的扰动阈值$\varepsilon $目标网络AC-DefGAN
    $\varepsilon = 0.1$77.8212.92
    $\varepsilon = 0.2$80.8917.47
    $\varepsilon = 0.3$82.3318.86
    $\varepsilon = 0.4$84.7424.13
    下载: 导出CSV

    表  9  ImageNet数据集中各类目标网络、AC-DefGAN对各类对抗样本的误分类率

    Table  9  Misclassification rates of various adversarial examples for target models and AC-DefGAN on ImageNet

    攻击方法VGG19ResNet-18Dense-Net40InceptionV3
    Target modelAC-DefGANTarget modelAC-DefGANTarget modelAC-DefGANTarget modelAC-DefGAN
    FGSM(%)71.2139.4269.1438.5268.4237.9269.6538.77
    DeepFool(%)88.4544.8085.7342.9686.2443.1787.6744.63
    C&W(%)97.3939.1396.1936.7595.8436.7496.4338.68
    下载: 导出CSV

    表  10  ImageNet数据集中AC-DefGAN与各防御模型对各攻击算法对抗样本的误分类率(%)

    Table  10  Misclassification rates of various adversarial examples for AC-DefGAN and other defense strategies on ImageNet (%)

    攻击方法目标网络APE-GANmAC-DefGAN
    FGSM72.9240.1438.94
    C&W97.8438.7036.52
    BIM76.7941.2840.78
    DeepFool94.7145.9344.31
    粗体表示最优值.
    下载: 导出CSV
  • [1] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436−444 doi: 10.1038/nature14539
    [2] Hinton G E. What kind of a graphical model is the brain? In: Proceedings of the 19th International Joint Conference on Artificial Intelligence. Burlington, USA: Morgan Kaufmann, 2005. 1765−1775
    [3] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786): 504 −507 doi: 10.1126/science.1127647
    [4] Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. Piscataway, NJ, USA: IEEE, 2011. 315−323
    [5] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of Annual Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2012. 1097−1105
    [6] Zeiler M D, Fergus R. Visualizing and understanding convolutional networks. In: Proceedings of European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014. 818−833
    [7] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv: 1409.1556v6, 2015.
    [8] Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D. Going deeper with convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. IEEE: Boston, USA: 2015. 1−9
    [9] He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. IEEE: Las Vegas, USA: 2016. 770−778
    [10] Chakraborty A, Alam M, Dey V, et al. Adversarial Attacks and Defences: A Survey. arXiv preprint arXiv: 1810.00069, 2018.
    [11] Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I J, Fergus R. Intriguing properties of neural networks. arXiv preprint arXiv: 1312. 6199, 2014.
    [12] Akhtar N, Mian A. Threat of adversarial attacks on deep learning in computer vision: A survey. IEEE Access, 2018, 6: 14410−14430 doi: 10.1109/ACCESS.2018.2807385
    [13] G oodfellow I J, Shlens J, and Szegedy C. Explaining and Harnessing Adversarial Examples. arXiv preprint arXiv: 1412.6572, 2015.
    [14] Kurakin A, Goodfellow I J, Bengio S. Adversarial examples in the physical world. arXiv preprint, arXiv: 1607.02533v4, 2017.
    [15] Tramer F, Kurakin A, Papernot N, et al. Ensemble adversarial training: Attacks and defenses. arXiv preprint, arXiv: 1705.07204v5, 2020.
    [16] Moosavi-Dezfooli S M, Fawzi A, Frossard P. Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016. 2574−2582
    [17] Madry A, Makelov A, Schmidt L, et al. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv: 1706.06083v4, 2019.
    [18] Lyu C, Huang K Z, and Liang H N. A Unified Gradient Regularization Family for Adversarial Examples. In: Proceedings of 2015 IEEE International Conference on Data Mining. IEEE, 2015. 301−309
    [19] Shaham U, Yamada Y, Negahban S. Understanding adversarial training: Increasing local stability of supervised models through robust optimization. Neurocomputing, 2018, 307: 195−204 doi: 10.1016/j.neucom.2018.04.027
    [20] Nayebi A, Ganguli S. Biologically inspired protection of deep networks from adversarial attacks. arXiv preprint arXiv: 1703.09202, 2017.
    [21] Dziugaite G K, Ghahramani Z, Roy D M. A study of the effect of jpg compression on adversarial images. arXiv preprint arXiv: 1608.00853, 2016.
    [22] Guo C, Rana M, Cisse M, et al. Countering adversarial images using input transformations. arXiv preprint arXiv: 1711.00117v3, 2018.
    [23] Das N, Shanbhogue M, Chen S T, et al. Keeping the bad guys out: Protecting and vaccinating deep learning with jpeg compression. arXiv preprint arXiv: 1705.02900, 2017.
    [24] Xie C, Wang J, Zhang Z, et al. Adversarial examples for semantic segmentation and object detection. In: Proceedings of the IEEE International Conference on Computer Vision, 2017. 1369−1378
    [25] Gu S, Rigazio L. Towards deep neural network architectures robust to adversarial examples. arXiv preprint arXiv: 1412.5068v4, 2015.
    [26] Papernot N, McDaniel P, Wu X, et al. Distillation as a defense to adversarial perturbations against deep neural networks. In: Proceedings of 2016 IEEE Symposium on Security and Privacy (SP). IEEE, 2016. 582−597
    [27] Ross A S, Doshi-Velez F. Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In Thirty-second AAAI conference on artificial intelligence. 2018.
    [28] Cisse M, Adi Y, Neverova N, et al. Houdini: Fooling deep structured prediction models. arXiv preprint arXiv: 1707.05373, 2017.
    [29] Akhtar N, Liu J, Mian A. Defense against universal adversarial perturbations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 3389−3398
    [30] Samangouei P, Kabkab M, Chellappa R. Defense-gan: Protecting classifiers against adversarial attacks using generative models. arXiv preprint arXiv: 1805.06605, 2018.
    [31] Meng D, Chen H. Magnet: a two-pronged defense against adversarial examples. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2017: 135−147
    [32] Xu W, Evans D, Qi Y. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv: 1704.01155, 2017.
    [33] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. In: Proceedings of Advances in Neural Information Processing Systems. 2014: 2672−2680
    [34] Yu Y, Gong Z, Zhong P, et al. Unsupervised representation learning with deep convolutional neural network for remote sensing images. In: Proceedings of International Conference on Image and Graphics. Springer, Cham, 2017: 97−108
    [35] Mirza M, Osindero S. Conditional generative adversarial nets. arXiv preprint arXiv: 1411.1784, 2014.
    [36] Odena A, Olah C, Shlens J. Conditional image synthesis with auxiliary classifier gans. In: Proceedings of International Conference on Machine Learning. 2017: 2642−2651
    [37] 林懿伦, 戴星原, 李力, 王晓, 王飞跃. 人工智能研究的新前线: 生成式对抗网络. 自动化学报, 2018, 44(5): 775−792

    Lin Yi-Lun, Dai Xing-Yuan, Li Li, Wang Xiao, Wang FeiYue. The new frontier of AI research: generative adversarial networks. Acta Automatica Sinica, 2018, 44(5): 775−792
    [38] Shen S, Jin G, Gao K, et al. Ape-gan: Adversarial perturbation elimination with gan. arXiv preprint arXiv: 1707.05474, 2017.
    [39] 王坤峰, 左旺孟, 谭营, 秦涛, 李力, 王飞跃. 生成式对抗网络: 从生成数据到创造智能. 自动化学报, 2018, 44(5): 769−774

    Wang Kun-Feng, Zuo Wang-Meng, Tan Ying, Qin Tao, Li Li, Wang Fei-Yue. Generative adversarial networks: from generating data to creating intelligence. Acta Automatica Sinica, 2018, 44(5): 769−774
    [40] 孔锐, 黄钢. 基于条件约束的胶囊生成对抗网络. 自动化学报, 2020, 46(1): 94−107

    KONG Rui, HUANG Gang. Conditional Generative Adversarial Capsule Networks. Acta Automatica Sinica, 2020, 46(1): 94−107
    [41] Zhang H, Chen H, Song Z, et al. The limitations of adversarial training and the blind-spot attack. arXiv preprint arXiv: 1901.04684, 2019.
    [42] 唐贤伦, 杜一铭, 刘雨微, 李佳歆, 马艺玮. 基于条件深度卷积生成对抗网络的图像识别方法. 自动化学报, 2018, 44(5): 855−864

    TANG Xian-Lun, DU Yi-Ming, LIU Yu-Wei, LI Jia-Xin, MA Yi-Wei. Image Recognition With Conditional Deep Convolutional Generative Adversarial Networks. Acta Automatica Sinica, 2018, 44(5): 855−864
    [43] Kussul E, Baidyk T. Improved method of handwritten digit recognition tested on MNIST database. Image and Vision Computing, 2004, 22(12): 971−981 doi: 10.1016/j.imavis.2004.03.008
    [44] Hinton G E, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv: 1207.0580, 2012.
    [45] Deng J, Dong W, Socher R, et al. Imagenet: A large-scale hierarchical image database. In: Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2009. 248−255
    [46] Carlini N, Wagner D. Towards evaluating the robustness of neural networks. In: Proceedings of 2017 IEEE Symposium on Security and Privacy (SP). IEEE, 2017. 39−57
  • 加载中
计量
  • 文章访问数:  2585
  • HTML全文浏览量:  1564
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-01-16
  • 修回日期:  2020-05-31

目录

    /

    返回文章
    返回