2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

收缩、分离和聚合: 面向长尾视觉识别的特征平衡方法

杨佳鑫 于淼淼 李虹颖 李硕豪 范灵毓 张军

杨佳鑫, 于淼淼, 李虹颖, 李硕豪, 范灵毓, 张军. 收缩、分离和聚合: 面向长尾视觉识别的特征平衡方法. 自动化学报, 2024, 50(5): 898−910 doi: 10.16383/j.aas.c230288
引用本文: 杨佳鑫, 于淼淼, 李虹颖, 李硕豪, 范灵毓, 张军. 收缩、分离和聚合: 面向长尾视觉识别的特征平衡方法. 自动化学报, 2024, 50(5): 898−910 doi: 10.16383/j.aas.c230288
Yang Jia-Xin, Yu Miao-Miao, Li Hong-Ying, Li Shuo-Hao, Fan Ling-Yu, Zhang Jun. Shrink, separate and aggregate: A feature balancing method for long-tailed visual recognition. Acta Automatica Sinica, 2024, 50(5): 898−910 doi: 10.16383/j.aas.c230288
Citation: Yang Jia-Xin, Yu Miao-Miao, Li Hong-Ying, Li Shuo-Hao, Fan Ling-Yu, Zhang Jun. Shrink, separate and aggregate: A feature balancing method for long-tailed visual recognition. Acta Automatica Sinica, 2024, 50(5): 898−910 doi: 10.16383/j.aas.c230288

收缩、分离和聚合: 面向长尾视觉识别的特征平衡方法

doi: 10.16383/j.aas.c230288
基金项目: 国家自然科学基金 (62101571), 湖南省自然科学基金 (2021JJ40685)资助
详细信息
    作者简介:

    杨佳鑫:国防科技大学系统工程学院硕士研究生. 主要研究方向为长尾识别技术. E-mail: yangjiaxin21@nudt.edu.cn

    于淼淼:国防科技大学系统工程学院博士研究生. 主要研究方向为人脸伪造检测. E-mail: yumiaomiaonudt@nudt.edu.cn

    李虹颖:国防科技大学系统工程学院硕士研究生. 主要研究方向为对抗攻击. E-mail: lihongying@nudt.edu.cn

    李硕豪:国防科技大学系统工程学院副教授. 主要研究方向为场景图生成. E-mail: lishuohao@nudt.edu.cn

    范灵毓:中国人民解放军96962部队工程师. 主要研究方向为伪装目标检测与分析. E-mail: 13810576175@139.com

    张军:国防科技大学系统工程学院教授. 主要研究方向为视觉数据计算与分析. 本文通信作者. E-mail: zhangjun1975@nudt.edu.cn

Shrink, Separate and Aggregate: A Feature Balancing Method for Long-tailed Visual Recognition

Funds: Supported by National Natural Science Foundation of China (62101571) and Natural Science Foundation of Hunan Province (2021JJ40685)
More Information
    Author Bio:

    YANG Jia-Xin Master student at the College of System Engineering, National University of Defense Technology. His main research interest is long-tailed recognition

    YU Miao-Miao Ph.D. candidate at the College of System Engineering, National University of Defense Technology. Her main research interest is face forgery detection

    LI Hong-Ying Master student at the College of System Engineering, National University of Defense Technology. His main research interest is adversarial attacks

    LI Shuo-Hao Associate professor at the College of System Engineering, National University of Defense Technology. His main research interest is scene graph generation

    FAN Ling-Yu Engineer at the Unit 96962 of the PLA. Her research interest covers camouflaged object detection and analysis

    ZHANG Jun Professor at the College of System Engineering, National University of Defense Technology. Her research interest covers visual data computation and analysis. Corresponding author of this paper

  • 摘要: 数据在现实世界中通常呈现长尾分布, 即, 少数类别拥有大量样本, 而多数类别仅有少量样本. 这种数据不均衡的情况会导致在该数据集上训练的模型对于样本数量较少的尾部类别产生过拟合. 面对长尾视觉识别这一任务, 提出一种面向长尾视觉识别的特征平衡方法, 通过对样本在特征空间中的收缩、分离和聚合操作, 增强模型对于难样本的识别能力. 该方法主要由特征平衡因子和难样本特征约束两个模块组成. 特征平衡因子利用类样本数量来调整模型的输出概率分布, 使得不同类别之间的特征距离更加均衡, 从而提高模型的分类准确率. 难样本特征约束通过对样本特征进行聚类分析, 增加不同类别之间的边界距离, 使得模型能够找到更合理的决策边界. 该方法在多个常用的长尾基准数据集上进行实验验证, 结果表明不但提高了模型在长尾数据上的整体分类精度, 而且显著提升了尾部类别的识别性能. 与基准方法BS相比较, 该方法在CIFAR100-LT、ImageNet-LT和iNaturalist 2018数据集上的性能分别提升了7.40%、6.60%和2.89%.
  • 图  1  长尾分布

    Fig.  1  Long-tailed distribution

    图  2  长尾与均衡数据的特征可视化

    Fig.  2  Feature visualization of long-tailed and balanced data

    图  3  基于类别的分类器权重$L_2 $范数

    Fig.  3  Class-based classifier weight $L_2 $ norm

    图  4  面向长尾视觉识别的特征平衡方法

    Fig.  4  A feature balancing method for long-tailed visual recognition

    图  5  难样本特征约束

    Fig.  5  Hard sample feature constraint

    图  6  特征可视化对比

    Fig.  6  Comparison of feature visualization

    图  7  特征$L_2 $范数对比

    Fig.  7  Feature $L_2 $ norm comparison

    图  8  参数$\psi $分析

    Fig.  8  Analysis of the parameter$\psi $

    表  1  数据集的基本信息

    Table  1  Basic information of the datasets

    数据集类数量 (个)训练样本 (张)测试样本 (张)$IF$
    CIFAR100-LT1001084710000100
    ImageNet-LT1000115846500000256
    iNaturalist 2018814243751324426435
    下载: 导出CSV

    表  2  模型的基本设定

    Table  2  Basic settings of the model

    数据集CIFAR100-LTImageNet-LTiNaturalist 2018
    骨干网络ResNet-32ResNet-50ResNeXt-50
    batch size64256512
    权重衰减0.00400.00020.0002
    初始学习率0.10.20.2
    调整策略warmupcosinecosine
    动量0.90.90.9
    下载: 导出CSV

    表  3  CIFAR100-LT上的Top-1准确率 (%)

    Table  3  Top-1 accuracy on CIFAR100-LT (%)

    方法来源年份CIFAR100-LT
    1050100
    CE[41]55.743.938.6
    CE-DRW[41]NeurIPS202257.947.941.1
    LDAM-DRW[37]NeurIPS201958.746.342.0
    Causal Norm[42]NeurIPS202059.650.344.1
    BS[12]NeurIPS202063.050.8
    Remix[43]ECCV202059.249.545.8
    RIDE(3E)[14]ICLR202048.0
    MiSLAS[44]CVPR202163.252.347.0
    TSC[11]CVPR202259.047.443.8
    WD[29]CVPR202268.757.753.6
    KPS[9]PAMI202349.245.0
    PC[40]IJCAI202369.157.853.4
    SuperDisco[39]CVPR202369.358.353.8
    SHIKE[38]CVPR202359.856.3
    特征平衡方法本文73.363.058.2
    下载: 导出CSV

    表  4  ImageNet-LT上的Top-1准确率 (%)

    Table  4  Top-1 accuracy on ImageNet-LT (%)

    方法来源年份骨干网络头部类中部类尾部类总计
    CE[41]ResNet-5064.033.85.841.60
    CE-DRW[41]NeurIPS2022ResNet-5061.747.328.850.10
    LDAM-DRW[37]NeurIPS2019ResNet-5060.446.930.749.80
    Causal Norm[42]NeurIPS2020ResNeXt-5062.748.831.651.80
    BS[12]NeurIPS2020ResNet-5060.948.832.151.00
    Remix[43]ECCV2020ResNet-1860.446.930.748.60
    RIDE(3E)[14]ICLR2020ResNeXt-5066.251.734.955.40
    MiSLAS[44]CVPR2021ResNet-5061.751.335.852.70
    CMO[41]CVPR2022ResNet-5066.453.935.656.20
    TSC[11]CVPR2022ResNet-5063.549.730.452.40
    WD[29]CVPR2022ResNeXt-5062.550.441.553.90
    KPS[9]PAMI2023ResNet-5051.28
    PC[40]IJCAI2023ResNeXt-5063.550.842.754.90
    SuperDisco[39]CVPR2023ResNeXt-5066.153.337.157.10
    SHIKE[38]CVPR2023ResNet-5059.70
    特征平衡方法本文ResNet-5067.954.340.157.60
    ResNeXt-5067.655.341.758.19
    下载: 导出CSV

    表  5  iNaturalist 2018上的Top-1准确率 (%)

    Table  5  Top-1 accuracy on iNaturalist 2018 (%)

    方法来源年份骨干网络头部类中部类尾部类总计
    CE[41]ResNet-5073.963.555.561.00
    LDAM-DRW[37]NeurIPS2019ResNet-5066.10
    BS[12]NeurIPS2020ResNet-5070.070.269.970.00
    Remix[43]ECCV2020ResNet-5070.50
    RIDE(3E)[14]ICLR2020ResNet-5070.272.272.772.20
    MiSLAS[44]CVPR2021ResNet-5073.272.470.471.60
    CMO[41]CVPR2022ResNet-5068.772.673.172.80
    TSC[11]CVPR2022ResNet-5072.670.667.869.70
    WD[29]CVPR2022ResNet-5071.270.469.770.20
    KPS[9]PAMI2023ResNet-5070.35
    PC[40]IJCAI2023ResNet-5071.670.670.270.60
    SuperDisco[39]CVPR2023ResNet-5072.372.971.373.60
    SHIKE[38]CVPR2023ResNet-5074.50
    特征平衡方法本文ResNet-5074.972.273.272.89
    ResNeXt-5074.672.372.272.53
    下载: 导出CSV

    表  6  模块的消融实验

    Table  6  Ablation experiments of the module

    数据增强权重衰减特征平衡因子难样本特征约束CEBS
    头部中部尾部总体头部中部尾部总体
    38.650.8
    76.045.616.947.672.051.528.051.6
    82.252.313.351.078.255.331.656.2
    79.956.516.152.676.156.836.857.6
    80.157.617.153.376.657.937.458.2
    下载: 导出CSV
  • [1] Zoph B, Vasudevan V, Shlens J, Le Q V. Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 8697−8710
    [2] Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 779−788
    [3] Deng J, Dong W, Socher R, Li L J, Li K, Li F F. ImageNet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE, 2009. 248−255
    [4] Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115: 211−252 doi: 10.1007/s11263-015-0816-y
    [5] Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, et al. Microsoft COCO: Common objects in context. In: Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014. 740−755
    [6] Zhou B L, Lapedriza A, Khosla A, Oliva A, Torralba A. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(6): 1452−1464 doi: 10.1109/TPAMI.2017.2723009
    [7] Anwar S M, Majid M, Qayyum A, Awais M, Alnowami M, Khan M K. Medical image analysis using convolutional neural networks: A review. Journal of Medical Systems, 2018, 42: Article No. 226 doi: 10.1007/s10916-018-1088-1
    [8] Liu Z W, Miao Z Q, Zhan X H, Wang J Y, Gong B Q, Yu S X. Large-scale long-tailed recognition in an open world. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 2537−2546
    [9] Li M K, Cheung Y M, Hu Z K. Key point sensitive loss for long-tailed visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(4): 4812−4825
    [10] Tian C Y, Wang W H, Zhu X Z, Dai J F, Qiao Y. VL-LTR: Learning class-wise visual-linguistic representation for long-tailed visual recognition. In: Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer, 2022. 73–91
    [11] Li T H, Cao P, Yuan Y, Fan L J, Yang Y Z, Feris R, et al. Targeted supervised contrastive learning for long-tailed recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 6908−6918
    [12] Ren J W, Yu C J, Sheng S A, Ma X, Zhao H Y, Yi S, et al. Balanced meta-softmax for long-tailed visual recognition. Advances in Neural Information Processing Systems, 2020, 33: 4175−4186
    [13] Hong Y, Han S, Choi K, Seo S, Kim B, Chang B. Disentangling label distribution for long-tailed visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE, 2021. 6622−6632
    [14] Wang X D, Lian L, Miao Z Q, Liu Z W, Yu S X. Long-tailed recognition by outing diverse distribution-aware experts. arXiv preprint arXiv: 2010.01809, 2020.
    [15] Li T H, Wang L M, Wu G S. Self supervision to distillation for long-tailed visual recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE, 2021. 610−619
    [16] Wang Y R, Gan W H, Yang J, Wu W, Yan J J. Dynamic curriculum learning for imbalanced data classication. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, South Korea: IEEE, 2019. 5016−5025
    [17] Zang Y H, Huang C, Loy C C. FASA: Feature augmentation and sampling adaptation for long-tailed instance segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE, 2021. 3457−3466
    [18] Cui Y, Jia M L, Lin T Y, Song Y, Belongie S. Class-balanced loss based on effective number of samples. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 9268−9277
    [19] Lin T Y, Goyal P, Girshick R, He K M, Dollár P. Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 318−327 doi: 10.1109/TPAMI.2018.2858826
    [20] Tan J R, Wang C B, Li B Y, Li Q Q, Ouyang W L, Yin C Q, et al. Equalization loss for long-tailed object recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020. 11659−11668
    [21] Wang J Q, Zhang W W, Zang Y H, Cao Y H, Pang J M, Gong T, et al. Seesaw loss for long-tailed instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE, 2021. 9690−9699
    [22] Zhao Y, Chen W C, Tan X, Huang K, Zhu J H. Adaptive logit adjustment loss for long-tailed visual recognition. arXiv preprint arXiv: 2104.06094, 2021.
    [23] Guo H, Wang S. Long-tailed multi-label visual recognition by collaborative training on uniform and re-balanced samplings. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE, 2021. 15084−15093
    [24] Zhou B Y, Cui Q, Wei X S, Chen Z M. BBN: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020. 9716−9725
    [25] Li Y, Wang T, Kang B Y, Tang S, Wang C F, Li J T, et al. Overcoming classier imbalance for long-tail object detection with balanced group softmax. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020. 10988−10997
    [26] Cui J Q, Liu S, Tian Z T, Zhong Z S, Jia J Y. ResLT: Residual learning for long-tailed recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3695−3706
    [27] Li J, Tan Z C, Wan J, Lei Z, Guo G D. Nested collaborative learning for long-tailed visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 6939−6948
    [28] Cui J Q, Zhong Z S, Liu S, Yu B, Jia J Y. Parametric contrastive learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE, 2021. 695−704
    [29] Alshammari S, Wang Y X, Ramanan D, Kong S. Long-tailed recognition via weight balancing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 6887−6897
    [30] Li M K, Cheung Y M, Jiang J Y. Feature-balanced loss for long-tailed visual recognition. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME). Taiwan, China: IEEE, 2022. 1−6
    [31] Goodfellow I, Bengio Y, Courville A. Deep Learning. Massachusetts: MIT Press, 2016.
    [32] Schroff F, Kalenichenko D, Philbin J. FaceNet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE, 2015. 815−823
    [33] Krizhevsky A. Learning Multiple Layers of Features From Tiny Images [Master thesis], University of Toronto, Canada, 2009.
    [34] Horn G V, Aodha O M, Song Y, Cui Y, Sun C, Shepard A, et al. The iNaturalist species classication and detection dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, USA: IEEE, 2018. 8769−8778
    [35] He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 770−778
    [36] Xie S N, Girshick R, Dollar P, Tu Z W, He K M. Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 5987−5995
    [37] Cao K D, Wei C, Gaidon A, Arechiga N, Ma T Y. Learning imbalanced datasets with label-distribution-aware margin loss. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc., 2019. 1567–1578
    [38] Jin Y, Li M K, Lu Y, Cheung Y M, Wang H Z. Long-tailed visual recognition via self-heterogeneous integration with knowledge excavation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada: IEEE, 2023. 23695−23704
    [39] Du Y J, Shen J Y, Zhen X T, Snoek C G M. SuperDisco: Super-class discovery improves visual recognition for the long-tail. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada: IEEE, 2023. 19944−19954
    [40] Sharma S, Xian Y Q, Yu N, Singh A. Learning prototype classiers for long-tailed recognition. In: Proceedings of the Thirty-second International Joint Conference on Articial Intelligence. Macao, China: ACM, 2023. 1360−1368
    [41] Park S, Hong Y, Heo B, Yun S, Choi J Y. The majority can help the minority: Context-rich minority oversampling for long-tailed classication. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 6877−6886
    [42] Tang K H, Huang J Q, Zhang H W. Long-tailed classication by keeping the good and removing the bad momentum causal effect. arXiv preprint arXiv: 2009.12991, 2020.
    [43] Chou H P, Chang S C, Pan J Y, Wei W, Juan D C. Remix: Rebalanced mixup. In: Proceedings of the European Conference on Computer Vision. Glasgow, UK: Springer, 2020. 95−110
    [44] Zhong Z S, Cui J Q, Liu S, Jia J Y. Improving calibration for long-tailed recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE, 2021. 16484−16493
    [45] McInnes L, Healy J, Melville J. UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv: 1802.03426, 2020.
  • 加载中
图(8) / 表(6)
计量
  • 文章访问数:  434
  • HTML全文浏览量:  287
  • PDF下载量:  168
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-05-18
  • 录用日期:  2023-11-03
  • 网络出版日期:  2024-04-23
  • 刊出日期:  2024-05-29

目录

    /

    返回文章
    返回