• 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

视觉工业缺陷检测: 从非基础模型到基础模型的演进与协同综述

杨天乐 常璐瑶 言嘉栋 李俊涛 张可 张民

杨天乐, 常璐瑶, 言嘉栋, 李俊涛, 张可, 张民. 视觉工业缺陷检测: 从非基础模型到基础模型的演进与协同综述. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250594
引用本文: 杨天乐, 常璐瑶, 言嘉栋, 李俊涛, 张可, 张民. 视觉工业缺陷检测: 从非基础模型到基础模型的演进与协同综述. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250594
Yang Tian-Le, Chang Lu-Yao, Yan Jia-Dong, Li Jun-Tao, Zhang Ke, Zhang Min. Visual industrial defect detection: a survey on the evolution and synergy from non-foundation model to foundation model. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250594
Citation: Yang Tian-Le, Chang Lu-Yao, Yan Jia-Dong, Li Jun-Tao, Zhang Ke, Zhang Min. Visual industrial defect detection: a survey on the evolution and synergy from non-foundation model to foundation model. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250594

视觉工业缺陷检测: 从非基础模型到基础模型的演进与协同综述

doi: 10.16383/j.aas.c250594 cstr: 32138.14.j.aas.c250594
基金项目: 国家自然科学基金(62506253)资助
详细信息
    作者简介:

    杨天乐:苏州大学计算机科学与技术学院硕士研究生. 主要研究方向为视觉与多模态大语言模型的应用. E-mail: tlyang@stu.suda.edu.cn

    常璐瑶:华中师范大学计算机学院硕士研究生. 主要研究方向为计算机视觉. E-mail: changluyao001@163.com

    言嘉栋:苏州大学计算机科学与技术学院硕士研究生. 主要研究方向为多模态大语言模型的应用. E-mail: jdyan24@stu.suda.edu.cn

    李俊涛:苏州大学人工智能研究院副教授. 主要研究方向为预训练语言模型, 文本生成与对话系统. E-mail: ljt@suda.edu.cn

    张可:苏州大学计算机科学与技术学院副教授. 主要研究方向为视觉大模型与多模态大语言模型的算法基础与应用. 本文通信作者. E-mail: kzhang19@suda.edu.cn

    张民:哈尔滨工业大学(深圳)计算与智能研究院教授. 主要研究方向为自然语言处理, 人工智能与大模型. E-mail: zhangminmt@hotmail.com

  • 中图分类号: Y

Visual Industrial Defect Detection: A Survey on the Evolution and Synergy From Non-foundation Model to Foundation Model

Funds: Supported by National Natural Science Foundation of China (62506253)
More Information
    Author Bio:

    YANG Tian-Le Master student at the School of Computer Science and Technology, Soochow University. His research interests include the application of vision and multimodal large language models

    CHANG Lu-Yao Master student at the School of Computer Science, Central China Normal University. Her main research interest is computer vision

    YAN Jia-Dong Master student at the School of Computer Science and Technology, Soochow University. His main research interest is the application of multimodal large language models

    LI Jun-Tao Associate professor at the Institute of Artificial Intelligence, Soochow University. His research interests include pre-trained language models, text generation, and dialogue systems

    ZHANG Ke Associate professor at the School of Computer Science and Technology, Soochow University. His research interests include the algorithmic foundations and applications of large-scale vision models and multimodal large language models. Corresponding author of this paper

    ZHANG Min Professor at the Faculty of Computing and Intelligence, Harbin Institute of Technology, Shenzhen. His research interests include natural language processing, artificial intelligence, and large models

  • 摘要: 随着工业产品日益丰富且精密复杂, 视觉工业缺陷检测技术备受关注. 近年来, 基础模型(FM)凭借其从海量数据中获取的广博先验知识, 在泛化能力及少样本与零样本场景中展现出强大的潜力. 然而, 梳理现有方法可以发现一个值得关注的现象:当前许多先进的FM方法, 其性能的显著提升并非单纯依赖FM的应用, 而是通过将FM强大的通用表征能力与非基础模型(NFM)方法中成熟、高效的任务导向型原理(例如对比学习、知识蒸馏和异常合成)进行战略性融合. 为系统性地分析并揭示这一协同范式, 首先分别对NFM和FM方法进行系统性综述, 并从多维度比较两种方法, 分析各自的优势与局限. 在此基础上, 深入剖析NFM策略如何从其原始架构中解耦出来, 并被重新用于增强基础模型在工业应用中的性能, 同时构建协同机制适配性矩阵. 此外, 进一步探讨该范式在实际场景中的落地局限. 在对比数据的支持下, 分析结果表明, FM的通用知识与NFM的任务特化优势之间存在巨大的协同潜力, 这为未来的研究指明一条有效的借鉴思路.
  • 图  1  调研方法的组织结构

    Fig.  1  Organization of surveyed methods

    图  2  FM与NFM方法的比较摘要

    Fig.  2  A summary of the comparison between FM and NFM methods

    图  3  FM与NFM方法的框架

    Fig.  3  The framework of FM and NFM methods

    图  4  主流工业缺陷检测数据集与典型缺陷样本示例

    Fig.  4  Mainstream industrial defect detection datasets and typical defect sample examples

    图  5  NFM与FM模型发展中的代表性方法

    Fig.  5  Representative methods along the development of NFM and FM mdoel

    图  6  FM与NFM策略的协同增强范式

    Fig.  6  Synergy enhancement paradigm of FM and NFM strategies

    表  1  主流工业检测数据集、核心挑战与典型应用场景汇总

    Table  1  Summary of mainstream industrial datasets, core challenges and typical application scenarios

    主流数据集 场景领域 核心挑战 典型FM方法
    MVTec AD[9] 通用元件/纹理 复杂纹理异常、微小缺陷分布 WinCLIP[23]
    VisA[10] 复杂背景/多对象 多对象复杂背景、结构不规则 SAA+[18]
    Real-IAD[11] 全品类/多视角 细粒度逻辑缺陷、多视角特征 AnomalyGPT[35]
    MVTec 3D-AD[100] 3D零部件 空间几何形变、高维空间建模 PointAD[43]
    下载: 导出CSV

    表  2  多维度评测体系

    Table  2  Multi-dimensional evaluation framework

    评测维度场景/任务推荐指标
    缺陷感知2D分类I-AUROC、F1-max
    2D分割P-AUROC、PRO
    3D检测3D-PRO、Chamfer
    逻辑异常Acc.、Conf. Matrix
    工业部署实时产线FPS、Latency
    边缘设备Params、Memory
    场景适应少样本Performance Gain
    域迁移Domain Adapt. Acc.
    下载: 导出CSV

    表  3  不同NFM与FM方法的简要总结与概览

    Table  3  A brief summary and overview of different NFM and FM methods

    类别 子类别 方法 描述 发表平台 性能 数据集
    非基础模型方法 2D统计 SOFS[45] 引入异常先验图和混合正常Dice损失 IEEE TII 2025 93.3 MVTec AD
    PNI[46] 利用位置和邻域信息 ICCV 2023 99.6
    REB[47] 减少领域和局部密度偏差 KBS 2024 99.5
    BGAD[48] 通过拉近正常样本同时推开异常样本来强化决策边界 CVPR 2023 99.3
    COAD[49] 通过受控的过拟合增强模型对异常的敏感性 ICLR 2025 99.9
    2D合成 GLASS[50] 基于高斯噪声和梯度上升的异常合成 ECCV 2024 99.9
    AdaBLDM[51] 具有特征编辑功能的潜在扩散模型 ARXIV 2024 -
    RealNet[52] 强度可控的扩散异常合成 CVPR 2024 99.6
    CAGEN[53] 文本引导的可控异常生成 ICASSP 2024 97.7
    AnomalyXFusion[54] 用于增强样本保真度的多模态异常合成 ARXIV 2024 99.2
    AnomalyDiffusion[55] 空间异常嵌入, 自适应注意力重加权机制 AAAI 2024 99.2
    DFMGAN[56] 在StyleGAN2中使用缺陷感知残差块 AAAI 2023 -
    DeSTSeg[57] 去噪学生编码器-解码器, 自适应多级特征融合 CVPR 2023 98.6
    CutSwap[58] 利用显著性指导以纳入语义线索 SIVP 2023 98.0
    Split Training[59] 缓解过拟合问题的拆分训练策略 AAAI 2024 98.3
    DFD[60] 具有双路径频率判别器的频域分析 KBS 2024 93.3
    PBAS[61] 利用正常样本特征的紧凑分布指导特征级异常合成的方向 IEEE TCSVT 2024 99.8
    2D RGB + 3D点云 Shape-Guided[62] 用于颜色和形状异常定位的协同专家模型 ICML 2023 94.7 MVTec3D-AD
    CPMF[63] 结合手工PCD描述符和预训练的2D神经网络 Pattern Recognition 2023 92.9
    Back to the Feature[64] 具有PatchCore的手工3D表示 CVPR 2023 97.8
    TransFusion[65] 利用基于透明度的扩散解决过度泛化和细节丢失问题 ECCV 2024 98.2
    3DSR[66] 深度感知离散自动编码器和模拟深度生成过程 WACV 2024 97.8
    M3DM[103] 一种混合融合方案, 以减少多模态特征间的干扰并鼓励特征交互 CVPR 2023 94.5
    AST[67] 引入网络以补偿归一化流错误估计的似然性 WACV 2023 93.7
    3D生成 R3D-AD[68] 克服记忆库模块导致的低效和MAE不正确重建导致的低性能 ECCV 2024 73.4 Real 3D-AD
    Reg 3D-AD[104] 一种双特征表示方法, 以保留训练原型的局部和全局特征 NeurIPS 2023 70.4 Real 3D-AD
    PointCore[69] 降低推理中的计算成本和错配干扰 ARXIV 2024 82.9 Real 3D-AD
    Uni-3DAD[70] 对无模型工业产品具有显著的适应性 Expert Syst. Appl. 2025 - MVTec 3D-AD
    Group3AD[71] 通过组级别特征对比学习提高3D异常检测的分辨率和准确性 ACM MM 2024 75.1 Real 3D-AD
    基础模型方法 基于2D SAM ClipSAM[21] 具有多级提示的分层掩码细化 Neurocomputing 2025 92.3 MVTec AD
    UCAD[20] 使用SAM的基于结构的对比学习 AAAI 2024 93.0
    SAM-LAD[19] 使用SAM获取查询和参考图像的对象掩码, 并提取对象特征进行匹配 KBS 2025 98.4
    SAA+[18] 混合提示正则化 IEEE T-CYB 2025 -
    STLM[17] 利用SAM作为教师指导学生网络 ACM TOMM 2025 98.3
    SPT[22] 调整SAM以更好地理解图像中不同区域间关系 AAAI 2025 -
    基于2D CLIP WinCLIP[23] 组合式提示集成, 参考关联方法 CVPR 2023 93.1
    AnoCLIP[110] 局部感知视觉令牌, 领域感知提示, 测试时自适应方法 ARXIV 2024 -
    AnomalyCLIP[24] 对象无关的文本提示模板, 全局异常损失函数 ICLR 2024 -
    AdaCLIP[25] 混合(静态和动态)可学习提示, 混合语义融合模块 ECCV 2024 -
    VCP-CLIP[26] 视觉上下文提示模型 ECCV 2024 -
    基础模型方法 基于2D CLIP SimCLIP[27] 多层次视觉适配器, 隐式提示学习, 先验感知优化算法 ACM MM 2024 95.3 MVTec AD
    CLIP-AD[28] 文本提示分布, 通过线性层促进对齐 IJCAI 2024 -
    CLIP-FSAC[29] 两阶段训练策略, 视觉驱动的文本特征, 融合-文本匹配任务 IJCAI 2024 95.5
    ClipSAM[21] CLIP与SAM协作, 统一的多尺度跨模态交互, 多级掩码细化 Neurocomputing 2025 -
    SOWA[30] 层次化冻结窗口自注意力, 双重可学习提示 ACM MM 2024 -
    SAA+[18] 混合提示, 领域专家知识和目标图像上下文 IEEE T-CYB 2025
    APRIL-GAN[31] 采用状态和模板集成组合, 基于记忆库的方法 ARXIV 2023 92.0
    PromptAD[32] 提示学习, 语义串联, 显式异常边界 CVPR 2024 94.6
    FiLo[33] 细粒度描述, 可学习向量, 位置增强的高质量定位方法 ACM MM 2024 -
    Dual-Image Enhanced CLIP[34] 双图像特征增强, 使用伪异常合成的测试时自适应 ARXIV 2024 -
    基于2D GPT AnomalyGPT[35] 轻量级且基于视觉-文本特征匹配的解码器, 提示嵌入 AAAI 2024 94.1
    Myriad[38] 应用视觉专家, 视觉专家分词器 ARXIV 2023 94.1
    ALFA[39] 运行时提示自适应策略, 细粒度对齐器 ACM MM 2024 94.5
    GPT-4V-AD[40] 视觉问答范式, 颗粒化区域划分, 提示设计, Text2Segmentation方法 IJCAI 2024 -
    Customizable-VLM[37] 通过提示将专家知识作为外部记忆集成, 以增强基础模型 IEEE CSCWD 2025 82.9
    LogiCode[41] 使用LLM提取图像逻辑并生成代码以进行逻辑异常检测 IEEE TASE 2025 -
    基于3D CLIP CLIP3D-AD[42] 无需记忆库和大量训练样本, 解决少样本异常分类和分割 ACM MM 2024 - MVTec3D-AD
    PointAD[43] 混合表示学习框架 NeurIPS 2024 97.2
    M3DM-NR[44] 使用疑似异常图实现去噪 IEEE TPAMI 2025 94.5
    下载: 导出CSV

    表  4  基础模型增强方法及其在MVTec AD上的性能比较(%)

    Table  4  Comparison of foundation model enhancement methods and their performance on MVTec AD (%)

    FMFM方法增强方法与NFM方法的联系性能(MVTec)
    指标基线增强后提升
    SAMUCAD由高保真SAM掩码引导的对比学习对比学习(BGAD、Group3AD): 利用对比学习框架, 使用SAM掩码定义正负样本p_AUROC I_AUROC69.3
    18.3
    93.0
    45.6
    ↑23.7
    ↑27.3
    SAA+使用专家知识和图像上下文的混合提示正则化知识驱动: 将领域专家先验知识编码到混合提示中以指导基础模型F_p
    F_r
    30.95
    29.17
    34.85
    34.07
    ↑3.9
    ↑4.9
    STLMSAM引导的双流轻量级架构学生−教师架构(AST、DeSTSeg)/知识重建(R3D-AD): 一个流基于学生−教师架构生成判别性特征; 另一个流重建无异常图像p_AUROC
    I_AUROC
    -98.26
    99.05
    -
    CLIPAnoCLIP使用合成噪声扰动的测试时自适应异常合成(GLASS、CAGEN): 合成数据用于轻量级适配器的在线优化p_AUROC
    I_AUROC
    88.9
    -
    90.6
    -
    ↑1.7
    -
    Dual-Image
    Enhanced CLIP
    使用合成异常的测试时自适应异常合成: 类似于AnoCLIPp_AUROC
    I_AUROC
    85.3
    91.6
    92.8
    93.2
    ↑7.5
    ↑1.6
    SimCLIP多级视觉适配器与隐式提示学习适配器/提示学习(SimCLIP): 插入轻量级模块对齐工业纹理与预训练分布的域差异p_AUROC
    I_AUROC
    -95.6
    95.3
    -
    WinCLIP组合式提示集成与参考关联方法记忆库机制(PatchCore): 引入外部记忆库存储正常特征以辅助少样本决策p_AUROC
    I_AUROC
    85.1
    91.8
    95.2
    93.1
    ↑10.1
    ↑1.3
    APRIL-GAN线性适配层与记忆库结合记忆库机制/对比学习: 结合分布存储与线性层以优化少样本下的稳定性p_AUROC
    I_AUROC
    -95.1
    92.0
    -
    GPTAnomalyGPT在模拟数据上训练, 结合特征匹配解码器和提示学习器异常合成(AnomalyXFusion): 通过生成大量"异常图像-文本描述"数据对来训练LVLM, 实现视觉−语言对齐p_AUROC
    I_AUROC
    -93.1
    97.4
    -
    Myriad由专业"视觉专家"引导的LoRA自适应学生−教师知识蒸馏(AST): 专业的检测器作"教师", 生成异常图以指导LMM (学生)的注意力Accuracy92.094.2↑2.2
    Customizable-VLM多模态提示策略知识驱动: 将专家知识编码到提示中I_AUROC-82.9-
    下载: 导出CSV

    表  5  FM面临挑战与NFM协同机制适配性矩阵

    Table  5  Adaptability matrix of FM facing challenges and NFM collaborative mechanisms

    FM类型 主要局限/挑战 适用场景/缺陷类型 推荐NFM协同机制 协同原理简述 代表性方法
    SAM 缺乏语义感知
    (不识“缺陷”类别)
    语义/对象级缺陷
    (如异物、组件缺失)
    知识驱动/提示工程 将专家知识(如“划痕”)编码为提示, 引导SAM聚焦特定异常语义 SAA+[18]
    ClipSAM[21]
    特征判别性不足
    (仅基于边缘分割)
    微小纹理/隐蔽缺陷
    (如微划痕、同色疵点)
    对比学习 利用掩码构造结构化正负样本, 强化特征空间对微小差异的敏感度 UCAD[20]
    CLIP 定位精度低
    (分割分辨率差)
    像素级表面缺陷
    (需高精度定位)
    异常合成 生成伪异常微调适配器, 强制模型学习像素级边界 AnoCLIP[110]
    Dual-Image[34]
    域分布差异
    (自然与工业)
    复杂工业纹理
    (强背景干扰)
    适配器/提示学习 插入轻量级模块对齐特征空间, 弥合自然图像与工业纹理鸿沟 SimCLIP[27]
    AdaCLIP[25]
    样本极少
    (需分布参考)
    多品种小批量
    (少样本/零样本)
    记忆库机制 引入外部记忆库存储正常分布, 辅助少样本下的决策稳定性 WinCLIP[23]
    APRIL-GAN[31]
    GPT/LVLM 缺乏领域知识
    (描述不专业)
    逻辑/因果异常
    (需推理诊断)
    知识驱动/上下文学习 注入专家示例与多模态提示, 激活模型的逻辑推理与泛化能力 Customizable-VLM[37]
    AnomalyGPT[35]
    推理开销大
    (难以实时)
    在线实时检测
    (高FPS需求)
    知识蒸馏 大模型作为"教师"生成伪标签, 指导轻量级"学生"网络学习 STLM[17]
    Myriad[38]
    下载: 导出CSV
  • [1] Pang G, Shen C, Cao L, Hengel A V D. Deep learning for anomaly detection: A review. ACM Computing Surveys, 2021, 54(2): 38
    [2] Bergmann P, Löwe S, Fauser M, Sattlegger D, Steger C. Improving unsupervised defect segmentation by applying structural similarity to autoencoders. arXiv: 1807.02011, 2018.
    [3] Gong D, Liu L, Le V, Saha B, Mansour M R, Venkatesh S, et al. Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul, South Korea: IEEE, 2019. 1705-1714.
    [4] Liu Y, Zhuang C, Lu F. Unsupervised two-stage anomaly detection. arXiv: 2103.11671, 2021.
    [5] Deng H, Li X. Anomaly detection via reverse distillation from one-class embedding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022. 9727-9736.
    [6] Liu Z, Zhou Y, Xu Y, Wang Z. SimpleNet: A simple network for image anomaly detection and localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE, 2023. 20402-20411.
    [7] Liang Y, Hu Z, Huang J, Di D, Su A, Fan L. ToCoAD: Two-stage contrastive learning for industrial anomaly detection. IEEE Transactions on Instrumentation and Measurement, 2025, 74: 1−13 doi: 10.1109/tim.2025.3545987
    [8] Hu H, Wang X, Zhang Y, Chen Q, Guan Q. A comprehensive survey on contrastive learning. Neurocomputing, 2024, 584: 128645
    [9] Bergmann P, Fauser M, Sattlegger D, Steger C. MVTec AD — A comprehensive real-world dataset for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 9584-9592.
    [10] Zou Y, Jeong J, Pemula L, Zhang D, Dabeer O. Spot-the-difference self-supervised pre-training for anomaly detection and segmentation. In: Proceedings of the European Conference on Computer Vision. Berlin, Germany: Springer, 2022. 392-408.
    [11] Wang C, Zhu W, Gao B B, Gan Z, Zhang J, Gu Z, et al. Real-IAD: A real-world multi-view dataset for benchmarking versatile industrial anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2024. 22883-22892.
    [12] Radford A, Kim J W, Hallacy C, Ramesh A, Goh G, Agarwal S, et al. Learning transferable visual models from natural language supervision. In: Proceedings of the International Conference on Machine Learning (ICML). Virtual Event: PMLR, 2021. 8748-8763.
    [13] Zhu D, Chen J, Shen X, Li X, Elhoseiny M. MiniGPT-4: Enhancing vision-language understanding with advanced large language models. arXiv: 2304.10592, 2023.
    [14] Yang Z, Li L, Lin K, Wang J, Lin C C, Liu Z, et al. The dawn of LMMs: Preliminary explorations with GPT-4V(ision). arXiv: 2309.17421, 2023.
    [15] Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L, et al. Segment anything. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Paris, France: IEEE, 2023. 4015-4026.
    [16] Rani A, Ortiz-Arroyo D, Durdevic P. Advancements in point cloud-based 3D defect detection and classification for industrial systems: A comprehensive survey. arXiv: 2402.12923, 2024.
    [17] Li C, Qi L, Geng X. A SAM-guided two-stream lightweight model for anomaly detection. ACM Transactions on Multimedia Computing, Communications and Applications, 2025, 21(2): 1−23
    [18] Cao Y, Xu X, Sun C, Cheng Y, Du Z, Gao L, et al. Personalizing vision-language models with hybrid prompts for zero-shot anomaly detection. IEEE Transactions on Cybernetics, 2025, DOI: 10.1109/TCYB.2025.10884560.
    [19] Peng Y, Lin X, Ma N, Du J, Liu C, Liu C, et al. SAM-LAD: Segment anything model meets zero-shot logic anomaly detection. Knowledge-Based Systems, 2025, 310: 112634 doi: 10.1016/j.knosys.2025.113176
    [20] Liu J, Wu K, Nie Q, Chen Y, Gao B B, Liu Y, et al. Unsupervised continual anomaly detection with contrastively-learned prompt. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vancouver, Canada: AAAI Press, 2024, 38(4): 3639-3647.
    [21] Li S, Cao J, Ye P, Ding Y, Tu C, Chen T. Enhancing zero-shot anomaly detection: CLIP-SAM collaboration with cascaded prompts. Neurocomputing, 2025, 615: 128682 doi: 10.1007/978-981-97-8490-5_4
    [22] Yang H Y, Chen H, Wang A, Chen K, Lin Z, Tang Y, et al. Promptable anomaly segmentation with sam through self-perception tuning. In: Proceedings of the AAAI Conference on Artificial Intelligence. Philadelphia, USA: AAAI Press, 2025.
    [23] Jeong J, Zou Y, Kim T, Zhang D, Ravichandran A, Dabeer O. WinClip: Zero-/few-shot anomaly classification and segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE, 2023. 19606-19616.
    [24] Zhou Q, Pang G, Tian Y, He S, Chen J. Anomalyclip: Object-agnostic prompt learning for zero-shot anomaly detection. In: Proceedings of the International Conference on Learning Representations. Vienna, Austria: OpenReview, 2024.
    [25] Cao Y, Zhang J, Frittoli L, Cheng Y, Shen W, Boracchi G. AdaCLIP: Adapting CLIP with hybrid learnable prompts for zero-shot anomaly detection. In: Proceedings of the European Conference on Computer Vision. Milan, Italy: Springer, 2025. 55-72.
    [26] Qu Z, Tao X, Prasad M, Shen F, Zhang Z, Gong X, et al. Vcp-clip: A visual context prompting model for zero-shot anomaly segmentation. In: Proceedings of the European Conference on Computer Vision. Milan, Italy: Springer, 2024.
    [27] Deng C, Xu H, Chen X, Xu H, Tu X, Ding X, et al. SimCLIP: Refining image-text alignment with simple prompts for zero-/few-shot anomaly detection. In: Proceedings of the 32nd ACM International Conference on Multimedia. Melbourne, Australia: ACM, 2024. 1761-1770.
    [28] Chen X, Zhang J, Tian G, He H, Zhang W, Wang Y, et al. CLIP-AD: A language-guided staged dual-path model for zero-shot anomaly detection. In: Proceedings of the International Joint Conference on Artificial Intelligence. Jeju, South Korea: ijcai.org, 2024. 17-33.
    [29] Zuo Z, Wu Y, Li B, Dong J, Zhou Y, Zhou L, et al. CLIP-FSAC: Boosting CLIP for few-shot anomaly classification with synthetic anomalies. In: Proceedings of the 33rd International Joint Conference on Artificial Intelligence. Jeju, South Korea: ijcai.org, 2024. 1834-1842.
    [30] Hu Z, Zhang Z. Sowa: Adapting hierarchical frozen window self-attention to visual-language models for better anomaly detection. In: Proceedings of the 32nd ACM International Conference on Multimedia. Melbourne, Australia: ACM, 2024.
    [31] Chen X, Han Y, Zhang J. APRIL-GAN: A zero-/few-shot anomaly classification and segmentation method. arXiv: 2305.17382, 2023.
    [32] Li X, Zhang Z, Tan X, Chen C, Qu Y, Xie Y, et al. PromptAD: Learning prompts with only normal samples for few-shot anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2024. 16838-16848.
    [33] Gu Z, Zhu B, Zhu G, Chen Y, Li H, Tang M, et al. Filo: Zero-shot anomaly detection by fine-grained description and high-quality localization. In: Proceedings of the 32nd ACM International Conference on Multimedia. Melbourne, Australia: ACM, 2024. 2041-2049.
    [34] Zhang Z, Deng H, Bao J, Li X. Dual-image enhanced CLIP for zero-shot anomaly detection. arXiv: 2405.04782, 2024.
    [35] Gu Z, Zhu B, Zhu G, Chen Y, Tang M, Wang J. AnomalyGPT: Detecting industrial anomalies using large vision-language models. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vancouver, Canada: AAAI Press, 2024, 38(3): 1932-1940.
    [36] Cao Y, Xu X, Sun C, Huang X, Shen W. Towards generic anomaly detection and understanding: Large-scale visual-linguistic model (GPT-4V) takes the lead. arXiv: 2311.02782, 2023.
    [37] Xu X, Cao Y, Chen Y, Shen W, Huang X. Customizing visual-language foundation models for multi-modal anomaly detection and reasoning. In: Proceedings of the IEEE 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD). Winnipeg, Canada: IEEE, 2025.
    [38] Li Y, Wang H, Yuan S, Liu M, Zhao D, Guo Y, et al. Myriad: Large multimodal model by applying vision experts for industrial anomaly detection. arXiv: 2310.19070, 2023.
    [39] Zhu J, Cai S, Deng F, Ooi B C, Wu J. Do LLMs understand visual anomalies? Uncovering LLM's capabilities in zero-shot anomaly detection. In: Proceedings of the 32nd ACM International Conference on Multimedia. Melbourne, Australia: ACM, 2024. 48-57.
    [40] Zhang J, He H, Chen X, Xue Z, Wang Y, Wang C, et al. GPT-4V-AD: Exploring grounding potential of VQA-oriented GPT-4V for zero-shot anomaly detection. In: Proceedings of the International Joint Conference on Artificial Intelligence. Jeju, South Korea: ijcai.org, 2024. 3-16.
    [41] Zhang Y, Cao Y, Xu X, Shen W. LogiCode: An LLM-driven framework for logical anomaly detection. IEEE Transactions on Automation Science and Engineering, 2024, DOI: 10.1109/TASE.2024.3468464.
    [42] Zuo Z, Dong J, Wu Y, Qu Y, Wu Z. Clip3d-ad: Extending clip for 3d few-shot anomaly detection with multi-view images generation. In: Proceedings of the 32nd ACM International Conference on Multimedia. Melbourne, Australia: ACM, 2024.
    [43] Zhou Q, Yan J, He S, Meng W, Chen J. Pointad: Comprehending 3d anomalies from points and pixels for zero-shot 3d anomaly detection. In: Proceedings of the Annual Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates, Inc., 2024.
    [44] Wang C, Zhu H, Peng J, Wang Y, Yi R, Wu Y, et al. M3DM-NR: RGB-3D noisy-resistant industrial anomaly detection via multimodal denoising. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, 47(11): 11091585
    [45] Zhang Z, Niu C, Zhao Z, Zhang X, Chen X. Small object few-shot segmentation for vision-based industrial inspection. IEEE Transactions on Industrial Informatics, 2025, 21(3): 10908360
    [46] Bae J, Lee J H, Kim S. PNI: Industrial anomaly detection using position and neighborhood information. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Paris, France: IEEE, 2023. 6350-6360.
    [47] Lyu S, Mo D, Wong W K. REB: Reducing biases in representation for industrial anomaly detection. Knowledge-Based Systems, 2024, 290: 111563 doi: 10.1016/j.knosys.2024.111563
    [48] Yao X, Li R, Zhang J, Sun J, Zhang C. Explicit boundary guided semi-push-pull contrastive learning for supervised anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE, 2023. 24490-24499.
    [49] Qian L, Zhu B, Chen Y, Tang M, Wang J. Friend or foe? Harnessing controllable overfitting for anomaly detection. In: Proceedings of the International Conference on Learning Representations. Singapore: OpenReview, 2025.
    [50] Chen Q, Luo H, Lv C, Zhang Z. A unified anomaly synthesis strategy with gradient ascent for industrial anomaly detection and localization. In: Proceedings of the European Conference on Computer Vision. Milan, Italy: Springer, 2025. 37-54.
    [51] Li H, Zhang Z, Chen H, Wu L, Li B, Liu D, et al. A novel approach to industrial defect generation through blended latent diffusion model with online adaptation. arXiv: 2402.19330, 2024.
    [52] Zhang X, Xu M, Zhou X. RealNet: A feature selection network with realistic synthetic anomaly for anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2024. 16699-16708.
    [53] Jiang B, Xie Y, Li J, Li N, Jiang Y, Xia S T. CAGEN: Controllable anomaly generator using diffusion model. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Seoul, South Korea: IEEE, 2024. 3110-3114.
    [54] Hu J, Huang Y, Lu Y, Xie G, Jiang G, Zheng Y, et al. AnomalyXfusion: Multi-modal anomaly synthesis with diffusion. arXiv: 2404.19444, 2024.
    [55] Hu T, Zhang J, Yi R, Du Y, Chen X, Liu L, et al. AnomalyDiffusion: Few-shot anomaly image generation with diffusion model. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vancouver, Canada: AAAI Press, 2024, 38(8): 8526-8534.
    [56] Duan Y, Hong Y, Niu L, Zhang L. Few-shot defect image generation via defect-aware feature manipulation. In: Proceedings of the AAAI Conference on Artificial Intelligence. Washington, DC, USA: AAAI Press, 2023, 37(1): 571-578.
    [57] Zhang X, Li S, Li X, Huang P, Shan J, Chen T. DestSeg: Segmentation guided denoising student-teacher for anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE, 2023. 3914-3923.
    [58] Qin J, Gu C, Yu J, Zhang C. Multilevel saliency-guided self-supervised learning for image anomaly detection. Signal, Image and Video Processing, 2023, 18: 6339−6351 doi: 10.1007/s11760-024-03320-z
    [59] Lin J, Yan Y. A comprehensive augmentation framework for anomaly detection. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vancouver, Canada: AAAI Press, 2024, 38(8): 8742-8749.
    [60] Bai Y, Zhang J, Chen Z, Dong Y, Cao Y, Tian G. Dual-path frequency discriminators for few-shot anomaly detection. Knowledge-Based Systems, 2024, 302: 112397 doi: 10.1016/j.knosys.2024.112397
    [61] Chen Q, Luo H, Gao H, Lv C, Zhang Z. Progressive boundary guided anomaly synthesis for industrial anomaly detection. IEEE Transactions on Circuits and Systems for Video Technology, 2025, 35(2): 1193−1208 doi: 10.1109/TCSVT.2024.3479887
    [62] Chu Y M, Liu C, Hsieh T I, Chen H T, Liu T L. Shape-guided dual-memory learning for 3D anomaly detection. In: Proceedings of the International Conference on Machine Learning (ICML). Honolulu, USA: PMLR, 2023. 6185-6194.
    [63] Cao Y, Xu X, Shen W. Complementary pseudo multimodal feature for point cloud anomaly detection. Pattern Recognition, 2024, 156: 110761 doi: 10.1016/j.patcog.2024.110761
    [64] Horwitz E, Hoshen Y. Back to the feature: Classical 3D features are (almost) all you need for 3D anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE, 2023. 2968-2977.
    [65] Fučka M, Zavrtanik V, Skočaj D. TransFusion—A transparency-based diffusion model for anomaly detection. In: Proceedings of the European Conference on Computer Vision. Milan, Italy: Springer, 2025. 91-108.
    [66] Zavrtanik V, Kristan M, Skočaj D. Cheating depth: Enhancing 3D surface anomaly detection via depth simulation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Waikoloa, USA: IEEE, 2024. 2153-2161.
    [67] Rudolph M, Wehrbein T, Rosenhahn B, Wandt B. Asymmetric student-teacher networks for industrial anomaly detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Waikoloa, USA: IEEE, 2023. 2592-2602.
    [68] Zhou Z, Wang L, Fang N, Wang Z, Qiu L, Zhang S. R3D-AD: Reconstruction via diffusion for 3D anomaly detection. In: Proceedings of the European Conference on Computer Vision. Milan, Italy: Springer, 2025. 91-107.
    [69] Zhao B, Xiong Q, Zhang X, Guo J, Liu Q, Xing X, et al. PointCore: Efficient unsupervised point cloud anomaly detector using local-global features. arXiv: 2403.01804, 2024.
    [70] Liu J, Mou S, Gaw N, Wang Y. Uni-3DAD: GAN-inversion aided universal 3D anomaly detection on model-free products. Expert Systems with Applications, 2025, 265: 125862 doi: 10.1016/j.eswa.2025.126665
    [71] Zhu H, Xie G, Hou C, Dai T, Gao C, Wang J, et al. Towards high-resolution 3D anomaly detection via group-level feature contrastive learning. In: Proceedings of the 32nd ACM International Conference on Multimedia. Melbourne, Australia: ACM, 2024. 4680-4689.
    [72] Zavrtanik V, Kristan M, Skočaj D. DRÆM—A discriminatively trained reconstruction embedding for surface anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021. 8310-8319.
    [73] Al-Fakih A, Koeshidayatullah A, Mukerji T, Kaka S I. Enhanced anomaly detection in well log data through the application of ensemble GANs. arXiv: 2411.19875, 2024.
    [74] Bhosale A, Mukherjee S, Banerjee B, Cuzzolin F. Anomaly detection using diffusion-based methods. arXiv: 2412.07539, 2024.
    [75] Kim S, Lee S Y, Bu F, Kang S, Kim K, Yoo J, et al. Rethinking reconstruction-based graph-level anomaly detection: Limitations and a simple remedy. arXiv: 2410.20366, 2024.
    [76] Yao H, Liu M, Yin Z, Yan Z, Hong X, Zuo W. GLAD: Towards better reconstruction with global and local adaptive diffusion models for unsupervised anomaly detection. In: Proceedings of the European Conference on Computer Vision. Milan, Italy: Springer, 2025. 1-17.
    [77] Zade H R, Zare H, Parsa M G, Davardoust H, Bagheri M S. DCOR: Anomaly detection in attributed networks via dual contrastive learning reconstruction. arXiv: 2412.16788, 2024.
    [78] Patra S, Taieb S B. Revisiting deep feature reconstruction for logical and structural industrial anomaly detection. arXiv: 2410.16255, 2024.
    [79] Lee K, Kim M, Jun Y, Woo S S. GDFlow: Anomaly detection with NCDE-based normalizing flow for advanced driver assistance system. arXiv: 2409.05346, 2024.
    [80] Zhou Y, Xu X, Sun Z, Song J, Cichocki A, Shen H T. VQ-Flow: Taming normalizing flows for multi-class anomaly detection via hierarchical vector quantization. arXiv: 2409.00942, 2024.
    [81] Liu X, Xing F, Zhuo J, Stone M, Prince J L, El Fakhri G, et al. Speech motion anomaly detection via cross-modal translation of 4D motion fields from tagged MRI. In: Proceedings of the Medical Imaging 2024: Image Processing. Bellingham, USA: SPIE, 2024, 12926: 129262W.
    [82] Tu Y, Zhang B, Liu L, Li Y, Zhang J, Wang Y, et al. Self-supervised feature adaptation for 3D industrial anomaly detection. In: Proceedings of the European Conference on Computer Vision. Milan, Italy: Springer, 2025. 75-91.
    [83] Li J, Wang X, Zhao H, Zhong Y. Learning a cross-modality anomaly detector for remote sensing imagery. IEEE Transactions on Image Processing, 2024, 33: 3225−3240
    [84] Arav R, Wittich D, Rottensteiner F. Evaluating saliency scores in point clouds of natural environments by learning surface anomalies. arXiv: 2408.14421, 2024.
    [85] Ye J, Zhao W, Yang X, Cheng G, Huang K. PO3AD: Predicting point offsets toward better 3D point cloud anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE, 2025.
    [86] Hao S, Fu W, Chen X, Jin C, Zhou J, Yu S, et al. Network anomaly traffic detection via multi-view feature fusion. arXiv: 2409.08020, 2024.
    [87] Dai A, Chang A X, Savva M, Halber M, Funkhouser T, Nießner M. ScanNet: Richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017. 2432-2443.
    [88] Uy M A, Pham Q H, Hua B S, Nguyen T, Yeung S K. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul, South Korea: IEEE, 2019. 1588-1597.
    [89] Zhou Q, He S, Liu H, Chen T, Chen J. Pull & push: Leveraging differential knowledge distillation for efficient unsupervised anomaly detection and localization. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(5): 2176−2189 doi: 10.1109/TCSVT.2022.3218587
    [90] Chen Z, Luo X, Wang W, Zhao Z, Su F, Men A. Filter or compensate: Towards invariant representation from distribution shift for anomaly detection. arXiv: 2412.10115, 2024.
    [91] Liu X, Wang J, Leng B, Zhang S. Unlocking the potential of reverse distillation for anomaly detection. arXiv: 2412.07579, 2024.
    [92] Liu H, Xu X, Li E, Zhang S, Li X. Anomaly detection with representative neighbors. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(6): 2831−2841 doi: 10.1109/TNNLS.2021.3109898
    [93] Zhou J, Wu Y. Outlier-probability-based feature adaptation for robust unsupervised anomaly detection on contaminated training data. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(10): 10023−10035 doi: 10.1109/TCSVT.2024.3408034
    [94] Xing P, Li Z. Visual anomaly detection via partition memory bank module and error estimation. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(8): 3596−3607 doi: 10.1109/TCSVT.2023.3237562
    [95] Zhang F, Zhu H, Cen Y, Kan S, Zhang L, Vadakkepat P, et al. Low-shot unsupervised visual anomaly detection via sparse feature representation. IEEE Transactions on Neural Networks and Learning Systems, 2024, DOI: 10.1109/TNNLS.2024.3420818.
    [96] Zhou Y, Song X, Zhang Y, Liu F, Zhu C, Liu L. Feature encoding with autoencoders for weakly supervised anomaly detection. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(6): 2454−2465 doi: 10.1109/TNNLS.2021.3086137
    [97] Ramírez Rivera A, Khan A, Bekkouch I E I, Sheikh T S. Anomaly detection based on zero-shot outlier synthesis and hierarchical feature distillation. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(1): 281−291 doi: 10.1109/TNNLS.2020.3027667
    [98] Liang Y, Li X, Huang X, Zhang Z, Yao Y. An automated data mining framework using autoencoders for feature extraction and dimensionality reduction. arXiv: 2412.02211, 2024.
    [99] Chen J, Wang C, Hong Y, Mi R, Zhang L J, Wu Y, et al. A survey on anomaly detection with few-shot learning. In: Proceedings of the International Conference on Cognitive Computing. Berlin, Germany: Springer, 2024. 34-50.
    [100] Bergmann P, Jin X, Sattlegger D, Steger C. The MVTec 3D-AD dataset for unsupervised 3D anomaly detection and localization. In: Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP). Virtual Event: SCITEPRESS, 2022. 202-213.
    [101] Yu J, Zheng Y, Wang X, Li W, Wu Y, Zhao R, et al. FastFlow: Unsupervised anomaly detection and localization via 2D normalizing flows. arXiv: 2111.07677, 2021.
    [102] Roth K, Pemula L, Zepeda J, Schölkopf B, Brox T, Gehler P. Towards total recall in industrial anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022. 14298-14308.
    [103] Wang Y, Peng J, Zhang J, Yi R, Wang Y, Wang C. Multimodal industrial anomaly detection via hybrid fusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE, 2023. 8032-8041.
    [104] Liu J, Xie G, Chen R, Li X, Wang J, Liu Y, et al. Real3D-AD: A dataset of point cloud anomaly detection. Advances in Neural Information Processing Systems, 2023, 36: 30402−30415 doi: 10.52202/075280-1324
    [105] Schlegl T, Seeböck P, Waldstein S M, Langs G, Schmidt-Erfurth U. f-AnoGAN: Fast unsupervised anomaly detection with generative adversarial networks. Medical Image Analysis, 2019, 54: 30−44 doi: 10.1016/j.media.2019.01.010
    [106] Zhou K, Yang J, Loy C C, Liu Z. Learning to prompt for vision-language models. International Journal of Computer Vision, 2022, 130(9): 2337−2348 doi: 10.1007/s11263-022-01653-1
    [107] Zhou K, Yang J, Loy C C, Liu Z. Conditional prompt learning for vision-language models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022. 16816-16825.
    [108] Fu S, Hamilton M, Brandt L, Feldman A, Zhang Z, Freeman W T. FeatUp: A model-agnostic framework for features at any resolution. arXiv: 2403.10516, 2024.
    [109] Rombach R, Blattmann A, Lorenz D, Esser P, Ommer B. High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022. 10674-10685.
    [110] Deng H, Zhang Z, Bao J, Li X. Bootstrap fine-grained vision-language alignment for unified zero-shot anomaly localization. arXiv: 2308.15939, 2024.
    [111] Ma B, Liu Y S, Zwicker M, Han Z. Surface reconstruction from point clouds by learning predictive context priors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022. 6316-6327.
    [112] Cao Y, Xu X, Liu Z, Shen W. Collaborative discrepancy optimization for reliable image anomaly localization. IEEE Transactions on Industrial Informatics, 2023, 19(11): 10674−10683 doi: 10.1109/TII.2023.3241579
    [113] Wan Q, Gao L, Li X, Wen L. Industrial image anomaly localization based on Gaussian clustering of pretrained feature. IEEE Transactions on Industrial Electronics, 2022, 69(6): 6182−6192 doi: 10.1109/TIE.2021.3094452
    [114] Liu S, Zeng Z, Ren T, Li F, Zhang H, Yang J, et al. Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection. In: Proceedings of the European Conference on Computer Vision. Milan, Italy: Springer, 2025. 38-55.
    [115] Ross T, Lin T Y, Goyal P, Dollár P, He K. Focal loss for dense object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017. 2980-2988.
    [116] Milletari F, Navab N, Ahmadi S A. V-Net: Fully convolutional neural networks for volumetric medical image segmentation. In: Proceedings of the 4th International Conference on 3D Vision (3DV). Stanford, USA: IEEE, 2016. 565-571.
    [117] Wang H, Vasu P K A, Faghri F, Vemulapalli R, Farajtabar M, Mehta S, et al. SAM-CLIP: Merging vision foundation models towards semantic and spatial understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2024. 3635-3647.
    [118] Wang Z, Lu Y, Li Q, Tao X, Guo Y, Gong M, et al. CRIS: CLIP-driven referring image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022. 11686-11695.
    [119] Xing Y, Wang X, Li Y, Huang H, Shi C. Less is more: On the over-globalizing problem in graph transformers. arXiv: 2405.01102, 2024.
    [120] Sun X, Hu P, Saenko K. DualCoOp: Fast adaptation to multi-label recognition with limited annotations. Advances in Neural Information Processing Systems, 2022, 35: 30569−30582 doi: 10.52202/068431-2216
    [121] Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, Mishkin P, et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 2022, 35: 27730−27744 doi: 10.52202/068431-2011
    [122] Touvron H, Lavril T, Izacard G, Martinet X, Lachaux M A, Lacroix T, et al. LLaMA: Open and efficient foundation language models. arXiv: 2302.13971, 2023.
    [123] Zhang R, Guo Z, Zhang W, Li K, Miao X, Cui B, et al. PointCLIP: Point cloud understanding by CLIP. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022. 8552-8562.
    [124] Zhou Y, Gu J, Chiang T Y, Xiang F, Su H. Point-SAM: Promptable 3D segmentation model for point clouds. In: Proceedings of the International Conference on Learning Representations. Singapore: ICLR, 2025.
    [125] Cen J, Zhou Z, Fang J, Yang C, Shen W, Xie L, et al. Segment anything in 3D with radiance fields. International Journal of Computer Vision, 2025, 133(8): 5138−5160 doi: 10.1007/s11263-025-02421-7
    [126] Guo Z, Zhang R, Zhu X, Tang Y, Ma X, Han J, et al. Point-Bind & Point-LLM: Aligning point cloud with multi-modality for 3D understanding, generation, and instruction following. arXiv: 2309.00615, 2023.
    [127] Xu R, Wang X, Wang T, Chen Y, Pang J, Lin D. PointLLM: Empowering large language models to understand point clouds. In: Proceedings of the European Conference on Computer Vision. Milan, Italy: Springer, 2024.
    [128] Hong Y, Zhen H, Chen P, Zheng S, Du Y, Chen Z, et al. 3D-LLM: Injecting the 3D world into large language models. In: Proceedings of the Annual Conference on Neural Information Processing Systems. New Orleans, USA: Curran Associates, Inc., 2023.
  • 加载中
计量
  • 文章访问数:  8
  • HTML全文浏览量:  5
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-11-03
  • 录用日期:  2026-03-04
  • 网络出版日期:  2026-04-08

目录

    /

    返回文章
    返回