The Classification, Applications, and Prospects of Prompt Learning in Computer Vision
-
摘要: 随着计算机视觉(Computer vision, CV)的快速发展, 人们对于提高视觉任务的性能和泛化能力的需求不断增长, 导致模型的复杂度与对各种资源的需求进一步提高. 提示学习(Prompt learning, PL)作为一种能有效地提升模型性能和泛化能力、重用预训练模型和降低计算量的方法, 在一系列下游视觉任务中受到了广泛的关注与研究. 然而, 现有的PL综述缺乏对PL方法全面的分类和讨论, 也缺乏对现有实验结果进行深入的研究以评估现有方法的优缺点. 因此, 本文对PL在CV领域的分类、应用和性能进行全面的概述. 首先, 介绍PL的研究背景和定义, 并简要回顾CV领域中PL研究的最新进展. 其次, 对目前CV领域中的PL方法进行分类, 包括文本提示、视觉提示和视觉—语言联合提示, 对每类PL方法进行详细阐述并探讨其优缺点. 接着, 综述PL在十个常见下游视觉任务中的最新进展. 此外, 提供三个CV应用的实验结果并进行总结和分析, 全面讨论不同PL方法在CV领域的表现. 最后, 基于上述讨论对PL在CV领域面临的挑战和机遇进行分析, 为进一步推动PL在CV领域的发展提供前瞻性的思考.Abstract: With the rapid development of computer vision (CV), the growing demand for improving the performance and generalization of visual tasks has led to a further increase in model complexity and the need for various resources. Prompt learning (PL), as a method to effectively enhance model performance, reuse pre-trained models, and reduce computational costs, has gained extensive attention and research in a series of downstream visual tasks. However, existing PL surveys lack comprehensive classification and discussion of PL methods, as well as in-depth analysis of existing experimental results to evaluate the strengths and weaknesses of current approaches. Therefore, this paper provides a comprehensive overview of the classification, applications, and performance of PL in the field of CV. Firstly, the research background and definition of PL are introduced, followed by a brief review of recent PL progress in CV. Secondly, PL methods in CV are categorized into text prompts, visual prompts, and vision-language joint prompts, with each category elaborated in detail and its strengths and limitations discussed. Next, recent advances in ten common downstream visual tasks are reviewed. Additionally, experimental results from three CV applications are provided, summarized, and analyzed to comprehensively discuss the performance of different PL methods in CV. Finally, based on the above discussions, the challenges and opportunities faced by PL in CV are analyzed, offering forward-looking insights to further advance the development of PL in the CV domain.
-
黑体, 作为红外设备的高精度定标源, 采用非接触式的辐射测温方法, 具有不破坏被测物体的优点, 需要快速高稳定的控制温度[1-2]. 黑体辐射源应用于机载红外成像系统, 校正红外制导系统, 同时也是现场红外校准的主要定标源[3]. 黑体温度控制的好坏(例如温度稳定性、升降温响应速度、抗干扰能力等)直接影响了红外设备的标定, 进而影响了其他的应用领域.
目前, 在工控领域, PID仍占据主导地位[4-5], 采用传统的单一的PID控制, 需根据不同的黑体设备由人工调节 PID参数, 无法达到快速的, 高精确, 高稳定的控制. 也有很多文献试图寻找更好的控制策略[6-10], 基本上都是在相关的特定场合进行控制. 黑体控制系统其实质为温度控制系统, 这类系统一般具有扰动性、滞后性且具有较难的系统建模, 是属于比较难的控制问题. 一些先进的控制方法不断出现, 例如模型预测控制、滑模控制、自适应控制等[11-14], 由于黑体温度控制的特殊性质使得这些方法未曾有效的应用. 下面分析一般的具有纯滞后的单回路反馈控制系统, 其控制传递函数为
$$ \Phi(s) = \frac{Y(s)}{R(s)} = \frac{D(s)G(s)}{1+D(s)G(s)e^{-\tau{s}}}{\rm{e}}^{-\tau{s}}$$ (1) 式(1)分母中含有
${\rm{e}}^{-\tau{s}}$ 滞后环节, 降低了系统的稳定性, 影响了控制质量, 从而大大降低了闭环系统的控制品质. 系统辨识理论的发展, 被控对象参数的精确辨识使得系统建模更加准确[15-17], 这使得Smith预估器进行补偿的方法变得更加切实可行. 但由于黑体型号众多, 且黑体温度控制系统具有非线性、时变性, 故想采用单一的模型去匹配多种对象, 是无法达到稳定的控制目的. 采用Smith预估器的方法不是很理想, 它对模型的误差十分敏感, 抗干扰能力也比较差. 即便采用了Smith预估器, 仍需要工程师根据不同的黑体对象来调整PID参数. 时变系统一般会采用自适应的控制方式[18-19], 本文根据这种思想, 结合黑体加热器的特性, 对于环境温度变化较小的, 提出了一种自适应的方法来控制黑体温度[20-21]; 对于环境变化较大的, 通过自整定方式[22] 获得当前参数, 提高系统的控制能力. 系统整体的控制见图1的所示.总体来说是以PID控制为核心, 进行一些相关的扩展. 自整定模块负责PID参数的自动整定. 经过整定的参数需要与模糊算法的输出进行相关运算后, 进入PID运算环节. 模糊算法的加入, 进一步优化了黑体的温度控制性能. PID运算完成后, 针对黑体加热器具有加热和制冷双路输出的特点, 系统根据当前的状态进行动态分配输出.
$ {\rm{SV}} $ 为设定值, S为选通开关, 当自整定开启后, 旁路PID输出. FUZZY模糊算法, CO为相关系数. OP是对自整定的参数与模糊算法进行运算. ALLOC为双路分配算法, 然后经过SCAL模块进行重映射.1. 继电器法推演及黑体PID参数自整定
1.1 黑体温控系统建模
黑体(
$ G_2 $ )的温度、加热器($ G_1 $ )的输入与黑体($ G_2 $ )本身的散热之间的关系如图2所示.$ G_1 $ 为加热器,$ G_2 $ 为黑体系统,$ i $ 为加热器驱动信号,$ Q_i $ 为黑体从加热器获得热量,$ Q_o $ 为黑体在空气中散失的热量. 下面分别求得$ G_1 $ 和$ G_2 $ . 一般的, 我们认为加热器是一阶系统, 其传递函数为$$ G_{1}(s) = \frac{k_{1}}{\tau _{1}s+\alpha _{1}} $$ (2) 然后分析黑体温度与能量的关系. 由图2得知, 黑体的能量变化为
$$ Q_{i}-Q_{o} = \frac{{\rm d} {Q}}{{\rm d} {t}} $$ (3) 根据热力学中的比热容公式可得
$$ {Q_{i}-Q_{o}} = {cm}\frac{{\rm d} {T} }{{\rm d} {t}} $$ (4) 式中,
$ c $ 为黑体材料的比热容,$ m $ 为黑体的质量,$ T $ 为黑体的温度. 散失能量$ Q_o $ 分为电磁辐射热交换和黑体空气流动热交换. 由于电磁辐射热交换能量较小, 此处仅考虑空气流动热交换, 所以$ Q_o $ 约等于黑体的散热$$ {Q_{0}\approx hA(T-T_{c})} $$ (5) 式中,
$ h $ 为热交换系数,$ A $ 为黑体与空气接触的表面积,$ T $ 为黑体当前温度,$ T_c $ 为空气当前温度. 将式(5)代入式(4)得到$$ {Q_{i}-hA(T-T_{c}) = cm}\frac{{\rm d} {T}}{{\rm d} {t}} $$ (6) 两边求Laplace变化
$$ {Q_{i}(s)-hAT(s) = cmsT(s)} $$ (7) 整理可得黑体传递函数为
$$ {G}_{2}{(s) = \frac{T(s)}{Q_{i}(s)} = \frac{1}{cms+hA}} $$ (8) 结合加热器, 得到整个黑体温控系统的传递函数
$G(s) = $ $ G_{1}(s)\times G_{2}(s){\rm{e}}^{-\tau{s}}$ 为$$ {G(s)} = \frac{{k}_{1}}{{\tau}_{1}{cms}^{2}+({\tau}_{1}{hA}+{\alpha}_{1}{cm}){s}+{\alpha}_{1}{hA}}{{\rm{e}}}^{{-\tau{s}}} $$ (9) 式中,
${\rm{e}}^{-\tau{s}}$ 为延迟时间, 由此可得, 黑体温控系统为二阶迟滞系统, 其传递函数与黑体本身的比热容, 质量, 面积以及加热器功率等条件相关. 为了验证上述传递函数的建模是否正确, 结合实际黑体, 通过阶跃响应辨识法[23-24], 实测黑体的阶跃响应后, 估算出实验黑体的传递函数. 为了便于分析与推导, 将式(9)改写成式(10)的简单形式$$ {G(s)} = \frac{{k}}{{s}^{2}+{as+b}}{{\rm{e}}^{-\tau s}} $$ (10) 式中,
$ k = k_{{1}}/\tau_{1}cm $ ,$ \alpha = (\tau_1 hA+\alpha_1 cm)/\tau_1 cm $ ,$b = $ $ \alpha_1hA/\tau_1cm $ . 将式(10)写成时域模式$$ \frac{{\rm d^{2}}{y(t)} }{{\rm d} {t}^{2}}+{a\frac{{\rm d} y(t)}{{\rm d} t}+by(t) = kx(t-\tau )+e(t)} $$ (11) 假定阶跃响应的输入信号
$ x(t) = hu(t) $ ,$ h $ 为阶跃信号的幅值,$ y(t) $ 为黑体的温度输出数据.$ \tau $ 为延时,$ e(t) $ 为噪声. 该系统初始状态为非零, 将该非零状态作为未知数, 一共引入6个未知变量, 若采用两次积分, 则无法完全求解, 故对式(11)进行三次积分[25], 令$ t>\tau $ , 得到$$ \begin{split} \int {y(t)} =\;& -b\iiint{y(t)}-a\iint{y(t)}+kh\frac{t^{3}}{6}+\\& \left(-kh\tau+\frac{{\rm d}y(0)}{{\rm d}t}+ay(0)\right)\frac{t^{2}}{2}+\\& \left (y(0)+\frac{kh\tau^{2}}{2}\right)t-kh\frac{\tau^{3}}{6}+\iiint{e(t)} \end{split} $$ (12) 令初始状态
$ c_0 = y(0) $ ,$ c_1 = \dfrac{{\rm d} y(0)}{{\rm d} t} $ , 则可将式(12)改写成式(13)的形式$$ \begin{split} \int y (t) = &\left[ {\begin{array}{*{20}{c}} { - \displaystyle\iiint {y(t)} }\;\;{ -\displaystyle \iint {y(t)} }\;\;{\dfrac{{{t^3}}}{6}}\;\;{ - \dfrac{{{t^2}}}{2}}\;\; t \;\;{ - 1} \end{array}} \right]*\\ &\left[ {\begin{array}{*{20}{c}} b\\ a\\ {kh}\\ {kh\tau - {c_1} - a{c_0}}\\ {{c_0} + \dfrac{{kh{\tau ^2}}}{2}}\\ {kh\dfrac{{{\tau ^3}}}{6}} \end{array}} \right]\\[-50pt] \end{split} $$ (13) 式(13)中共6个未知数, 令
$$\begin{split} &{\boldsymbol{\phi}} (t) = \left[ {\begin{array}{*{20}{c}} { - \iiint {y(t)} }\;\;{ - \iint {y(t)} }\;\;{\dfrac{{{t^3}}}{6}}\;\;{ - \dfrac{{{t^2}}}{2}}\;\; t \;\; { - 1} \end{array}} \right]\\ &{\boldsymbol{\theta}} = {\left[ {\begin{array}{*{20}{c}} b \;\; a \;\; {kh} \;\; {kh\tau - {c_1} - a{c_0}} \;\; {{c_0} + \dfrac{{kh{\tau ^2}}}{2}} \;\; {kh\dfrac{{{\tau ^3}}}{6}} \end{array}}\right]^{\rm{T}}}\end{split} $$ 根据所采集的
$ N $ 个采样点组成如下线性方程组$$ {\boldsymbol \Gamma}(t) = {\boldsymbol \Phi}(t){\boldsymbol \theta} +\xi(t) $$ (14) 采样点必须满足
$ mT_s>\tau $ . 式中,$$ {\boldsymbol{\Gamma }}(t) = \left[ {\begin{array}{*{20}{c}} {\int_0^{m{T_s}} y (t){\rm{d}}t}\\ {\int_0^{(m + 1){T_s}} y (t){\rm{d}}t}\\ {\vdots}\\ {\int_0^{(m + N){T_s}} y (t){\rm{d}}t} \end{array}} \right] $$ $$ {\boldsymbol{\Phi }}(t) = \left[ {\begin{array}{*{20}{c}} {{\boldsymbol{\phi}} (m{T_s})}\\ {{\boldsymbol{\phi}} ((m + 1){T_s})}\\ {\vdots}\\ {{\boldsymbol{\phi}} ((m + N){T_s})} \end{array}} \right] \hspace{5pt}$$ $ \xi(t) $ 为零均值相关噪声, 对系统的识别影响很小, 很多情况下能满足实际需要[24]. 对式(14)进行运算, 使用最小二乘法估计参数得到$$ {\boldsymbol \theta} = ({\boldsymbol \Phi} ^{{\rm T}}(t){\boldsymbol \Phi} (t))^{-1}{\boldsymbol \Phi} ^{{\rm T}}(t){\boldsymbol \Gamma}(t) $$ (15) 在得到
$ {\boldsymbol \theta} $ 后, 便可以知道系统的传递函数了. 下面针对实验室的一台黑体, 进行阶跃响应(输入阶跃信号为10 %)的数据采集. 将这些采样点, 根据式(15)的计算结果, 得到黑体的传递函数为$$ G(s) = \frac{0.0003183}{s^{2}+0.1738s-0.0004992}{\rm{e}}^{-10s} $$ (16) 式(16)为阶跃响应识别得到的传递函数, 图3为传递函数的理论阶跃响应与实际阶跃响应曲线, 识别的阶跃响应几乎与实际阶跃响应相重合.
由式(16)得知, 实验黑体温控系统的传递函数为一个二阶低通滤波器, 输入信号为方波时, 其输出信号能将方波的高频部分滤除, 得到低频部分的波形. 在输入方波信号的频率分量中, 基波占据较大能量, 所以继电器法仅采取基波分量作为运算.
1.2 继电器法描述
根据上一节分析建立的系统模型, 当输入脉冲信号时, 频域乘机, 时域是卷积. 单脉冲信号与
$ h(t) $ 卷积, 得到一个震荡衰减. 若加入一个负反馈, 使得输入变为周期信号, 则输出也会是一个周期信号. 输出输入的基波比值, 可理解为PID中与比例相关的部分. 继电器法的实现如图4所示.$ E = SV-PV $ . 根据$ E $ 的变化, 可得到$ Y $ 的输出:$$ \begin{split} Y = \;&Y_{\max}\frac{1+{\rm sign}(E-\varepsilon {\rm sign}(\dot{E}))}{2}+ \\&Y_{\min}\frac{1-{\rm sign}(E-\varepsilon {\rm sign}(\dot{E}))}{2} \end{split} $$ (17) 式中,
${\rm{sign}}(x) = \left\{ {\begin{aligned}&{1,}&{x \ge 0}\\&{ - 1,}&{x < 0}\end{aligned}} \right.$ .$ Y_{\max} $ 与$ Y_{\min} $ 为硬件输出上下限,$ \varepsilon $ 为滞回区间. 结合上述黑体传递函数, 该方法相当于引入负反馈, 使系统处在一个稳定的振荡状态.1.3 继电器法推演
假定继电器法的输入信号为图5所示.
方波的函数表达式为
$$ f(t) = \frac{h}{2}{\rm sign}\left(\sin\left(\frac{2{\text{π}} }{T}t\right)\right) $$ (18) 式中, sign为符号函数,
$ T $ 为方波周期. 分别求得傅里叶级数:$$ \left\{ {\begin{aligned} &{{a_n} = \frac{1}{{\text{π}} }\int_{ - {\text{π}} }^{\text{π}} f (x)\cos (nx){\rm{d}}x,}&{n = 0,1,2,\cdots}\\ &{{b_n} = \frac{1}{{\text{π}} }\int_{ - \pi }^{\text{π}} f (x)\sin (nx){\rm{d}}x,}&{n = 1,2,3,\cdots} \end{aligned}} \right. $$ (19) $$ \begin{split} a_{n} =\;& \frac{2}{T}\int_{-\frac{T}{2}}^{0}\left [ -\frac{h}{2} \right ]\cos(n\omega _{0}t){\rm d}t\;+\\ &\frac{2}{T}\int_{0}^{\frac{T}{2}}\left [ \frac{h}{2} \right ]\cos(n\omega _{0}t){\rm d}t = 0 \end{split} \hspace{50pt} $$ (20) $$ \begin{split} b_{n} =\;& \frac{2}{T}\int_{-\frac{T}{2}}^{0}\left [ -\frac{h}{2} \right ]\sin(n\omega _{0}t){\rm d}t\;+ \\& \frac{2}{T}\int_{0}^{\frac{T}{2}}\left [ \frac{h}{2} \right ]\sin(n\omega _{0}t){\rm d}t= \\ &\frac{h}{n\omega _{0}T}\left [ 2-2\cos(n{\text{π}} ) \right ] \end{split}\hspace{55pt} $$ (21) 即
$$ {b_n} = \left\{ \begin{aligned} &\frac{{2h}}{{n{\text{π}} }},\;\;\;\;n\text{为奇数}\\ &0,\;\;\;\;\;\;\;n\text{为偶数} \end{aligned} \right. $$ (22) 根据第1.2节描述, 输出波形为正弦波, 假定正弦波的周期是
$ T $ , 峰峰值为$ a $ . 其傅里叶级数如下求得:$$ \left\{ {\begin{aligned} &{{a_n} = \frac{2}{T}\int_0^T {\frac{a}{2}} \sin ({\omega _0}t)\cos (n{\omega _0}t){\rm{d}}t,\;\;n = 0,1,2,\cdots}\\ &{{b_n} = \frac{2}{T}\int_0^T {\frac{a}{2}} \sin ({\omega _0}t)\sin (n{\omega _0}t){\rm{d}}t,\;\;\;n = 1,2,3,\cdots} \end{aligned}}\\ \right. $$ (23) 根据三角函数的正交性
$$ \int_{-{\text{π}} }^{{\text{π}} }\cos(mx)\sin(nx) = 0,\;\;\;m,n = 1,2,3,\cdots $$ (24) $$ \int_{ - {\text{π}} }^{\text{π}} {\sin } (mx)\sin (nx) = \left\{ {\begin{aligned} &0,&{m \ne n}\\ &{\text{π}},&{m = n} \end{aligned}} \right. \hspace{33pt}$$ (25) 上述级数其余分量都为0, 仅
$ b_1 $ 为$$ \begin{split} b_{1} =\;& \frac{1}{T}\int_{0}^{T}a\sin(\omega_{0}t)\sin(\omega_{0}t){\rm d}t =\\ & \frac{1}{\omega _{0}T}\int_{0}^{\omega _{0}T}a\sin^{2}x{\rm d}x \end{split} $$ (26) 因为
$ \omega_o = \dfrac{2{\text{π}} }{T} $ , 由此得到:$$ b_{1} = \frac{a}{2{\text{π}} }\int_{0 }^{2{\text{π}} }\sin^{2}(x){\rm{d}}x = \frac{a}{2} $$ (27) 输入波形的基波分量除以输出波形的基波分量结果即为临界增益
$$ K_{{\rm{pcrit}}} = \frac{2h}{{\text{π}} }\div \frac{a}{2} = \frac{4h}{{\text{π}} a} $$ (28) 式中,
$ h $ 为输入方波的峰峰值,$ a $ 为输出正弦波的峰峰值. 通过检测输入方波或输出正弦波的周期, 即为临界周期$T_{{\rm{crit}}}$ . 自此得到了临界增益$ K_{{\rm{pcrit}}} $ 与临界周期$ T_{{\rm{crit}}} $ . 将此值代入Ziegler-Nichols整定法则如表1所示, 即可得到相应的PID参数.表 1 Ziegler-Nichols整定法则Table 1 Ziegler-Nichols setting rule控制器类型 Kp Tn Tv Ki Kd P 0.5· Kpcrit — — — — PD 0.8· Kpcrit — 0.12 Tcrit — Kp × Tv PI 0.45· Kpcrit 0.85 Tcrit — Kp/Tn — PID 0.6· Kpcrit 0.5 Tcrit 0.12 Tcrit Kp/Tn Kp × Tv 图6为是实测黑体的数据. IN为控制板输出给黑体的信号, OUT是控制板采集的黑体温度.
从原始数据中可分析得到临界状态,
$ T_{{\rm{crit}}} $ = 32.599998,$ h $ = 150.00,$ a $ = 0.518509. 将临界值代入式(28), 得到临界增益$ K_{{\rm{pcrit}}} $ ≈386.3368. 根据表1中所列的Ziegler-Nichols整定法则, 本文采用PID方式控制, 得到PID 3个参数,$ K_p $ = 231.80208,$ K_i $ = 14.2210,$ K_d $ = 906.8097. 实际系统为ADC采样, 采样时间$ T_{{\rm{adc}}} $ = 0.392 s, 将其离散化后$DK_p = $ $ K_p*T_{{\rm{adc}}} $ = 90.8664,$ DK_i = T_{{\rm{adc}}}*DK_p/T_n $ = 2.1853,$ DK_d = DK_p\cdot T_v/T_{{\rm{adc}}} $ = 906.8097. 以上便是实验黑体的自整定PID参数, 按照0.392 s为周期, 进行运算.2. 改进黑体温度控制
经过上述实验, 已经得到PID相关参数, 但这仅仅是获得了基本参数, 为了能更快、更稳地控制黑体温度, 结合黑体系统本身的特性来改进性能.
2.1 自适应动态双路输出
实验黑体具有两路输入: 一路加热, 一路制冷. 从上述黑体传递函数的推导中获知, 黑体温度的控制与散热及环境温度相关, 是一个非线性时变系统. 为了减少环境的影响以及快速的稳定调节, 本文描述了一种动态双路输出算法, 包括评估器与动态分配算法两部分, 其基本原理框图如图7所示.
评估因子只是影响动态分配的功率大小, 评估因子越大, 其最后动态分配的功率越大. 为了加快评估器的运行效率, 外部独立输入
$ E(t) $ , d$E(t)/$ d$ t $ ,$SV{\text{,}}$ $ T_b,\;T_e $ 值, 它们分别表示PID误差输入、误差的导数、设定值、黑体温度反馈以及环境温度. 在一般温差变化不大环境中,$ T_e $ 可以采用系统默认值, 不做动态跟随. 衰减因子由黑体加热器和制冷器的功率决定. 评估器的输出为归一化输出, 经过动态分配后, 变成符合黑体要求的信号.$ {\rm PID}_{{\rm{out}}} $ 为当前PID输出.评估器根据所给定的误差、误差变化速度、当前设定值以及环境温度等参数, 给出归一化的输出, 限定值与衰减因子. 当误差较大时, 评估器满负荷输出, 结合误差变化率, 若误差变化率大, 则评估器输出减小, 这些操作都与当前温度以及环境温度有关. 即同样的条件下, 若当前黑体温度远大于环境温度, 则需要考虑黑体散热导致的输出功率补偿. 当误差达到一定程度时, 根据评估因子的给定, 会开启双路输出, 双路输出的目的是为了消减环境造成的影响, 降低环境因素. 黑体离环境温度越高, 这种制冷补偿需要越大; 同理离环境温度越低, 加热补偿需要越大. 在环境温度附近时, 最容易被干扰, 需要提升双路输出值. 评估器包含一系列规则的总结, 给出一个合理的输出. 各因子关系为
$$ f(n) = \left\{ {\begin{aligned} &0,\qquad\quad\;\,{\left| {e(n)} \right| > \theta }\\ &{y(n)},\qquad\,{\left| {e(n)} \right| > \theta\;\;\;\; \text{且}\;\left| {e(n - 1)} \right| < \theta }\\ &\text{保持},\qquad{\left| {e(n)} \right| < \theta } \end{aligned}} \right. $$ (29) $$ \begin{split} H_{{\rm{out}}} =\;& \eta _{h}(f(n),y(n)-y(n-1))*\\ &\lambda_{h}\left(SV,T_{e},e(n),\frac{\Delta e(n)}{\Delta n}\right) \end{split} \hspace{42pt}$$ (30) $$ \begin{split} C_{{\rm{out}}} =\;& \eta _{c}(f(n),y(n)-y(n-1))*\\ &\lambda_{c}\left(SV,T_{e},e(n),\frac{\Delta e(n)}{\Delta n}\right) \end{split} \hspace{47pt} $$ (31) $$ H_{{\rm{out}}}+C_{{\rm{out}}} = \varepsilon (f(n),SV,T_{e}) \hspace{66pt}$$ (32) 式中,
$ y(n) $ 为单次PID输出,$ \theta $ 为相关因子,$ e(n) $ 为当前归一化后的误差. 所以, 最后的动态输出与误差、误差变化率及$ f(n) $ 有关, 总是保持着一种动态的平衡. 双路输出具有抗环境扰动的能力. 假定环境干扰因素为$ \alpha $ , 若仅采用单路加热输出, 假定输出因数为$ \beta $ , 那么环境对当前调节的影响为$ \alpha/\beta $ . 若采用双路方式, 假定当前制冷为$\gamma$ , 制冷的扰动影响为$ \Delta \gamma $ , 则最终的影响为$ (\alpha +\Delta \gamma )/((\beta +\gamma )+\gamma ) $ . 对这两个影响值进行比较为$ \mu $ , 该值越小, 表明环境对黑体的影响越小,$ \mu $ 的表达式为$$ \begin{split} \mu =\;& \frac{(\alpha +\Delta \gamma )/(\beta +2\gamma )}{\alpha /\beta }=\\ & \frac{(1+\Delta \gamma /\alpha )}{(1+2\gamma /\beta )} \propto\frac{\Delta \gamma /\alpha }{2\gamma /\beta } = \frac{\Delta \gamma /\gamma }{2\alpha /\beta } \end{split} $$ (33) 式中,
$ \Delta \gamma /\gamma $ 制冷导致的扰动,$ \alpha /\beta $ 为环境扰动影响(包括热交换能力和环境温度), 由于制冷输出以及制冷器的设计是满足黑体精度需求的, 所以此处的$ \Delta \gamma /\gamma \ll {\rm 2}\alpha /\beta $ , 故$ \mu<1 $ , 可以得出制冷$ \gamma $ 的加入, 能提高黑体温控系统的抗环境扰动性能, 从式(33)中看出制冷量$ \gamma $ 越大, 环境扰动影响值$ \mu $ 越小. 同理, 若处于制冷状态, 加热器的输出可作为扰动补偿. 图8中H曲线表示加热, C曲线表示制冷. 图8(a)中, 开始是稳定状态, 当设定一个阶跃值时, 加热100 %, 制冷为0 %; 当快接近目标温度时, 开始动态分配; 最后稳定时, 两个输出趋于稳定.2.2 模糊算法
模糊算法[7]的本质是通过判定误差以及误差的微分, 根据一定的规则, 动态修正当前PID参数. 模糊算法的引入, 大幅度提高了系统的调节速度, 阶跃响应获得较小的过冲. 在常规的模糊算法基础上, 本文做出了适当修改, 即在常规模糊算法给出的修正系数基础上, 结合当前加热制冷的功率以及温度的区间, 乘以一个相关系数(图9), 使得系统更加稳定.
模糊控制器的输入为误差(
$ E(t) $ )和误差的变化率(${\rm{d}} E(t)/{\rm{d}} t $ ), 并不关心当前设定值($ SV $ ). 本文结合黑体特性, 对模糊控制器的输出进行一个相关因子的运算, 二次修正模糊控制器的输出. 黑体实测温度越高时, 需要相关因子系数越大. 下面给出三个系数的模糊规则, 将比例、积分、微分的规则合并在一个表中, 如表2所示.表 2 比例积分微分模糊规则Table 2 Proportional integral differential fuzzy ruleP, I, D NB(EC) NM(EC) NS(EC) ZO(EC) PS(EC) PM(EC) PB(EC) NB(E) PB, NB, PS PB, NB, NS PM, NM, NB PM, NM, NB PS, NS, NB ZO, ZO, NM ZO, ZO, PS NM(E) PB, NB, PS PB, NB, NS PM, NM, NB PS, NS, NM PS, NS, NM ZO, ZO, NS NS, ZO, ZO NS(E) PM, NB, ZO PM, NM, NS PM, NS, NM PS, NS, NM ZO, ZO, NS NS, PS, PS NS, PS, ZO ZO(E) PM, NM, ZO PM, NM, NS PS, NS, PS ZO, ZO, NS NS, NS, NS NM, NM, NS NM, NM, ZO PS(E) PS, NM, ZO PS, NS, ZO ZO, ZO, ZO NS, PS, ZO NS, PS, ZO NM, PM, ZO NM, PB, ZO PM(E) PS, ZO, PB ZO, ZO, NS NS, PS, PS NM, PS, PS NM, PM, PS NM, PB, PS NB, PB, PB PB(E) ZO, ZO, PB ZO, ZO, PM NM, PS, PM NM, PM, PM NM, PM, PS NB, PB, PS NB, PB, PB 3. 实测数据
对上述理论通过实际验证. 主要是从不同条件下的阶跃响应、抗扰度、精度三个方面的实验数据进行对比. 实验数据采用绝对误差积分
$ (IAE = $ $ \int_{0}^{\infty }\left | e(t) \right |{\rm d}t) $ , 时间乘积绝对误差积分$ (ITAE = $ $ \int_{0}^{\infty }t\left | e(t) \right |{\rm d}t) $ , 系统过冲$ (PV = \max(y_{n})-T) $ , 输出波动总和$(TV = \sum\nolimits _{n = 0}^{N}\left | y_{n+1}-y_{n} |\right)$ , 最大绝对误差$ (DEL = \max(\left | y_{n} -T\right |)) $ , 绝对精度$ (AA = DEL/T) $ , 均方差$(STD = \sqrt{\frac{1}{N-1}\sum\nolimits _{n = 1}^{N}\left | y_{n} -\mu \right |^{2}})$ 等方面进行比较, 上述公式中的$ \mu $ 为均值,$ T $ 为实验目标值. 根据不同的实验, 会采用上述不同的参数标准进行评判. 对比的实验条件进行如下约定:1)条件S: 控制系统只有单路输出, 即无第2.1节所描述的功能.
2)条件D: 控制系统双路输出, 即增了第2.1节所描述的功能.
3)条件SF: 在条件S的基础上增加了第2.2节所表述的模糊功能.
4)条件DF:在条件D的基础上增加了第2.2节所描述的模糊功能.
对每个实验, 做一个简单的综合指标, 即将分立指标相加, 再对条件S取归一化运算.
3.1 阶跃响应及抗扰度
先设定黑体温度为50 ℃, 稳定后, 再将温度设定为55 ℃, 观察输出曲线. 不同条件下的曲线, 如图10(a)所示, STEP为给定阶跃响应值. 黑体稳定运行时, 人为加入一个4 s的扰动信号, 如图10(b)所示. 从图10可以看出, 未做出任何改进的单路输出性能最差, 双路要优于单路. 同时, 模糊算法的加入使得系统超调量变小. 性能最好的DF曲线, 几乎没有过冲, 且调节振荡比其他3种情况都小. 阶跃响应的综合性能由优到劣依次为
$DF \gg D \gg SF \gg $ $ S$ , 抗扰动的综合性能由优到劣依次为${\rm{D}}{{\rm{F}}_{{\rm{int}}}} \gg {{\rm{D}}_{{\rm{int}}}} \gg $ $ {\rm{S}}{{\rm{F}}_{{\rm{int}}}} \gg {{\rm{S}}_{{\rm{int}}}}$ . 从中也可以看出, 双路算法性能优于模糊算法. 根据原始数据, 依次计算$ IAE $ , 该值反映阶跃收敛振动的强烈程度. 为了显著起见, 从阶跃跳变点开始计算.$ ITAE $ 值反映收敛程度,$ PV $ 反映过冲强度,$ TV $ 输出波动总和. 在同一条件下,$ TV $ 也可以作为阶跃响应的比较性能. 对4种情况进行比较, S条件性能最差. 为了有统一的比较准则, 将所有实验数据在S条件下做归一化运算. 阶跃响应(抗扰性能)如表3所示.表 3 阶跃响应(抗干扰)性能指标Table 3 Step response (anti-interference) performance index条件 IAE ITAE PV TV 综合1 (综合2) S 1.000000 (1.000000) 1.000000 (1.000000) 1.000000 (1.000000) 1.000000 (1.000000) 1.000000 (1.000000) D 0.847483 (0.723668) 0.562693 (0.678478) 0.442698 (0.805442) 0.762998 (0.907009) 0.653968 (0.778649) SF 0.943743 (0.992518) 0.807751 (1.004470) 0.633536 (0.944839) 0.851171 (1.013720) 0.809050 (0.988887) DF 0.843329 (0.520340) 0.525302 (0.432016) 0.042592 (0.806038) 0.642354 (0.805883) 0.513394 (0.641069) 3.2 稳定精度
在黑体温度控制稳定以后, 采集一段数据进行观察并分析. 图11所示为温度轴放大后的温度采集数据.
从表4的统计中可以看出, 双路的精度优于单路, 其均方差也是优于单路. 模糊算法的加入使得精度变差, 但仍属于同一数量级的改变, 综合考虑, 精度的损失不足以影响实际黑体控制(即DF的精度已远远满足黑体控制的需求).
$ TV $ 表示输出波动总和, 对S项进行归一, 可以看出双路输出的波动性能明显优于单路.表 4 稳定精度测试(55 ℃)Table 4 Stability accuracy testing (55 ℃)条件 绝对误差 (℃) 绝对精度 均方差 TV 综合3 S 0.003979 0.0000723455 0.00163144 1.000000 1.000000 D 0.002308 0.0000419636 0.000764468 0.846146 0.844462 SF 0.003132 0.0000569455 0.00125763 0.954824 0.953850 DF 0.002628 0.0000477818 0.000786771 0.885582 0.884021 3.3 性能总结
通过上述三个方面的性能指标, 可以看出双路优于单路以及模糊算法的加入对阶跃响应和抗扰度有较好的性能提升, 但稳定性并无太大改进. 将上述综合指标进行加权, 得到最终的性能指标, 如表5所示.
表 5 性能指标Table 5 Performance index条件 综合1 综合2 综合3 性能指标 S 1.000000 1.000000 1.000000 1.000000 D 0.653968 0.778649 0.844462 0.759026 SF 0.809050 0.988887 0.953850 0.917262 DF 0.513394 0.641069 0.884021 0.679495 综合考虑各个性能指标, 并以S为参考, 值越小, 性能越好. 双路加模糊算法(DF)的性能最好, 其次是双数输出(D), 再次是单路带模糊算法(SF), 最后是单路输出(S). 实验结果与理论推导及设计思想保持一致.
4. 结论
针对黑体这种具有迟滞特性的温度控制系统, 通常的做法是通过Smith预估器来弥补系统的延时, 但对现场的被控设备无法做到精确的数学建模. 同时, 黑体温度控制体统本身就是一个非线性时变系统, 无法用实验获得模型进行替代. 本文结合黑体的物理模型, 进行理论建模. 根据阶跃响应法得出系统传递函数, 然后分析继电器法的实现, 再结合Ziegler-Nichols整定规则, 得到不同黑体的PID参数, 减少现场工程师手动整定PID参数的难度. 针对黑体环境因素带来的时变性, 环境变化较大的采用自整定方式解决, 环境扰动较小的采用双路动态输出法来弥补. 最后, 通过对实际黑体的实验测量, 分析数据, 并对相关性能指标进行归一化后, 验证了本文所述方法对黑体温度控制系统的性能有较大改进和提升.
-
图 3 文本提示((a)基于手工设计的文本提示; (b)连续提示; (c)基于梯度引导的文本提示; (d)基于视觉映射到语言空间的提示; (e)基于图像引导的文本提示; (f)基于伪标签的文本提示; (g)基于多任务的文本提示)
Fig. 3 Text prompts((a)Text prompt based on hand-crafted; (b)Continuous prompt; (c)Text prompt based on gradient guidance; (d)Prompt based on the mapping from vision to the language space; (e)Text prompt based on image guidance; (f)Text prompt based on pseudo-labels; (g)Text prompt based on multi-tasking)
图 4 视觉提示((a)基于像素扰动的视觉提示; (b)基于提示tokens的视觉提示; (c)基于提示模块的视觉提示; (d)基于上下文样例模板的视觉提示; (e)基于网络结构搜索的视觉提示)
Fig. 4 Visual prompts ((a) Pixel perturbation-based visual prompt; (b) Prompt tokens-based visual prompt; (c) Prompt module-based visual prompt; (d) Contextual example template-based visual prompt; (e) Network architecture search-based visual prompt)
图 5 在视觉—语言模型上引入视觉—语言联合提示的四种方法对比((a)独立训练两种模态的提示; (b)共享地训练两种模态的提示; (c)使用两个MLP层来生成提示; (d)使用一个轻量级的自注意力网络来生成提示)
Fig. 5 Comparison of four methods for introducing vision-language joint prompts in vision-language models ((a) Independently train the prompts of the two modalities; (b) Train the prompts of two modalities in a shared manner; (c) Utilizing two MLP layers to generate prompts; (d) Employing a lightweight self-attention network to generate prompts)
表 1 CV领域视觉与多模态基础大模型及其参数量
Table 1 Vision and multimodal foundational large models in CV with their parameter size
模型 视觉 多模态 DERT Vision Transformer DINOv2 LVM CLIP SAM MiniGPT-4 LLaVA Yi-VL 年份 2020 2021 2023 2023 2021 2023 2023 2023 2024 参数量 40M 86M$ \sim $632M 1.1B 300M$ \sim $3B 400M$ \sim $1.6B 1B 13B 7B$ \sim $13B 6B$ \sim $34B 表 2 图像分类任务中提示方法和非提示方法的性能对比(加粗表示性能最优, 下划线表示性能次优)
Table 2 In the task of image classification, a comparison of the performance between prompted and unprompted methods is presented (Bold indicates the best performance and underline indicates the second-best performance)
预训练模型 ViT-B-22K Swin-B-22K 方法 非PL方法 PL方法 非PL方法 PL方法 全面微调 (%) 线性探测 (%) VP (%) VPT (%) DAM-VP (%) 全面微调 (%) 线性探测 (%) VP (%) VPT (%) DAM-VP (%) CIFAR10 97.4 96.3 94.2 96.83 97.3 98.3 96.3 94.8 96.9 97.3 CIFAR100 68.9 63.4 78.7 78.8 88.1 73.3 61.6 80.6 80.5 88.1 Food-101 84.9 84.4 80.5 83.3 86.9 91.7 88.2 83.4 90.1 90.5 DTD 64.3 63.2 59.5 65.8 73.1 72.4 73.6 75.1 78.5 80.0 SVHN 87.4 36.6 87.6 78.1 87.9 91.2 43.5 80.3 87.8 81.7 CUB-200 87.3 85.3 84.6 88.5 87.5 89.7 88.6 86.5 90.0 90.4 Stanford Dogs 89.4 86.2 84.5 90.2 92.3 86.2 85.9 81.3 84.8 88.5 Flowers102 98.8 97.9 97.7 99.0 99.2 98.3 99.4 98.6 99.3 99.6 表 3 从基类到新类的泛化设置下CLIP、CoOp、CoCoOp和MaPLe的对比(HM代表对基类和新类的准确率取调和平均值, 加粗表示性能最优)
Table 3 Comparison of CLIP, CoOp, CoCoOp and MaPLe under the generalization setting from base class to new class (HM denotes the harmonic mean of the accuracies on both base and new classes, bold indicates the best performance)
数据集 CLIP CoOp) CoCoOp MaPLe Base (%) New (%) HM (%) Base (%) New (%) HM (%) Base (%) New (%) HM (%) Base (%) New (%) HM (%) ImageNet 72.43 68.14 70.22 76.47 67.88 71.92 75.98 70.43 73.10 76.66 70.54 73.47 Caltech101 96.84 94.00 95.40 98.00 89.81 93.73 97.96 93.81 95.84 97.74 94.36 96.02 OxfordPets 91.17 97.26 94.12 93.67 95.29 94.47 95.20 97.69 96.43 95.43 97.76 96.58 StanfordCars 63.37 74.89 68.65 78.12 60.40 68.13 70.49 73.59 72.01 72.94 74.00 73.47 Flowers102 72.08 77.80 74.83 97.60 59.67 74.06 94.87 71.75 81.71 95.92 72.46 82.56 Food-101 90.10 91.22 90.66 88.33 82.26 85.19 90.70 91.29 90.99 90.71 92.05 91.38 FGVCAircraft 27.19 36.29 31.09 40.44 22.30 28.75 33.41 23.71 27.74 37.44 35.61 36.50 SUN397 69.36 75.35 72.23 80.60 65.89 72.51 79.74 76.86 78.27 80.82 78.70 79.75 DTD 53.24 59.90 56.37 79.44 41.18 54.24 77.01 56.00 64.85 80.36 59.18 68.16 EuroSAT 56.48 64.05 60.03 92.19 54.74 68.69 87.49 60.04 71.21 94.07 73.23 82.35 UCF101 70.53 77.50 73.85 84.69 56.05 67.46 82.33 73.45 77.64 83.00 78.66 80.77 平均值 69.34 74.22 71.10 82.69 63.22 71.66 80.47 71.69 75.83 82.28 75.14 78.55 表 4 ADE20K数据集上提示方法和非提示方法的语义分割性能对比(加粗表示性能最优, 下划线表示性能次优
Table 4 Comparison of semantic segmentation performance on the ADE20K dataset between prompted and unprompted methods (Bold indicates the best performance and underline indicates the second-best performance)
评价指标 参数量(M) mIoU(%) PL方法 SPM 14.9 45.05 VPT 13.39 42.11 AdaptFormer 16.31 44.00 SAM — 53.0 EfficientSAM — 51.8 非PL方法 fully tuning 317.29 47.53 head tuning 13.14 37.77 表 5 COCO数据集上提示方法和非提示方法的实例分割性能对比(加粗表示性能最优, 下划线表示性能次优)
Table 5 Comparison of instance segmentation performance on the COCO dataset between prompted and unprompted methods (Bold indicates the best performance and underline indicates the second-best performance)
评价指标 mAP(%) PL方法 SAM 46.8 EfficientSAM 44.4 HQ-SAM 49.5 PA-SAM 49.9 非PL方法 Mask2Former 43.7 OneFormer 45.6 表 6 多模态跟踪任务中提示方法和非提示方法的性能对比(加粗表示性能最优, 下划线表示性能次优)
Table 6 Performance comparison between prompted and unprompted methods in multimodal tracking tasks (Bold indicates the best performance and underline indicates the second-best performance)
数据集 RGBT234 LasHeR 评价指标 precision (%) success (%) precision (%) success (%) PL
方法TaTrack 87.2 64.4 85.3 61.8 MPLT 88.4 65.7 72.0 57.1 ViPT 83.5 61.7 65.1 52.5 ProTrack 79.5 59.9 53.8 42.0 非PL
方法OsTrack 72.9 54.9 51.5 41.2 FANet 78.7 55.3 44.1 30.9 SGT 72.0 47.2 36.5 25.1 -
[1] Xu M, Yin W, Cai D, Yi R, Xu D, Wang Q, et al. A survey of resource-efficient llm and multimodal foundation models. arXiv preprint arXiv: 2401.08092, 2024. [2] Zhou J, Chen Y, Hong Z, Chen W, Yu Y, Zhang T, et al. Training and Serving System of Foundation Models: A Comprehensive Survey. IEEE Open Journal of the Computer Society, DOI: 10.1109/OJCS.2024.3380828 [3] Liu Z, Yu X, Fang Y, Zhang X. Graphprompt: Unifying pre-training and downstream tasks for graph neural networks. In: Proceedings of the ACM Web Conference. Austin, USA: 2023. 417-428 [4] Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 2023, 55(9): 1−35 [5] Oquab M, Darcet T, Moutakanni T, Vo H, Szafraniec M, Khalidov V, Fernandez P, Haziza D, Massa F, El-Nouby A. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv: 230407193, 2023. [6] Radford A, Kim J W, Hallacy C, Ramesh A, Goh G, Agarwal S, et al. Learning transferable visual models from natural language supervision. In: Proceedings of the International Conference on Machine Learning. Virtual Event: PMLR, 2021. 8748-8763 [7] Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L, et al. Segment Anything. In: Proceeding of 2023 IEEE/CVF International Conference on Computer Vision (ICCV). Paris, France: IEEE, 2023. 3992-4003 [8] 廖宁, 曹敏, 严骏驰. 视觉提示学习综述. 计算机学报, 2024, 47(04): 790−820Liao Ning, Cao Min, Yan Jun-Chi. Visual prompt learning: a survey. Chinese Journal of Computers, 2024, 47(04): 790−820 [9] Zang Y, Li W, Zhou K, Huang C, Loy C C. Unified vision and language prompt learning. arXiv: 2210.07225, 2022 [10] Khattak M U, Rasheed H, Maaz M, Khan S, Khan F S. Maple: Multi-modal prompt learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, BC, Canada: IEEE, 2023. 19113-19122 [11] Chen S, Ge C, Tong Z, Wang J, Song Y, Wang J, et al. Adaptformer: Adapting vision transformers for scalable visual recognition. arXiv: 2205.13535, 2022 [12] Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L. Imagenet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL, USA: IEEE, 2009. 248-255 [13] Zhou K, Yang J, Loy C C, Liu Z. Learning to prompt for vision-language models. International Journal of Computer Vision, 2022, 130(9): 2337−2348 doi: 10.1007/s11263-022-01653-1 [14] Zhou K, Yang J, Loy C C, Liu Z. Conditional prompt learning for vision-language models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA: IEEE, 2022. 16816-16825 [15] Derakhshani M M, Sanchez E, Bulat A, da Costa V G, Snoek C G, Tzimiropoulos G, et al. Bayesian prompt learning for image-language model generalization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Vancouver, BC, Canada: IEEE, 2023. 15237-15246 [16] Yao H, Zhang R, Xu C. Visual-language prompt tuning with knowledge-guided context optimization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Vancouver, BC, Canada: IEEE, 2023. 6757-6767 [17] Bulat A, Tzimiropoulos G. Lasp: Text-to-text optimization for language-aware soft prompting of vision & language models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, BC, Canada: IEEE, 2023. 23232-23241 [18] Zhu B, Niu Y, Han Y, Wu Y, Zhang H. Prompt-aligned gradient for prompt tuning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Paris, France: IEEE, 2023. 15659-15669 [19] Huang T, Chu J, Wei F. Unsupervised prompt learning for vision-language models. arXiv preprint arXiv: 2204.03649, 2022. [20] Shen S, Yang S, Zhang T, Zhai B, Gonzalez J E, Keutzer K, Darrell T. Multitask vision-language prompt tuning. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, HI, USA: IEEE, 2024. 5656-5667 [21] Bahng H, Jahanian A, Sankaranarayanan S, Isola P. Exploring visual prompts for adapting large-scale models. arXiv preprint arXiv: 2203.17274, 2022. [22] Chen A, Yao Y, Chen P Y, Zhang Y, Liu S. Understanding and improving visual prompting: A label-mapping perspective. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, BC, Canada: IEEE, 2023. 19133-19143 [23] Oh C, Hwang H, Lee H Y, Lim Y, Jung G, Jung J, Choi H, Song K. Blackvip: Black-box visual prompting for robust transfer learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, BC, Canada: IEEE, 2023. 24224-24235 [24] Huang Q, Dong X, Chen D, Zhang W, Wang F, Hua G, Yu N. Diversity-aware meta visual prompting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, BC, Canada: IEEE, 2023. 10878-10887 [25] Jia M, Tang L, Chen B C, Cardie C, Belongie S, Hariharan B, et al. Visual prompt tuning. In: Proceedings of the European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022. 709-727 [26] Tu C H, Mai Z, Chao W L. Visual query tuning: Towards effective usage of intermediate representations for parameter and memory efficient transfer learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, BC, Canada: IEEE, 2023. 7725-7735 [27] Das R, Dukler Y, Ravichandran A, Swaminathan A. Learning expressive prompting with residuals for vision transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, BC, Canada: IEEE, 2023. 3366-3377 [28] Dong B, Zhou P, Yan S, Zuo W. LPT: long-tailed prompt tuning for image classification. In: Proceedings of The Eleventh International Conference on Learning Representations. Kigali, Rwanda: ICLR, 2023. 1-20 [29] Zhang Y, Zhou K, Liu Z. Neural prompt search. IEEE Transactions on Pattern Analysis and Machine Intelligence, DOI: 10.48550/arXiv.2206.04673 [30] Hu EJ, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, et al. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv: 2106.09685, 2021. [31] Houlsby N, Giurgiu A, Jastrzebski S, Morrone B, De Laroussilhe Q, Gesmundo A, et al. Parameter-efficient transfer learning for NLP. In: Proceedings of the International Conference on Machine Learning. Long Beach, CA, USA: PMLR, 2019. 2790-2799 [32] Nilsback M E, Zisserman A. Automated flower classification over a large number of classes. In: Proceedings of the Sixth Indian Conference on Computer Vision, Graphics & Image Processing. Bhubaneswar, India: IEEE, 2008. 722-729 [33] Helber P, Bischke B, Dengel A, Borth D. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2019, 12(7): 2217−26 doi: 10.1109/JSTARS.2019.2918242 [34] Fahes M, Vu T H, Bursuc A, Pérez P, De Charette R. Poda: Prompt-driven zero-shot domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Vancouver, BC, Canada: IEEE, 2023. 18623-18633 [35] Liu L, Chang J, Yu BX, Lin L, Tian Q, Chen C W. Prompt-matched semantic segmentation. arXiv preprint arXiv: 2208.10159, 2022. [36] Liu W, Shen X, Pun C M, Cun X. Explicit visual prompting for low-level structure segmentations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, BC, Canada: IEEE, 2023. 19434-19445 [37] Bar A, Gandelsman Y, Darrell T, Globerson A, Efros A. Visual prompting via image inpainting. arXiv: 2209. 00647, 2022. [38] Ma X, Wang Y, Liu H, Guo T, Wang Y. When visual prompt tuning meets source-free domain adaptive semantic segmentation. Advances in Neural Information Processing Systems, 2023, 36: 6690−6702 [39] Zhao X, Ding W, An Y, Du Y, Yu T, Li M, et al. Fast segment anything. arXiv preprint arXiv: 2306.12156, 2023. [40] Zhang C, Han D, Qiao Y, Kim J U, Bae S H, Lee S, et al. Faster segment anything: Towards lightweight sam for mobile applications. arXiv preprint arXiv: 2306.14289, 2023. [41] Xiong Y, Varadarajan B, Wu L, Xiang X, Xiao F, Zhu C, et al. Efficientsam: Leveraged masked image pretraining for efficient segment anything. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 2024. 16111-16121 [42] Ke L, Ye M, Danelljan M, Tai Y W, Tang C K, Yu F. Segment anything in high quality. Advances in Neural Information Processing Systems. arXiv: 2306. 01567, 2024. [43] Xie Z, Guan B, Jiang W, Yi M, Ding Y, Lu H, et al. PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation. arXiv preprint arXiv: 2401.13051, 2024. [44] Wang X, Zhang X, Cao Y, Wang W, Shen C, Huang T. Seggpt: Segmenting everything in context. arXiv preprint arXiv: 2304.03284, 2023. [45] Ren T, Liu S, Zeng A, Lin J, Li K, Cao H, et al. Grounded sam: Assembling open-world models for diverse visual tasks. arXiv preprint arXiv: 2401.14159, 2024. [46] Zou X, Yang J, Zhang H, Li F, Li L, Wang J, et al. Segment everything everywhere all at once. In: Proceedings of the 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA: NeurIPS. 2023. 19769-19782. [47] Gu X, Lin T Y, Kuo W, Cui Y. Open-vocabulary object detection via vision and language knowledge distillation. arXiv preprint arXiv: 2104.13921, 2021. [48] Du Y, Wei F, Zhang Z, Shi M, Gao Y, Li G. Learning to prompt for open-vocabulary object detection with vision-language model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA: IEEE, 2022. 14084-14093 [49] Wu X, Zhu F, Zhao R, Li H. Cora: Adapting CLIP for open-vocabulary detection with region prompting and anchor pre-matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, BC, Canada: IEEE, 2023. 7031-7040 [50] Ju C, Han T, Zheng K, Zhang Y, Xie W. Prompting visual-language models for efficient video understanding. European Conference on Computer Vision. Tel Aviv, Israel: Springer Nature Switzerland, 2022. 105-124 [51] Wang M, Xing J, Liu Y. Actionclip: A new paradigm for video action recognition. arXiv preprint arXiv: 2109.08472, 2021. [52] Mokady R, Hertz A, Bermano A H. Clipcap: Clip prefix for image captioning. arXiv preprint arXiv: 2111.09734, 2021. [53] Tewel Y, Shalev Y, Schwartz I, Wolf L. Zerocap: Zero-shot image-to-text generation for visual-semantic arithmetic. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022. 17918-17928 [54] Su Y, Lan T, Liu Y, Liu F, Yogatama D, Wang Y, et al. Language models can see: Plugging visual controls in text g eneration. arXiv preprint arXiv: 2205.02655, 2022. [55] Wang N, Xie J, Wu J, Jia M, Li L. Controllable image captioning via prompting. In: Proceedings of the AAAI Conference on Artificial Intelligence. Washington, DC, USA: AAAI Press, 2023. 2617-2625 [56] Yang J, Li Z, Zheng F, Leonardis A, Song J. Prompting for multi-modal tracking. In: Proceedings of the 30th ACM International Conference on Multimedia. Lisbon, Portugal: Association for Computing Machinery, 2022. 3492-3500 [57] Zhu J, Lai S, Chen X, Wang D, Lu H. Visual prompt multi-modal tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, BC, Canada: IEEE, 2023. 9516-9526 [58] He K, Zhang C, Xie S, Li Z, Wang Z. Target-aware tracking with long-term context attention. In: Proceedings of the AAAI Conference on Artificial Intelligence. Washington, DC, USA: AAAI Press, 2023. 773-780 [59] Luo Y, Guo X, Feng H, Ao L. RGB-T Tracking via Multi-Modal Mutual Prompt Learning. arXiv preprint arXiv: 2308.16386, 2023. [60] Tsimpoukelli M, Menick J L, Cabi S, Eslami S M, Vinyals O, Hill F. Multimodal few-shot learning with frozen language models. Advances in Neural Information Processing Systems, 2021, 34: 200−12 [61] Yang Z, Gan Z, Wang J, Hu X, Lu Y, Liu Z, et al. An empirical study of GPT-3 for few-shot knowledge-based VQA. In: Proceedings of the AAAI Conference on Artificial Intelligence. Virtual Event: AAAI Press, 2022. 3081-3089 [62] Jin W, Cheng Y, Shen Y, Chen W, Ren X. A good prompt is worth millions of parameters: Low-resource prompt-based learning for vision-language models. arXiv preprint arXiv: 2110.08484, 2021. [63] Wang A J, Zhou P, Shou M Z, Yan S. Enhancing visual grounding in vision-language pre-training with position-guided text prompts. IEEE Transactions on Pattern Analysis and Machine Intelligence, DOI: 10.1109/TPAMI.2023.3343736 [64] Wu W, Liu T, Wang Y, Xu K, Yin Q, Hu Y. Dynamic multi-modal prompting for efficient visual grounding. In: Proceedings of the 6th Chinese Conference on Pattern Recognition and Computer Vision. Xiamen, China: Springer-Verlag, 2023. 359-371 [65] Hegde D, Valanarasu J M, Patel V. CLIP goes 3D: leveraging prompt tuning for language grounded 3D recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Paris, France: IEEE, 2023. 2028-2038 [66] Zhu X, Zhang R, He B, Guo Z, Zeng Z, Qin Z, et al. Pointclip v2: Prompting clip and gpt for powerful 3d open-world learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Paris, France: IEEE, 2023. 2639-2650 [67] Bar-Tal O, Ofri-Amar D, Fridman R, Kasten Y, Dekel T. Text2live: Text-driven layered image and video editing. In: Proceedings of the European Conference on Computer Vision. Cham, Switzerland: Springer Nature, 2022. 707-723 [68] Krizhevsky A. Learning Multiple Layers of Features from Tiny Images [Master's thesis], University of Toronto, Canada, 2009 [69] Bossard L, Guillaumin M, Van G L. Food-101: Mining discriminative components with random forests. In: Proceedings of the European Conference on Computer Vision. Zurich, Switzerland: Springer International Publishing, 2014. 446-461 [70] Cimpoi M, Maji S, Kokkinos I, Mohamed S, Vedaldi A. Describing textures in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, Ohio, USA: IEEE, 2014. 3606-3613 [71] Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng A Y. Reading digits in natural images with unsupervised feature learning. In: Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning. Granada, Spain: NIPS, 2011. 4 [72] Wah C, Branson S, Welinder P, Perona P, Belongie S. The caltech-ucsd birds-200-2011 dataset, Technical Report CNS-TR-2011-001, California Institute of Technology, USA, 2011. [73] Khosla A, Jayadevaprakash N, Yao B, Fei-Fei L. Novel dataset for fine-grained image categorization. In: Proceedings of the First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, CO, USA: IEEE, 2011 [74] Fei-Fei L, Fergus R, Perona P. Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. In: Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop. Washington, DC, USA: IEEE, 2004. 178-178 [75] Parkhi O M, Vedaldi A, Zisserman A, Jawahar C V. Cats and dogs. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE, 2012. 3498-3505 [76] Krause J, Stark M, Deng J, Fei-Fei L. 3d object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision Workshops. Sydney, Australia: IEEE, 2013. 554-561 [77] Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A. Fine-grained visual classification of aircraft. arXiv preprint arXiv: 1306.5151, 2013. [78] Xiao J, Hays J, Ehinger K A, Oliva A, Torralba A. SUN database: Large-scale scene recognition from abbey to zoo. In: Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE, 2010. 3485-3492 [79] Soomro K, Zamir A, Shah M. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv: 1212.0402, 2012. [80] Cheng B, Misra I, Schwing A G, Kirillov A, Girdhar R. Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA: IEEE, 2022. 1290-1299 [81] Jain J, Li J, Chiu MT, Hassani A, Orlov N, Shi H. OneFormer: One Transformer to Rule Universal Image Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, BC, Canada: IEEE, 2023. 2989-2998 [82] Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A. Scene parsing through ADE20K dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017. 633-641 [83] Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, et al. Microsoft COCO: Common objects in context. In: Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer International Publishing, 2014. 740-755 [84] Xiao Y, Yang M, Li C, Liu L, Tang J. Attribute-based progressive fusion network for RGBT tracking. In: Proceedings of the AAAI Conference on Artificial Intelligence. Virtual Event: AAAI Press, 2022. 2831-2838 [85] Li C, Xue W, Jia Y, Qu Z, Luo B, Tang J, et al. LasHeR: A large-scale high-diversity benchmark for RGBT tracking. arXiv: 2104.13202, 2021. -
计量
- 文章访问数: 46
- HTML全文浏览量: 43
- 被引次数: 0