基于无锚框的目标检测方法及其在复杂场景下的应用进展

刘小波; 肖肖; 王凌; 蔡之华; 龚鑫; 郑可心

doi:10.16383/j.aas.c220115

基于无锚框的目标检测方法及其在复杂场景下的应用进展

doi: 10.16383/j.aas.c220115

刘小波^{1, 2, 3, 4,},
肖肖^{1, 2, 3,},
王凌^5,,
蔡之华^6,,
龚鑫^{1, 2, 3,},
郑可心^{1, 2, 3,}

1.
中国地质大学(武汉)自动化学院武汉 430074
2.
复杂系统先进控制与智能自动化湖北省重点实验室武汉 430074
3.
地球探测智能化技术教育部工程研究中心武汉 430074
4.
地质探测与评估教育部重点实验室, 中国地质大学 (武汉) 武汉 430074
5.
清华大学自动化系北京 100084
6.
中国地质大学(武汉)计算机学院武汉 430074

基金项目: 国家自然科学基金(61973285, 62076226, 61873249, 61773355), 地质探测与评估教育部重点实验室主任基金 (GLAB2023ZR08)资助

详细信息

作者简介:
刘小波：中国地质大学 (武汉) 自动化学院副教授. 2008年获得中国地质大学(武汉)计算机学院计算机软件与理论硕士学位. 2012年获得中国地质大学(武汉)计算机学院地学信息工程博士学位. 主要研究方向为机器学习, 演化计算和高光谱遥感图像处理. 本文通信作者. E-mail: xbliu@cug.edu.cn

肖肖：中国地质大学 (武汉) 自动化学院硕士研究生. 2020年获得江汉大学物理与信息工程学院学士学位. 主要研究方向为遥感图像处理, 目标检测. E-mail: xxiao@cug.edu.cn

王凌：清华大学自动化系教授. 1995年获得清华大学自动化系学士学位. 1999年获得清华大学自动化系控制理论与控制工程专业博士学位. 主要研究方向为智能优化理论、方法与应用, 复杂生产过程建模、优化与调度. E-mail: wangling@tsinghua.edu.cn

蔡之华：中国地质大学 (武汉) 计算机学院教授. 1986年获得武汉大学学士学位. 1992年获得北京工业大学硕士学位. 2003年获得中国地质大学(武汉) 博士学位. 主要研究方向为数据挖掘, 机器学习和演化计算. E-mail: zhcai@cug.edu.cn

龚鑫：中国地质大学 (武汉) 自动化学院硕士研究生. 2020年获得江汉大学物理与信息工程学院学士学位. 主要研究方向为遥感图像处理, 架构搜索. E-mail: xgong@cug.edu.cn

郑可心：中国地质大学(武汉)自动化学院硕士研究生. 2019年获得长江大学物理与光电工程学院学士学位. 主要研究方向为遥感图像处理. E-mail: zhengkexin@cug.edu.cn

计量
- 文章访问数: 2205
- HTML全文浏览量: 1996
- PDF下载量: 624
- 被引次数: 0
出版历程
- 收稿日期: 2022-02-24
- 录用日期: 2022-11-03
- 网络出版日期: 2022-12-22
- 刊出日期: 2023-07-20

Anchor-free Based Object Detection Methods and Its Application Progress in Complex Scenes

LIU Xiao-Bo^{1, 2, 3, 4
,},
XIAO Xiao^{1, 2, 3
,},
WANG Ling^5
,,
CAI Zhi-Hua^6
,,
GONG Xin^{1, 2, 3
,},
ZHENG Ke-Xin^{1, 2, 3
,}

1.
School of Automation, China University of Geosciences, Wuhan 430074
2.
Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan 430074
3.
Engineering Research Center of Intelligent Technology for Geo-Exploration, Ministry of Education, Wuhan 430074
4.
Key Laboratory of Geological Survey and Evaluation of Ministry of Education, China University of Geosciences, Wuhan 430074
5.
Department of Automation, Tsinghua University, Beijing 100084
6.
School of Computer Science, China University of Geosciences, Wuhan 430074

Funds: Supported by National Natural Science Foundation of China (61973285, 62076226, 61873249, 61773355) and Opening Fund of Key Laboratory of Geological Survey and Evaluation of Ministry of Education (GLAB2023ZR08)

More Information

Author Bio:
LIU Xiao-Bo　Associate professor at the School of Automation, China University of Geosciences. He received his master degree in computer software and theory from the School of Computer Science, China University of Geosciences in 2008. He received his Ph.D. degree in geoinformation engineering from the School of Computer Science, China University of Geosciences in 2012. His research interest covers machine learning, evolutionary computation, and hyperspectral remote sensing image processes. Corresponding author of this paper

XIAO Xiao　Master student at the School of Automation, China University of Geosciences. She received her bachelor degree from the School of Physics and Information Engineering, Jianghan University in 2020. Her research interest covers remote sensing image processing and object detection

WANG Ling　Professor in the Department of Automation, Tsinghua University. He received his bachelor degree from the Department of Automation, Tsinghua University in 1995. He received his Ph.D. degree in control theory and control engineering from the Department of Automation, Tsinghua University in 1999. His research interest covers intelligent optimization theory, method and application, and complex production process modeling, optimization and scheduling

CAI Zhi-Hua　Professor at the School of Computer Science, China University of Geosciences. He received his bechelor degree from Wuhan University in 1986. He received his master degree from Beijing University of Technology in 1992. He received his Ph.D. degree from China University of Geosciences, in 2003. His research interest covers data mining, machine learning, and evolutionary computation

GONG Xin　Master student at the School of Automation, China University of Geosciences. He received his bachelor degree from the School of Physics and Information Engineering, Jianghan University in 2020. His research interest covers remote sensing image processing and neural architecture search

ZHENG Ke-Xin　Master student at the School of Automation, China University of Geosciences. He received his bachelor degree from the School of Physics and Optoelectronic Engineering, Yangtze University in 2019. His main research interest is remote sensing image processing

摘要

摘要: 基于深度学习的目标检测方法是目前计算机视觉领域的热点, 在目标识别、跟踪等领域发挥了重要的作用. 随着研究的深入开展, 基于深度学习的目标检测方法主要分为有锚框的目标检测方法和无锚框的目标检测方法, 其中无锚框的目标检测方法无需预定义大量锚框, 具有更低的模型复杂度和更稳定的检测性能, 是目前目标检测领域中较前沿的方法. 在调研国内外相关文献的基础上, 梳理基于无锚框的目标检测方法及各场景下的常用数据集, 根据样本分配方式不同, 分别从基于关键点组合、中心点回归、Transformer、锚框和无锚框融合等4个方面进行整体结构分析和总结, 并结合COCO (Common objects in context)数据集上的性能指标进一步对比. 在此基础上, 介绍了无锚框目标检测方法在重叠目标、小目标和旋转目标等复杂场景情况下的应用, 聚焦目标遮挡、尺寸过小和角度多等关键问题, 综述现有方法的优缺点及难点. 最后对无锚框目标检测方法中仍存在的问题进行总结并对未来发展的应用趋势进行展望.
- 无锚框 /
- 关键点 /
- 中心点 /
- Transformer /
- 复杂场景 /
- 目标检测
Abstract: The object detection method based on deep learning is a hot spot in the field of computer vision currently, which plays an important role in object recognition and tracking. With the in-depth development of research, object detection method are mainly divided into anchor-based object detection method and anchor-free object detection method, the anchor-free object detection method is state-of-the-art method currently, which can reduces model complexity and reaches higher detection ability, owing to the fact that there is no need to predefine a large number of anchor boxes for it. Firstly, this paper summarizes the anchor-free object detection method and analyzes the performance on the common objects in context (COCO) dataset for further comparison in recent years, which are divided into keypoints combination, center point regression, Transformer, anchor-based fusing with anchor-free methods respectively, according to different sampling methods of anchor points. On the basis, in order to solve the above key problems, this paper focuses the applications of anchor-free object detection method in complex scenes. Finally, discusses the problems and the application trends of the anchor-free object detection method in the future development.
- Anchor-free /
- keypoints /
- center point /
- Transformer /
- complex scenes /
- object detection

HTML全文

图 1 基于锚框的目标检测方法整体框架

Fig. 1 The overall framework of anchor-based object detection method

下载: 全尺寸图片幻灯片

图 2 基于无锚框的目标检测方法整体框架

Fig. 2 The overall framework of anchor-free object detection method

下载: 全尺寸图片幻灯片

图 3 基于角点组合的CornerNet目标检测方法

Fig. 3 CornerNet framework of object detection method based on corner points combination

下载: 全尺寸图片幻灯片

图 4 预测框采样方法

Fig. 4 The sampling methods of prediction box

下载: 全尺寸图片幻灯片

图 5 基于中心点回归的无锚框目标检测方法整体框架

Fig. 5 The overall framework of anchor-free object detection method based on center point regression

下载: 全尺寸图片幻灯片

图 6 DETR整体框架

Fig. 6 The overall architecture of DETR

下载: 全尺寸图片幻灯片

图 7 基于优化标签分配算法的关系

Fig. 7 The relationship between label assignment optimization algorithms

下载: 全尺寸图片幻灯片

图 8 重叠目标检测问题

Fig. 8 The detection problems of overlapping object

下载: 全尺寸图片幻灯片

图 9 小目标示例

Fig. 9 The object example of too few pixels

下载: 全尺寸图片幻灯片

图 10 RepPoints系列点集表示示例

Fig. 10 The example of RepPoints series point set

下载: 全尺寸图片幻灯片

图 11 多角度目标检测结果示例

Fig. 11 The detection result of arbitrary rotation objects

下载: 全尺寸图片幻灯片

表 1 目标检测公共数据集对比

Table 1 Comparison of public datasets for object detection

数据集	类别数	图片数量	实例数量	图片尺寸 (像素)	标注方式	使用场景	发表年份
Pascal VOC^[10]	20	~23 k	~55 k	800 × 800	水平框	综合	2010
COCO^[11]	80	~123 k	~896 k	—	水平框	综合	2014
DOTA^[12]	15	~2.8 k	~188 k	800 ~ 4000	水平框/旋转框	综合	2018
UCAS-AOD^[13]	2	~1 k	~6 k	1280 × 1280	旋转框	汽车、飞机	2015
ICDAR2015^[14]	1	1.5 k	—	720 × 1280	旋转框	文本	2015
CUHK-SYSU^[15]	1	~18 k	~96 k	50 ~ 4000	水平框	行人	2017
PRW^[16]	1	~12 k	~43 k	—	水平框	行人	2017
CrowdHuman^[17]	1	~24 k	~470 k	608 × 608	水平框	行人	2018
HRSC2016^[18]	1	~1.1 k	~3 k	~1000 × 1000	旋转框	船舰	2017
SSDD^[19]	1	1.16 k	~2.5 k	500 × 500	水平框	船舰	2017
HRSID^[20]	1	~5.6 k	~17 k	800 × 800	水平框	船舰	2020

下载: 导出CSV

表 2 基于无锚框的目标检测方法对比

Table 2 Comparison of anchor-free object detection method

方法类型	基于关键点组合	基于中心点回归	基于Transformer	基于锚框和无锚框融合
方法动机	无需设计锚框, 减少锚框带来的超参数, 简化模型
方法思想	组合关键点并检测	中心点回归预测框位置	Transformer的编码和解码直接预测	优化样本标签分配策略
方法优点	充分利用边界和内部信息	减少回归超参数数量	实现端到端, 简化流程	缓解正负样本不均衡
方法难点	不同类别关键点的误配对	中心点重叠目标的漏检	小目标检测性能较差	自适应标签分配不连续
计算速度	检测速度相对较慢	检测速度相对较快	收敛速度相对较慢	检测速度相对较慢

下载: 导出CSV

表 3 基于关键点组合的无锚框目标检测算法在COCO数据集上的性能及优缺点对比

Table 3 Comparison of the keypoints combination based anchor-free object detection methods on the COCO dataset

算法	特征提取网络	输入尺寸 (像素)	处理器配置及检测速度(帧/s)	mAP (%)	优点	缺点	收录来源	发表年份
PLN^[21]	Inception-V2	512 × 512	GTX 1080 —	28.9	重叠及特殊形状目标的检测效果好	感受野范围较小	arXiv	2017
CornerNet^[22]	Hourglass-104	511 × 511	TitanX × 10 4.1	42.1	使用角池化来精确定位目标	同类别的角点匹配易出错	ECCV	2018
CornerNet-Saccade^[23]	Hourglass-54	255 × 255	GTX 1080Ti × 4 5.2	42.6	无需对每个像素点进行类别检测	小目标的误检率较高	BMVC	2020
CornerNet-Squeeze^[23]	Hourglass-54	255 × 255	GTX 1080Ti × 4 33	34.4	大幅提升检测速度	角点类别的判断较易出错	BMVC	2020
ExtremeNet^[24]	Hourglass-104	511 × 511	TitanX × 10 3.1	43.7	极值点和中心点充分获取目标信息	容易产生假阳性样本	CVPR	2019
CenterNet-Triplets^[25]	Hourglass-104	511 × 511	Tesla V100 × 8 2.94	47.0	用角点和中心点获取充分目标信息	中心点遗漏时位置偏移量大	ICCV	2019
CentripetalNet^[26]	Hourglass-104	511 × 511	Tesla V100 × 16 —	48.0	改进CornerNet的角点误匹配问题	中心区域的缩放依赖超参数	CVPR	2020
SaccadeNet^[27]	DLA-34-DCN	512 × 512	RTX 2080Ti 28	40.4	获取局部和整体特征, 提高特征利用率	需要平衡检测精度与速度	CVPR	2020
CPNDet^[28]	Hourglass-104	511 × 511	Tesla V100 × 8 —	49.2	多种分类器提升角点类别判断准确率	检测头计算效率较低	ECCV	2020

下载: 导出CSV

表 4 基于中心点回归的无锚框目标检测算法在COCO数据集上的性能及优缺点对比

Table 4 Comparison of the center point regression based anchor-free object detection methods on the COCO dataset

算法	特征提取网络	输入尺寸 (像素)	处理器配置及检测速度(帧/s)	mAP (%)	优点	缺点	收录来源	发表年份
YOLO v1^[31]	—	—	—	—	用网格划分法提高中心点搜寻效率	目标中心点在同网格内的漏检	CVPR	2016
FCOS^[33]	ResNet-101	800 × $\le 1333$	— 9.3	41.5	用中心度降低远离中心点的预测框得分	同尺度特征层中出现目标误检	ICCV	2019
CenterNet^[35]	Hourglass-104	511 × 511	Titan X 7.8	45.1	用中心点定位目标减少角点匹配操作	目标中心点重合, 产生漏检	arXiv	2019
Grid R-CNN^[40]	ResNet-101	800 × 800	Titan Xp × 32 3.45	41.5	用网格定位机制精准定位边界框	特征采样区域范围过于广泛	CVPR	2019
Grid R-CNN Plus^[41]	ResNet-101	800 × 800	Titan Xp × 32 7.69	42.0	缩小特征表达区域尺寸, 减少计算量	非代表性特征区域存在遗漏	arXiv	2019
HoughNet^[37]	Hourglass-104	512 × 512	Tesla V100 × 4 —	46.4	用投票机制改进全局信息缺失的问题	投票机制使计算量增大	ECCV	2020
YOLOX^[32]	Darknet53	640 × 640	Tesla V100 × 8 90.1	47.4	解耦分类和回归分支, 提升收敛速度	难分类样本的检测精度较低	arXiv	2021
OneNet^[34]	ResNet-101	512 × $\le 853$	Tesla V100 × 8 50	37.7	用最小匹配损失提升预测框和标签的匹配	单像素点检测单目标, 产生漏检	ICML	2021
CenterNet2^[36]	Res2Net-101-DCN-BiFPN	1280 × 1280	Titan Xp —	56.4	清晰区分目标特征和背景区域的特征	分步分类、回归的效率较低	arXiv	2021

下载: 导出CSV

表 5 基于Transformer的无锚框目标检测算法在COCO数据集上的性能及优缺点对比

Table 5 Comparison of the Transformer based anchor-free object detection methods on the COCO dataset

算法	特征提取网络	输入尺寸 (像素)	处理器配置及检测速度(帧/s)	mAP (%)	浮点计算量(FLOPs/G)	优点	缺点	收录来源	发表年份
DETR^[42]	ResNet-50	(480, 800)× (800, 1333)	Tesla V100 × 16 28	42.0	86	用Transformer减少手工设计参数数量	收敛速度慢, 小目标检测性能较差	ECCV	2020
TSP-FCOS^[43]	ResNet-50	(640, 800)× (800, 1333)	Tesla V100 × 8 15	43.1	189	添加辅助子网来提高多尺度特征的提取	模型计算量、复杂度较高	ICCV	2021
Deformable DETR^[44]	ResNet-50	(480, 800)× (800, 1333)	Tesla V100 19	43.8	173	有效关注稀疏空间的目标位置	模型计算量、复杂度较高	ICLR	2021
Dynamic DETR^[45]	ResNet-50	—	Tesla V100 × 8 —	47.2	—	用动态注意力机制加速收敛	未说明模型的计算量、复杂度	ICCV	2021
YOLOS^[47]	DeiT-base	(480, 800)× (800, 1333)	— 2.7	42.0	538	不依赖卷积骨干网络, 性能良好	检测速度较低, 计算量较高	NeurlPS	2021
SAM-DETR^[46]	ResNet-50	(480, 800)× (800, 1333)	Tesla V100 × 8 —	41.8	100	利用语义对齐加速模型收敛速度	检测精度有待进一步提升	CVPR	2022
ViDT^[49]	Swin-base	(480, 800)× (800, 1333)	Tesla V100 × 8 11.6	49.2	—	用新的骨干网络和检测颈减少计算开销	浅层难以直接获取目标的有用信息	ICLR	2022
DN-DETR^[50]	ResNet-50	—	Tesla A100 × 8 —	44.1	94	利用去噪训练法大幅提升检测性能	仅使用均匀分布的噪声	CVPR	2022

下载: 导出CSV

表 6 基于锚框和无锚框融合的目标检测算法在COCO数据集上的性能及优缺点对比

Table 6 Comparison of the anchor-based and anchor-free fusion object detection methods on the COCO dataset

算法	特征提取网络	输入尺寸 (像素)	处理器配置及检测速度(帧/s)	mAP (%)	优点	缺点	收录来源	发表年份
FSAF^[52]	ResNeXt-101	800 × 800	Tesla V100 × 8 2.76	44.6	动态选择最适合目标的特征层	未区分不同特征的关注程度	CVPR	2019
SAPD^[54]	ResNeXt-101	800 × 800	GTX 1080Ti 4.5	47.4	能筛选出有代表性的目标特征	未能真正将有锚框和无锚框分支融合	ECCV	2020
ATSS^[56]	ResNeXt-101	800 × (800, 1333)	Tesla V100 —	50.7	能根据统计特性自动训练样本	未完全实现无需参数调节的样本分配	CVPR	2020
AutoAssign^[57]	ResNeXt-101	800 × 800	—	52.1	无需手动调节的动态样本分配	样本的的权重分配机制相对较复杂	arXiv	2020
LSNet^[58]	ResNeXt-101	800 × (800, 1333)	Tesla V100 × 8 5.1	50.4	用位置敏感网络大幅提高定位精度	小目标的定位和分类精度较低	arXiv	2021
DW^[59]	ResNeXt-101	800 × 800	GPU × 8 —	49.8	有效获取分类和回归置信度高的框	小目标的检测性能仍需进一步提升	CVPR	2022

下载: 导出CSV

表 7 解决目标重叠排列问题的不同检测方法的性能对比

Table 7 Performance comparison of detection methods to solve the problem that objects are densely arranged

问题	算法	数据集	输入尺寸 (像素)	骨干网络	处理器配置	检测速度 (帧/s)	mAP (%)	收录来源	发表年份
目标重叠排列	VarifocalNet^[75]	COCO	(480, 960)× 1333	ResNeXt-101	Tesla V100 × 8	6.7	50.8	TMI	2019
	WSMA-Seg^[77]	COCO	—	MSP-Seg	—	—	38.1	arXiv	2019
	FCOS v2^[73]	COCO CrowdHuman	800×$\le$1333	ResNeXt-101 ResNet-50	GTX 1080Ti	—	50.4 87.3	TPAMI	2022
	BorderDet^[76]	COCO	800×$\le$1333	ResNeXt-101	GPU × 8	—	50.3	ECCV	2020
	AlignPS^[71]	CUHK-SYSU PRW	900 × 1500	ResNet-50	Tesla V100	16.4	94.0 46.1	CVPR	2021
	OTA-FCOS^[78]	COCO CrowdHuman	(640, 800) ×$\le$ 1333	ResNeXt-101 ResNet-50	GPU × 8	—	51.5 88.4	CVPR	2021
	LLA-FCOS^[79]	CrowdHuman	800×$\le$1400	ResNet-50	GPU × 8	—	88.1	Neuro- computing	2021
	LTM^[80]	COCO	800×$\le$1333	ResNeXt-101	Tesla V100 × 8	1.7	46.3	TPAMI	2022
	Efficient DETR^[81]	COCO CrowdHuman	—	ResNet-101 ResNet-50	—	—	45.7 90.8	arXiv	2021
	PSTR^[72]	CUHK-SYSU PRW	900×1500	ResNet-50	Tesla V100	—	94.2 50.1	CVPR	2022
	COAT^[82]	CUHK-SYSU PRW	900×1500	ResNet-50	Tesla A100	11.1	94.2 53.3	CVPR	2022
	Progressive DETR^[83]	COCO CrowdHuman	(480, 800)×$\le$1333	ResNet-50	GPU × 8	—	46.7 92.1	CVPR	2022

下载: 导出CSV

表 8 解决目标重叠排列问题的不同检测方法优缺点对比

Table 8 Feature comparison of detection methods to solve the problem that objects are densely arranged

问题	算法	方法	优点	缺点/难点
目标重叠排列	CSP^[70]	增加中心点偏移量预测分支和尺度预测分支	解决行人检测任务中漏检问题	特征与框间的关联度较低
	VarifocalNet^[75]	预测IACS分类得分、提出Varifocal Loss损失函数	有效抑制同目标重叠框	小目标检测效果需提升
	WSMA-Seg^[77]	利用分割模型构建无需NMS后处理的目标检测模型	准确利用重叠目标边缘特征	分割算法的模型复杂度较高
	FCOS v2^[73]	将中心度子分支加入回归分支, 并修正中心度计算方式	减少类别判断错误数量	针对不同尺度特征仅使用相同检测头, 限制模型性能
	BorderDet^[76]	用边界对齐的特征提取操作自适应地提取边界特征	高效获取预测框的位置	边界点选取数量较多
	AlignPS^[71]	使用特征对齐和聚合模块	解决区域、尺度不对齐的问题	未扩展到通用目标检测任务
	OTA-FCOS^[78]	用最优传输理论寻找全局高置信度样本分配方式	有助于选择信息丰富区域	模型的计算复杂度较高
	LLA-FCOS^[79]	使用基于损失感知的样本分配策略	锚点和真实框对应性更好	仅在密集人群中的效果较好
	LTM^[80]	目标与特征的匹配定义为极大似然估计问题	提高目标遮挡和不对齐的精度	检测速度有待进一步提高
	Efficient DETR^[81]	用密集先验知识初始化来简化模型结构	减少编码器和解码器数量	检测精度有待进一步提升
	PSTR^[72]	使用Transformer构成首个行人搜索网络	提高特征的可判别性和关联性	未扩展到通用目标检测任务
	COAT^[82]	用三段级联设计来检测和完善目标的检测和重识别	更清晰地区分目标和背景特征	部分阶段过度关注ReID特征, 牺牲部分检测性能
	Progressive DETR^[83]	设计关联信息提取模块和队列更新模块	加强低置信点的复用	检测精度有待进一步提升

下载: 导出CSV

表 9 解决目标尺寸过小问题的不同检测方法性能对比

Table 9 Performance comparison of detection methods to solve the problem that object pixels are too few

问题	算法	数据集	输入尺寸 (像素)	骨干网络	处理器配置	检测速度 (帧/s)	mAP (%)	收录来源	发表年份
目标尺寸过小	RepPoints^[89]	COCO	(480, 960) ×$\le$960	ResNet-101	GPU × 4	—	46.5	ICCV	2019
	DuBox^[92]	COCO VOC 2012	800 × 800 500 × 500	ResNet-101 VGG-16	NVIDIA P40 × 8	—	39.5 82.0	arXiv	2019
	PPDet^[87]	COCO	800 × 1300	ResNet-101	Tesla V100 × 4	—	45.2	BMVC	2020
	RepPoints v2^[90]	COCO	(800, 1333) × $\le$1333	ResNet-101	GPU × 8	—	48.1	NeurlPS	2020
	FoveaBox^[93]	COCO VOC 2012	800 × 800	ResNet-101 ResNet-50	GPU × 4	— 16.4	42.1 76.6	TIP	2020
	FBR-Net^[94]	SSDD	448 × 448	ResNet-50	RTX 2080Ti	25.0	92.8	TGRS	2021
	FCOS (AFE-GDH)^[88]	HRSID SSDD	800 × 800	ResNet-50	NVIDIA Titan Xp	15.2 28.5	67.4 56.2	Remote Sensing	2022
	Oriented RepPoints ^[91]	DOTA HRSC2016	1024 × 1024 (300, 900)× (300, 1500)	ResNet-101 ResNet-50	RTX 2080Ti × 4	—	76.5 97.3	CVPR	2022
	QueryDet^[95]	COCO	—	ResNet-50	RTX 2080Ti × 8	14.4	39.5	CVPR	2022

下载: 导出CSV

表 10 解决目标尺寸过小问题的不同检测方法优缺点对比

Table 10 Feature comparison of detection methods to solve the problem that object pixels are too few

问题	算法	方法	优点	缺点/难点
目标尺寸过小	RepPoints^[89]	使用点集形式表征目标的特征	自适应地学习极值点和语义信息	过度依赖回归分支
	DuBox^[92]	使用有多尺度特性的双尺度残差单元	减少小目标边缘和内部信息的漏检	分割模型的复杂度较高
	PPDet^[87]	使用框内部为正样本点的新标记策略	提高判别性目标特征的贡献程度	小目标特征信息不足
	RepPoints v2^[90]	增加角点验证分支来判断特征映射点	获得更具目标内部和边缘信息的特征	预测框定位准确度低
	FoveaBox^[93]	在多层特征图上检测多尺度目标特征	对目标形状和分布有很强的适应能力	难以区分目标和背景区域
	FBR-Net^[94]	用多尺度注意力机制选择特征重要性	减少背景区域与小目标间的强关联性	检测精度仍需进一步提升
	FCOS (AFE-GDH)^[88]	使用自适应特征编码策略(AFE)和构造高斯引导检测头	有效增强小目标表达能力	仅说明船舰目标有效性
	Oriented RepPoints^[91]	提出质量评估、样本分配方案和空间约束	提升非轴对齐小目标特征的捕获能力	仅涉及空域小目标检测
	QueryDet^[95]	使用基于级联稀疏查询机制进行动态预测	减少检测头计算开销、提高小目标的位置精确度	提高分辨率导致误判概率提高

下载: 导出CSV

表 11 解决目标方向变化问题的不同检测方法性能对比

Table 11 Performance comparison of detection methods to solve the problem that object direction changeable

问题	算法	数据集	输入尺寸 (像素)	骨干网络	处理器配置	检测速度 (帧/s)	mAP (%)	收录来源	发表年份
目标方向变化	SARD^[102]	DOTA HRSC2016	800 × 800	ResNet-101	Tesla P100	— 1.5	72.9 85.4	IEEE Access	2019
	P-RSDet^[97]	DOTA UCAS-AOD	512 × 512	ResNet-101	Tesla V100 × 2	—	72.3 90.0	IEEE Access	2020
	O²-DNet^[101]	DOTA ICDAR2015	800 × 800	ResNet-101	Tesla V100 × 2	—	71.0 85.6	P&RS	2020
	DRN^[106]	DOTA HRSC2016	1024 × 1024 768 × 768	Hourglass-104	Tesla V100	—	73.2 92.7	CVPR	2020
	BBAVectors^[96]	DOTA HRSC2016	608 × 608	ResNet-101	GTX 1080Ti × 4	— 11.7	75.4 88.6	WACV	2021
	FCOSR^[98]	DOTA HRSC2016	1024 × 1024 800 × 800	ResNeXt-101	Tesla V100 × 4	7.9 —	77.4 95.7	arXiv	2021
	DARDet^[103]	DOTA HRSC2016	1024 × 1024	ResNet-50	RTX 2080Ti	12.6 —	71.2 78.9	GRSL	2021
	DAFNe^[105]	DOTA HRSC2016	1024 × 1024	ResNet-101	Tesla V100 × 4	—	76.9 89.5	arXiv	2021
	CHPDet^[107]	UCAS-AOD HRSC2016	1024 × 1024	DLA-34	RTX 2080Ti	—	89.6 88.8	TGRS	2021
	AOPG^[99]	DOTA HRSC2016	1024 × 1024 (800, 1333) × (800, 1333)	ResNet-101 ResNet-50	RTX 2080Ti	10.8 —	80.2 96.2	TGRS	2022
	GGHL^[100]	DOTA SSDD+	800 × 800 —	Darknet53	RTX 3090 × 2	42.3 44.1	76.9 90.2	TIP	2022

下载: 导出CSV

表 12 解决目标方向变化问题的不同检测方法优缺点对比

Table 12 Feature comparison of detection methods to solve the problem that object direction changeable

问题	算法	方法	优点	缺点/难点
目标方向变化	SARD^[102]	用尺度感知方法融合深层和浅层特征信息	对大尺度变化和多角度变化目标适应度好	整体检测效率较低
	P-RSDet^[97]	回归一个极半径和两个极角, 实现多角度物体的检测	避免角度周期性及预测框的顶点排序问题	极坐标的后处理操作相关复杂度较高
	O²-DNet^[101]	用横纵比感知方向中心度的方法, 学习判别性特征	网络从复杂背景中学习更具判别性的特征	特征融合方法的实际融合效果较差
	DRN^[106]	使用自适应的特征选择模块和动态优化的检测头	缓解目标特征和坐标轴之间的不对齐问题	检测精度有待进一步提升
	BBAVectors^[96]	使用边缘感知向量来替代原回归参数	在同坐标系中回归所有参数, 减少计算量	向量的类型转化过程处理较复杂
	FCOSR^[98]	使用基于高斯分布的椭圆中心采样策略	修正样本分配方法在航空场景下漏检问题	未实现标签分配方案的自适应
	DARDet^[103]	设计高效对齐卷积模块来提取对齐特征	可一次性预测出所有的预测框相关参数	损失函数的角度预测偏移量较大
	DAFNe^[105]	使用基于方向感知的边界框中心度函数	降低低质量框的权重并且提高定位精度	损失函数仍存在部分旋转敏感度误差
	CHPDet^[107]	使用方位不变模块OIM生成方位不变特征映射	确定旋转目标的朝向(如车头、船头等)	存在目标预测框的位置偏移量
	AOPG^[99]	使用区域标签分配模块和粗定位模块	缓解标签分配不均衡、目标特征不对齐的问题	未实现标签分配方案的自适应
	GGHL^[100]	使用二维定向高斯热力图进行定向框标签分配	实现动态标签分配对齐回归和分类任务	检测精度有待进一步提升

下载: 导出CSV

参考文献(116)

[1]	聂光涛, 黄华. 光学遥感图像目标检测算法综述. 自动化学报, 2021, 47(8): 1749-1768 doi: 10.16383/j.aas.c200596 Nie Guang-Tao, Huang Hua. A survey of object detection in optical remote sensing images. Acta Automatica Sinica, 2021, 47(8): 1749-1768 doi: 10.16383/j.aas.c200596
[2]	Neubeck A, Van Gool L. Efficient non-maximum suppression. In: Proceedings of the 18th International Conference on Pattern Recognition (ICPR＇06). Hong Kong, China: IEEE, 2006. 850−855
[3]	Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, et al. SSD: Single shot multibox detector. In: Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016. 21−37
[4]	Girshick R. Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015. 1440−1448
[5]	Ren S Q, He K M, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. In: Proceedings of the 29th International Conference on Neural Information Processing Systems. Montreal, Canada: 2015. 91−99
[6]	Redmon J, Farhadi A. YOLO9000: Better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017. 6517−6525
[7]	Redmon J, Farhadi A. YOLOv3: An incremental improvement. arXiv preprint arXiv: 1804.02767, 2018.
[8]	肖雨晴, 杨慧敏. 目标检测算法在交通场景中应用综述. 计算机工程与应用, 2021, 57(6): 30-41 doi: 10.3778/j.issn.1002-8331.2011-0361 Xiao Yu-Qing, Yang Hui-Min. Research on application of object detection algorithm in traffic scene. Computer Engineering and Applications, 2021, 57(6): 30-41 doi: 10.3778/j.issn.1002-8331.2011-0361
[9]	Huang L C, Yang Y, Deng Y F, Yu Y N. DenseBox: Unifying landmark localization with end to end object detection. arXiv preprint arXiv: 1509.04874, 2015.
[10]	Everingham M, Van Gool L, Williams C K I, Winn J, Zisserman A. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 2010, 88(2): 303-338 doi: 10.1007/s11263-009-0275-4
[11]	Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, et al. Microsoft COCO: Common objects in context. In: Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014. 740−755
[12]	Xia G S, Bai X, Ding J, Zhu Z, Belongie S, Luo J B, et al. DOTA: A large-scale dataset for object detection in aerial images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 3974−3983
[13]	Zhu H G, Chen X G, Dai W Q, Fu K, Ye Q X, Jiao J B. Orientation robust object detection in aerial images using deep convolutional neural network. In: Proceedings of the IEEE International Conference on Image Processing (ICIP). Quebec City, Canada: IEEE, 2015. 3735−3739
[14]	Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, et al. ICDAR 2015 competition on robust reading. In: Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR). Tunis, Tunisia: IEEE, 2015. 1156−1160
[15]	Xiao T, Li S, Wang B C, Lin L, Wang X G. Joint detection and identification feature learning for person search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017. 3376−3385
[16]	Zheng L, Zhang H H, Sun S Y, Chandraker M, Yang Y, Tian Q. Person re-identification in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017. 3346−3355
[17]	Shao S, Zhao Z J, Li B X, Xiao T T, Yu G, Zhang X Y, et al. CrowdHuman: A benchmark for detecting human in a crowd. arXiv preprint arXiv: 1805.00123, 2018.
[18]	Liu Z K, Yuan L, Weng L B, Yang Y P. A high resolution optical satellite image dataset for ship recognition and some new baselines. In: Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods. Porto, Portugal: SciTePress, 2017. 324−331
[19]	Li J W, Qu C W, Shao J Q. Ship detection in SAR images based on an improved faster R-CNN. In: Proceedings of the SAR in Big Data Era: Models, Methods and Applications (BIGSARDATA). Beijing, China: IEEE, 2017. 1−6
[20]	Wei S J, Zeng X F, Qu Q Z, Wang M, Su H, Shi J. HRSID: A high-resolution SAR images dataset for ship detection and instance segmentation. IEEE Access, 2020, 8: 120234-120254 doi: 10.1109/ACCESS.2020.3005861
[21]	Wang X G, Chen K B, Huang Z L, Yao C, Liu W Y. Point linking network for object detection. arXiv preprint arXiv: 1706.03646, 2017.
[22]	Law H, Deng J. CornerNet: Detecting objects as paired keypoints. In: Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer, 2018. 765−781
[23]	Law H, Teng Y, Russakovsky O, Deng J. Cornernet-lite: Efficient keypoint based object detection. In: Proceedings of the 31st British Machine Vision Conference. BMVC, 2020.
[24]	Zhou X Y, Zhuo J C, Krähenbühl P. Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 850−859
[25]	Duan K W, Bai S, Xie L X, Qi H G, Huang Q M, Tian Q. CenterNet: Keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul, South Korea: IEEE, 2019. 6568−6577
[26]	Dong Z W, Li G X, Liao Y, Wang F, Ren P J, Qian C. CentripetalNet: Pursuing high-quality keypoint pairs for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020. 10516−10525
[27]	Lan S Y, Ren Z, Wu Y, Davis L S, Hua G. SaccadeNet: A fast and accurate object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020. 10394−10403
[28]	Duan K W, Xie L X, Qi H G, Bai S, Huang Q M, Tian Q. Corner proposal network for anchor-free, two-stage object detection. In: Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer, 2020. 399−416
[29]	王彦情, 马雷, 田原. 光学遥感图像舰船目标检测与识别综述. 自动化学报, 2011, 37(9): 1029-1039 Wang Yan-Qing, Ma Lei, Tian Yuan. State-of-the-art of ship detection and recognition in optical remotely sensed imagery. Acta Automatica Sinica, 2011, 37(9): 1029-1039
[30]	Yu J H, Jiang Y N, Wang Z Y, Cao Z M, Huang T. UnitBox: An advanced object detection network. In: Proceedings of the 24th ACM International Conference on Multimedia. Amsterdam, The Netherlands: ACM, 2016. 516−520
[31]	Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 779−788
[32]	Ge Z, Liu S T, Wang F, Li Z M, Sun J. YOLOX: Exceeding YOLO series in 2021. arXiv preprint arXiv: 2107.08430, 2021.
[33]	Tian Z, Shen C H, Chen H, He T. FCOS: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul, South Korea: IEEE, 2019. 9626−9635
[34]	Sun P Z, Jiang Y, Xie E Z, Shao W Q, Yuan Z H, Wang C H, et al. What makes for end-to-end object detection? In: Proceedings of the 38th International Conference on Machine Learning. PMLR, 2021. 9934−9944
[35]	Zhou X Y, Wang D Q, Krahenbuhl P. Objects as points. arXiv preprint arXiv: 1904.07850, 2019.
[36]	Zhou X Y, Koltun V, Krähenbühl P. Probabilistic two-stage detection. arXiv preprint arXiv: 2103.07461, 2021.
[37]	Samet N, Hicsonmez S, Akbas E. HoughNet: Integrating near and long-range evidence for bottom-up object detection. In: Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer, 2020. 406−423
[38]	Chu C, Zhmoginov A, Sandler M. CycleGAN, a master of steganography. arXiv preprint arXiv: 1712.02950, 2017.
[39]	Isola P, Zhu J Y, Zhou T H, Efros A A. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017. 5967−5976
[40]	Lu X, Li B Y, Yue Y X, Li Q Q, Yan J J. Grid R-CNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 7355−7364
[41]	Lu X, Li B Y, Yue Y X, Li Q Q, Yan J J. Grid R-CNN plus: Faster and better. arXiv preprint arXiv: 1906.05688, 2019.
[42]	Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S. End-to-end object detection with transformers. In: Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer, 2020. 213−229
[43]	Sun Z Q, Cao S C, Yang Y M, Kitani K. Rethinking transformer-based set prediction for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021. 3591−3600
[44]	Zhu X Z, Su W J, Lu L W, Li B, Wang X G, Dai J F. Deformable DETR: Deformable transformers for end-to-end object detection. In: Proceedings of the 9th International Conference on Learning Representations. ICLR, 2021.
[45]	Dai X Y, Chen Y P, Yang J W, Zhang P C, Yuan L, Zhang L. Dynamic DETR: End-to-end object detection with dynamic attention. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021. 2968−2977
[46]	Zhang G J, Luo Z P, Yu Y C, Cui K W, Lu S J. Accelerating DETR convergence via semantic-aligned matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022. 939−948
[47]	Fang Y X, Liao B C, Wang X G, Fang J M, Qi J Y, Wu R, et al. You only look at one sequence: Rethinking transformer in vision through object detection. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021. 26183−26197
[48]	Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X H, Unterthiner T, et al. An image is worth 16×16 words: Transformers for image recognition at scale. In: Proceedings of the 9th International Conference on Learning Representations. ICLR, 2021.
[49]	Song H, Sun D Q, Chun S, Jampani V, Han D, Heo B, et al. ViDT: An efficient and effective fully transformer-based object detector. In: Proceedings of the 10th International Conference on Learning Representations. ICLR, 2022.
[50]	Li F, Zhang H, Liu S L, Guo J, Ni L M, Zhang L. DN-DETR: Accelerate DETR training by introducing query DeNoising. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022. 13609−13617
[51]	Wang J F, Yuan Y, Li B X, Yu G, Jian S. SFace: An efficient network for face detection in large scale variations. arXiv preprint arXiv: 1804.06559, 2018.
[52]	Zhu C C, He Y H, Savvides M. Feature selective anchor-free module for single-shot object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 840−849
[53]	Lin T Y, Goyal P, Girshick R, He K M, Dollár P. Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 2999−3001
[54]	Zhu C C, Chen F Y, Shen Z Q, Savvides M. Soft anchor-point object detection. In: Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer, 2020. 91−107
[55]	Zhang X S, Wan F, Liu C, Ji R R, Ye Q X. FreeAnchor: Learning to match anchors for visual object detection. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc., 2019. Article No. 14
[56]	Zhang S F, Chi C, Yao Y Q, Lei Z, Li S Z. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020. 9756−9765
[57]	Zhu B J, Wang J F, Jiang Z K, Zong F H, Liu S T, Li Z M, et al. AutoAssign: Differentiable label assignment for dense object detection. arXiv preprint arXiv: 2007.03496, 2020.
[58]	Duan K W, Xie L X, Qi H G, Bai S, Huang Q M, Tian Q. Location-sensitive visual recognition with cross-IOU loss. arXiv preprint arXiv: 2104.04899, 2021.
[59]	Li S, He C H, Li R H, Zhang L. A dual weighting label assignment scheme for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022. 9377−9386
[60]	刘小波, 刘鹏, 蔡之华, 乔禹霖, 王凌, 汪敏. 基于深度学习的光学遥感图像目标检测研究进展. 自动化学报, 2021, 47(9): 2078-2089 doi: 10.16383/j.aas.c190455 Liu Xiao-Bo, Liu Peng, Cai Zhi-Hua, Qiao Yu-Lin, Wang Ling, Wang Min. Research progress of optical remote sensing image object detection based on deep learning. Acta Automatica Sinica, 2021, 47(9): 2078-2089 doi: 10.16383/j.aas.c190455
[61]	龚浩田, 张萌. 基于关键点检测的无锚框轻量级目标检测算法. 计算机科学, 2021, 48(8): 106-110 doi: 10.11896/jsjkx.200700161 Gong Hao-Tian, Zhang Meng. Lightweight anchor-free object detection algorithm based on KeyPoint detection. Computer Science, 2021, 48(8): 106-110 doi: 10.11896/jsjkx.200700161
[62]	邵晓雯, 帅惠, 刘青山. 融合属性特征的行人重识别方法. 自动化学报, 2022, 48(2): 564-571 Shao Xiao-Wen, Shuai Hui, Liu Qing-Shan. Person re-identification based on fused attribute features. Acta Automatica Sinica, 2022, 48(2): 564-571
[63]	刘洋, 战荫伟. 基于深度学习的小目标检测算法综述. 计算机工程与应用, 2021, 57(2): 37-48 doi: 10.3778/j.issn.1002-8331.2009-0047 Liu Yang, Zhan Yin-Wei. Survey of small object detection algorithms based on deep learning. Computer Engineering and Applications, 2021, 57(2): 37-48 doi: 10.3778/j.issn.1002-8331.2009-0047
[64]	Bodla N, Singh B, Chellappa R, Davis L S. Soft-NMS: Improving object detection with one line of code. In: Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 5562−5570
[65]	Liu S T, Huang D, Wang Y H. Adaptive NMS: Refining pedestrian detection in a crowd. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 6452−6461
[66]	Huang X, Ge Z, Jie Z Q, Yoshie O. NMS by representative region: Towards crowded pedestrian detection by proposal pairing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020. 10747−10756
[67]	Zhang S F, Wen L Y, Bian X, Lei Z, Li S Z. Occlusion-aware R-CNN: Detecting pedestrians in a crowd. In: Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer, 2018. 657−674
[68]	阳珊, 王建, 胡莉, 刘波, 赵皓. 改进RetinaNet的遮挡目标检测算法研究. 计算机工程与应用, 2022, 58(11): 209-214 doi: 10.3778/j.issn.1002-8331.2107-0277 Yang Shan, Wang Jian, Hu Li, Liu Bo, Zhao Hao. Research on occluded object detection by improved RetinaNet. Computer Engineering and Applications, 2022, 58(11): 209-214 doi: 10.3778/j.issn.1002-8331.2107-0277
[69]	Luo Z K, Fang Z, Zheng S X, Wang Y B, Fu Y W. NMS-Loss: Learning with non-maximum suppression for crowded pedestrian detection. In: Proceedings of the International Conference on Multimedia Retrieval. Taipei, China: ACM, 2021. 481−485
[70]	Liu W, Hasan I, Liao S C. Center and scale prediction: A box-free approach for pedestrian and face detection. arXiv preprint arXiv: 1904.02948, 2019.
[71]	Yan Y C, Li J P, Qin J, Bai S, Liao S C, Liu L, et al. Anchor-free person search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE, 2021. 7686−7695
[72]	Cao J L, Pang Y W, Anwer R M, Cholakkal H, Xie J, Shah M, et al. PSTR: End-to-end one-step person search with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022. 9448−9457
[73]	Tian Z, Shen C H, Chen H, He T. FCOS: A simple and strong anchor-free object detector. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(4): 1922-1933
[74]	Rezatofighi H, Tsoi N, Gwak J Y, Sadeghian A, Reid I, Savarese S. Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 658−666
[75]	Qin Y L, Wen J, Zheng H, Huang X L, Yang J, Song N, et al. Varifocal-Net: A chromosome classification approach using deep convolutional networks. IEEE Transactions on Medical Imaging, 2019, 38(11): 2569-2581 doi: 10.1109/TMI.2019.2905841
[76]	Qiu H, Ma Y C, Li Z M, Liu S T, Sun J. BorderDet: Border feature for dense object detection. In: Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer, 2020. 549−564
[77]	Cheng Z H, Wu Y X, Xu Z H, Lukasiewicz T, Wang W Y. Segmentation is all you need. arXiv preprint arXiv: 1904.13300, 2019.
[78]	Ge Z, Liu S T, Li Z M, Yoshie O, Sun J. OTA: Optimal transport assignment for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE, 2021. 303−312
[79]	Ge Z, Wang J F, Huang X, Liu S T, Yoshie O. LLA: Loss-aware label assignment for dense pedestrian detection. Neurocomputing, 2021, 462: 272-281 doi: 10.1016/j.neucom.2021.07.094
[80]	Zhang X S, Wan F, Liu C, Ji X Y, Ye Q X. Learning to match anchors for visual object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(6): 3096-3109 doi: 10.1109/TPAMI.2021.3050494
[81]	Yao Z Y, Ai J B, Li B X, Zhang C. Efficient DETR: Improving end-to-end object detector with dense prior. arXiv preprint arXiv: 2104.01318, 2021.
[82]	Yu R, Du D W, LaLonde R, Davila D, Funk C, Hoogs A, et al. Cascade transformers for end-to-end person search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022. 7257−7266
[83]	Zheng A L, Zhang Y, Zhang X Y, Qi X J, Sun J. Progressive end-to-end object detection in crowded scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022. 847−856
[84]	Zhu Y S, Zhao C Y, Wang J Q, Zhao X, Wu Y, Lu H Q. CoupleNet: Coupling global structure with local parts for object detection. In: Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 4146−4154
[85]	Li Y Z, Pang Y W, Shen J B, Cao J L, Shao L. NETNet: Neighbor erasing and transferring network for better single shot object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020. 13346−13355
[86]	Zhong Z Y, Sun L, Huo Q. An anchor-free region proposal network for Faster R-CNN-based text detection approaches. International Journal on Document Analysis and Recognition (IJDAR), 2019, 22(3): 315-327 doi: 10.1007/s10032-019-00335-y
[87]	Samet N, Hicsonmez S, Akbas E. Reducing label noise in anchor-free object detection. In: Proceedings of the 31st British Machine Vision Conference. BMVC, 2020.
[88]	He B K, Zhang Q Y, Tong M, He C. An anchor-free method based on adaptive feature encoding and Gaussian-guided sampling optimization for ship detection in SAR imagery. Remote Sensing, 2022, 14(7): 1738 doi: 10.3390/rs14071738
[89]	Yang Z, Liu S H, Hu H, Wang L W, Lin S. RepPoints: Point set representation for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul, South Korea: IEEE, 2019. 9656−9665
[90]	Chen Y H, Zhang Z, Cao Y, Wang L W, Lin S, Hu H. RepPoints v2: Verification meets regression for object detection. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020. Article No. 33
[91]	Li W T, Chen Y J, Hu K X, Zhu J K. Oriented RepPoints for aerial object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022. 1819−1828
[92]	Chen S, Li J P, Yao C Q, Hou W B, Qin S, Jin W Y, et al. DuBox: No-prior box objection detection via residual dual scale detectors. arXiv preprint arXiv: 1904.06883, 2019.
[93]	Kong T, Sun F C, Liu H P, Jiang Y N, Li L, Shi J B. FoveaBox: Beyound anchor-based object detection. IEEE Transactions on Image Processing, 2020, 29: 7389-7398 doi: 10.1109/TIP.2020.3002345
[94]	Fu J M, Sun X, Wang Z R, Fu K. An anchor-free method based on feature balancing and refinement network for multiscale ship detection in SAR images. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(2): 1331-1344 doi: 10.1109/TGRS.2020.3005151
[95]	Yang C, Huang Z H, Wang N Y. QueryDet: Cascaded sparse query for accelerating high-resolution small object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE, 2022. 13658−13667
[96]	Yi J R, Wu P X, Liu B, Huang Q Y, Qu H, Metaxas D. Oriented object detection in aerial images with box boundary-aware vectors. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision. Waikoloa, USA: IEEE, 2021. 2149−2158
[97]	Zhou L, Wei H R, Li H, Zhao W Z, Zhang Y, Zhang Y. Arbitrary-oriented object detection in remote sensing images based on polar coordinates. IEEE Access, 2020, 8: 223373-223384 doi: 10.1109/ACCESS.2020.3041025
[98]	Li Z H, Hou B, Wu Z T, Jiao L C, Ren B, Yang C. FCOSR: A simple anchor-free rotated detector for aerial object detection. arXiv preprint arXiv: 2111.10780, 2021.
[99]	Cheng G, Wang J B, Li K, Xie X X, Lang C B, Yao Y Q, et al. Anchor-free oriented proposal generator for object detection. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: Article No. 5625411
[100]	Huang Z C, Li W, Xia X G, Tao R. A general Gaussian heatmap label assignment for arbitrary-oriented object detection. IEEE Transactions on Image Processing, 2022, 31: 1895-1910 doi: 10.1109/TIP.2022.3148874
[101]	Wei H R, Zhang Y, Chang Z H, Li H, Wang H Q, Sun X. Oriented objects as pairs of middle lines. ISPRS Journal of Photogrammetry and Remote Sensing, 2020, 169: 268-279 doi: 10.1016/j.isprsjprs.2020.09.022
[102]	Wang Y S, Zhang Y, Zhang Y, Zhao L J, Sun X, Guo Z. SARD: Towards scale-aware rotated object detection in aerial imagery. IEEE Access, 2019, 7: 173855-173865 doi: 10.1109/ACCESS.2019.2956569
[103]	Zhang F, Wang X Y, Zhou S L, Wang Y Q. DARDet: A dense anchor-free rotated object detector in aerial images. IEEE Geoscience and Remote Sensing Letters, 2021, 19: Article No. 8024305
[104]	Chen Z M, Chen K, Lin W Y, See J, Yu H, Ke Y, et al. PIoU loss: Towards accurate oriented object detection in complex environments. In: Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer, 2020. 195−211
[105]	Lang S, Ventola F, Kersting K. DAFNe: A one-stage anchor-free approach for oriented object detection. arXiv preprint arXiv: 2109.06148, 2021.
[106]	Pan X J, Ren Y Q, Sheng K K, Dong W M, Yuan H L, Guo X W, et al. Dynamic refinement network for oriented and densely packed object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2020. 11204−11213
[107]	Zhang F, Wang X Y, Zhou S L, Wang Y Q, Hou Y. Arbitrary-oriented ship detection through center-head point extraction. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60: Article No. 5612414
[108]	Yang X, Yan J C, Ming Q, Wang W T, Zhang X P, Tian Q. Rethinking rotated object detection with Gaussian wasserstein distance loss. In: Proceedings of the 38th International Conference on Machine Learning. PMLR, 2021. 11830−11841
[109]	Yang X, Yang X J, Yang J R, Ming Q, Wang W T, Tian Q, et al. Learning high-precision bounding box for rotated object detection via Kullback-Leibler divergence. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021. 18381−18394
[110]	Qian W, Yang X, Peng S L, Zhang X J, Yan J C. RSDet++: Point-based modulated loss for more accurate rotated object detection. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(11): 7869-7879 doi: 10.1109/TCSVT.2022.3186070
[111]	Yang X, Yang J R, Yan J C, Zhang Y, Zhang T F, Guo Z, et al. SCRDet: Towards more robust detection for small, cluttered and rotated objects. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul, South Korea: IEEE, 2019. 8231−8240
[112]	Milan A, Leal-Taixé L, Reid I, Roth S, Schindler K. MOT16: A benchmark for multi-object tracking. arXiv preprint arXiv: 1603.00831, 2016.
[113]	Zhang Y F, Wang C Y, Wang X G, Zeng W J, Liu W Y. FairMOT: On the fairness of detection and re-identification in multiple object tracking. International Journal of Computer Vision, 2021, 129(11): 3069-3087 doi: 10.1007/s11263-021-01513-4
[114]	Howard A G, Zhu M L, Chen B, Kalenichenko D, Wang W J, Weyand T, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv: 1704.04861, 2017.
[115]	Zhang X Y, Zhou X Y, Lin M X, Sun J. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 6848−6856
[116]	Wang R J, Li X, Ling C X. Pelee: A real-time object detection system on mobile devices. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: Curran Associates Inc., 2018. 1967−1976