基于注意力机制的多尺度全场景监控目标检测方法 Object Detection Method for Multi-scale Full-scene Surveillance Based on Attention Mechanism期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于注意力机制的多尺度全场景监控目标检测方法

引用本文：	张德祥,王俊,袁培成.基于注意力机制的多尺度全场景监控目标检测方法[J].电子与信息学报,2022,44(9):3249-3257.

作者姓名：	张德祥王俊袁培成

作者单位：	1.安徽大学电气工程与自动化学院合肥 2306012.安徽三联学院电子电气工程学院合肥 230601

基金项目：	国家重点研发计划(2018YFB0504604)

摘要：	针对复杂城市监控场景中由于目标尺寸变化大、目标遮挡、天气影响等原因导致目标特征不明显的问题，该文提出一种基于注意力机制的多尺度全场景监控目标检测方法。该文设计了一种基于Yolov5s模型的多尺度检测网络结构，以提高网络对目标尺寸变化的适应性。同时，构建了基于注意力机制的特征提取模块，通过网络学习获得特征的通道级别权重，增强了目标特征，抑制了背景特征，提高了特征的网络提取能力。通过K-means聚类算法计算全场景监控数据集的初始锚框大小，加速模型收敛同时提升检测精度。在COCO数据集上，与基本网络相比，平均精度均值(mAP)提高了3.7%，mAP₅₀提升了4.7%，模型推理时间仅为3.8 ms。在整个场景监控数据集中，mAP₅₀达到89.6%，处理监控视频时为154 fps，满足监控现场的实时检测要求。
关键词：	目标检测全场景监控多尺度检测注意力机制
收稿时间：	2021-07-02
Object Detection Method for Multi-scale Full-scene Surveillance Based on Attention Mechanism

ZHANG Dexiang,WANG Jun,YUAN Peicheng.Object Detection Method for Multi-scale Full-scene Surveillance Based on Attention Mechanism[J].Journal of Electronics & Information Technology,2022,44(9):3249-3257.

Authors:	ZHANG Dexiang WANG Jun YUAN Peicheng

Affiliation:	1.School of Electrical Engineering and Automation, Anhui University, Hefei 230601, China2.School of Electronic and Electrical Engineering, Anhui Sanlian University, Hefei 230601, China

Abstract:	Focusing on the problem that the object features are not obvious in complex urban surveillance scenes due to large object size changes, object occlusion and weather influence, a multi-scale full-scene surveillance object detection method based on attention mechanism is proposed. In this paper, a multi-scale detection network structure based on Yolov5s model is designed to improve the adaptability of the network to the changes of object size. Meanwhile, a feature extraction module based on attention mechanism is constructed to obtain channel level weight of features through network learning, which enhances the object features, suppresses the background features, and improves the network extraction capability of features. The initial anchor frame size of the full-scene surveillance dataset is calculated by the K-means clustering algorithm to accelerate the model convergence while improving the detection accuracy. On the COCO dataset, the mean Average Precision (mAP) is improved by 3.7%, and the mAP₅₀ is improved by 4.7% compared with the basic network, and the model inference time is only 3.8 ms. In the full-scene surveillance dataset, the mAP₅₀ reaches 89.6% and the fps is 154 frames per second when processing the surveillance video, which meets the real-time detection requirements of the surveillance scene.

Keywords:

	点击此处可从《电子与信息学报》浏览原始摘要信息
	点击此处可从《电子与信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏