首页 | 官方网站   微博 | 高级检索  
     

结合时序注意力机制的多特征融合行人序列图像属性识别方法
引用本文:黄晨,裴继红,赵阳. 结合时序注意力机制的多特征融合行人序列图像属性识别方法[J]. 信号处理, 2022, 38(1): 64-73. DOI: 10.16798/j.issn.1003-0530.2022.01.008
作者姓名:黄晨  裴继红  赵阳
作者单位:深圳大学电子与信息工程学院,广东 深圳 518060
基金项目:国家自然科学基金项目(62071303,61871269);广东省基础与应用基础研究基金(2019A1515011861);深圳市科技计划项目(JCYJ20190808151615540)。
摘    要:目前绝大多数的行人属性识别任务都是基于单张图像的,单张图像所含信息有限,而图像序列中包含丰富的有用信息和时序特征,利用序列信息是提高行人属性识别性能的一个重要途径.本文提出了结合时序注意力机制的多特征融合行人序列图像属性识别网络,该网络除了使用常见的空-时二次平均池化特征聚合和空-时平均最大池化特征聚合提取序列的特征外...

关 键 词:行人属性识别  时序注意力机制  特征融合  时序建模
收稿时间:2021-03-02

Pedestrian Sequence Attribute Recognition Method with Multi-feature Fusion Combined with Temporal Attention Mechanism
HUANG Chen,PEI Jihong,ZHAO Yang. Pedestrian Sequence Attribute Recognition Method with Multi-feature Fusion Combined with Temporal Attention Mechanism[J]. Signal Processing(China), 2022, 38(1): 64-73. DOI: 10.16798/j.issn.1003-0530.2022.01.008
Authors:HUANG Chen  PEI Jihong  ZHAO Yang
Affiliation:School of Electronic and Information Engineering,Shenzhen University,Shenzhen,Guangdong 518060,China
Abstract:The majority of pedestrian attribute recognition tasks are based on a single image. The information contained in a single image is limited, and the image sequence contained rich useful information and temporal features. Using sequence information is an important way to improve the performance of pedestrian attribute recognition. This paper proposed a multi feature fusion pedestrian sequence attribute recognition network based on temporal attention mechanism. In addition to using common spatial-temporal quadratic average pooling feature aggregation and spatial-temporal mean maximum pooling feature aggregation to extract features, the network also designs spatial-temporal attention factor weighted feature aggregation branch to further extract sequence features. By fusing the sequence features of the above three branches, the network can obtain more abundant information. In the spatial-temporal attention factor weighted feature aggregation branches, a full channel spatial-temporal attention factor generation network based on 3D convolution is designed to better capture the spatial-temporal features in a sequence. Based on the cross-entropy loss, this paper adds the Tversky loss, which is used to constrain the number of FP and FN, as the overall loss function of the network, so that the network has a better trade-off between the precision and the recall. The experimental results show that the proposed method is superior to the method based on a single image and other common feature fusion and time series modeling methods in each performance metrics. 
Keywords:pedestrian attribute recognition  temporal attention mechanism  feature fusion  temporal modeling
本文献已被 维普 等数据库收录!
点击此处可从《信号处理》浏览原始摘要信息
点击此处可从《信号处理》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号