期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Siamese visual tracking with enriched semantics and dynamic template

王汇三张红颖《光电子快报》2021,17(4):241-246

Siamese tracking methods have recently drawn extensive attention due to their balanced accuracy and efficiency. However, most Siamese-based trackers use shallow backbone network, in which extracting high-level semantic features is difficult. When the appearance of distractors and targets is particularly similar, these methods may lead to tracking drift or even failure. Considering this deficiency, we propose a Siamese network with enriched semantics, named ESDT. First, a semantic enrichment module (SEM) comprising dilated convolution layers is designed to improve the classification capability of the siamese tracker. In addition, the target template is updated adaptively to cope with the target texture information changes caused by illumination and blur and further promote the tracking performance. Finally, exhaustive experimental analysis on the public datasets shows that the proposed algorithm outperforms several state-of-the-art algorithms and could track the target stably despite disturbances. 相似文献

2.

Discriminative descriptors for object tracking

《Journal of Visual Communication and Image Representation》2016

相似文献

3.

Object semantic-guided graph attention feature fusion network for Siamese visual tracking

《Journal of Visual Communication and Image Representation》2023

The similarity matching between the template and the search area plays a key role in Siamese-based trackers. Most Siamese-based trackers adopt correlation operation to perform feature fusion on the template branch and search branch for similarity matching. However, the correlation operation directly uses the template feature to slide the window on the search area feature without distinguishing the discriminant part of the target and the background noise, which blurs the spatial information of the response feature. To address this issue, this work proposes a novel object semantic-guided graph attention feature fusion network that both removes background information and focuses on the discriminative part of the object. The proposed network effectively removes background noise by utilizing an adaptive template instead of the fixed-size template used by the correlation operation. The network also models the contextual semantic relations of the target and uses the resulting semantic relations to guide the feature fusion process in a part-based manner, thereby accurately highlighting the discriminative parts of the target. Therefore, the problem of blurring response feature caused by correlation operation is effectively resolved. Furthermore, we propose an object-aware prediction network to learn object-aware features for classification and regression task, which effectively improves the discriminative ability of the prediction network. Experiments on many challenging benchmarks like OTB-100, LaSOT, TColor-128, GOT-10k and VOT2019, show that our methods achieves excellent performance. 相似文献

4.

End-to-end DeepNCC framework for robust visual tracking

《Journal of Visual Communication and Image Representation》2020

In this paper, we propose an NCC-based object tracking deep framework, which can be well initialized with the limited target samples in the first frame. The proposed framework contains a pretrained model, online feature fine-tuning layers and tracking processes. The pretrained model provides rich feature representations while online feature fine-tuning layers select discriminative and generic features for the tracked object. We choose normalized cross-correlation as a template tracking layer to perform the tracking process. To enable the learned features representation closely coordinated to the tracked target, we jointly train the feature representation network and tracking processes. In online tracking, an adaptive template and a fixed template are fused to find the optimal tracking results. Scale estimation and a high-confidence model update scheme are perfectly integrated into the framework to adapt to the target appearance changes. The extensive experiments demonstrate that the proposed tracker achieves superior performance compared with other state-of-the-art trackers. 相似文献

5.

SiamMBFAN: Siamese tracker with multi-branch feature aggregation network

《Journal of Visual Communication and Image Representation》2022

Siamese trackers have attracted considerable attention in the field of object tracking because of their high precision and speed. However, one of the main disadvantages of Siamese trackers is that their feature extraction network is relatively single. They often use AlexNet or ResNet50 as the backbone network. AlexNet is shallow and thus cannot easily extract abundant semantic information, whereas ResNet50 has many convolutional layers, reducing the real-time performance of Siamese trackers. We propose a multi-branch feature aggregation network with different designs in the shallow and deep convolutional layers. We use the residual module to build the shallow convolutional layers to extract textural and edge features. The deep convolution layers, designed with two independent branches, are built with residual and parallel modules to extract different semantic features. The proposed network has a depth of only nine modules, and thus it is a simple and effective network. We then apply the network to a Siamese tracker to form SiamMBFAN. We design multi-layer classification and regression subnetworks in the Siamese tracker by aggregating the last three modules of the two branches, improving the localization ability of the tracker. Our tracker achieves a better balance between performance and speed. Finally, SiamMBFAN is tested on four challenging benchmarks, including OTB100, VOT2016, VOT2018, and UAV123. Compared with other trackers, our tracker improves by 7% (OTB100). 相似文献

6.

ACSiam: Asymmetric convolution structures for visual tracking with Siamese network

《Journal of Visual Communication and Image Representation》2022

Object trackers based on Siamese network usually transform the tracking task into a matching problem between the candidate samples and the target template. However, with the increasing depth and width of backbone networks, researches on Siamese trackers using backbone networks are not very advanced. Therefore, it is necessary for us to further investigate the characteristics of backbone network. As a fact, the ability of backbone network to extract features can directly determine the performance of object tracker. Given this, in this paper, we first propose an asymmetric convolutional network to improve the representational capability of backbone network. And then, the strip convolution is employed to enhance the operational capability of square kernel convolution in the backbone network. Besides, we also construct a novel module named Feature Dropblock (i.e., FD) to simulate the occlusion of hidden space, which goal is to improve the performance of backbone network in the target tracking under occlusion. To demonstrate the effectiveness of the proposed tracker, extensive ablation studies are conducted. Better results are obtained on the tracking benchmarks OTB100 and VOT2018, compared to other state-of-the-art trackers. 相似文献

7.

Learning complementary Siamese networks for real-time high-performance visual tracking

《Journal of Visual Communication and Image Representation》2021

Recently, Siamese based methods have made a breakthrough in the visual tracking field. However, the existing trackers still cannot take full advantage of the deep features. In this work, we improve the performances of Siamese trackers by complementary learning with different types of matching features. Specifically, a Matching Activation Network (MAN) is firstly designed to highlight the matching regions of the search image given a template. Since only sparse parts of feature maps contribute to the matching result, an important design choice is to emphasize the weak-matching features by erasing the strong-matching ones and learn complementary classifiers from both types of features. Then we propose a novel complementary region proposal network (CoRPN) to take complementary features as inputs and their outputs complement to each other, which are fused to improve the performance. Experiments show that our proposed tracker achieves leading performances on five tracking datasets while retaining real-time speed. 相似文献

8.

Cross-layer progressive attention bilinear fusion method for fine-grained visual classification

《Journal of Visual Communication and Image Representation》2022

Fine-grained visual classification (FGVC) is a critical task in the field of computer vision. However, FGVC is full of challenges due to the large intra-class variation and small inter-class variation of the classes to be classified on an image. The key in dealing with the problem is to capture subtle visual differences from the image and effectively represent the discriminative features. Existing methods are often limited by insufficient localization accuracy and insufficient feature representation capabilities. In this paper, we propose a cross-layer progressive attention bilinear fusion (CPABF in short) method, which can efficiently express the characteristics of discriminative regions. The CPABF method involves three components: 1) Cross-Layer Attention (CLA) locates and reinforces the discriminative region with low computational costs; 2) The Cross-Layer Bilinear Fusion Module (CBFM) effectively integrates the semantic information from the low-level to the high-level 3) Progressive Training optimizes the parameters in the network to the best state in a delicate way. The CPABF shows excellent performance on the four FGVC datasets and outperforms some state-of-the-art methods. 相似文献

9.

Siamese visual tracking with multilayer feature fusion and corner distance IoU loss

《Journal of Visual Communication and Image Representation》2022

The tracker based on the Siamese network regards tracking tasks as solving a similarity problem between the target template and search area. Using shallow networks and offline training, these trackers perform well in simple scenarios. However, due to the lack of semantic information, they have difficulty meeting the accuracy requirements of the task when faced with complex backgrounds and other challenging scenarios. In response to this problem, we propose a new model, which uses the improved ResNet-22 network to extract deep features with more semantic information. Multilayer feature fusion is used to obtain a high-quality score map to reduce the influence of interference factors in the complex background on the tracker. In addition, we propose a more powerful Corner Distance IoU (intersection over union) loss function so that the algorithm can better regression to the bounding box. In the experiments, the tracker was extensively evaluated on the object tracking benchmark data sets, OTB2013 and OTB2015, and the visual object tracking data sets, VOT2016 and VOT2017, and achieved competitive performance, proving the effectiveness of this method. 相似文献

10.

基于特征融合的RGBT双模态孪生跟踪网络

申亚丽《红外与激光工程》2021,50(3):20200459-1-20200459-7

热红外成像技术被广泛地应用于军事、遥感和安防等领域中的目标跟踪,但热红外图像对对比度较低、目标模糊等跟踪场景效果一般。因此,将热红外图像与可见光图像进行融合提高跟踪性能具有重要意义。与基于可见光或热红外图像的单模态跟踪算法相比,基于可见光/热红外(RGB/Thermal, RGBT)图像的双模态跟踪算法对光照变化、云雾遮挡具有更强的鲁棒性。提出了一种基于特征融合的RGBT双模态孪生跟踪网络架构。该网络将双模态图像中提取的深度特征进行融合,提高目标外观特征的判别力。该网络可以利用训练数据进行端到端的离线训练。公开数据集RGBT234上的实验结果表明,所提出的RGBT双模态孪生特征融合跟踪网络能够实现复杂场景下鲁棒持续的目标跟踪。相似文献

11.

Learning discriminative and meaningful samples for generalized zero shot classification

《Signal Processing: Image Communication》2020

Generalized zero shot classification aims to recognize both seen and unseen samples in test sets, which has gained great attention. Recently, many works consider using generative adversarial network to generate unseen samples for solving generalized zero shot classification problem. In this paper, we study how to generate discriminative and meaningful samples. We propose a method to learn discriminative and meaningful samples for generalized zero shot classification tasks (LDMS) by generative adversarial network with the regularization of class consistency and semantic consistency. In order to make the generated samples discriminative, class consistency is used, such that the generated samples of the same classes are near and of different classes are far away. In order to make the generated samples meaningful, semantic consistency is used, such that the semantic representations of the generated samples are close to their class prototypes. It encodes the discriminative information and semantic information to the generator. In order to alleviate the bias problem, we select some confident unseen samples. We use the seen samples, the generated unseen samples and the selected confident unseen samples to train the final classifier. Extensive experiments on all datasets demonstrate that the proposed method can outperform state-of-the-art models on generalized zero shot classification tasks. 相似文献

12.

Network in network based weakly supervised learning for visual tracking

《Journal of Visual Communication and Image Representation》2016

One of the key limitations of the many existing visual tracking method is that they are built upon low-level visual features and have limited predictability power of data semantics. To effectively fill the semantic gap of visual data in visual tracking with little supervision, we propose a tracking method which constructs a robust object appearance model via learning and transferring mid-level image representations using a deep network, i.e., Network in Network (NIN). First, we design a simple yet effective method to transfer the mid-level features learned from NIN on the source tasks with large scale training data to the tracking tasks with limited training data. Then, to address the drifting problem, we simultaneously utilize the samples collected in the initial and most previous frames. Finally, a heuristic schema is used to judge whether updating the object appearance model or not. Extensive experiments show the robustness of our method. 相似文献

13.

Cross-level reinforced attention network for person re-identification

《Journal of Visual Communication and Image Representation》2020

Attention mechanism is a simple and effective method to enhance discriminative performance of person re-identification (Re-ID). Most of previous attention-based works have difficulty in eliminating the negative effects of meaningless information. In this paper, a universal module, named Cross-level Reinforced Attention (CLRA), is proposed to alleviate this issue. Firstly, we fuse features of different semantic levels using adaptive weights. The fused features, containing richer spatial and semantic information, can better guide the generation of subsequent attention module. Then, we combine hard and soft attention to improve the ability to extract important information in spatial and channel domains. Through the CLRA, the network can aggregate and propagate more discriminative semantic information. Finally, we integrate the CLRA with Harmonious Attention CNN (HA-CNN) and form a novel Cross-level Reinforced Attention CNN (CLRA-CNN) to optimize person Re-ID. Experiment results on several public benchmarks show that the proposed method achieves state-of-the-art performance. 相似文献

14.

Object tracking using discriminative sparse appearance model

《Signal Processing: Image Communication》2015

Object tracking based on sparse representation formulates tracking as searching the candidate with minimal reconstruction error in target template subspace. The key problem lies in modeling the target robustly to vary appearances. The appearance model in most sparsity-based trackers has two main problems. The first is that global structural information and local features are insufficiently combined because the appearance is modeled separately by holistic and local sparse representations. The second problem is that the discriminative information between the target and the background is not fully utilized because the background is rarely considered in modeling. In this study, we develop a robust visual tracking algorithm by modeling the target as a model for discriminative sparse appearance. A discriminative dictionary is trained from the local target patches and the background. The patches display the local features while their position distribution implies the global structure of the target. Thus, the learned dictionary can fully represent the target. The incorporation of the background into dictionary learning also enhances its discriminative capability. Upon modeling the target as a sparse coding histogram based on this learned dictionary, our tracker is embedded into a Bayesian state inference framework to locate a target. We also present a model update scheme in which the update rate is adjusted automatically. In conjunction with the update strategy, the proposed tracker can handle occlusion and alleviate drifting. Comparative results on challenging benchmark image sequences show that the tracking method performs favorably against several state-of-the-art algorithms. 相似文献

15.

基于增强RPN的孪生网络目标跟踪算法

张长弓杨海涛冯博迪王晋宇李高源《电讯技术》2022,62(10)

目前孪生网络跟踪器已经具有比较良好的表现,但是对于卷积神经网络所提取的特征仍没有较好地利用其特点,同时孪生网络通过相似性学习进行跟踪的特性使跟踪器的准确性和鲁棒性存在不足。提出了一种金字塔式特征融合的方法,根据骨干网络特征提取层不同深度具有不同侧重的特点提高网络对目标的表征能力,然后使用注意力机制对区域推荐网络（Region Proposal Network,RPN）进行增强,最终实现更精准更鲁棒的跟踪。在OTB100数据集的实验中,新提出的SiamERPN(Siamese Enhanced RPN)算法分别得到了0.668的成功率和0.876的精度,测试结果好于基线算法和其他对比算法。相似文献

16.

Edge-aware object pixel-level representation tracking

《Journal of Visual Communication and Image Representation》2023

Recently, there has been a trend in tracking to use more refined segmentation mask instead of coarse bounding box to represent the target object. Some trackers proposed segmentation branches based on the tracking framework and maintain real-time speed. However, those trackers use a simple FCNs structure and lack of the edge information modeling. This makes performance quite unsatisfactory. In this paper, we propose an edge-aware segmentation network, which uses the complementarity between target information and edge information to provide a more refined representation of the target. Firstly, We use the high-level features of the tracking backbone network and the correlation features of the classification branch of the tracking framework to fuse, and use the target edge and target segmentation mask for simultaneous supervision to obtain an optimized high-level feature with rough edge information and target information. Secondly, we use the optimized high-level features to guide the low-level features of the tracking backbone network to generate more refined edge features. Finally, we use the refined edge features to fuse with the target features of each layer to generate the final mask. Our approach has achieved leading performance on recent pixel-wise object tracking benchmark VOT2020 and segmentation datasets DAVIS2016 and DAVIS2017 while running on 47 fps. Code is available at https://github.com/TJUMMG/EATtracker. 相似文献

17.

Learning discriminative update adaptive spatial-temporal regularized correlation filter for RGB-T tracking

《Journal of Visual Communication and Image Representation》2020

The RGB-T trackers based on correlation filter framework have been extensively investigated for that they can track targets more accurately in most complex scenes. However, the performance of these trackers is limited when facing some specific challenging scenarios, such as occlusion and background clutter. For different tracking targets, most of these trackers utilize fixed regularization constraint to build the filter model, which is obviously unreasonable to effectively present the appearance changes and characteristics of a specific target. In addition, they adopt a simple model update mechanism based on linear interpolation, which can easily lead to model degradation in challenging scenarios, resulting in tracker drift. To solve the above problems, we propose a novel adaptive spatial-temporal regularized correlation filter model to learn an appropriate regularization for achieving robust tracking and a relative peak discriminative method for model updating to avoid the model degradation. Besides, to make better integrate the unique advantages of the two modes and adapt the changing appearance of the target, an adaptive weighting ensemble scheme and a multi-scale search mechanism are adopted, respectively. To optimize the proposed model, we designed an efficient ADMM algorithm, which greatly improved the efficiency. Extensive experiments have been carried out on two available datasets, RGBT234 and RGBT210, and the experimental results indicate that the tracker proposed by us performs favorably in both accuracy and robustness against the state-of-the-art RGB-T trackers. 相似文献

18.

基于代价敏感结构化SVM的目标跟踪

袁广林孙子文秦晓燕夏良朱虹《电子与信息学报》2021,43(11):3335-3341

基于结构化SVM的目标跟踪由于其优异的性能而受到了广泛关注,但是现有方法存在正样本和负样本不平衡问题。针对此问题,该文首先提出一种用于目标跟踪的代价敏感结构化SVM模型,其次基于对偶坐标下降原理设计了该模型的求解算法,最后利用提出的代价敏感结构化SVM实现了一种多尺度目标跟踪方法。在OTB100数据集和VOT2019数据集上进行了实验验证,实验结果表明:该文方法相比相关滤波目标跟踪方法,跟踪精度较高,相比深度目标跟踪方法,具有速度优势。相似文献

19.

基于空谱注意力机制及预激活残差网络的高光谱图像分类算法

下载免费PDF全文

袁芊芊谢维信《信号处理》2022,38(12):2594-2605

面向高光谱图像分类的许多深度学习算法中,由于提取的空谱特征表示鉴别性不足,其模型的分类性能有待提高。针对该问题,本文提出了一种基于空谱注意力机制及预激活残差网络的高光谱图像分类算法。首先,设计了基于空谱注意力机制的空谱特征提取模块,对空谱特征进行重校准,为空谱特征在后续联合学习时能专注于更具辨别力的通道和空间位置提供保证;其次,设计了基于预激活残差网络的空谱特征联合学习模块,其中预激活残差网络改进了原始残差构建块的网络结构,从而能在利用注意力机制重校准的空谱特征的联合学习时捕获更具鉴别性的深层空谱特征,以提高分类器的分类性能。实验结果表明,和已有的一些高光谱图像分类算法相比,所提出的算法的分类准确率更高,表明该算法能有效地获得判别能力更强的空谱特征表示。相似文献

20.

基于语义特征传播图神经网络的小样本图像分类算法

姜威汪洋尹晶朱超然《激光与红外》2023,53(12):1944-1952

使用少量样本进行学习和概括的能力是人工智能和人类之间主要的区别。在小样本学习领域,大多数图神经网络专注于将标记的样本信息传递给未标记的查询样本,而忽略了语义特征在分类过程中的重要作用。为此构建了语义特征传播图神经网络,首先将语义特征嵌入到图神经网络中,解决了细粒度图像特征相似性带来的分类准确率低的问题,然后将注意力机制与骨干网络合并达到强化前景并提高特征提取质量的目的,利用马氏距离计算类的相似度得到更好的分类性能,最后使用Funnel ReLU函数作为激活函数进一步提高分类准确率。在基准数据集上实验表明,所提算法相比于基线算法在5类1/2/5样本任务上的准确率分别提高了903%、456%和415%。相似文献