首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
谢林  李菲菲  陈虬 《电子科技》2019,32(1):38-41
针对场景识别中低级特征与高级概念之间的语义鸿沟问题,提出了一种基于稀疏自动编码机的场景识别方法。采用了稀疏自动编码机和空间金字塔池化相结合的特征编码技术。首先对场景图像提取局部的HOG特征,然后利用改进的稀疏自动编码机对HOG特征进行编码,得到稀疏特征,通过空间金字塔池化和局部归一化得到整张场景图像的表示,最后利用线性SVM实现分类。在标准的场景图像数据集Scene-15上进行的实验表明,该算法可以将识别的准确率提升至81.97%。  相似文献   

2.
Sparse coding which encodes the natural visual signal into a sparse space for visual codebook generation and feature quantization, has been successfully utilized for many image classification applications. However, it has been seldom explored for many video analysis tasks. In particular, the increased complexity in characterizing the visual patterns of diverse human actions with both the spatial and temporal variations imposes more challenges to the conventional sparse coding scheme. In this paper, we propose an enhanced sparse coding scheme through learning discriminative dictionary and optimizing the local pooling strategy. Localizing when and where a specific action happens in realistic videos is another challenging task. By utilizing the sparse coding based representations of human actions, this paper further presents a novel coarse-to-fine framework to localize the Volumes of Interest (VOIs) for the actions. Firstly, local visual features are transformed into the sparse signal domain through our enhanced sparse coding scheme. Secondly, in order to avoid exhaustive scan of entire videos for the VOI localization, we extend the Spatial Pyramid Matching into temporal domain, namely Spatial Temporal Pyramid Matching, to obtain the VOI candidates. Finally, a multi-level branch-and-bound approach is developed to refine the VOI candidates. The proposed framework is also able to avoid prohibitive computations in local similarity matching (e.g., nearest neighbors voting). Experimental results on both two popular benchmark datasets (KTH and YouTube UCF) and the widely used localization dataset (MSR) demonstrate that our approach reduces computational cost significantly while maintaining comparable classification accuracy to that of the state-of-the-art methods.  相似文献   

3.
This paper addresses the problem of efficient representation of scenes captured by distributed omnidirectional vision sensors. We propose a novel geometric model to describe the correlation between different views of a 3-D scene. We first approximate the camera images by sparse expansions over a dictionary of geometric atoms. Since the most important visual features are likely to be equivalently dominant in images from multiple cameras, we model the correlation between corresponding features in different views by local geometric transforms. For the particular case of omnidirectional images, we define the multiview transforms between corresponding features based on shape and epipolar geometry constraints. We apply this geometric framework in the design of a distributed coding scheme with side information, which builds an efficient representation of the scene without communication between cameras. The Wyner-Ziv encoder partitions the dictionary into cosets of dissimilar atoms with respect to shape and position in the image. The joint decoder then determines pairwise correspondences between atoms in the reference image and atoms in the cosets of the Wyner-Ziv image in order to identify the most likely atoms to decode under epipolar geometry constraints. Experiments demonstrate that the proposed method leads to reliable estimation of the geometric transforms between views. In particular, the distributed coding scheme offers similar rate-distortion performance as joint encoding at low bit rate and outperforms methods based on independent decoding of the different images.  相似文献   

4.
This paper deals with the use of the segmentation tools and principles presented in [10] and [13] for allowing content-based functionalities. In this framework, means for supervised selection of objects in the scene are proposed. In addition, a technique for object tracking in the context of segmentation-based video coding is presented. The technique is independent of the type of segmentation approach used in the coding scheme. The algorithm relies on a double partition of the image that yields spatially homogeneous regions. This double partition permits to obtain the position and shape of the previous object in the current image while computing the projected partition. In order to demonstrate the potentialities of this algorithm, it is applied in a specific coding scheme so that content-based functionalities, such as selective coding, are allowed.  相似文献   

5.
This paper presents a joint scene and signal modeling for the design of an adaptive quantization scheme applied to the wavelet coefficients in subband video coding applications. The joint modeling includes two integrated components: the scene modeling characterized by the neighborhood binding with Gibbs random field and the signal modeling characterized by the matching of the wavelet coefficient distribution. With this joint modeling, the quantization becomes adaptive to not only wavelet coefficient signal distribution but also the prominent image scene structures. The proposed quantization scheme based on the joint scene and signal modeling is accomplished through adaptive clustering with spatial neighborhood constraints. Such spatial constraint allows the quantization to shift its bit allocation, if necessary, to those perceptually more important coefficients so that the preservation of scene structure can be achieved. This joint modeling enables the quantization to reach beyond the limit of the traditional statistical signal modeling-based approaches which often lack scene adaptivity. Furthermore, the dynamically enforced spatial constraints of the Gibbs random field are able to overcome the shortcomings of the artificial block division which are usually the major source of distortion when the video is coded by block-based approaches at low bit rate. In addition, we introduce a cellular neural network architecture for the hardware implementation of this proposed adaptive quantization. We prove that this cellular neural network does converge to the desired steady state with the suggested update scheme. The adaptive quantization scheme based on the joint scene and signal modeling has been successfully applied to videoconferencing application and very favorable results have been obtained. We believe that this joint modeling-based video coding will have an impact on many other applications because it is able to simultaneously perform signal adaptive and scene adaptive quantization.  相似文献   

6.
Transferring visual prior for online object tracking   总被引:1,自引:0,他引:1  
Visual prior from generic real-world images can be learned and transferred for representing objects in a scene. Motivated by this, we propose an algorithm that transfers visual prior learned offline for online object tracking. From a collection of real-world images, we learn an overcomplete dictionary to represent visual prior. The prior knowledge of objects is generic, and the training image set does not necessarily contain any observation of the target object. During the tracking process, the learned visual prior is transferred to construct an object representation by sparse coding and multiscale max pooling. With this representation, a linear classifier is learned online to distinguish the target from the background and to account for the target and background appearance variations over time. Tracking is then carried out within a Bayesian inference framework, in which the learned classifier is used to construct the observation model and a particle filter is used to estimate the tracking result sequentially. Experiments on a variety of challenging sequences with comparisons to several state-of-the-art methods demonstrate that more robust object tracking can be achieved by transferring visual prior.  相似文献   

7.
一种基于稀疏编码的多核学习图像分类方法   总被引:2,自引:0,他引:2       下载免费PDF全文
亓晓振  王庆 《电子学报》2012,40(4):773-779
 本文提出一种基于稀疏编码的多核学习图像分类方法.传统稀疏编码方法对图像进行分类时,损失了空间信息,本文采用对图像进行空间金字塔多划分方式为特征加入空间信息限制.在利用非线性SVM方法进行图像分类时,空间金字塔的各层分别形成一个核矩阵,本文使用多核学习方法求解各个核矩阵的权重,通过核矩阵的线性组合来获取能够对整个分类集区分能力最强的核矩阵.实验结果表明了本文所提出图像分类方法的有效性和鲁棒性.对Scene Categories场景数据集可以达到83.10%的分类准确率,这是当前该数据集上能达到的最高分类准确率.  相似文献   

8.
The bag of visual words (BOW) model is an efficient image representation technique for image categorization and annotation tasks. Building good visual vocabularies, from automatically extracted image feature vectors, produces discriminative visual words, which can improve the accuracy of image categorization tasks. Most approaches that use the BOW model in categorizing images ignore useful information that can be obtained from image classes to build visual vocabularies. Moreover, most BOW models use intensity features extracted from local regions and disregard colour information, which is an important characteristic of any natural scene image. In this paper, we show that integrating visual vocabularies generated from each image category improves the BOW image representation and improves accuracy in natural scene image classification. We use a keypoint density-based weighting method to combine the BOW representation with image colour information on a spatial pyramid layout. In addition, we show that visual vocabularies generated from training images of one scene image dataset can plausibly represent another scene image dataset on the same domain. This helps in reducing time and effort needed to build new visual vocabularies. The proposed approach is evaluated over three well-known scene classification datasets with 6, 8 and 15 scene categories, respectively, using 10-fold cross-validation. The experimental results, using support vector machines with histogram intersection kernel, show that the proposed approach outperforms baseline methods such as Gist features, rgbSIFT features and different configurations of the BOW model.  相似文献   

9.
为了有效描述图像的多角度视觉内容,提出一种将图像异质局部特征集通过稀疏学习映射为图像全局稀疏表示的新方法.该方法从不同的训练特征集中学习超完备视觉词典,经过局部稀疏编码、最大值合并、加权联接及归一化等一系列处理步骤融合多种局部特征的互补信息,最终形成一个高维稀疏向量来描述图像的多角度视觉内容.将其应用于基于内容的图像检索(CBIR)任务中,实验结果表明,这种基于异质局部特征学习而来的图像全局稀疏表示解决了单一局部特征集描述图像的局限性和高维局部特征集相似性度量时空复杂度高的问题.  相似文献   

10.
针对荧光分子断层成像数据采集方式存在的问题,提出了一种基于频率调制和空间编码的成像方法,旨在改进数据采集方案,缩短数据采集时间。在该方法中,激发光束被分成若干个子束,用作多点激发光源。这些子光束首先被调制成不同的频率,然后同时入射到目标表面的不同点上。在检测端,目标的出射光首先通过空间编码掩模,然后被引导至单光电倍增管。根据压缩感知理论,改变掩模的模式,进行稀疏重构恢复,最终得到目标表面荧光信号的分布。为了验证本文所提方法的可行性,设计了相应的仿真模拟实验,实验结果表明该方法可以较好地恢复原始图像,证明该方法的可行性。  相似文献   

11.
摘 要 稀疏编码(SRC)是一种用于人脸识别的方法。该方法把检测图像表示为一组训练样本的稀疏线性组合,表示的准确性通过L2或L1残余项来衡量。此模型假定编码残余项服从高斯分布或拉普拉斯分布,实际上却不能很准确的描述编码错误率。本文提出一种新的稀疏编码方法,建立一种有约束的回归问题模型。最大似然稀疏编码(MSC)寻找此模型的最大似然估计参数,对异常情况具有很强的鲁棒性。在Yale及ORL人脸数据库的实验结果表明了该方法对于人脸模糊、光照及表情变化等的有效性及鲁棒性。  相似文献   

12.
针对稀疏表示分类器不能较好地适应多特征框架的问题,该文提出一种空间约束多特征联合稀疏编码模型,并以此实现遥感影像的自动标注。该方法利用l1,2混合范数正则化多特征编码系数,约束编码系数共享相同的稀疏模式,在保持多特征关联的同时,又不添加过于严格的约束。同时,将字典学习技术扩展到多特征框架中,通过约束字典更新的变换矩阵,解决了字典学习过程丢失多特征关联的问题。另外,针对遥感影像中的空间关系常常被忽略或者利用不充分的不足,还提出了将空间一致性与多特征联合稀疏编码相结合的分类准则,提高了标注性能。在遥感公开数据集与大尺寸卫星影像上的实验证明了该方法的有效性。  相似文献   

13.
Nonlocal means (NLM) filtering or sparse representation based denoising method has obtained a remarkable denoising performance. In order to integrate the advantages of two methods into a unified framework, we propose an image denoising algorithm through skillfully combining NLM and sparse representation technique to remove Gaussian noise mixed with random-valued impulse noise. In the non-Gaussian circumstance, we propose a customized blockwise NLM (CBNLM) filter to generate an initial denoised image. Based on it, we classify the different noisy pixels according to the three-sigma rule. Besides, an overcomplete dictionary is trained on the initial denoised image. Then, a complementary sparse coding technique is used to find the sparse vector for each input noisy patch over the overcomplete dictionary. Through solving a more reasonable variational denoising model, we can reconstruct the clean image. Experimental results verify that our proposed algorithm can obtain the best denoising performance, compared with some typical methods.  相似文献   

14.
为了提高图像在低比特率条件下的解码质量和视觉效果,该文提出一种基于稀疏分解的低比特率图像压缩编码新方法。利用二维不可分离、具有各向异性尺度的墨西哥草帽小波作为生成函数,这种函数构建的冗余字典能够有效捕获图像边缘轮廓特征。为降低原子投影系数的冗余度和减小其编码量,对图像稀疏分解投影的系数采用分段式拟合。获得的图像压缩码流具有渐进特性的,满足现代无线通信对可伸缩码流的要求。实验结果表明,在低比特率下,该文方法与JPEG2000方法和通常的稀疏分解编码法相比,有更高的峰值信噪比;解压图像无振铃效应,主观效果好。  相似文献   

15.
The emerging international standard for high efficiency video coding (HEVC) based 3D video coding (3D-HEVC) is an extension of HEVC. In the test model of 3D-HEVC, variable size motion estimation (ME) and disparity estimation (DE) are both employed to select the best coding mode for each treeblock in the encoding process. This technique achieves the highest possible coding efficiency, but it brings extremely high computational complexity which limits 3D-HEVC from practical applications. In this paper, a fast ME/DE algorithm based on inter-view and spatial correlations is proposed to reduce 3D-HEVC computational complexity. Since the multi-view videos represent the same scene with similar characteristic, there is a high correlation among the coding information from inter-view prediction. Besides, the homogeneous regions in texture video have a strong spatial correlation, and thus spatially neighboring treeblocks have similar coding information. Therefore, we can determine ME search range and skip some specific ME and DE rarely used in the previously coded view frames and spatially neighboring coding unit. Experimental results demonstrate that the proposed algorithm can significantly reduce computational complexity of 3D-HEVC encoding while maintaining almost the same rate-distortion performance.  相似文献   

16.
Recently, many researchers started to challenge a long-standing practice of digital photography: oversampling followed by compression and pursuing more intelligent sparse sampling techniques. In this paper, we propose a practical approach of uniform down sampling in image space and yet making the sampling adaptive by spatially varying, directional low-pass prefiltering. The resulting down-sampled prefiltered image remains a conventional square sample grid, and, thus, it can be compressed and transmitted without any change to current image coding standards and systems. The decoder first decompresses the low-resolution image and then upconverts it to the original resolution in a constrained least squares restoration process, using a 2-D piecewise autoregressive model and the knowledge of directional low-pass prefiltering. The proposed compression approach of collaborative adaptive down-sampling and upconversion (CADU) outperforms JPEG 2000 in PSNR measure at low to medium bit rates and achieves superior visual quality, as well. The superior low bit-rate performance of the CADU approach seems to suggest that oversampling not only wastes hardware resources and energy, and it could be counterproductive to image quality given a tight bit budget.   相似文献   

17.
在信号的稀疏表示方法中,传统的基于变换基的稀疏逼近不能自适应性地提取图像的纹理特征,而基于过完备字典的稀疏逼近算法复杂度过高.针对该问题,文章提出了一种基于小波变换稀疏字典优化的图像稀疏表示方法.该算法在图像小波变换的基础上构建图像过完备字典,利用同一场景图像的小波变换在纹理上具有内部和外部相似的属性,对过完备字典进行灰色关联度的分类,有效提高了图像表示的稀疏性.将该新算法应用于图像信号进行稀疏表示,以及基于压缩感知理论的图像采样和重建实验,结果表明新算法总体上提升了重建图像的峰值信噪比与结构相似度,并能有效缩短图像重建时间.  相似文献   

18.
机动目标的逆合成孔径雷达成像原理与算法   总被引:18,自引:0,他引:18       下载免费PDF全文
对于非合作的机动目标,由于目标相对于雷达射线的姿态和转速难以测定,而且是时变的,因而给逆合成孔径雷达(ISAR)成像造成较大困难.本文讨论了这种情况下成像的一般原理,并对机动性不太大,散射点子回波多普勒变化满足一阶近似条件时,提出了实用算法.实测数据的处理结果说明新算法是可行的.  相似文献   

19.
This paper presents a 3D structure extraction coding scheme that first computes the 3D structural properties such as 3D shape, motion, and location of objects and then codes image sequences by utilizing such 3D information. The goal is to achieve efficient and flexible coding while still avoiding the visual distortions through the use of 3D scene characteristics inherent in image sequences. To accomplish this, we present two multiframe algorithms for the robust estimation of such 3D structural properties, one from motion and one from stereo. The approach taken in these algorithms is to successively estimate 3D information from a longer sequence for a significant reduction in error. Three variations of 3D structure extraction coding are then presented — 3D motion interpolative coding, 3D motion compensation coding, and “viewpoint” compensation stereo image coding — to suggest that the approach can be viable for high-quality visual communications.  相似文献   

20.
胡超  李春国  杨绿溪 《信号处理》2021,37(7):1153-1163
为了提高人脸特征提取网络的性能,进而提高人脸识别算法的准确率,本文对基于卷积神经网络的人脸特征提取网络进行研究,提出了SFRNet (Sparse Feature Reuse Network)。首先,基于稀疏特征重用、混合特征融合、中心-高斯池化三个创新点,给出了SFRNet的网络结构。然后,在图像分类数据集ImageNet和人脸识别数据集LFW (Labeled Faces in the Wild)、MegaFace上进行实验,分别验证了SFRNet在一般场景和人脸识别这一特定场景下的特征提取能力。实验表明本文所设计的SFRNet不仅计算量和参数量小,还能有效提取到人脸特征并且在一般场景中也有较强的泛化能力。   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号