共查询到20条相似文献,搜索用时 421 毫秒
1.
2.
This paper presents a novel approach to automatic image annotation which combines global, regional, and contextual features by an extended cross-media relevance model. Unlike typical image annotation methods which use either global or regional features exclusively, as well as neglect the textual context information among the annotated words, the proposed approach incorporates the three kinds of information which are helpful to describe image semantics to annotate images by estimating their joint probability. Specifically, we describe the global features as a distribution vector of visual topics and model the textual context as a multinomial distribution. The global features provide the global distribution of visual topics over an image, while the textual context relaxes the assumption of mutual independence among annotated words which is commonly adopted in most existing methods. Both the global features and textual context are learned by a probability latent semantic analysis approach from the training data. The experiments over 5k Corel images have shown that combining these three kinds of information is beneficial in image annotation. 相似文献
3.
4.
融合语义主题的图像自动标注 总被引:7,自引:0,他引:7
由于语义鸿沟的存在,图像自动标注已成为一个重要课题.在概率潜语义分析的基础上,提出了一种融合语义主题的方法以进行图像的标注和检索.首先,为了更准确地建模训练数据,将每幅图像的视觉特征表示为一个视觉"词袋";然后设计一个概率模型分别从视觉模态和文本模态中捕获潜在语义主题,并提出一种自适应的不对称学习方法融合两种语义主题.对于每个图像文档,它在各个模态上的主题分布通过加权进行融合,而权值由该文档的视觉词分布的熵值来确定.于是,融合之后的概率模型适当地关联了视觉模态和文本模态的信息,因此能够很好地预测未知图像的语义标注.在一个通用的Corel图像数据集上,将提出的方法与几种前沿的图像标注方法进行了比较.实验结果表明,该方法具有更好的标注和检索性能. 相似文献
5.
6.
With the rapid development of location-based social networks (LBSNs), more and more media data are unceasingly uploaded by users. The asynchrony between the visual and textual information has made it extremely difficult to manage the multimodal information for manual annotation-free retrieval and personalized recommendation. Consequently the automated image semantic discovery of multimedia location-related user-generated contents (UGCs) for user experience has become mandatory. Most of the literatures leverage single-modality data or correlated multimedia data for image semantic detection. However, the intrinsically heterogeneous UGCs in LBSNs are usually independent and uncorrelated. It is hard to build correlation between textual information and visual information. In this paper, we propose a cross-domain semantic modeling method for automatic image annotation for visual information from social network platforms. First, we extract a set of hot topics from the collected textual information for image dataset preparation. Then the proposed noisy sample filtering is implemented to remove low-relevance photos. Finally, we leverage cross-domain datasets to discover the common knowledge of each semantic concept from UGCs and boost the performance of semantic annotation by semantic transfer. The comparison experiments on cross-domain datasets were conducted to demonstrate the superiority of the proposed method. 相似文献
7.
Xing Xu Atsushi Shimada Hajime Nagahara Rin-ichiro Taniguchi 《Multimedia Tools and Applications》2016,75(4):2203-2231
The goal of image annotation is to automatically assign a set of textual labels to an image to describe the visual contents thereof. Recently, with the rapid increase in the number of web images, nearest neighbor (NN) based methods have become more attractive and have shown exciting results for image annotation. One of the key challenges of these methods is to define an appropriate similarity measure between images for neighbor selection. Several distance metric learning (DML) algorithms derived from traditional image classification problems have been applied to annotation tasks. However, a fundamental limitation of applying DML to image annotation is that it learns a single global distance metric over the entire image collection and measures the distance between image pairs in the image-level. For multi-label annotation problems, it may be more reasonable to measure similarity of image pairs in the label-level. In this paper, we develop a novel label prediction scheme utilizing multiple label-specific local metrics for label-level similarity measure, and propose two different local metric learning methods in a multi-task learning (MTL) framework. Extensive experimental results on two challenging annotation datasets demonstrate that 1) utilizing multiple local distance metrics to learn label-level distances is superior to using a single global metric in label prediction, and 2) the proposed methods using the MTL framework to learn multiple local metrics simultaneously can model the commonalities of labels, thereby facilitating label prediction results to achieve state-of-the-art annotation performance. 相似文献
8.
Ruofei Zhang Zhongfei Zhang Mingjing Li Wei-Ying Ma Hong-Jiang Zhang 《Multimedia Systems》2006,12(1):27-33
This paper addresses automatic image annotation problem and its application to multi-modal image retrieval. The contribution of our work is three-fold. (1) We propose a probabilistic semantic model in which the visual features and the textual words are connected via a hidden layer which constitutes the semantic concepts to be discovered to explicitly exploit the synergy among the modalities. (2) The association of visual features and textual words is determined in a Bayesian framework such that the confidence of the association can be provided. (3) Extensive evaluation on a large-scale, visually and semantically diverse image collection crawled from Web is reported to evaluate the prototype system based on the model. In the proposed probabilistic model, a hidden concept layer which connects the visual feature and the word layer is discovered by fitting a generative model to the training image and annotation words through an Expectation-Maximization (EM) based iterative learning procedure. The evaluation of the prototype system on 17,000 images and 7736 automatically extracted annotation words from crawled Web pages for multi-modal image retrieval has indicated that the proposed semantic model and the developed Bayesian framework are superior to a state-of-the-art peer system in the literature. 相似文献
9.
This paper presents a unified annotation and retrieval framework, which integrates region annotation with image retrieval
for performance reinforcement. To integrate semantic annotation with region-based image retrieval, visual and textual fusion
is proposed for both soft matching and Bayesian probabilistic formulations. To address sample insufficiency and sample asymmetry
in the annotation classifier training phase, we present a region-level multi-label image annotation scheme based on pair-wise
coupling support vector machine (SVM) learning. In the retrieval phase, to achieve semantic-level region matching we present
a novel retrieval scheme which differs from former work: the query example uploaded by users is automatically annotated online,
and the user can judge its annotation quality. Based on the user’s judgment, two novel schemes are deployed for semantic retrieval:
(1) if the user judges the photo to be well annotated, Semantically supervised Integrated Region Matching is adopted, which is a keyword-integrated soft region matching method; (2) If the user judges the photo to be poorly annotated,
Keyword Integrated Bayesian Reasoning is adopted, which is a natural integration of a Visual Dictionary in online content-based search. In the relevance feedback phase, we conduct both visual and textual learning to capture the
user’s retrieval target. Better annotation and retrieval performance than current methods were reported on both COREL 10,000 and Flickr web image database (25,000 images), which demonstrated the effectiveness of our proposed framework. 相似文献
10.
Modeling semantic aspects for cross-media image indexing 总被引:3,自引:0,他引:3
Monay F Gatica-Perez D 《IEEE transactions on pattern analysis and machine intelligence》2007,29(10):1802-1817
11.
For many applications in graphics, design and human computer interaction, it is essential to reliably estimate the visual saliency of images. In this paper, we propose a visual saliency detection method that combines the respective merits of color saliency boosting and global region based contrast schemes to achieve more accurate saliency maps. Our method is compared with existing saliency detection methods when evaluated using four public available datasets. Experimental results show that our method consistently outperformed current state-of-the-art methods on predicting human fixations. We also demonstrate how the extracted saliency map can be used for image classification. 相似文献
12.
图像处理技术依赖于高质量的视觉显著图才能获得较好的处理结果,现有的视觉显著性检测方法通常只能检测得到粗糙的视觉显著性属性图,严重影响了图像处理的最终效果。为此,提出一种采用贝叶斯理论和统计学习的视觉显著性检测方法来检测图像的视觉显著性属性。该方法基于贝叶斯理论的静态图像的自上而下的显著性和整体显著性,将自上而下的知识和由下向上的显著性进行结合针对特征整合问题,利用线性模型的加权线性组合方法和正规化神经网络相结合的非线性加权方法来研究与所有因素相关的权值参数。根据自下而上的视觉显著性模型在两个标准数据集中采用ROC曲线来进行定量评价,结果表明非线性组合效果优于线性组合。 相似文献
13.
针对基于深度特征的图像标注模型训练复杂、时空开销大的不足,提出一种由深 度学习中间层特征表示图像视觉特征、由正例样本均值向量表示语义概念的图像标注方法。首 先,通过预训练深度学习模型的中间层直接输出卷积结果作为低层视觉特征,并采用稀疏编码 方式表示图像;然后,采用正例均值向量法为每个文本词汇构造视觉特征向量,从而构造出文 本词汇的视觉特征向量库;最后,计算测试图像与所有文本词汇的视觉特征向量相似度,并取 相似度最大的若干词汇作为标注词。多个数据集上的实验证明了所提出方法的有效性,就 F1 值而言,该方法在 IAPR TC-12 数据集上的标注性能比采用端到端深度特征的 2PKNN 和 JEC 分 别提高 32%和 60%。 相似文献
14.
15.
Automatic image annotation has become an important and challenging problem due to the existence of semantic gap. In this paper, we firstly extend probabilistic latent semantic analysis (PLSA) to model continuous quantity. In addition, corresponding Expectation-Maximization (EM) algorithm is derived to determine the model parameters. Furthermore, in order to deal with the data of different modalities in terms of their characteristics, we present a semantic annotation model which employs continuous PLSA and standard PLSA to model visual features and textual words respectively. The model learns the correlation between these two modalities by an asymmetric learning approach and then it can predict semantic annotation precisely for unseen images. Finally, we compare our approach with several state-of-the-art approaches on the Corel5k and Corel30k datasets. The experiment results show that our approach performs more effectively and accurately. 相似文献
16.
Automatic image annotation (AIA) is an effective technology to improve the performance of image retrieval. In this paper, we propose a novel AIA scheme based on hidden Markov model (HMM). Compared with the previous HMM-based annotation methods, SVM based semi-supervised learning, i.e. transductive SVM (TSVM), is triggered out for remarkably boosting the reliability of HMM with less users’ labeling effort involved (denoted by TSVM-HMM). This guarantees that the proposed TSVM-HMM based annotation scheme integrates the discriminative classification with the generative model to mutually complete their advantages. In addition, not only the relevance model between the visual content of images and the textual keywords but also the property of keyword correlation is exploited in the proposed AIA scheme. Particularly, to establish an enhanced correlation network among keywords, both co-occurrence based and WordNet based correlation techniques are well fused and are able to be helpful for benefiting from each other. The final experimental results reveal that the better annotation performance can be achieved at less labeled training images. 相似文献
17.
In this paper,we present a video coding scheme which applies the technique of visual saliency computation to adjust image fidelity before compression.To extract visually salient features,we construct a spatio-temporal saliency map by analyzing the video using a combined bottom-up and top-down visual saliency model.We then use an extended bilateral filter,in which the local intensity and spatial scales are adjusted according to visual saliency,to adaptively alter the image fidelity.Our implementation is based on the H.264 video encoder JM12.0.Besides evaluating our scheme with the H.264 reference software,we also compare it to a more traditional foreground-background segmentation-based method and a foveation-based approach which employs Gaussian blurring.Our results show that the proposed algorithm can improve the compression ratio significantly while effectively preserving perceptual visual quality. 相似文献
18.
孔超张化祥生海迪 《数据采集与处理》2017,32(2):399-407
针对视觉词典在图像表示与检索方面的应用需求,本文提出了一种基于多视觉词典与显著性加权相结合的图像检索方法,实现了图像多特征的显著性稀疏表示。该方法首先划分图像为小块,提取图像块的多种底层特征,然后将其作为输入向量,通过非负稀疏编码分别学习图像块多种特征对应的视觉词典,将得到的图
像块稀疏向量经过显著性汇总方法引入空间信息并作显著性加权处理,形成整幅图像的稀疏表示,最后采用提出的SDD距离计算方式进行图像检索。在Corel和Caltech通用图像集上进行仿真实验,与单一视觉词典的方法对比,结果表明本文方法能够有效提高图像检索的准确率。 相似文献
19.
Visual saliency detection is an important cue used in human visual system, which can offer efficient solutions for both biological and artificial vision systems. Although there are many saliency detection models that can achieve good results on public datasets, the accuracy and reliability of salient object detection models still remains a challenge. For this reason, a novel effective salient region detection model is presented in this paper. Based on the principle that a combination of global statistics and surrounding contrast saliency operators can yield even better results than just using either alone, we use a histogram-based contrast method to calculate the global saliency values in an opponent color space. At the same time, we partition the input image into a set of regions, and the regional saliency is detected by considering the color isolation with spatial information and textural distinctness simultaneously. The final saliency is obtained based on a weighted fusion of the two saliency results. The experimental results from three widely used databases validate the efficacy of the proposed method in comparison with fourteen state-of-the-art existing methods. 相似文献
20.
Annotating images by mining image search results 总被引:3,自引:0,他引:3
Xin-Jing Wang Lei Zhang Xirong Li Wei-Ying Ma 《IEEE transactions on pattern analysis and machine intelligence》2008,30(11):1919-1932