期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Image annotation by input-output structural grouping sparsity

Han Y Wu F Tian Q Zhuang Y 《IEEE transactions on image processing》2012,21(6):3066-3079

Automatic image annotation (AIA) is very important to image retrieval and image understanding. Two key issues in AIA are explored in detail in this paper, i.e., structured visual feature selection and the implementation of hierarchical correlated structures among multiple tags to boost the performance of image annotation. This paper simultaneously introduces an input and output structural grouping sparsity into a regularized regression model for image annotation. For input high-dimensional heterogeneous features such as color, texture, and shape, different kinds (groups) of features have different intrinsic discriminative power for the recognition of certain concepts. The proposed structured feature selection by structural grouping sparsity can be used not only to select group-of-features but also to conduct within-group selection. Hierarchical correlations among output labels are well represented by a tree structure, and therefore, the proposed tree-structured grouping sparsity can be used to boost the performance of multitag image annotation. In order to efficiently solve the proposed regression model, we relax the solving process as a framework of the bilayer regression model for multilabel boosting by the selection of heterogeneous features with structural grouping sparsity (Bi-MtBGS). The first-layer regression is to select the discriminative features for each label. The aim of the second-layer regression is to refine the feature selection model learned from the first layer, which can be taken as a multilabel boosting process. Extensive experiments on public benchmark image data sets and real-world image data sets demonstrate that the proposed approach has better performance of multitag image annotation and leads to a quite interpretable model for image understanding. 相似文献

2.

基于图正则化与非负组稀疏的自动图像标注

钱智明钟平王润生《电子与信息学报》2015,37(4):784-790

设计一个稳健的自动图像标注系统的重要环节是提取能够有效描述图像语义的视觉特征。由于颜色、纹理和形状等异构视觉特征在表示特定图像语义时所起作用的重要程度不同且同一类特征之间具有一定的相关性,该文提出了一种图正则化约束下的非负组稀疏(Graph Regularized Non-negative Group Sparsity, GRNGS)模型来实现图像标注,并通过一种非负矩阵分解方法来计算其模型参数。该模型结合了图正则化与l2,1-范数约束,使得标注过程中所选的组群特征能体现一定的视觉相似性和语义相关性。在Corel5K和ESP Game等图像数据集上的实验结果表明:相较于一些最新的图像标注模型,GRNGS模型的鲁棒性更强,标注结果更精确。相似文献

3.

Image annotation by semi-supervised cross-domain learning with group sparsity

Ying Yuan Fei Wu Jian Shao Yueting Zhuang 《Journal of Visual Communication and Image Representation》2013,24(2):95-102

With the explosive growth of multimedia data in the web, multi-label image annotation has been attracted more and more attention. Although the amount of available data is large and growing, the number of labeled data is quite small. This paper proposes an approach to utilize both unlabeled data in target domain and labeled data in auxiliary domain to boost the performance of image annotation. Moreover, since different kinds of heterogeneous features in images have different intrinsic discriminative power for image understanding, group sparsity is introduced in our approach to effectively utilize those heterogeneous visual features with data of target and auxiliary domains. We call this approach semi-supervised cross-domain learning with group sparsity (S²CLGS). The strength of the proposed S²CLGS method for multi-label image annotation is to integrate semi-supervised discriminant analysis, cross-domain learning and sparse coding together. Experiments demonstrate the effectiveness of S²CLGS in comparison with other image annotation algorithms. 相似文献

4.

Learning discriminative representations for multi-label image recognition

《Journal of Visual Communication and Image Representation》2022

Multi-label recognition is a fundamental, and yet is a challenging task in computer vision. Recently, deep learning models have achieved great progress towards learning discriminative features from input images. However, conventional approaches are unable to model the inter-class discrepancies among features in multi-label images, since they are designed to work for image-level feature discrimination. In this paper, we propose a unified deep network to learn discriminative features for the multi-label task. Given a multi-label image, the proposed method first disentangles features corresponding to different classes. Then, it discriminates between these classes via increasing the inter-class distance while decreasing the intra-class differences in the output space. By regularizing the whole network with the proposed loss, the performance of applying the well-known ResNet-101 is improved significantly. Extensive experiments have been performed on COCO-2014, VOC2007 and VOC2012 datasets, which demonstrate that the proposed method outperforms state-of-the-art approaches by a significant margin of 3.5% on large-scale COCO dataset. Moreover, analysis of the discriminative feature learning approach shows that it can be plugged into various types of multi-label methods as a general module. 相似文献

5.

Image classification and captioning model considering a CAM‐based disagreement loss

Yeo Chan Yoon So Young Park Soo Myoung Park Heuiseok Lim 《ETRI Journal》2020,42(1):67-77

相似文献

6.

分层结构直方图及其应用 总被引：1，自引：0，他引：1

下载免费PDF全文

余旺盛李卫华侯志强《电子学报》2017,45(11):2617-2624

针对传统直方图特征存在的特征分辨能力不强的问题,提出了一种图像分层结构直方图.该特征首先将图像按照亮度幅值的大小进行分层,然后根据预先设计的结构图元对所有的分层图进行基于结构图元的直方图统计,最后将所得的直方图进行集成得到最终的分层结构直方图.以图像匹配与视觉跟踪为例,对分层结构直方图的应用进行了大量的仿真实验,结果表明该特征较参考特征具有更强的特征分辨性能和局部描述能力.利用分层结构直方图进行图像匹配得到的相似性度量图单峰特性更加明显,能够显著降低视觉跟踪算法的跟踪误差. 相似文献

7.

Dictionary-based color image retrieval using multiset theory

D. Besiris E. Zigouris 《Journal of Visual Communication and Image Representation》2013,24(7):1155-1167

Dictionaries have recently attracted a great deal of interest as a new powerful representation scheme that can describe the visual content of an image. Most existing approaches nevertheless, neglect dictionary statistics. In this work, we explore the linguistic and statistical properties of dictionaries in an image retrieval task, representing the dictionary as a multiset. This is extracted by means of the LZW data compressor which encodes the visual patterns of an image. For this reason the image is first quantized and then transformed into a 1D string of characters. Based on the multiset notion we also introduce the Normalized Multiset Distance (NMD), as a new dictionary-based dissimilarity measure which enables the user to retrieve images with similar content to a given query. Experimental results demonstrate a significant improvement in retrieval performance compared to related dictionary-based techniques or to several other image indexing methods that utilize classical low-level image features. 相似文献

8.

基于2D-PCA特征描述的非负权重邻域嵌入人脸超分辨率重建算法

曹明明干宗良崔子冠李然朱秀昌《电子与信息学报》2015,37(4):777-783

在基于邻域嵌入人脸图像的超分辨率重建算法中,训练和重建均在特征空间进行,因此,特征选择对算法性能具有较大影响。另外,算法模型对重建权重未加限定,导致负数权重出现而产生过拟合效应,使得重建人脸图像质量衰退。考虑到人脸图像的特征选择以及权重符号限定的重要作用,该文提出一种基于2维主成分分析(2D- PCA)特征描述的非负权重邻域嵌入人脸超分辨率重建算法。首先将人脸图像分成若干子块,利用K均值聚类获得图像子块的局部视觉基元,并利用得到的局部视觉基元对图像子块分类。然后,利用2D-PCA对每一类人脸图像子块提取特征,并建立高、低分辨率样本库。最后,在重建过程中使用新的非负权重求解方法求取权重。仿真实验结果表明,相比其他基于邻域嵌入人脸超分辨率重建方法,所提算法可有效提高权重的稳定性,减少过拟合效应,其重建人脸图像具有较好的主客观质量。相似文献

9.

Semantic-aware visual attributes learning for zero-shot recognition

《Journal of Visual Communication and Image Representation》2021

Zero-shot learning (ZSL) aims to recognize unseen image classes without requiring any training samples of these specific classes. The ZSL problem is typically achieved by building up a semantic embedding space like attributes to bridge the visual features and class labels of images. Currently, most ZSL approaches focus on learning a visual-semantic alignment from seen classes using only the human-designed attributes, and then ZSL problem is solved by transferring semantic knowledge from seen classes to the unseen classes. However, few works indicate if the human-designed attributes are discriminative enough for image class prediction. To address this issue, we propose a semantic-aware dictionary learning (SADL) framework to explore these discriminative visual attributes across seen and unseen classes. Furthermore, the semantic cues are elegantly integrated into the feature representations via learned visual attributes for recognition task. Experiments conducted on two challenging benchmark datasets show that our approach outweighs other state-of-the-art ZSL methods. 相似文献

10.

Optimizing the color-to-grayscale conversion for image classification

Ali Güneş Habil Kalkan Efkan Durmuş 《Signal, Image and Video Processing》2016,10(5):853-860

In many of the computer vision applications, color-to-grayscale conversion algorithms are required to preserve the salient features of the color images, such as brightness, contrast and structure of the color image. The traditional color-to-grayscale conversion algorithms such as National Television Standards Committee (NTSC) may produce mediocre images for visual observation. However, these NTSC grayscale images are not tailored for classification purposes because the objective of NTSC is not to obtain discriminative images. For image classification problems, we present a novel color-to-grayscale conversion method based on genetic algorithm (GA). By using the GA, the color image conversion coefficients are optimized to generate more discriminative grayscale images to decrease the error in image classification problems. In order to analyze the effectiveness of the proposed method, all experimental results are compared with traditional NTSC, equal and Karhunen–Loeve-based color-to-grayscale optimization methods. It is observed that the proposed method converges to more discriminative grayscale images as compared to traditional methods. 相似文献

11.

A Multi-Directional Search technique for image annotation propagation 总被引：1，自引：0，他引：1

Ning Yu Kien A. HuaHao Cheng 《Journal of Visual Communication and Image Representation》2012,23(1):237-244

Image annotation has attracted lots of attention due to its importance in image understanding and search areas. In this paper, we propose a novel Multi-Directional Search framework for semi-automatic annotation propagation. In this system, the user interacts with the system to provide example images and the corresponding annotations during the annotation propagation process. In each iteration, the example images are clustered and the corresponding annotations are propagated separately to each cluster: images in the local neighborhood are annotated. Furthermore, some of those images are returned to the user for further annotation. As the user marks more images, the annotation process goes into multiple directions in the feature space. The query movements can be treated as multiple path navigation. Each path could be further split based on the user’s input. In this manner, the system provides accurate annotation assistance to the user - images with the same semantic meaning but different visual characteristics can be handled effectively. From comprehensive experiments on Corel and U. of Washington image databases, the proposed technique shows accuracy and efficiency on annotating image databases. 相似文献

12.

Improving retrieval framework using information gain models

Huu Ton Le Thierry Urruty Syntyche Gbèhounou François Lecellier Jean Martinet Christine Fernandez-Maloigne 《Signal, Image and Video Processing》2017,11(2):309-316

Content-based image retrieval systems are meant to retrieve the most similar images of a collection to a query image. One of the most well-known models widely applied for this task is the bag of visual words (BoVW) model. In this paper, we introduce a study of different information gain models used for the construction of a visual vocabulary. In the proposed framework, information gain models are used as a discriminative information to index image features and select the ones that have the highest information gain values. We introduce some extensions to further improve the performance of the proposed framework: mixing different vocabularies and extending the BoVW to bag of visual phrases. Exhaustive experiments show the interest of information gain models on our retrieval framework. 相似文献

13.

An improved Gamma correction model for image dehazing in a multi-exposure fusion framework

《Journal of Visual Communication and Image Representation》2021

A usual problem encountered during bad weather conditions is the degraded image quality due to haze/fog. In basic Gamma correction method there is always an uncertainty regarding the choice of a particular exponential factor, which improves the quality of the input image because of the nonlinearity involved in the process. This issue has been solved in this study by proposing a modified Gamma correction method, in which the exponential correction factor is varied incrementally to generate images. We also propose the implementation of an automatic image selection criterion for fusion which helps chose images with varied and distinct features. The implementation of the multi-exposure fusion framework is done in the hue-saturation-value color space which has close resemblance with the human vision. The intensity channel of the selected images is fused in the gradient domain which captures minute details and takes an edge as compared to other conventional fusion based methods. The fused saturation channel is obtained by averaging fusion followed by enhancement using a non-linear sigmoid function. The hue channel of the input hazy image is left unprocessed to avoid color distortion. The experimental analysis demonstrates that the proposed method outperforms most of the single image dehazing methods. 相似文献

14.

Image distance metric learning based on neighborhood sets for automatic image annotation

《Journal of Visual Communication and Image Representation》2016

Since there is semantic gap between low-level visual features and high-level image semantic, the performance of many existing content-based image annotation algorithms is not satisfactory. In order to bridge the gap and improve the image annotation performance, a novel automatic image annotation (AIA) approach using neighborhood set (NS) based on image distance metric learning (IDML) algorithm is proposed in this paper. According to IDML, we can easily obtain the neighborhood set of each image since obtained image distance can effectively measure the distance between images for AIA task. By introducing NS, the proposed AIA approach can predict all possible labels of the image without caption. The experimental results confirm that the introduction of NS based on IDML can improve the efficiency of AIA approaches and achieve better annotation performance than the existing AIA approaches. 相似文献

15.

Fusing integrated visual vocabularies-based bag of visual words and weighted colour moments on spatial pyramid layout for natural scene image classification

Yousef Alqasrawi Daniel Neagu Peter I. Cowling 《Signal, Image and Video Processing》2013,7(4):759-775

The bag of visual words (BOW) model is an efficient image representation technique for image categorization and annotation tasks. Building good visual vocabularies, from automatically extracted image feature vectors, produces discriminative visual words, which can improve the accuracy of image categorization tasks. Most approaches that use the BOW model in categorizing images ignore useful information that can be obtained from image classes to build visual vocabularies. Moreover, most BOW models use intensity features extracted from local regions and disregard colour information, which is an important characteristic of any natural scene image. In this paper, we show that integrating visual vocabularies generated from each image category improves the BOW image representation and improves accuracy in natural scene image classification. We use a keypoint density-based weighting method to combine the BOW representation with image colour information on a spatial pyramid layout. In addition, we show that visual vocabularies generated from training images of one scene image dataset can plausibly represent another scene image dataset on the same domain. This helps in reducing time and effort needed to build new visual vocabularies. The proposed approach is evaluated over three well-known scene classification datasets with 6, 8 and 15 scene categories, respectively, using 10-fold cross-validation. The experimental results, using support vector machines with histogram intersection kernel, show that the proposed approach outperforms baseline methods such as Gist features, rgbSIFT features and different configurations of the BOW model. 相似文献

16.

基于视觉与标注相关信息的图像聚类算法 总被引：1，自引：0，他引：1

下载免费PDF全文

于林森张田文《电子学报》2006,34(7):1265-1269

算法首先按视觉相关程度对标注字进行打分,标注字的分值体现了语义一致图像的视觉连贯程度.利用图像语义类别固有的语言描述性,从图像标注中抽取具有明显视觉连贯性的标注字作为图像的语义类别,减少了数据库设计者繁琐的手工编目工作.按标注字信息对图像进行语义分类,提高了图像聚类的语义一致性.对4500幅Corel标注图像的聚类结果证实了算法的有效性. 相似文献

17.

基于正交多项式变换的图像融合算法

徐杰锋李爱国覃征《微电子学与计算机》2006,23(11):93-95

将1维的OPT推广为2维OPT,在此基础上提出了一种基于正交多项式变换的图像融合算法.正交多项式变换将图像的主要特征映射到时域特征空间,而将图像的细节特征映射为白噪声.融合处理在特征空间中进行.实验结果表明,在降噪方面该算法优于离散小波变换方法、拉普拉斯金字塔方法和Morphological金字塔方法. 相似文献

18.

Similarity-based online feature selection in content-based image retrieval. 总被引：2，自引：0，他引：2

Wei Jiang Guihua Er Qionghai Dai Jinwei Gu 《IEEE transactions on image processing》2006,15(3):702-712

Content-based image retrieval (CBIR) has been more and more important in the last decade, and the gap between high-level semantic concepts and low-level visual features hinders further performance improvement. The problem of online feature selection is critical to really bridge this gap. In this paper, we investigate online feature selection in the relevance feedback learning process to improve the retrieval performance of the region-based image retrieval system. Our contributions are mainly in three areas. 1) A novel feature selection criterion is proposed, which is based on the psychological similarity between the positive and negative training sets. 2) An effective online feature selection algorithm is implemented in a boosting manner to select the most representative features for the current query concept and combine classifiers constructed over the selected features to retrieve images. 3) To apply the proposed feature selection method in region-based image retrieval systems, we propose a novel region-based representation to describe images in a uniform feature space with real-valued fuzzy features. Our system is suitable for online relevance feedback learning in CBIR by meeting the three requirements: learning with small size training set, the intrinsic asymmetry property of training samples, and the fast response requirement. Extensive experiments, including comparisons with many state-of-the-arts, show the effectiveness of our algorithm in improving the retrieval performance and saving the processing time. 相似文献

19.

Hot spot detection based on feature space representation of visual search

Hu XP Dempere-Marco L Yang GZ 《IEEE transactions on medical imaging》2003,22(9):1152-1162

This paper presents a new framework for capturing intrinsic visual search behavior of different observers in image understanding by analysing saccadic eye movements in feature space. The method is based on the information theory for identifying salient image features based on which visual search is performed. We demonstrate how to obtain feature space fixation density functions that are normalized to the image content along the scan paths. This allows a reliable identification of salient image features that can be mapped back to spatial space for highlighting regions of interest and attention selection. A two-color conjunction search experiment has been implemented to illustrate the theoretical framework of the proposed method including feature selection, hot spot detection, and back-projection. The practical value of the method is demonstrated with computed tomography image of centrilobular emphysema, and we discuss how the proposed framework can be used as a basis for decision support in medical image understanding. 相似文献

20.

MLSIM: A Multi-Level Similarity index for image quality assessment

Hu Zhang Yan Huang Xi Chen Dexiang Deng 《Signal Processing: Image Communication》2013,28(10):1464-1477

Image quality assessment (IQA) is of great importance to numerous image processing applications, and various methods have been proposed for it. In this paper, a Multi-Level Similarity (MLSIM) index for full reference IQA is proposed. The proposed metric is based on the fact that human visual system (HVS) distinguishes the quality of an image mainly according to the details given by low-level gradient information. In the proposed metric, the Prewitt operator is first utilized to get gradient information of both reference and distorted images, then the gradient information of reference image is segmented into three levels (3LSIM) or two levels (2LSIM), and the gradient information of distorted image is segmented by the corresponding regions of reference image, therefore we get multi-level information of these two images. Riesz transform is utilized to get corresponding features of different levels and the corresponding 1st-order and 2nd-order coefficients are combined together by regional mutual information (RMI) and weighted to obtain a single quality score. Experimental results demonstrate that the proposed metric is highly consistent with human subjective evaluations and achieves good performance. 相似文献