首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Cui  Zheng  Hu  Yongli  Sun  Yanfeng  Gao  Junbin  Yin  Baocai 《Multimedia Tools and Applications》2022,81(17):23615-23632

Image-text retrieval task has received a lot of attention in the modern research field of artificial intelligence. It still remains challenging since image and text are heterogeneous cross-modal data. The key issue of image-text retrieval is how to learn a common feature space while semantic correspondence between image and text remains. Existing works cannot gain fine cross-modal feature representation because the semantic relation between local features is not effectively utilized and the noise information is not suppressed. In order to address these issues, we propose a Cross-modal Alignment with Graph Reasoning (CAGR) model, in which the refined cross-modal features in the common feature space are learned and then a fine-grained cross-modal alignment method is implemented. Specifically, we introduce a graph reasoning module to explore semantic connection for local elements in each modality and measure their importance by self-attention mechanism. In a multi-step reasoning manner, the visual semantic graph and textual semantic graph can be effectively learned and the refined visual and textual features can be obtained. Finally, to measure the similarity between image and text, a novel alignment approach named cross-modal attentional fine-grained alignment is used to compute similarity score between two sets of features. Our model achieves the competitive performance compared with the state-of-the-art methods on Flickr30K dataset and MS-COCO dataset. Extensive experiments demonstrate the effectiveness of our model.

  相似文献   

2.
3.
莫宏伟  田朋 《控制与决策》2021,36(12):2881-2890
视觉场景理解包括检测和识别物体、推理被检测物体之间的视觉关系以及使用语句描述图像区域.为了实现对场景图像更全面、更准确的理解,将物体检测、视觉关系检测和图像描述视为场景理解中3种不同语义层次的视觉任务,提出一种基于多层语义特征的图像理解模型,并将这3种不同语义层进行相互连接以共同解决场景理解任务.该模型通过一个信息传递图将物体、关系短语和图像描述的语义特征同时进行迭代和更新,更新后的语义特征被用于分类物体和视觉关系、生成场景图和描述,并引入融合注意力机制以提升描述的准确性.在视觉基因组和COCO数据集上的实验结果表明,所提出的方法在场景图生成和图像描述任务上拥有比现有方法更好的性能.  相似文献   

4.
Tian  Peng  Mo  Hongwei  Jiang  Laihao 《Applied Intelligence》2021,51(11):7781-7793

Understanding scene image includes detecting and recognizing objects, estimating the interaction relationships of the detected objects, and describing image regions with sentences. However, since the complexity and variety of scene image, existing methods take object detection or vision relationship estimate as the research targets in scene understanding, and the obtained results are not satisfactory. In this work, we propose a Multi-level Semantic Tasks Generation Network (MSTG) to leverage mutual connections across object detection, visual relationship detection and image captioning, to solve jointly and improve the accuracy of the three vision tasks and achieve the more comprehensive and accurate understanding of scene image. The model uses a message pass graph to mutual connections and iterative updates across the different semantic features to improve the accuracy of scene graph generation, and introduces a fused attention mechanism to improve the accuracy of image captioning while using the mutual connections and refines of different semantic features to improve the accuracy of object detection and scene graph generation. Experiments on Visual Genome and COCO datasets indicate that the proposed method can jointly learn the three vision tasks to improve the accuracy of those visual tasks generation.

  相似文献   

5.
基于内容的图象检索中的语义处理方法   总被引:8,自引:4,他引:4       下载免费PDF全文
基于内容的图象检索系统,其目标是最大限度地减小图象简单视觉特征与用户检索丰富语义之间的“语义鸿沟”,因此图象语义处理则成为基于内容的图象检索进一步发展的关键。为了使人们对基于内容的图象检索中的语义处理方法有个概略了解,首先从图象语义模型和图象语义提取方法这两个方面对利用语义进行图象检索的研究状况进行了总结,并将图象语义模型概括为图象语义知识、图象语义层次模型和语义抽取模型等3个主要组成部分;然后将图象语义提取方法分为用户交互、将查询请求作为语义模板、对象及其空间关系、场景和行为语义及情感语义等类别,同时对其中有代表性的方法进行了详细的分析,还指出了其局限性;最后从对象建模和识别、语义抽取规则和用户检索模型3个方面,阐明了实现图象语义处理所面临的问题,并提出了一些初步的解决思路。  相似文献   

6.
为解决细粒度图像分类中不相关背景信息干扰以及子类别差异特征难以提取等问题,提出了一种结合前景特征增强和区域掩码自注意力的细粒度图像分类方法。首先,利用ResNet50提取输入图片的全局特征;然后通过前景特征增强网络定位前景目标在输入图片中的位置,在消除背景信息干扰的同时对前景目标进行特征增强,有效突出前景物体;最后,将特征增强的前景目标通过区域掩码自注意力网络学习丰富、多样化且区别于其他子类的特征信息。在训练模型的整个过程,建立多分支损失函数约束特征学习。实验表明,该模型在细粒度图像数据集CUB-200-2011、Stanford Cars和FGVC-Aircraft的准确率分别达到了88.0%、95.3%和93.6%,优于其他主流方法。  相似文献   

7.
This paper presents a new visual aggregation model for representing visual information about moving objects in video data. Based on available automatic scene segmentation and object tracking algorithms, the proposed model provides eight operations to calculate object motions at various levels of semantic granularity. It represents trajectory, color and dimensions of a single moving object and the directional and topological relations among multiple objects over a time interval. Each representation of a motion can be normalized to improve computational cost and storage utilization. To facilitate query processing, there are two optimal approximate matching algorithms designed to match time-series visual features of moving objects. Experimental results indicate that the proposed algorithms outperform the conventional subsequence matching methods substantially in the similarity between the two trajectories. Finally, the visual aggregation model is integrated into a relational database system and a prototype content-based video retrieval system has been implemented as well.  相似文献   

8.
为了从海量的道路交通图像中检索出违反交通法规的图像,提出了一种特定目标自识别的语义图像检索方法。首先,通过交通领域专家建立交通领域本体及道路交通规则描述;然后,通过卷积神经网络(CNN)对交通图像的特征进行提取,并结合改进的支持向量机决策树(SVM-DT)算法对图像特征进行分类的策略,对交通图像中的特定目标及目标间空间位置关系进行自动识别,并映射成为相应的本体实例及其对象之间的关联关系(规则实例);最后,利用本体实例和规则实例,通过推理得到语义检索结果。实验结果表明,相比关键字和本体交通图像语义检索方法,所提方法具有更高的准确率、召回率和检索效率。  相似文献   

9.
In order to improve the retrieval accuracy of content-based image retrieval systems, research focus has been shifted from designing sophisticated low-level feature extraction algorithms to reducing the ‘semantic gap’ between the visual features and the richness of human semantics. This paper attempts to provide a comprehensive survey of the recent technical achievements in high-level semantic-based image retrieval. Major recent publications are included in this survey covering different aspects of the research in this area, including low-level image feature extraction, similarity measurement, and deriving high-level semantic features. We identify five major categories of the state-of-the-art techniques in narrowing down the ‘semantic gap’: (1) using object ontology to define high-level concepts; (2) using machine learning methods to associate low-level features with query concepts; (3) using relevance feedback to learn users’ intention; (4) generating semantic template to support high-level image retrieval; (5) fusing the evidences from HTML text and the visual content of images for WWW image retrieval. In addition, some other related issues such as image test bed and retrieval performance evaluation are also discussed. Finally, based on existing technology and the demand from real-world applications, a few promising future research directions are suggested.  相似文献   

10.
Finding an object inside a target image by querying multimedia data is desirable, but remains a challenge. The effectiveness of region-based representation for content-based image retrieval is extensively studied in the literature. One common weakness of region-based approaches is that perform detection using low level visual features within the region and the homogeneous image regions have little correspondence to the semantic objects. Thus, the retrieval results are often far from satisfactory. In addition, the performance is significantly affected by consistency in the segmented regions of the target object from the query and database images. Instead of solving these problems independently, this paper proposes region-based object retrieval using the generalized Hough transform (GHT) and adaptive image segmentation. The proposed approach has two phases. First, a learning phase identifies and stores stable parameters for segmenting each database image. In the retrieval phase, the adaptive image segmentation process is also performed to segment a query image into regions for retrieving visual objects inside database images through the GHT with a modified voting scheme to locate the target visual object under a certain affine transformation. The learned parameters make the segmentation results of query and database images more stable and consistent. Computer simulation results show that the proposed method gives good performance in terms of retrieval accuracy, robustness, and execution speed.  相似文献   

11.
在计算机视觉领域中,语义分割是场景解析和行为识别的关键任务,基于深度卷积神经网络的图像语义分割方法已经取得突破性进展。语义分割的任务是对图像中的每一个像素分配所属的类别标签,属于像素级的图像理解。目标检测仅定位目标的边界框,而语义分割需要分割出图像中的目标。本文首先分析和描述了语义分割领域存在的困难和挑战,介绍了语义分割算法性能评价的常用数据集和客观评测指标。然后,归纳和总结了现阶段主流的基于深度卷积神经网络的图像语义分割方法的国内外研究现状,依据网络训练是否需要像素级的标注图像,将现有方法分为基于监督学习的语义分割和基于弱监督学习的语义分割两类,详细阐述并分析这两类方法各自的优势和不足。本文在PASCAL VOC(pattern analysis, statistical modelling and computational learning visual object classes)2012数据集上比较了部分监督学习和弱监督学习的语义分割模型,并给出了监督学习模型和弱监督学习模型中的最优方法,以及对应的MIoU(mean intersection-over-union)。最后,指出了图像语义分割领域未来可能的热点方向。  相似文献   

12.
显著目标检测是指通过引入人类视觉注意力机制,使计算机能检测视觉场景中人们最感兴趣的区域或对象.针对显著性目标检测中存在检测边缘不清晰、检测目标不完整及小目标漏检的问题,文中提出基于渐进式嵌套特征的融合网络.网络采用渐进式压缩模块,将较深层特征不断向下传递融合,在降低模型参数量的同时也充分利用高级语义信息.先设计加权特征融合模块,将编码器的多尺度特征聚合成可访问高级信息和低级信息的特征图.再将聚合的特征分配到其它层,充分获取图像上下文信息及关注图像中的小目标对象.同时引入非对称卷积模块,进一步提高检测准确性.在6个公开数据集上的实验表明文中网络取得较优的检测效果.  相似文献   

13.
The ‘semantic gap’ is the main challenge in content-based image retrieval. To overcome this challenge, we propose a semantic-based model to retrieve images efficiently with considering user-interested concepts. In this model, an interactive image segmentation algorithm is carried out on the query image to extract the user-interested regions. To recognize the image objects from regions, a neural network classifier is used in this model. In order to have a general-purpose system, no priori assumptions should be made regarding the nature of images in extracting features. So a large number of features should be extracted from all aspect of the image. The high dimensional feature space, not only increases the complexity and required memory, but also may reduce the efficiency and accuracy. Hence, the ant colony optimization algorithm is employed to eliminate irrelevant and redundant features. To find the most similar images to the query image, the similarity between images is measured based on their semantic objects which are defined according to a predefined ontology.  相似文献   

14.
基于目标的图像标注一直是图像处理和计算机视觉领域中一个重要的研究问题.图像目标的多尺度性、多形变性使得图像标注十分困难.目标分割和目标识别是目标图像标注任务中两大关键问题.本文提出一种基于形式概念分析(Formal concept analysis, FCA)和语义关联规则的目标图像标注方法, 针对目标建议算法生成图像块中存在的高度重叠问题, 借鉴形式概念分析中概念格的思想, 按照图像块的共性将其归成几个图像簇挖掘图像类别模式, 利用类别概率分布判决和平坦度判决分别去除目标噪声块和背景噪声块, 最终得到目标语义簇; 针对语义目标判别问题, 首先对有效图像簇进行特征融合形成共性特征描述, 通过分类器进行类别判决, 生成初始目标图像标注, 然后利用图像语义标注词挖掘语义关联规则, 进行图像标注的语义补充, 以避免挖掘类别模式时丢失较小的语义目标.实验表明, 本文提出的图像标注算法既能保证语义标注的准确性, 又能保证语义标注的完整性, 具有较好的图像标注性能.  相似文献   

15.
16.
基于特征融合的图像情感语义分类   总被引:1,自引:0,他引:1  
基于颜色或颜色-空间信息的图像分类方法,由于没有考虑图像中所含目标对象的形状特征,分类效果不够理想,以服装图像作为数据源,提出并设计了颜色-边缘方向角二维直方图,将图像的颜色特征与形状特征融合起来进行图像分类。图像中的低阶可视化特征与高阶情感概念之间有着密切的关联,分析了服装图像的颜色和形状的融合特征与情感之间的相关性,采用概率神经网络作为分类算法来完成情感语义分类,实验结果表明,该方法的分类精度有了明显的提高。  相似文献   

17.
目的 场景图能够简洁且结构化地描述图像。现有场景图生成方法重点关注图像的视觉特征,忽视了数据集中丰富的语义信息。同时,受到数据集长尾分布的影响,大多数方法不能很好地对出现概率较小的三元组进行推理,而是趋于得到高频三元组。另外,现有大多数方法都采用相同的网络结构来推理目标和关系类别,不具有针对性。为了解决上述问题,本文提出一种提取全局语义信息的场景图生成算法。方法 网络由语义编码、特征编码、目标推断以及关系推理等4个模块组成。语义编码模块从图像区域描述中提取语义信息并计算全局统计知识,融合得到鲁棒的全局语义信息来辅助不常见三元组的推理。目标编码模块提取图像的视觉特征。目标推断和关系推理模块采用不同的特征融合方法,分别利用门控图神经网络和门控循环单元进行特征学习。在此基础上,在全局统计知识的辅助下进行目标类别和关系类别推理。最后利用解析器构造场景图,进而结构化地描述图像。结果 在公开的视觉基因组数据集上与其他10种方法进行比较,分别实现关系分类、场景图元素分类和场景图生成这3个任务,在限制和不限制每对目标只有一种关系的条件下,平均召回率分别达到了44.2%和55.3%。在可视化实验中,相比性能第2的方法,本文方法增强了不常见关系类别的推理能力,同时改善了目标类别与常见关系的推理能力。结论 本文算法能够提高不常见三元组的推理能力,同时对于常见的三元组也具有较好的推理能力,能够有效地生成场景图。  相似文献   

18.
视觉问答是一项计算机视觉与自然语言处理相结合的任务,需要理解图中的场景,特别是不同目标对象之间的交互关系。近年来,关于视觉问答的研究有了很大的进展,但传统方法采用整体特征表示,很大程度上忽略了所给图像的结构,无法有效锁定场景中的目标。而图网络依靠高层次图像表示,能捕获语义和空间关系,但以往利用图网络的视觉问答方法忽略了关系与问题间的关联在解答过程中的作用。据此提出基于同等注意力图网络的视觉问答模型EAGN,通过同等注意力机制赋予关系边与目标节点同等的重要性,两者结合使回答问题的依据更加充分。通过实验得出,相比于其他相关方法,EAGN模型性能优异且更具有竞争力,也为后续的相关研究提供了基础。  相似文献   

19.

In recent years, image scene classification based on low/high-level features has been considered as one of the most important and challenging problems faced in image processing research. The high-level features based on semantic concepts present a more accurate and closer model to the human perception of the image scene content. This paper presents a new multi-stage approach for image scene classification based on high-level semantic features extracted from image content. In the first stage, the object boundaries and their labels that represent the content are extracted. For this purpose, a combined method of a fully convolutional deep network and a combined network of a two-class SVM-fuzzy and SVR are used. Topic modeling is used to represent the latent relationships between the objects. Hence in the second stage, a new combination of methods consisting of the bag of visual words, and supervised document neural autoregressive distribution estimator is used to extract the latent topics (topic modeling) in the image. Finally, classification based on Bayesian method is performed according to the extracted features of the deep network, objects labels and the latent topics in the image. The proposed method has been evaluated on three datasets: Scene15, UIUC Sports, and MIT-67 Indoor. The experimental results show that the proposed approach achieves average performance improvement of 12%, 11% and 14% in the accuracy of object detection, and 0.5%, 0.6% and 1.8% in the mean average precision criteria of the image scene classification, compared to the previous state-of-the-art methods on these three datasets.

  相似文献   

20.
目的 小样本学习旨在通过一幅或几幅图像来学习全新的类别。目前许多小样本学习方法基于图像的全局表征,可以很好地实现常规小样本图像分类任务。但是,细粒度图像分类需要依赖局部的图像特征,而基于全局表征的方法无法有效地获取图像的局部特征,导致很多小样本学习方法不能很好地处理细粒度小样本图像分类问题。为此,提出一种融合弱监督目标定位的细粒度小样本学习方法。方法 在数据量有限的情况下,目标定位是一个有效的方法,能直接提供最具区分性的区域。受此启发,提出了一个基于自注意力的互补定位模块来实现弱监督目标定位,生成筛选掩膜进行特征描述子的筛选。基于筛选的特征描述子,设计了一种语义对齐距离来度量图像最具区分性区域的相关性,进而完成细粒度小样本图像分类。结果 在miniImageNet数据集上,本文方法在1-shot和5-shot下的分类精度相较性能第2的方法高出0.56%和5.02%。在细粒度数据集Stanford Dogs和Stanford Cars数据集上,本文方法在1-shot和5-shot下的分类精度相较性能第2的方法分别提高了4.18%,7.49%和16.13,5.17%。在CUB 200-2011(Caltech-UCSD birds)数据集中,本文方法在5-shot下的分类精度相较性能第2的方法提升了1.82%。泛化性实验也显示出本文方法可以更好地同时处理常规小样本学习和细粒度小样本学习。此外,可视化结果显示出所提出的弱监督目标定位模块可以更完整地定位出目标。结论 融合弱监督目标定位的细粒度小样本学习方法显著提高了细粒度小样本图像分类的性能,而且可以同时处理常规的和细粒度的小样本图像分类。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号