首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
目的 场景图能够简洁且结构化地描述图像。现有场景图生成方法重点关注图像的视觉特征,忽视了数据集中丰富的语义信息。同时,受到数据集长尾分布的影响,大多数方法不能很好地对出现概率较小的三元组进行推理,而是趋于得到高频三元组。另外,现有大多数方法都采用相同的网络结构来推理目标和关系类别,不具有针对性。为了解决上述问题,本文提出一种提取全局语义信息的场景图生成算法。方法 网络由语义编码、特征编码、目标推断以及关系推理等4个模块组成。语义编码模块从图像区域描述中提取语义信息并计算全局统计知识,融合得到鲁棒的全局语义信息来辅助不常见三元组的推理。目标编码模块提取图像的视觉特征。目标推断和关系推理模块采用不同的特征融合方法,分别利用门控图神经网络和门控循环单元进行特征学习。在此基础上,在全局统计知识的辅助下进行目标类别和关系类别推理。最后利用解析器构造场景图,进而结构化地描述图像。结果 在公开的视觉基因组数据集上与其他10种方法进行比较,分别实现关系分类、场景图元素分类和场景图生成这3个任务,在限制和不限制每对目标只有一种关系的条件下,平均召回率分别达到了44.2%和55.3%。在可视化实验中,相比性能第2的方法,本文方法增强了不常见关系类别的推理能力,同时改善了目标类别与常见关系的推理能力。结论 本文算法能够提高不常见三元组的推理能力,同时对于常见的三元组也具有较好的推理能力,能够有效地生成场景图。  相似文献   

2.
姚伟凡  马力 《计算机应用研究》2021,38(7):2091-2095,2102
知识图谱补全旨在预测三元组中缺失的部分使知识图谱趋于完整.针对基于神经网络等模型的链接预测方法忽略了实体间的关联信息,导致模型不能覆盖三元组周围局部邻域中固有的隐藏信息,提出图注意力机制与谓词感知结合的方法.首先,利用图注意力机制定义了一个关系嵌入矩阵,描述任意给定实体邻域内实体间的关系;其次,引入谓词增强实体间语义理解程度,构造了基于谓词嵌入向量的注意力值计算公式,以便有效地度量实体间语义联系的强度;此外,利用实体邻居间的边关系预测多跳实体间的直接关系以补全知识图谱.在数据集WN18RR、Kinship、FB15K的实验结果表明了该方法能有效提高三元组的预测精度.  相似文献   

3.
We present a new approach aimed at understanding the structure of connections in edge‐bundling layouts. We combine the advantages of edge bundles with a bundle‐centric simplified visual representation of a graph's structure. For this, we first compute a hierarchical edge clustering of a given graph layout which groups similar edges together. Next, we render clusters at a user‐selected level of detail using a new image‐based technique that combines distance‐based splatting and shape skeletonization. The overall result displays a given graph as a small set of overlapping shaded edge bundles. Luminance, saturation, hue, and shading encode edge density, edge types, and edge similarity. Finally, we add brushing and a new type of semantic lens to help navigation where local structures overlap. We illustrate the proposed method on several real‐world graph datasets.  相似文献   

4.
目的 图表问答是计算机视觉多模态学习的一项重要研究任务,传统关系网络(relation network,RN)模型简单的两两配对方法可以包含所有像素之间的关系,因此取得了不错的结果,但此方法不仅包含冗余信息,而且平方式增长的关系配对的特征数量会给后续的推理网络在计算量和参数量上带来很大的负担。针对这个问题,提出了一种基于融合语义特征提取的引导性权重驱动的重定位关系网络模型来改善不足。方法 首先通过融合场景任务的低级和高级图像特征来提取更丰富的统计图语义信息,同时提出了一种基于注意力机制的文本编码器,实现融合语义的特征提取,然后对引导性权重进行排序进一步重构图像的位置,从而构建了重定位的关系网络模型。结果 在2个数据集上进行实验比较,在FigureQA(an annotated figure dataset for visual reasoning)数据集中,相较于IMG+QUES(image+questions)、RN和ARN(appearance and relation networks),本文方法的整体准确率分别提升了26.4%,8.1%,0.46%,在单一验证集上,相较于LEA...  相似文献   

5.
莫宏伟  田朋 《控制与决策》2021,36(12):2881-2890
视觉场景理解包括检测和识别物体、推理被检测物体之间的视觉关系以及使用语句描述图像区域.为了实现对场景图像更全面、更准确的理解,将物体检测、视觉关系检测和图像描述视为场景理解中3种不同语义层次的视觉任务,提出一种基于多层语义特征的图像理解模型,并将这3种不同语义层进行相互连接以共同解决场景理解任务.该模型通过一个信息传递图将物体、关系短语和图像描述的语义特征同时进行迭代和更新,更新后的语义特征被用于分类物体和视觉关系、生成场景图和描述,并引入融合注意力机制以提升描述的准确性.在视觉基因组和COCO数据集上的实验结果表明,所提出的方法在场景图生成和图像描述任务上拥有比现有方法更好的性能.  相似文献   

6.
Distributed scene graphs are important in virtual reality, both in collaborative virtual environments and in cluster rendering. Modern scalable visualization systems have high local throughput, but collaborative virtual environments (VEs) over a wide‐area network (WAN) share data at much lower rates. This complicates the use of one scene graph across the whole application. Myriad is an extension of the Syzygy VR toolkit in which individual scene graphs form a peer‐to‐peer network. Myriad connections filter scene graph updates and create flexible relationships between nodes of the scene graph. Myriad's sharing is fine‐grained: the properties of individual scene graph nodes to share are dynamically specified (in C++ or Python). Myriad permits transient inconsistency, relaxing resource requirements in collaborative VEs. A test application, WorldWideCrowd, demonstrates collaborative prototyping of a 300‐avatar crowd animation viewed on two PC‐cluster displays and edited on low‐powered laptops, desktops, and over a WAN. We have further used our framework to facilitate collaborative educational experiences and as a vehicle for undergraduates to experiment with shared virtual worlds. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

7.
目的 目前文本到图像的生成模型仅在具有单个对象的图像数据集上表现良好,当一幅图像涉及多个对象和关系时,生成的图像就会变得混乱。已有的解决方案是将文本描述转换为更能表示图像中场景关系的场景图结构,然后利用场景图生成图像,但是现有的场景图到图像的生成模型最终生成的图像不够清晰,对象细节不足。为此,提出一种基于图注意力网络的场景图到图像的生成模型,生成更高质量的图像。方法 模型由提取场景图特征的图注意力网络、合成场景布局的对象布局网络、将场景布局转换为生成图像的级联细化网络以及提高生成图像质量的鉴别器网络组成。图注意力网络将得到的具有更强表达能力的输出对象特征向量传递给改进的对象布局网络,合成更接近真实标签的场景布局。同时,提出使用特征匹配的方式计算图像损失,使得最终生成图像与真实图像在语义上更加相似。结果 通过在包含多个对象的COCO-Stuff图像数据集中训练模型生成64×64像素的图像,本文模型可以生成包含多个对象和关系的复杂场景图像,且生成图像的Inception Score为7.8左右,与原有的场景图到图像生成模型相比提高了0.5。结论 本文提出的基于图注意力网络的场景图到图像生成模型不仅可以生成包含多个对象和关系的复杂场景图像,而且生成图像质量更高,细节更清晰。  相似文献   

8.
深度学习作为人工智能的一个研究分支发展迅速,而研究数据主要是语音、图像和视频等,这些具有规则结构的数据通常在欧氏空间中表示。然而许多学习任务需要处理的数据是从非欧氏空间中生成,这些数据特征和其关系结构可以用图来定义。图卷积神经网络通过将卷积定理应用于图,完成节点之间的信息传播与聚合,成为建模图数据一种有效的方法。尽管图卷积神经网络取得了巨大成功,但针对图任务中的节点分类问题,由于深层图结构优化的特有难点——过平滑现象,现有的多数模型都只有两三层的浅层模型架构。在理论上,图卷积神经网络的深层结构可以获得更多节点表征信息,因此针对其层级信息进行研究,将层级结构算法迁移到图数据分析的核心在于图层级卷积算子构建和图层级间信息融合。本文对图网络层级信息挖掘算法进行综述,介绍图神经网络的发展背景、存在问题以及图卷积神经网络层级结构算法的发展,根据不同图卷积层级信息处理将现有算法分为正则化方法和架构调整方法。正则化方法通过重新构建图卷积算子更好地聚合邻域信息,而架构调整方法则融合层级信息丰富节点表征。图卷积神经网络层级特性实验表明,图结构中存在层级特性节点,现有图层级信息挖掘算法仍未对层级特性节点的图信息进行完全探索。最后,总结了图卷积神经网络层级信息挖掘模型的主要应用领域,并从计算效率、大规模数据、动态图和应用场景等方面提出进一步研究的方向。  相似文献   

9.
知识图谱采用RDF三元组的形式描述现实世界中的关系和头、尾实体,即(头实体,关系,尾实体)或(主语,谓语,宾语)。为补全知识图谱中缺失的事实三元组,将四元数融入胶囊神经网络模型预测缺失的知识,并构建一种新的知识图谱补全模型。采用超复数嵌入取代传统的实值嵌入来编码三元组结构信息,以尽可能全面捕获三元组全局特性,将实体、关系的四元数嵌入作为胶囊网络的输入,四元数结合优化的胶囊网络模型可以有效补全知识图谱中丢失的三元组,提高预测精度。链接预测实验结果表明,与CapsE模型相比,在数据集WN18RR中,该知识图谱补全模型的Hit@10与正确实体的倒数平均排名分别提高3.2个百分点和5.5%,在数据集FB15K-237中,Hit@10与正确实体的倒数平均排名分别提高2.5个百分点和4.4%,能够有效预测知识图谱中缺失的事实三元组。  相似文献   

10.
目的 现有视觉问答模型的研究主要从注意力机制和多模态融合角度出发,未能对图像场景中对象之间的语义联系显式建模,且较少突出对象的空间位置关系,导致空间关系推理能力欠佳。对此,本文针对需要空间关系推理的视觉问答问题,提出利用视觉对象之间空间关系属性结构化建模图像,构建问题引导的空间关系图推理视觉问答模型。方法 利用显著性注意力,用Faster R-CNN (region-based convolutional neural network)提取图像中显著的视觉对象和视觉特征;对图像中的视觉对象及其空间关系结构化建模为空间关系图;利用问题引导的聚焦式注意力进行基于问题的空间关系推理。聚焦式注意力分为节点注意力和边注意力,分别用于发现与问题相关的视觉对象和空间关系;利用节点注意力和边注意力权重构造门控图推理网络,通过门控图推理网络的信息传递机制和控制特征信息的聚合,获得节点的深度交互信息,学习得到具有空间感知的视觉特征表示,达到基于问题的空间关系推理;将具有空间关系感知的图像特征和问题特征进行多模态融合,预测出正确答案。结果 模型在VQA (visual question answering) v2数据集上进行训练、验证和测试。实验结果表明,本文模型相比于Prior、Language only、MCB (multimodal compact bilinear)、ReasonNet和Bottom-Up等模型,在各项准确率方面有明显提升。相比于ReasonNet模型,本文模型总体的回答准确率提升2.73%,是否问题准确率提升4.41%,计数问题准确率提升5.37%,其他问题准确率提升0.65%。本文还进行了消融实验,验证了方法的有效性。结论 提出的问题引导的空间关系图推理视觉问答模型能够较好地将问题文本信息和图像目标区域及对象关系进行匹配,特别是对于需要空间关系推理的问题,模型展现出较强的推理能力。  相似文献   

11.
Tian  Peng  Mo  Hongwei  Jiang  Laihao 《Applied Intelligence》2021,51(11):7781-7793

Understanding scene image includes detecting and recognizing objects, estimating the interaction relationships of the detected objects, and describing image regions with sentences. However, since the complexity and variety of scene image, existing methods take object detection or vision relationship estimate as the research targets in scene understanding, and the obtained results are not satisfactory. In this work, we propose a Multi-level Semantic Tasks Generation Network (MSTG) to leverage mutual connections across object detection, visual relationship detection and image captioning, to solve jointly and improve the accuracy of the three vision tasks and achieve the more comprehensive and accurate understanding of scene image. The model uses a message pass graph to mutual connections and iterative updates across the different semantic features to improve the accuracy of scene graph generation, and introduces a fused attention mechanism to improve the accuracy of image captioning while using the mutual connections and refines of different semantic features to improve the accuracy of object detection and scene graph generation. Experiments on Visual Genome and COCO datasets indicate that the proposed method can jointly learn the three vision tasks to improve the accuracy of those visual tasks generation.

  相似文献   

12.
自动的室内家具摆放在家居设计、动态场景生成等应用中具有显著的意义.传统算法往往通过显式的空间、语义和功能性上物体之间的关系来理解场景的内部结构,并进一步辅助室内场景的生成.随着大规模室内场景数据集的出现,提出将零散的输入家具编码进图结构,并利用图神经网络中迭代的消息传递隐式地学习场景的分布先验.为了满足家具摆放的多样性,提出将图神经网络融合进条件式变分自编码器.通过一个编码器将输入场景嵌入到一个符合高斯分布的隐变量,并通过一个生成器将从隐变量采样的场景先验用于条件式的新场景生成.在Fu-floor数据集上的实验结果表明,与基准算法相比,该算法在生成结果的评价指标最小匹配距离上表现更优.该算法对于未来实现场景补全、基于场景图的室内家具摆放等实际应用也具有显式的意义和价值.  相似文献   

13.
从图像中挖掘人物间的社会关系在刑侦、隐私防护等领域有重要的作用。现有的图建模方法通过创建人际关系图或构建知识图谱来学习人物关系,取得了良好的效果。但基于图卷积神经网络(GCN)的方法一定程度上忽略了不同特征对特定关系的不同程度的重要性。针对上述问题,提出了一种基于图注意力的双分支社会关系识别模型(GAT-DBSR),第一个分支提取人物区域以及图像全局特征作为节点,核心是通过图注意力网络和门控机制去更新这些节点以学习人物关系的特征表示。第二个分支通过卷积神经网络提取场景特征来增强对人物关系的识别。最终对两个分支的特征进行融合并分类得到所有的社会关系。该模型在PISC数据集的细粒度关系识别任务上的mAP达到了74.4%,相比基线模型提高了1.2%。在PIPA数据集上的关系识别准确率也有一定的提升。实验结果表明了该模型具有更优越的效果。  相似文献   

14.
15.
VLSI technology has recently received increasing attention due to its high performance and high reliability. Designing a VLSI structure systematically for a given task becomes a very important problem to many computer engineers. In this paper, we present a method to transform a recursive computation task into a VLSI structure systematically. The main advantages of this approach are its simplicity and completeness. Several examples, such as vector inner product, matrix multiplication, convolution, comparison operations in relational database and fast Fourier transformation (FFT), are given to demonstrate the transformation procedure. Finally, we apply the proposed method to hierarchical scene matching. Scene matching refers to the process of locating or matching a region of an image with a corresponding region of another view of the same image taken from a different viewing angle or at a different time. We first present a constant threshold estimation for hierarchical scene matching. The VLSI implementation of the hierarchical scene matching is then described in detail.  相似文献   

16.
The need to find related images from big data streams is shared by many professionals, such as architects, engineers, designers, journalist, and ordinary people. Users need to quickly find the relevant images from data streams generated from a variety of domains. The challenges in image retrieval are widely recognized, and the research aiming to address them led to the area of content‐based image retrieval becoming a “hot” area. In this paper, we propose a novel computationally efficient approach, which provides a high visual quality result based on the use of local recursive density estimation between a given query image of interest and data clouds/clusters which have hierarchical dynamically nested evolving structure. The proposed approach makes use of a combination of multiple features. The results on a data set of 65,000 images organized in two layers of a hierarchy demonstrate its computational efficiency. Moreover, the proposed Look‐a‐like approach is self‐evolving and updating adding new images by crawling and from the queries made.  相似文献   

17.
Recent neural style transfer frameworks have obtained astonishing visual quality and flexibility in Single‐style Transfer (SST), but little attention has been paid to Multi‐style Transfer (MST) which refers to simultaneously transferring multiple styles to the same image. Compared to SST, MST has the potential to create more diverse and visually pleasing stylization results. In this paper, we propose the first MST framework to automatically incorporate multiple styles into one result based on regional semantics. We first improve the existing SST backbone network by introducing a novel multi‐level feature fusion module and a patch attention module to achieve better semantic correspondences and preserve richer style details. For MST, we designed a conceptually simple yet effective region‐based style fusion module to insert into the backbone. It assigns corresponding styles to content regions based on semantic matching, and then seamlessly combines multiple styles together. Comprehensive evaluations demonstrate that our framework outperforms existing works of SST and MST.  相似文献   

18.

In recent years, image scene classification based on low/high-level features has been considered as one of the most important and challenging problems faced in image processing research. The high-level features based on semantic concepts present a more accurate and closer model to the human perception of the image scene content. This paper presents a new multi-stage approach for image scene classification based on high-level semantic features extracted from image content. In the first stage, the object boundaries and their labels that represent the content are extracted. For this purpose, a combined method of a fully convolutional deep network and a combined network of a two-class SVM-fuzzy and SVR are used. Topic modeling is used to represent the latent relationships between the objects. Hence in the second stage, a new combination of methods consisting of the bag of visual words, and supervised document neural autoregressive distribution estimator is used to extract the latent topics (topic modeling) in the image. Finally, classification based on Bayesian method is performed according to the extracted features of the deep network, objects labels and the latent topics in the image. The proposed method has been evaluated on three datasets: Scene15, UIUC Sports, and MIT-67 Indoor. The experimental results show that the proposed approach achieves average performance improvement of 12%, 11% and 14% in the accuracy of object detection, and 0.5%, 0.6% and 1.8% in the mean average precision criteria of the image scene classification, compared to the previous state-of-the-art methods on these three datasets.

  相似文献   

19.
For dealing with the noise image processing in the real scene, this paper proposes a denoising algorithm based on convolu- tional neural network according to the noise model which closer to the real scene. The algorithm uses multiple convolution layers in the convolutional neural network to learn the data characteristics of the noise image in the real scene, so as to continuously optimize its own parameters. The simulation results show that the denoising algorithm based on convolutional neural network has a good de- noising effect on the noise image in real scene, the denoised image is clearer, the visual effect is better, and the edge details in the im- age are well preserved.  相似文献   

20.
Scene classification is a complicated task, because it includes much content and it is difficult to capture its distribution.A novel hierarchical serial scene classification framework is presented in this paper. At first, we use hierarchical feature to present both the global scene and local patches containing specific objects. Hierarchy is presented by space pyramid match, and our own codebook is built by two different types of words. Secondly, we train the visual words by generative and discriminative methods respectively based on space pyramid match, which could obtain the local patch labels efficiently. Then, we use a neural network to simulate the human decision process, which leads to the final scene category from local labels. Experiments show that the hierarchical serial scene image representation and classification model obtains superior results with respect to accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号