首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
张建明  李梅  李广翠 《计算机工程》2011,37(15):212-214
目前大多数的视频语义概念提取研究没有考虑到视频多模态之间的关联共生特性,而在样本的标注方面采用自定义的概念进行标注,会影响语义概念提取的准确率。针对上述问题,提出结合Simfusion算法和用本体知识库标注样本的方法提取视频的语义概念,该方法根据镜头内容变化提取关键帧,在提取出镜头内容时,有效地利用镜头多模态之间的时序关联共生特性,同时运用本体知识库中的概念标注样本、训练分类器,弥补传统方法在标注样本时存在的主观、不规范等不足。实验结果表明,该方法在视频语义概念提取的研究中,有较高的准确度、可操作性强。  相似文献   

2.
基于张量表示的直推式多模态视频语义概念检测   总被引:4,自引:0,他引:4  
吴飞  刘亚楠  庄越挺 《软件学报》2008,19(11):2853-2868
提出了一种基于高阶张量表示的视频语义分析与理解框架.在此框架中,视频镜头首先被表示成由视频中所包含的文本、视觉和听觉等多模态数据构成的三阶张量;其次,基于此三阶张量表达及视频的时序关联共生特性设计了一种子空间嵌入降维方法,称为张量镜头;由于直推式学习从已知样本出发能对特定的未知样本进行学习和识别.最后在这个框架中提出了一种基于张量镜头的直推式支持张量机算法,它不仅保持了张量镜头所在的流形空间的本征结构,而且能够将训练集合外数据直接映射到流形子空间,同时充分利用未标记样本改善分类器的学习性能.实验结果表明,该方法能够有效地进行视频镜头的语义概念检测.  相似文献   

3.
精彩事件检测在体育视频语义分析领域具有很高的学术研究价值和广泛的市场应用前景.利用隐条件随机场(hidden conditional random field,HCRF)模型在表达和识别语义事件方面的强大功能,创新性地提出了一种融合了HCRF和情感激励模型(affective arousal model,AAM)的精彩事件检测方法.首先,通过精彩事件视频结构语义分析,定义了13种多模态语义线索,以准确描述精彩事件富含的语义信息;其次,在基于概念格的多模态语义线索聚类基础上,添加时域特征信息,以构建特征值加权的情感激励模型,得到了各类精彩事件的情感激励值;最后,在小规模训练样本情况下,有效建立了各类精彩事件检测的HCRF模型,基于视频语义镜头序列、情感激励值序列和精彩事件之间的映射关系,从多模态语义线索、视频结构语义、情感语义等多个维度挖掘了精彩事件的潜在规律,实现了同一HCRF模型下各类精彩事件的同时检测.实验证明了该方法的有效性.  相似文献   

4.
为了能快速、有效地进行视频场景分割,论文提出一种基于镜头竞争力的多模态视频场景分割算法,充分考虑视频中多模态之间的时序关联共生特性,通过对视频物理特征的提取、融合计算出镜头间相似度,结合镜头竞争力的判定思想分割出视频场景.实验结果表明,该算法能较为高效地进行视频场景分割,查全率和查准率可达82.1%和86.7%.  相似文献   

5.
基于联合知识表示学习的多模态实体对齐   总被引:1,自引:0,他引:1  
王会勇  论兵  张晓明  孙晓领 《控制与决策》2020,35(12):2855-2864
基于知识表示学习的实体对齐方法是将多个知识图谱嵌入到低维语义空间,通过计算实体向量之间的相似度实现对齐.现有方法往往关注文本信息而忽视图像信息,导致图像中实体特征信息未得到有效利用.对此,提出一种基于联合知识表示学习的多模态实体对齐方法(ITMEA).该方法联合多模态(图像、文本)数据,采用TransE与TransD相结合的知识表示学习模型,使多模态数据能够嵌入到统一低维语义空间.在低维语义空间中迭代地学习已对齐多模态实体之间的关系,从而实现多模态数据的实体对齐.实验结果表明,ITMEA在WN18-IMG数据集中能够较好地实现多模态实体对齐.  相似文献   

6.
为挖掘视频中丰富的语义信息,提出基于负样本精简概念格规则的语义概念检测方法.分析基于概念格的语义分析系统,考虑训练数据中负样本的信息,提出利用负样本精简的语义规则提取算法,将其应用于视频语义检测.先将视频镜头的低层特征映射到低层语义特征,再利用该算法生成语义分类规则,进行视频语义概念检测.实验结果表明,该方法是有效可行...  相似文献   

7.
为了挖掘视频中不同的模态信息,提出一种基于多模态信息的视频描述算法。在基本的编码解码器网络基础上,更加关注视频多模态信息和高级语义属性。在编码器阶段,提取视频的静态特征、光流特征和视频段特征,同时设计语义属性检测网络得到视频高级语义特征。为了避免解码器阶段的曝光偏差和训练损失与评价准则不统一的问题,采用基于强化学习的训练算法直接将客观评价准则作为优化目标来训练模型。所提出的算法在公开视频描述数据集MSVD上取得了很好的实验效果。  相似文献   

8.
深度学习方法促使多模态虚假新闻检测领域快速发展,现有的检测模型通常从全局角度学习新闻图文间的跨模态语义关联,并利用共享语义内容获取检测的关键信息.然而,新闻内部的局部语义差异可能会限制模型有效利用跨模态语义关联的能力,其中潜在的非共享语义内容作为重要线索能够有效揭示虚假新闻的篡改意图和目的.为了解决上述问题,本文提出了一种双分支线索深度感知与自适应协同优化的多模态虚假新闻检测模型.该模型首先从图像显著区域和文本语义单词中提取细粒度的新闻特征,并使用跨模态加权残差网络从中学习共享语义线索.同时,根据所有图像区域和文本单词之间的语义相关性,双分支图文线索感知模块显式地建模共享与非共享语义内容的语义关联.其中,线索关联优化分支对两类语义内容的关联边界持续迭代优化,促使模型准确区分非共享语义线索;线索关联分析分支刻画两类语义内容的可信程度,并在此基础上引导模型实现线索的自主融合.通过上述自适应协同优化框架,本文提出的模型能够在复杂新闻语境下进行线索的深度感知与融合,实现更准确、更可解释的多模态虚假新闻检测.在广泛使用的中英文真实数据集上的实验结果表明,本文提出的模型明显优于基线方法,在准确率和...  相似文献   

9.
在教育场景下,教育资源推荐是一项关键且基础的任务,教育资源呈现出显著的多源、异构和多模态特性,给教育资源的理解、应用带来了巨大的挑战。对此,该文提出了一种基于多模态语义分析的试题推荐方法:首先进行多模态教育资源的特征抽取以及不同模态数据之间的语义关联,构建多模态教育资源的理解表示框架;并利用相同领域任务进行多模态视频和试题特征的预训练,进行关联知识建模;最后,利用线上收集的数据进行视频-试题关联特征微调,得到更加鲁棒的特征表示,进行多模态教学视频的相关性试题推荐。在教育领域数据集上的实验结果表明,该文所提出的方法能有效提升现有方法的效果,具有很好的应用价值。  相似文献   

10.
视频内容具有非常强的时间关联和逻辑结构,镜头语义是视频内容理解的基本单元。 从符合人类认识理解视频内容的角度来看,镜头语义之间隐含着时间上、语义上、结构上的多种 上下文关联信息。合理地描述这种上下文信息至关重要。为此,首先采用一棵带有上下文标签的 标签树作为镜头语义上下文层次结构的表征模型,以序列化的镜头语义序列为底层叶节点,以内 节点的上下文标签表征镜头语义间的上下文关联,其树形结构与视频内容层次化表征形式一致, 能为视频内容理解提供显著的信息增益。然后,着眼于解决镜头语义从其序列结构向标签树的层 次结构转化,采用结构化支持向量机的分析方法,根据镜头语义序列和视频语义上下文标签树的 联合特性构造了语义上下文结构化函数和损失函数,实现了镜头语义的结构化分析。实验结果表 明,视频语义上下文标签树在时序性、层次性、领域性、逻辑性等方面具有良好的表征能力,而 基于结构化支持向量机的结构化分析方法在镜头语义上下文分析的准确率、召回率及F1 值表现 良好。  相似文献   

11.
Semantic filtering and retrieval of multimedia content is crucial for efficient use of the multimedia data repositories. Video query by semantic keywords is one of the most difficult problems in multimedia data retrieval. The difficulty lies in the mapping between low-level video representation and high-level semantics. We therefore formulate the multimedia content access problem as a multimedia pattern recognition problem. We propose a probabilistic framework for semantic video indexing, which call support filtering and retrieval and facilitate efficient content-based access. To map low-level features to high-level semantics we propose probabilistic multimedia objects (multijects). Examples of multijects in movies include explosion, mountain, beach, outdoor, music etc. Semantic concepts in videos interact and to model this interaction explicitly, we propose a network of multijects (multinet). Using probabilistic models for six site multijects, rocks, sky, snow, water-body forestry/greenery and outdoor and using a Bayesian belief network as the multinet we demonstrate the application of this framework to semantic indexing. We demonstrate how detection performance can be significantly improved using the multinet to take interconceptual relationships into account. We also show how the multinet can fuse heterogeneous features to support detection based on inference and reasoning  相似文献   

12.
13.
基于多模态信息挖掘融合的视频检索技术   总被引:1,自引:0,他引:1  
基于内容的多媒体检索特别是视频检索,由于多媒体数据本身具有复杂的语义,所以极大地提高了检索的难度.算法着眼于视频本身挖掘出充分的资源信息并且将这些信息加以融合来提高视频检索的性能.基于这种思想,提出一种多模态视频检索模型以及相应的手动式搜索和交互式搜索的算法方案.搜索策略在TRECVID视频检索比赛中取得了不错的成绩.  相似文献   

14.

Since its invention, the Web has evolved into the largest multimedia repository that has ever existed. This evolution is a direct result of the explosion of user-generated content, explained by the wide adoption of social network platforms. The vast amount of multimedia content requires effective management and retrieval techniques. Nevertheless, Web multimedia retrieval is a complex task because users commonly express their information needs in semantic terms, but expect multimedia content in return. This dissociation between semantics and content of multimedia is known as the semantic gap. To solve this, researchers are looking beyond content-based or text-based approaches, integrating novel data sources. New data sources can consist of any type of data extracted from the context of multimedia documents, defined as the data that is not part of the raw content of a multimedia file. The Web is an extraordinary source of context data, which can be found in explicit or implicit relation to multimedia objects, such as surrounding text, tags, hyperlinks, and even in relevance-feedback. Recent advances in Web multimedia retrieval have shown that context data has great potential to bridge the semantic gap. In this article, we present the first comprehensive survey of context-based approaches for multimedia information retrieval on the Web. We introduce a data-driven taxonomy, which we then use in our literature review of the most emblematic and important approaches that use context-based data. In addition, we identify important challenges and opportunities, which had not been previously addressed in this area.

  相似文献   

15.
针对基于内容的检索技术在高层语义检索方面的不足,论文提出了一种通过XML大纲(schema)制导进行多媒体标注和检索的方法,即首先以schema来定义多媒体数据的结构,然后以schema制导的方法对多媒体内容进行标注和检索,并尝试采用基于SMIL的技术来实现检索结果的超媒体合成。为实现通用性,提出了schema制导树生成的一般算法,以及检索条件到XQuery查询语句的转换方法,最后描述了一个该方法的具体实现——Schema制导的多媒体标注和检索系统SDMMRS。  相似文献   

16.
Multimedia content has been growing quickly and video retrieval is regarded as one of the most famous issues in multimedia research. In order to retrieve a desirable video, users express their needs in terms of queries. Queries can be on object, motion, texture, color, audio, etc. Low-level representations of video are different from the higher level concepts which a user associates with video. Therefore, query based on semantics is more realistic and tangible for end user. Comprehending the semantics of query has opened a new insight in video retrieval and bridging the semantic gap. However, the problem is that the video needs to be manually annotated in order to support queries expressed in terms of semantic concepts. Annotating semantic concepts which appear in video shots is a challenging and time-consuming task. Moreover, it is not possible to provide annotation for every concept in the real world. In this study, an integrated semantic-based approach for similarity computation is proposed with respect to enhance the retrieval effectiveness in concept-based video retrieval. The proposed method is based on the integration of knowledge-based and corpus-based semantic word similarity measures in order to retrieve video shots for concepts whose annotations are not available for the system. The TRECVID 2005 dataset is used for evaluation purpose, and the results of applying proposed method are then compared against the individual knowledge-based and corpus-based semantic word similarity measures which were utilized in previous studies in the same domain. The superiority of integrated similarity method is shown and evaluated in terms of Mean Average Precision (MAP).  相似文献   

17.
Numerous web videos associated with rich metadata are available on the Internet today. While such metadata like video tags bring us facilitation and opportunities for video search and multimedia content understanding, some challenges also arise due to the fact that those video tags are usually annotated at the video level, while many tags actually only describe parts of the video content. How to localize the relevant parts or frames of web video for given tags is the key to many applications and research tasks. In this paper we propose combining topic model and relevance filtering to localize relevant frames. Our method is designed in three steps. First, we apply relevance filtering to assign relevance scores to video frames and a raw relevant frame set is obtained by selecting the top ranked frames. Then, we separate the frames into topics by mining the underlying semantics using latent Dirichlet allocation and use the raw relevance set as validation set to select relevant topics. Finally, the topical relevances are used to refine the raw relevant frame set and the final results are obtained. Experiment results on two real web video databases validate the effectiveness of the proposed approach.  相似文献   

18.
基于内容的多媒体数据库系统引擎CDB   总被引:3,自引:0,他引:3  
CDB(Content-based DataBase)是一种基于内容的多媒体数据库引擎,可以嵌入到通用的对象一关系数据库中,使数据库系统综合支持对多媒体数据的常规和基于内容的壹询.本文首先阐述CDB的体系结构,它把信息检索和数据检索结合到数据库中,支持多媒体数据库的基于内容的建立、操纵和维护;然后给出其层次型内容模型,描述多媒体内容的时空结构特征以及信息线索;最后描述用于CDB的基于内容信息检索技术及其设计和实现的用户壹询和操纵接口,包括示例壹询、主观颜色壹询、视频概要和浏览、扩展SQL内容壹询等.  相似文献   

19.
随着互联网和多媒体技术的飞速发展,多媒体信息的保护在广播监视、拷贝控制、内容认证、数字指纹、安全隐蔽通信等领域被广泛关注.提出了一种基于广义Fibonacci变换的视频置乱新算法,该算法具有速度快、置乱效果好、抗剪切、抗噪声,独立于任何输入视频格式和编码方式的优点.  相似文献   

20.
A survey on content-based retrieval for multimedia databases   总被引:8,自引:0,他引:8  
Conventional database systems are designed for managing textual and numerical data, and retrieving such data is often based on simple comparisons of text/numerical values. However, this simple method of retrieval is no longer adequate for multimedia data, since the digitized representation of images, video, or data itself does not convey the reality of these media items. In addition, composite data consisting of heterogeneous types of data also associates with the semantic content acquired by a user's recognition. Therefore, content-based retrieval for multimedia data is realized taking such intrinsic features of multimedia data into account. Implementation of the content-based retrieval facility is not based on a single fundamental, but is closely related to an underlying data model, a priori knowledge of the area of interest, and the scheme for representing queries. This paper surveys recent studies on content-based retrieval for multimedia databases from the point of view of three fundamental issues. Throughout the discussion, we assume databases that manage only nontextual/numerical data, such as image or video, are also in the category of multimedia databases  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号