首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
基于高层语义的视频检索研究   总被引:1,自引:0,他引:1       下载免费PDF全文
视频语义检索的研究是目前研究的热点之一。现有的视频检索系统技术多是基于底层特征的、非语义层次的检索。与人类思维中所能理解的高层语义概念相去甚远,这严重影响视频检索的实际效果。如何跨越底层特征和高层语义的鸿沟,用高层语义概念进行视频检索是当前研究的重点。通过对视频内容的语义理解、语义分析、语义提取的简要概述,试图构造一种视频语义检索模型。  相似文献   

3.
Semantic filtering and retrieval of multimedia content is crucial for efficient use of the multimedia data repositories. Video query by semantic keywords is one of the most difficult problems in multimedia data retrieval. The difficulty lies in the mapping between low-level video representation and high-level semantics. We therefore formulate the multimedia content access problem as a multimedia pattern recognition problem. We propose a probabilistic framework for semantic video indexing, which call support filtering and retrieval and facilitate efficient content-based access. To map low-level features to high-level semantics we propose probabilistic multimedia objects (multijects). Examples of multijects in movies include explosion, mountain, beach, outdoor, music etc. Semantic concepts in videos interact and to model this interaction explicitly, we propose a network of multijects (multinet). Using probabilistic models for six site multijects, rocks, sky, snow, water-body forestry/greenery and outdoor and using a Bayesian belief network as the multinet we demonstrate the application of this framework to semantic indexing. We demonstrate how detection performance can be significantly improved using the multinet to take interconceptual relationships into account. We also show how the multinet can fuse heterogeneous features to support detection based on inference and reasoning  相似文献   

4.
To support effective multimedia information retrieval, video annotation has become an important topic in video content analysis. Existing video annotation methods put the focus on either the analysis of low-level features or simple semantic concepts, and they cannot reduce the gap between low-level features and high-level concepts. In this paper, we propose an innovative method for semantic video annotation through integrated mining of visual features, speech features, and frequent semantic patterns existing in the video. The proposed method mainly consists of two main phases: 1) Construction of four kinds of predictive annotation models, namely speech-association, visual-association, visual-sequential, and statistical models from annotated videos. 2) Fusion of these models for annotating un-annotated videos automatically. The main advantage of the proposed method lies in that all visual features, speech features, and semantic patterns are considered simultaneously. Moreover, the utilization of high-level rules can effectively complement the insufficiency of statistics-based methods in dealing with complex and broad keyword identification in video annotation. Through empirical evaluation on NIST TRECVID video datasets, the proposed approach is shown to enhance the performance of annotation substantially in terms of precision, recall, and F-measure.  相似文献   

5.
6.
We present a system for multimedia event detection. The developed system characterizes complex multimedia events based on a large array of multimodal features, and classifies unseen videos by effectively fusing diverse responses. We present three major technical innovations. First, we explore novel visual and audio features across multiple semantic granularities, including building, often in an unsupervised manner, mid-level and high-level features upon low-level features to enable semantic understanding. Second, we show a novel Latent SVM model which learns and localizes discriminative high-level concepts in cluttered video sequences. In addition to improving detection accuracy beyond existing approaches, it enables a unique summary for every retrieval by its use of high-level concepts and temporal evidence localization. The resulting summary provides some transparency into why the system classified the video as it did. Finally, we present novel fusion learning algorithms and our methodology to improve fusion learning under limited training data condition. Thorough evaluation on a large TRECVID MED 2011 dataset showcases the benefits of the presented system.  相似文献   

7.
8.
In this work we are concerned with detecting non-collaborative videos in video sharing social networks. Specifically, we investigate how much visual content-based analysis can aid in detecting ballot stuffing and spam videos in threads of video responses. That is a very challenging task, because of the high-level semantic concepts involved; of the assorted nature of social networks, preventing the use of constrained a priori information; and, which is paramount, of the context-dependent nature of non-collaborative videos. Content filtering for social networks is an increasingly demanded task: due to their popularity, the number of abuses also tends to increase, annoying the user and disrupting their services. We propose two approaches, each one better adapted to a specific non-collaborative action: ballot stuffing, which tries to inflate the popularity of a given video by giving “fake” responses to it, and spamming, which tries to insert a non-related video as a response in popular videos. We endorse the use of low-level features combined into higher-level features representation, like bag-of-visual-features and latent semantic analysis. Our experiments show the feasibility of the proposed approaches.  相似文献   

9.
The method based on Bag-of-visual-Words (BoW) deriving from local keypoints has recently appeared promising for video annotation. Visual word weighting scheme has critical impact to the performance of BoW method. In this paper, we propose a new visual word weighting scheme which is referred as emerging patterns weighting (EP-weighting). The EP-weighting scheme can efficiently capture the co-occurrence relationships of visual words and improve the effectiveness of video annotation. The proposed scheme firstly finds emerging patterns (EPs) of visual keywords in training dataset. And then an adaptive weighting assignment is performed for each visual word according to EPs. The adjusted BoW features are used to train classifiers for video annotation. A systematic performance study on TRECVID corpus containing 20 semantic concepts shows that the proposed scheme is more effective than other popular existing weighting schemes.  相似文献   

10.
While people compare images using semantic concepts, computers compare images using low-level visual features that sometimes have little to do with these semantics. To reduce the gap between the high-level semantics of visual objects and the low-level features extracted from them, in this paper we develop a framework of learning pseudo metrics (LPM) using neural networks for semantic image classification and retrieval. Performance analysis and comparative studies, by experimenting on an image database, show that the LPM has potential application to multimedia information processing.  相似文献   

11.
为了克服图像底层特征与高层语义之间的语义鸿沟,降低自顶向下的显著性检测方法对特定物体先验的依赖,提出一种基于高层颜色语义特征的显著性检测方法。首先从彩色图像中提取结构化颜色特征并在多核学习框架下,实现对图像进行颜色命名获取像素的颜色语义名称;接着利用图像颜色语义名称分布计算高层颜色语义特征,再将其与底层的Gist特征融合,通过线性支持向量机训练生成显著性分类器,实现像素级的显著性检测。实验结果表明,本文算法能够更加准确地检测出人眼视觉关注点。且与传统的底层颜色特征相比,本文颜色语义特征能够获得更好的显著性检测结果。  相似文献   

12.
13.
14.
15.
为了弥补图像底层特征到高层语义之间的语义鸿沟,提出一种颜色语义特征的构建方法以建立新的语义映射来提高图像分类准确率。通过提取底层颜色特征,构建包含颜色概念的语义网络,建立了颜色语义特征三元组,利用机器学习分类算法进行图像分类。实验结果表明,利用文章提出的新方法构建的语义特征向量进行图像分类,不仅可以取得优秀的分类结果,同时对不同的分类算法具有鲁棒性。  相似文献   

16.
基于FRD的图像纹理情感语义提取   总被引:1,自引:0,他引:1       下载免费PDF全文
王莉 《计算机工程》2009,35(20):212-215
图像的低层视觉特征(颜色、纹理和形状)中包含大量人类可感知的情感语义信息。利用纹理特征,提出一种新的索引方法——模糊认识度(FRD)聚类法,用来描述与情感相关联的语义图像。FRD聚类法能从高层的情感概念出发进行图像检索。索引使用3个感性的纹理特征:方向性,对比度和粗糙度生成FRD值。实验采用室内装饰图片,结果表明,该方法性能较好。  相似文献   

17.
Image classification is an essential task in content-based image retrieval.However,due to the semantic gap between low-level visual features and high-level semantic concepts,and the diversification of Web images,the performance of traditional classification approaches is far from users’ expectations.In an attempt to reduce the semantic gap and satisfy the urgent requirements for dimensionality reduction,high-quality retrieval results,and batch-based processing,we propose a hierarchical image manifold with novel distance measures for calculation.Assuming that the images in an image set describe the same or similar object but have various scenes,we formulate two kinds of manifolds,object manifold and scene manifold,at different levels of semantic granularity.Object manifold is developed for object-level classification using an algorithm named extended locally linear embedding(ELLE) based on intra-and inter-object difference measures.Scene manifold is built for scene-level classification using an algorithm named locally linear submanifold extraction(LLSE) by combining linear perturbation and region growing.Experimental results show that our method is effective in improving the performance of classifying Web images.  相似文献   

18.
SVM用于基于内容的自然图像分类和检索   总被引:26,自引:0,他引:26  
付岩  王耀威  王伟强  高文 《计算机学报》2003,26(10):1261-1265
在传统的基于内容图像检索的方法中,由于图像的领域较宽,图像的低级视觉特征和高级概念之间存在着较大的语义间隔,导致检索效果不佳.该文认为更有现实意义的做法是,缩窄图像的领域以减小低级特征和高级概念间的语义间隔,并利用机器学习方法自动建立图像类的模型,从而提供用户概念化的图像查询方式.该文以自然图像领域为例,使用支持向量机(SVM)学习自然图像的类别,学习到的模型用于自然图像分类和检索.实验结果表明作者的方法是可行的.  相似文献   

19.
20.
In order to improve the retrieval accuracy of content-based image retrieval systems, research focus has been shifted from designing sophisticated low-level feature extraction algorithms to reducing the ‘semantic gap’ between the visual features and the richness of human semantics. This paper attempts to provide a comprehensive survey of the recent technical achievements in high-level semantic-based image retrieval. Major recent publications are included in this survey covering different aspects of the research in this area, including low-level image feature extraction, similarity measurement, and deriving high-level semantic features. We identify five major categories of the state-of-the-art techniques in narrowing down the ‘semantic gap’: (1) using object ontology to define high-level concepts; (2) using machine learning methods to associate low-level features with query concepts; (3) using relevance feedback to learn users’ intention; (4) generating semantic template to support high-level image retrieval; (5) fusing the evidences from HTML text and the visual content of images for WWW image retrieval. In addition, some other related issues such as image test bed and retrieval performance evaluation are also discussed. Finally, based on existing technology and the demand from real-world applications, a few promising future research directions are suggested.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号