首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
With the growing demand for visual information of rich content, effective and efficient manipulations of large video databases are increasingly desired. Many investigations have been made on content-based video retrieval. However, despite the importance, video subsequence identification, which is to find the similar content to a short query clip from a long video sequence, has not been well addressed. This paper presents a graph transformation and matching approach to this problem, with extension to identify the occurrence of potentially different ordering or length due to content editing. With a novel batch query algorithm to retrieve similar frames, the mapping relationship between the query and database video is first represented by a bipartite graph. The densely matched parts along the long sequence are then extracted, followed by a filter-and-refine search strategy to prune some irrelevant subsequences. During the filtering stage, maximum size matching is deployed for each subgraph constructed by the query and candidate subsequence to obtain a smaller set of candidates. During the refinement stage, sub-maximum similarity matching is devised to identify the subsequence with the highest aggregate score from all candidates, according to a robust video similarity model that incorporates visual content, temporal order, and frame alignment information. The performance studies conducted on a long video recording of 50 hours validate that our approach is promising in terms of both search accuracy and speed.  相似文献   

2.
视频片断检索是视频领域的研究热点,为了提高查询效率,利用高维索引结构Vector-Approximation File(VA-File)来组织视频子片段,并采用新的相似度模型和基于限定性滑动窗口的高效视频检索算法进行视频片段检索.提出的子片段的分割算法能够较好地区分运动的细节动作,且相似度模型充分考虑了对应子片段之间的视觉相似性以及时间顺序关系,因此对于运动视频的检索十分有效.实验证明,对于运动视频片段检索不仅具有较高的查询效率,而且能够得到较高的查全率和准确率.  相似文献   

3.
Video indexing requires the efficient segmentation of video into scenes. The video is first segmented into shots and a set of key-frames is extracted for each shot. Typical scene detection algorithms incorporate time distance in a shot similarity metric. In the method we propose, to overcome the difficulty of having prior knowledge of the scene duration, the shots are clustered into groups based only on their visual similarity and a label is assigned to each shot according to the group that it belongs to. Then, a sequence alignment algorithm is applied to detect when the pattern of shot labels changes, providing the final scene segmentation result. In this way shot similarity is computed based only on visual features, while ordering of shots is taken into account during sequence alignment. To cluster the shots into groups we propose an improved spectral clustering method that both estimates the number of clusters and employs the fast global k-means algorithm in the clustering stage after the eigenvector computation of the similarity matrix. The same spectral clustering method is applied to extract the key-frames of each shot and numerical experiments indicate that the content of each shot is efficiently summarized using the method we propose herein. Experiments on TV-series and movies also indicate that the proposed scene detection method accurately detects most of the scene boundaries while preserving a good tradeoff between recall and precision.  相似文献   

4.
视频中异常事件所体现的时空特征存在着较强的相关关系.针对视频异常事件发生的时空特征相关性而影响检测性能问题,提出了基于时空融合图网络学习的视频异常事件检测方法,该方法针对视频片段的特征分别构建空间相似图和时间连续图,将各片段对应为图中的节点,考虑各节点特征与其他节点特征的Top-k相似性动态形成边的权重,构成空间相似图;考虑各节点的m个时间段内的连续性形成边的权重,构成时间连续图.将空间相似图和时间连续图进行自适应加权融合形成时空融合图卷积网络,并学习生成视频特征.在排序损失中加入图的稀疏项约束降低图模型的过平滑效应并提升检测性能.在UCF-Crime和ShanghaiTech等视频异常事件数据集上进行了实验,以接收者操作曲线(receiver operating characteristic curve,ROC)以及曲线下面积(area under curve,AUC)值作为性能度量指标.在UCF-Crime数据集下,提出的方法在AUC上达到80.76%,比基准线高5.35%;在ShanghaiTech数据集中,AUC达到89.88%,比同类最好的方法高5.44%.实验结果表明:所提出的方法可有效提高视频异常事件检测的性能.  相似文献   

5.
近期,跨模态视频语料库时刻检索(VCMR)这一新任务被提出,它的目标是从未分段的视频语料库中检索出与查询语句相对应的一小段视频片段.现有的跨模态视频文本检索工作的关键点在于不同模态特征的对齐和融合,然而,简单地执行跨模态对齐和融合不能确保来自相同模态且语义相似的数据在联合特征空间下保持接近,也未考虑查询语句的语义.为了解决上述问题,本文提出了一种面向多模态视频片段检索的查询感知跨模态双重对比学习网络(QACLN),该网络通过结合模态间和模态内的双重对比学习来获取不同模态数据的统一语义表示.具体地,本文提出了一种查询感知的跨模态语义融合策略,根据感知到的查询语义自适应地融合视频的视觉模态特征和字幕模态特征等多模态特征,获得视频的查询感知多模态联合表示.此外,提出了一种面向视频和查询语句的模态间及模态内双重对比学习机制,以增强不同模态的语义对齐和融合,从而提高不同模态数据表示的可分辨性和语义一致性.最后,采用一维卷积边界回归和跨模态语义相似度计算来完成时刻定位和视频检索.大量实验验证表明,所提出的QACLN优于基准方法.  相似文献   

6.
提出了一种基于层次化结构的视频颜色迁移方法。利用层次化分割技术对视频帧进行区域分割并将分割区域之间的组织关系用树的形式来描述,形成表示图像组成区域的具有层次化特征的树状结构。通过定义表示图像的树之间的层次化结构的相似性比较方法,对迁移图像之间的局部区域特征进行相似性比较,以寻找目标图像与参考图像局部迁移的最佳区域。在此基础上,利用颜色概率分布迁移的方法在图像的不同区域上进行局部颜色迁移以实现保持目标图像视觉特征的目的。  相似文献   

7.
In heterogeneous networks, different modalities are coexisting. For example, video sources with certain lengths usually have abundant time-varying audiovisual data. From the users’ perspective, different video segments will trigger different kinds of emotions. In order to better interact with users in heterogeneous networks and improve their user experiences, affective video content analysis to predict users’ emotions is essential. Academically, users’ emotions can be evaluated by arousal and valence values, and fear degree, which provides an approach to quantize the prediction accuracy of the reaction of the audience and users towards videos. In this paper, we propose the multimodal data fusion method for integrating the visual and audio data in order to perform the affective video content analysis. Specifically, to align the visual and audio data, the temporal attention filters are proposed to obtain the time-span features of the entire video segments. Then, by using the two-branch network structure, matched visual and audio features are integrated in the common space. At last, the fused audiovisual feature is employed for the regression and classification subtasks in order to measure the emotional responses of users. Simulation results show that the proposed method can accurately predict the subjective feelings of users towards the video contents, which provides a way to predict users’ preferences and recommend videos according to their own demand.  相似文献   

8.
Scene extraction is the first step toward semantic understanding of a video. It also provides improved browsing and retrieval facilities to users of video database. This paper presents an effective approach to movie scene extraction based on the analysis of background images. Our approach exploits the fact that shots belonging to one particular scene often have similar backgrounds. Although part of the video frame is covered by foreground objects, the background scene can still be reconstructed by a mosaic technique. The proposed scene extraction algorithm consists of two main components: determination of the shot similarity measure and a shot grouping process. In our approach, several low-level visual features are integrated to compute the similarity measure between two shots. On the other hand, the rules of film-making are used to guide the shot grouping process. Experimental results show that our approach is promising and outperforms some existing techniques.  相似文献   

9.
一种快速相似视频检索方法   总被引:1,自引:0,他引:1  
曹政  卢宝丰  朱明 《信息与控制》2010,39(5):635-639
为了解决相似性视频检索中相似性度量和快速检索两个难题,本文提出了一种新的相似性视频快速检索方法。从视觉相似性出发,根据视频的时空分布特征统计计算压缩视频签名,通过视频签名的距离度量视频相似性。为了适应可扩展计算的需要,提出了基于聚类索引表的检索方法。通过对大规模数据库的查询测试结果证明该相似性检索算法快速有效。  相似文献   

10.
Cui  Zheng  Hu  Yongli  Sun  Yanfeng  Gao  Junbin  Yin  Baocai 《Multimedia Tools and Applications》2022,81(17):23615-23632

Image-text retrieval task has received a lot of attention in the modern research field of artificial intelligence. It still remains challenging since image and text are heterogeneous cross-modal data. The key issue of image-text retrieval is how to learn a common feature space while semantic correspondence between image and text remains. Existing works cannot gain fine cross-modal feature representation because the semantic relation between local features is not effectively utilized and the noise information is not suppressed. In order to address these issues, we propose a Cross-modal Alignment with Graph Reasoning (CAGR) model, in which the refined cross-modal features in the common feature space are learned and then a fine-grained cross-modal alignment method is implemented. Specifically, we introduce a graph reasoning module to explore semantic connection for local elements in each modality and measure their importance by self-attention mechanism. In a multi-step reasoning manner, the visual semantic graph and textual semantic graph can be effectively learned and the refined visual and textual features can be obtained. Finally, to measure the similarity between image and text, a novel alignment approach named cross-modal attentional fine-grained alignment is used to compute similarity score between two sets of features. Our model achieves the competitive performance compared with the state-of-the-art methods on Flickr30K dataset and MS-COCO dataset. Extensive experiments demonstrate the effectiveness of our model.

  相似文献   

11.
信息论联合聚类算法及其在视频镜头聚类中的应用   总被引:2,自引:0,他引:2  
视频镜头自动聚类是基于内容索引与检索领域中的重要研究课题.以往相关工作,缺乏考虑描述镜头内容的特征与特征间存在关联性以及关联特征对镜头相似性度量和镜头聚类性能带来的影响.为提供更合理的镜头相似性度量,该文基于信息论联合聚类算法,将特征关联性挖掘和镜头聚类描述为彼此依附的同步优化过程.同时,为自动估计视频中镜头类别数,文中还提出基于贝叶斯信息准则的类别数估计算法.  相似文献   

12.
综合利用声视特征的新闻视频结构化模型   总被引:5,自引:1,他引:5  
视频结构化表征和基于这种表征进行相似度比较是视频检索的前提和最基础的重要工作。该文受视频制作过程的启发,通过分析新闻视频的内容结构特点,提出一个基于多语义抽象层次表示的5层视频目录结构化模型;在此基础上,采用分层处理思想,通过综合利用声视特征实现了基于目录结构的新闻视频结构化,克服了单纯用视觉特征难以胜任场景分段的困难,实验验证了该文思想及相应算法的有效性。  相似文献   

13.
Grouping video content into semantic segments and classifying semantic scenes into different types are the crucial processes to content-based video organization, management and retrieval. In this paper, a novel approach to automatically segment scenes and semantically represent scenes is proposed. Firstly, video shots are detected using a rough-to-fine algorithm. Secondly, key-frames within each shot are selected adaptively with hybrid features, and redundant key-frames are removed by template matching. Thirdly, spatio-temporal coherent shots are clustered into the same scene based on the temporal constraint of video content and visual similarity between shot activities. Finally, under the full analysis of typical characters on continuously recorded videos, scene content is semantically represented to satisfy human demand on video retrieval. The proposed algorithm has been performed on various genres of films and TV program. Promising experimental results show that the proposed method makes sense to efficient retrieval of interesting video content.
Yuncai LiuEmail:
  相似文献   

14.
The need for content-based access to image and video information from media archives has captured the attention of researchers in recent years. Research efforts have led to the development of methods that provide access to image and video data. These methods have their roots in pattern recognition. The methods are used to determine the similarity in the visual information content extracted from low level features. These features are then clustered for generation of database indices. This paper presents a comprehensive survey on the use of these pattern recognition methods which enable image and video retrieval by content.  相似文献   

15.
Browsing video scenes is just the process to unfold the story scenarios of a long video archive, which can help users to locate their desired video segments quickly and efficiently. Automatic scene detection of a long video stream file is hence the first and crucial step toward a concise and comprehensive content-based representation for indexing, browsing and retrieval purposes. In this paper, we present a novel scene detection scheme for various video types. We first detect video shot using a coarse-to-fine algorithm. The key frames without useful information are detected and removed using template matching. Spatio-temporal coherent shots are then grouped into the same scene based on the temporal constraint of video content and visual similarity of shot activity. The proposed algorithm has been performed on various types of videos containing movie and TV program. Promising experimental results shows that the proposed method makes sense to efficient retrieval of video contents of interest.  相似文献   

16.
新闻视频中广告片段精确定位方法研究   总被引:1,自引:0,他引:1       下载免费PDF全文
由于广告节目制作风格的多样性,新闻视频中广告片段的检测和定位是一个非常具有挑战性的问题。提出了一种新闻视频中广告片段定位的新方法。首先针对镜头切换和非镜头切换数目不平衡问题,设计了一种新的分类方法进行镜头检测,然后利用聚类分析粗略地标识广告块,最后通过分析相邻镜头的平均持续时间和镜头关键帧的视觉特征准确定位广告边界。实验结果表明,该方法具有较高的定位精度。  相似文献   

17.
基于视频感知哈希的视频篡改检测与多粒度定位   总被引:1,自引:0,他引:1       下载免费PDF全文
为了对被篡改过的视频进行准确快速的篡改检测与定位,引入人类视觉可计算模型,提出一种多层次、多粒度的视频篡改快速检测与定位算法.采用随机分块采样技术,提取视频结构感知特征及视频图像时域感知特征,利用哈希理论的单向摘要特性量化感知特征,获取视频摘要哈希.通过应用相似度矩阵进行多粒度、多层次篡改部位检测与定位.实验结果表明,相似度拟合图能够体现视频篡改攻击强度和攻击部位,算法表现出更好的篡改检测准确率与定位精确度.  相似文献   

18.
With the advances in multimedia databases and the popularization of the Internet, it is now possible to access large image and video repositories distributed throughout the world. One of the challenging problems in such access is how the information in the respective databases can be summarized to enable an intelligent selection of relevant database sites based on visual queries. This paper presents an approach to solve this problem based on image content-based indexing of a metadatabase at a query distribution server. The metadatabase records a summary of the visual content of the images in each database through image templates and statistical features characterizing the similarity distributions of the images. The selection of the databases is done by searching the metadatabase using a ranking algorithm that uses the query's similarity to a template and the features of the databases associated with the template. Two selection approaches, termed mean-based and histogram-based approaches, are presented. The database selection mechanisms have been implemented in a metaserver, and extensive experiments have been performed to demonstrate the effectiveness of the database selection approaches  相似文献   

19.
This paper presents a new visual aggregation model for representing visual information about moving objects in video data. Based on available automatic scene segmentation and object tracking algorithms, the proposed model provides eight operations to calculate object motions at various levels of semantic granularity. It represents trajectory, color and dimensions of a single moving object and the directional and topological relations among multiple objects over a time interval. Each representation of a motion can be normalized to improve computational cost and storage utilization. To facilitate query processing, there are two optimal approximate matching algorithms designed to match time-series visual features of moving objects. Experimental results indicate that the proposed algorithms outperform the conventional subsequence matching methods substantially in the similarity between the two trajectories. Finally, the visual aggregation model is integrated into a relational database system and a prototype content-based video retrieval system has been implemented as well.  相似文献   

20.
相邻帧间相似性原理的传统视频被动取证方法会对画面运动剧烈的视频发生大量误检测,针对这个问题,提出了一种融合空间约束和梯度结构信息的视频篡改检测方法。首先,利用空间约束准则,提取低运动区域和高纹理区域,并将两个区域进行融合,获取顽健的量化相关性丰富区域用于提取视频最优相似性特征;然后,改进原有特征的提取和描述方法,运用符合人类视觉系统特性的梯度结构相似性 GSSIM 来计算空间约束相关性值,最后,利用切比雪夫不等式对篡改点进行定位。实验证明,针对画面运动剧烈的视频,所提算法误检率更低,精确度更高。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号