首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
视频语义概念检测是跨越语义鸿沟问题,实现基于语义的视频检索的前提。本文提出了一种基于证据理论的视频语义概念检测方法。首先,分别提取了镜头关键帧的分块颜色矩、小波纹理特征和边缘方向直方图特征;然后,利用支持向量机(Support vector machine,SVM)对3种特征数据分别进行训练,分别建立分类器模型;再次,对各SVM模型泛化误差进行分析,采用折扣系数法对不同SVM模型输出的分类结果进行修正;最后,采用证据融合公式对修正后的输出进行融合,把融合结果作为最终的概念检测结果。实验结果表明,新方法提高了概念检测的准确率,优于传统的线性分类器融合方法。  相似文献   

2.
基于多模态的检测方法是过滤成人视频的有效手段,然而现有方法中缺乏准确的音频语义表示方法。因此本文提出融合音频单词与视觉特征的成人视频检测方法。先提出基于周期性的能量包络单元(简称EE)分割算法,将音频流准确地分割为EE的序列;再提出基于EE和BoW(Bag-of-Words)的音频语义表示方法,将EE的特征描述为音频单词的出现概率;采用复合加权方法融合音频单词与视觉特征的检测结果;还提出基于周期性的成人视频判别算法,与基于周期性的EE分割算法前后配合,以充分利用周期性进行检测。实验结果表明,与基于视觉特征的方法相比,本文方法显著提高了检测性能。当误检率为9.76%时,检出率可达94.44%。  相似文献   

3.
基于视觉注意模型VAMAI的敏感图像检测方法   总被引:1,自引:1,他引:0       下载免费PDF全文
基于内容的敏感图像检测方法是过滤互联网上敏感信息的有效手段。然而,基于全局特征的检测方法误检率偏高,现有的基于BoW(bag-of-visual-words)的检测方法速度较慢。为了快速准确地检测敏感图像,本文提出基于视觉注意模型VAMAI(visual attention model for adult images)的敏感图像检测方法,包括构造面向敏感图像的视觉注意模型VAMAI、基于兴趣区域和SURF(speeded up robust features)的视觉词表算法、全局特征选择及其与BoW的融合三部分。首先,结合显著图模型、肤色分类模型和人脸检测模型,构造VAMAI,用于较准确地提取兴趣区域。然后,基于兴趣区域和SURF构建视觉词表,用于提高基于BoW的检测方法的检测速度与检测精度。最后,比较多种全局特征的性能,从中选择颜色矩,将它与BoW的支持向量机分类结果进行后融合。实验结果表明:VAMAI能够较准确地检测兴趣区域;从检测速度和检测精度两方面显著地提高了敏感图像的检测性能。  相似文献   

4.
基于混合高斯模型(GMM)的背景建模算法被广泛运用于运动目标检测,但在一些发生快速光照变化的视频序列中,不能正确地检测出运动目标。此外在对GMM参数进行初始化时,若初始化图像中存在运动目标,则目标检测的结果会出现初始化图像中的运动目标,从而导致误检测。针对上述问题,提出一种基于亮度特征自相关的GMM算法,该算法根据亮度特征自相关参数判断初始化图像中是否存在运动目标,利用亮度特征自相关参数的拟合值判断当前帧是否发生快速光照变化,运用GMM和亮度差值相结合进行目标检测。对实际摄取的视频进行仿真实验,结果证明,该算法在GMM初始化图像存在运动目标的干扰条件下,能够较好地从发生快速光照变化的视频序列中提取出运动目标,满足准确性和实时性的要求。  相似文献   

5.
针对互联网相似视频内容检测问题,提出了基于短空时变化的鲁棒视频哈希算法。特征提取和特征量化是该算法的两个关键步骤。在特征提取中,与现有基于时空信息融合的特征提取方法相比,该算法的创新性在于充分利用相邻帧之间 局部空域信息的短时变化(简称“短空时变化”)来提取特征。该算法首先构造视频内接球,并以球心为起点对内接球进行划分,获取一系列内接球环,从而捕捉相邻帧的空域信息的短时变化,然后将球环非负矩阵分解系数作为视频内容进行特征表示;在特征量化中,该算法采用改进的曼哈顿量化策略将视频特征映射成二进制的哈希序列,更好地保留了原空间中的近邻关系,提高了量化的准确度。实验结果表明,该算法具有良好的性能。  相似文献   

6.
基于自适应SVM的半监督主动学习视频标注   总被引:1,自引:0,他引:1  
具有不同分布特性的视频包含相同的语义概念,会表现出不同的视觉特征,从而导致标注正确率下降。为解决该问题,提出一种基于自适应支持向量机(SVM)的半监督主动学习视频标注算法。通过引入?函数和优化模型参数将现有分类器转换为自适应支持向量(A-SVM)分类器,将基于高斯调和函数的半监督学习融合到基于A-SVM的主动学习中,得出相关性评价函数,根据评价函数对视频数据进行标注。实验结果表明,该算法在跨域视频概念检测问题上的平均标准率为68.1%,平均标全率为60%,与支持向量机半监督主动学习和基于直推式支持向量机半监督主动学习相比有所提高。  相似文献   

7.
为挖掘视频中丰富的语义信息,提出基于负样本精简概念格规则的语义概念检测方法.分析基于概念格的语义分析系统,考虑训练数据中负样本的信息,提出利用负样本精简的语义规则提取算法,将其应用于视频语义检测.先将视频镜头的低层特征映射到低层语义特征,再利用该算法生成语义分类规则,进行视频语义概念检测.实验结果表明,该方法是有效可行...  相似文献   

8.
尽管传统的词袋(BoW,bag of words)模型在复杂场景行为识别中能够保持鲁棒性,但是硬向量量化会导致大量的近似误差,进而产生很差的特征集.行为识别中一个重要的挑战是视觉词汇的构造,从原始特征到分类标签没有直接的映射,因此高层的视觉描述子需要更加精确的词典,故提出基于结构稀疏表示的人体行为识别方法.在所提出方法的BoW模型中,视频表示为组稀疏编码系数的直方图.与传统的BoW模型相比,所提方法具有更少的量化误差,而且高层特征表示可以减少模型参数和存储复杂性,并在标准化的人体行为数据集上评价所提方法,数据集包括KTH,Weimann,UCF-Sports,UCF50人体行为数据集,实验结果表明,所提方法与现存的其他方法相比各方面性能都有显著的提高.  相似文献   

9.
同时定位与构图(SLAM)主要用于解决移动机器人在未知环境中进行地图构建和导航的问题,是移动机器人实现自主移动的基础.闭环检测是视觉SLAM的关键步骤,对构建一致性地图和减少位姿累积误差具有重要作用.当前的闭环检测方法通常采用传统的SIFT、SURF等特征,很容易受到环境影响,为了提高闭环检测的准确性和鲁棒性,提出基于无监督栈式卷积自编码(CAEs)模型的特征提取方法,运用训练好的CAEs卷积神经网络对输入图像进行学习,将输出的特征应用于闭环检测.实验结果表明:与传统的BoW方法及其他基于深度学习模型的方法相比,所提出的算法能够有效降低图像特征的维数并改善特征描述的效果,可以在机器人SLAM闭环检测环节获得更好的精确性和鲁棒性.  相似文献   

10.
基于语义概念的视频检索系统的设计与实现   总被引:2,自引:0,他引:2       下载免费PDF全文
设计并实现了一种基于语义概念的视频检索系统,该系统包括视频镜头分割与关键帧提取、语义概念检测和用户检索3个部分。系统采用镜头分割与关键帧提取对视频进行层次分割,并对关键帧图像提取有效的图像低层特征,再使用支持向量机(SVM)进行概念的检测,最后针对概念内容进行视频检索。在概念检测中,提出了一种基于验证平均准确率的线性加权方法对SVM的分类结果进行后融合。实验结果表明,该方法可以达到较高的检索准确率。  相似文献   

11.
The method based on Bag-of-visual-Words (BoW) deriving from local keypoints has recently appeared promising for video annotation. Visual word weighting scheme has critical impact to the performance of BoW method. In this paper, we propose a new visual word weighting scheme which is referred as emerging patterns weighting (EP-weighting). The EP-weighting scheme can efficiently capture the co-occurrence relationships of visual words and improve the effectiveness of video annotation. The proposed scheme firstly finds emerging patterns (EPs) of visual keywords in training dataset. And then an adaptive weighting assignment is performed for each visual word according to EPs. The adjusted BoW features are used to train classifiers for video annotation. A systematic performance study on TRECVID corpus containing 20 semantic concepts shows that the proposed scheme is more effective than other popular existing weighting schemes.  相似文献   

12.
13.
Based on the local keypoints extracted as salient image patches, an image can be described as a ?bag-of-visual-words (BoW)? and this representation has appeared promising for object and scene classification. The performance of BoW features in semantic concept detection for large-scale multimedia databases is subject to various representation choices. In this paper, we conduct a comprehensive study on the representation choices of BoW, including vocabulary size, weighting scheme, stop word removal, feature selection, spatial information, and visual bi-gram. We offer practical insights in how to optimize the performance of BoW by choosing appropriate representation choices. For the weighting scheme, we elaborate a soft-weighting method to assess the significance of a visual word to an image. We experimentally show that the soft-weighting outperforms other popular weighting schemes such as TF-IDF with a large margin. Our extensive experiments on TRECVID data sets also indicate that BoW feature alone, with appropriate representation choices, already produces highly competitive concept detection performance. Based on our empirical findings, we further apply our method to detect a large set of 374 semantic concepts. The detectors, as well as the features and detection scores on several recent benchmark data sets, are released to the multimedia community.  相似文献   

14.
15.
The abnormal visual event detection is an important subject in Smart City surveillance where a lot of data can be processed locally in edge computing environment. Real-time and detection effectiveness are critical in such an edge environment. In this paper, we propose an abnormal event detection approach based on multi-instance learning and autoregressive integrated moving average model for video surveillance of crowded scenes in urban public places, focusing on real-time and detection effectiveness. We propose an unsupervised method for abnormal event detection by combining multi-instance visual feature selection and the autoregressive integrated moving average model. In the proposed method, each video clip is modeled as a visual feature bag containing several subvideo clips, each of which is regarded as an instance. The time-transform characteristics of the optical flow characteristics within each subvideo clip are considered as a visual feature instance, and time-series modeling is carried out for multiple visual feature instances related to all subvideo clips in a surveillance video clip. The abnormal events in each surveillance video clip are detected using the multi-instance fusion method. This approach is verified on publically available urban surveillance video datasets and compared with state-of-the-art alternatives. Experimental results demonstrate that the proposed method has better abnormal event detection performance for crowded scene of urban public places with an edge environment.  相似文献   

16.
多核学习方法(Multiple kernel learning, MKL)在视觉语义概念检测中有广泛应用, 但传统多核学习大都采用线性平稳的核组合方式而无法准确刻画复杂的数据分布. 本文将精确欧氏空间位置敏感哈希(Exact Euclidean locality sensitive Hashing, E2LSH)算法用于聚类, 结合非线性多核组合方法的优势, 提出一种非线性非平稳的多核组合方法—E2LSH-MKL. 该方法利用Hadamard内积实现对不同核函数的非线性加权,充分利用了不同核函数之间交互得到的信息; 同时利用基于E2LSH哈希原理的聚类算法,先将原始图像数据集哈希聚类为若干图像子集, 再根据不同核函数对各图像子集的相对贡献大小赋予各自不同的核权重, 从而实现多核的非平稳加权以提高学习器性能; 最后,把E2LSH-MKL应用于视觉语义概念检测. 在Caltech-256和TRECVID 2005数据集上的实验结果表明,新方法性能优于现有的几种多核学习方法.  相似文献   

17.
基于轨迹行为模式特征的视频拷贝检测算法   总被引:1,自引:0,他引:1  
为了有效地利用视频的时域运动信息来提高视频拷贝检测的精度和鲁棒性,提出一种基于特征点轨迹行为模式的拷贝检测算法.首先从视频连续帧中提取特征点轨迹的行为模式特征,然后采用视觉关键词典技术构造视频的运动特征,最后基于运动特征的相似度进行视频拷贝检测.该算法在TRECVID标准数据集上取得了较高的检测精度.实验分析表明,基于轨迹的运动特征具有较强的描述区分能力,对各种常见的拷贝变化具有鲁棒性.  相似文献   

18.
Describing visual contents in videos by semantic concepts is an effective and realistic approach that can be used in video applications such as annotation, indexing, retrieval and ranking. In these applications, video data needs to be labelled with some known set of labels or concepts. Assigning semantic concepts manually is not feasible due to the large volume of ever-growing video data. Hence, automatic semantic concept detection of videos is a hot research area. Recently Deep Convolutional Neural Networks (CNNs) used in computer vision tasks are showing remarkable performance. In this paper, we present a novel approach for automatic semantic video concept detection using deep CNN and foreground driven concept co-occurrence matrix (FDCCM) which keeps foreground to background concept co-occurrence values, built by exploiting concept co-occurrence relationship in pre-labelled TRECVID video dataset and from a collection of random images extracted from Google Images. To deal with the dataset imbalance problem, we have extended this approach by making a fusion of two asymmetrically trained deep CNNs and used FDCCM to further improve concept detection. The performance of the proposed approach is compared with state-of-the-art approaches for the video concept detection over the widely used TRECVID data set and is found to be superior to existing approaches.  相似文献   

19.
Data imbalance problem often exists in our real life dataset, especial for massive video dataset, however, the balanced data distribution and the same misclassification cost are assumed in traditional machine learning algorithms, thus, it will be difficult for them to accurately describe the true data distribution, and resulting in misclassification. In this paper, the data imbalance problem in semantic extraction under massive video dataset is exploited, and enhanced and hierarchical structure (called EHS) algorithm is proposed. In proposed algorithm, data sampling, filtering and model training are considered and integrated together compactly via hierarchical structure algorithm, thus, the performance of model can be improved step by step, and is robust and stability with the change of features and datasets. Experiments on TRECVID2010 Semantic Indexing demonstrate that our proposed algorithm has much more powerful performance than that of traditional machine learning algorithms, and keeps stable and robust when different kinds of features are employed. Extended experiments on TRECVID2010 Surveillance Event Detection also prove that our EHS algorithm is efficient and effective, and reaches top performance in four of seven events.  相似文献   

20.
基于词袋模型的图像表示方法的有效性主要受限于局部特征的量化误差。文中提出一种基于多视觉码本的图像表示方法,通过综合考虑码本构建和编码方法这两个方面的因素加以改进。具体包括:1)多视觉码本构建,以迭代方式构建多个紧凑且具有互补性的视觉码本;2)图像表示,首先针对多码本的情况,依次从各码本中选择相应的视觉单词并采用线性回归估计编码系数,然后结合图像的空间金字塔结构形成最终的图像表示。在一些标准测试集合的图像分类结果验证文中方法的有效性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号