首页 | 官方网站   微博 | 高级检索  
 共查询到20条相似文献,搜索用时 203 毫秒
为了解决l1范数约束下的稀疏表示判别信息不足的问题,该文提出基于局部敏感核稀疏表示的视频目标跟踪算法。为了提高目标的线性可分性,首先将候选目标的SIFT特征通过高斯核函数映射到高维核空间,然后在高维核空间中求解局部敏感约束下的核稀疏表示,将核稀疏表示经过多尺度最大值池化得到候选目标的表示,最后将候选目标的表示代入在线的SVMs,选择分类器得分最大的候选目标作为目标的跟踪位置。实验结果表明,由于利用了核稀疏表示下数据的局部性信息,使得算法的鲁棒性得到一定程度的提高。  相似文献   

针对相关滤波跟踪算法在车载视频下由于环境复杂及目标尺度变化等情况下容易跟踪失败的问题,该文提出一种基于背景信息的尺度自适应相关滤波跟踪算法。首先利用背景感知相关滤波跟踪器融合方向梯度直方图特征预测目标下一帧位置,然后根据预测位置选取图像块进行检测,最后结合动态尺度比例金字塔模型对目标进行尺度估计。实验选取了KITTI数据库中23段车载视频和标注国内的4段车载视频进行测试,实验结果表明,该算法能有效降低车载环境的复杂背景、目标尺度变化等因素干扰,整体性能优于KCF, DSST, SAMF, SATPLE等主流相关滤波算法,对车载环境下复杂背景和尺度变化的目标跟踪具有鲁棒性。  相似文献   

当前足球视频的解说和分析是智能领域研究的热点问题,文中提出一种基于尺度不变特征(SIFT)的球员检测和跟踪算法。该算法首先利用颜色特征提取场地区域,再以尺度不变特征点检测来确定球员位置,最后通过帧间特征点关联图实现对球员的跟踪。在MATLAB软件上借助图像处理工具箱进行编译,经过测试,球员检测和跟踪正确率达到90%。  相似文献   

张鑫  杨棉绒 《电视技术》2015,39(11):41-45
针对传统的三维视频稳定算法计算量较大的问题,提出了一种基于感兴趣区域内容保护的变形算法.首先,估计每个输入帧特征点的三维信息并确定感兴趣区域;然后,利用基于ROI整帧的图像变形算法和保持显著性的图像变形算法去除输入帧的抖动;最后,利用交叉方法去除ROI边界伪迹.在5个HD视频序列上的仿真实验结果表明,该算法可获得与其他先进算法相同的去抖动性能,且算法执行时间降低了至少14%.  相似文献   

图像和视频亮度的自动调整   总被引:2,自引:0,他引:2       下载免费PDF全文
 对曝光不足的图像和视频进行亮度调整具有重要的理论研究意义和实际应用价值,本文提出一种基于梯度域操作的图像和视频亮度自动调整算法.对于静态图像,算法首先将图像分割为不同的亮度区域;然后分别计算各区域的亮度调整算子;最后通过求解一个梯度约束方程得到结果图像.我们进而将该算法延伸到视频,首先选取若干关键帧并使用上述图像亮度调整算法进行处理;然后对非关键帧进行分割并通过光流算法确定非关键帧上的分割区域与前后关键帧区域的对应关系;最后利用对应关系通过关键帧区域的亮度调整算子以及调整后的亮度指导非关键帧上各区域的亮度调整,并生成结果视频序列.本文算法可以有效处理空间和时间上曝光不足和不均的图像和视频,并能够较好地保持图像、视频的细节纹理信息,实验结果表明了算法的有效性.  相似文献   

为了解决尺度变化对目标跟踪的影响,本文在颜色特征跟踪算法的基础下提出了一种多尺度目标跟踪算法。该算法通过训练位置和尺度两个相关滤波器以实现自适应尺度跟踪。首先通过最小二乘分类器学习获得位置相关滤波器,采用主成分分析法对颜色特征进行降维,计算响应的最大值作为下一帧目标中心位置;接着根据设定的尺度因子在中心位置周围形成多个大小不一的矩形区域,并计算每个区域的颜色特征;学习每个区域的颜色特征,获得尺度相关滤波器,并采用正交三角分解对尺度相关滤波器进行降维;然后根据响应的最大值确定跟踪目标的尺寸;最后对目标的位置和尺寸进行更新。通过对13组挑战性的视频序列进行测试,结果表明,本算法不仅对目标尺度变化具有一定的适应性,而且对光照变化、快速运动、运动模糊等复杂情况下,均具有鲁棒性,多项性能指标均优于目前跟踪性能先进的算法。  相似文献   

文章通过选取30个涵盖人物、风景、设备等不同场景下的短视频,在电视卖场选取9个不同电视机品牌并采用Iphone、VIVO、HUAWEI三种不同拍摄设备对30个短视频进行录制;针对录制视频时出现的位移、形变等问题,采用结合边缘检测及改进SIFT算法的录制视频处理办法,首先使用边缘检测算法寻找播放设备框界,通过逐帧操作实现录制视频空间一致性;然后利用图像特征匹配算法中的尺度不变特征变换匹配视频开始和结束画面,去除时域上产生的冗余;利用录制视频的边缘线性,采用映射边缘检测算法,同时针对视频处理耗时过长的问题,使用曼哈顿距离计算参考图与待匹配图的相似度降低算法复杂度。最后将处理后的录制视频去和原始视频得到空间和时间上的对齐,然后采用SSIM进行质量评估。  相似文献   

街道场景视频实例分割是无人驾驶技术研究中的关键问题之一,可为车辆在街道场景下的环境感知和路径规划提供决策依据.针对现有方法存在多纵横比锚框应用单一感受野采样导致边缘特征提取不充分以及高层特征金字塔空间细节位置信息匮乏的问题,本文提出锚框校准和空间位置信息补偿视频实例分割(Anchor frame calibration and Spatial position information compensation for Video Instance Segmentation,AS-VIS)网络.首先,在预测头3个分支中添加锚框校准模块实现同锚框纵横比匹配的多类型感受野采样,解决目标边缘提取不充分问题.其次,设计多感受野下采样模块将各种感受野采样后的特征融合,解决下采样信息缺失问题.最后,应用多感受野下采样模块将特征金字塔低层目标区域激活特征映射嵌入到高层中实现空间位置信息补偿,解决高层特征空间细节位置信息匮乏问题.在Youtube-VIS标准库中提取街道场景视频数据集,其中包括训练集329个视频和验证集53个视频.实验结果与YolactEdge检测和分割精度指标定量对比表明,锚框校准平均...  相似文献   

刘内美  林宏飞 《电视技术》2015,39(17):143-146
针对视频水印算法鲁棒性差或需过多先验信息提高鲁棒性的问题,提出利用视频帧块、KAZE特征匹配和DCT变换方法实现半盲鲁棒视频水印算法。首先,利用用户注册信息作为水印并为之产生唯一的秘钥,基于该秘钥随机产生唯一的帧块并存入数据库;然后,将帧与帧块进行KAZE特征匹配并检测出嵌入区域,进而嵌入水印;最后,将帧块与扭曲视频进行KAZE匹配修正视频各帧并提取水印。仿真试验结果显示,在半盲的前提下,本算法具有高鲁棒性、水印提取过程同步性、版权纠纷解决和非法传播追踪的优点,且具有高应用价值。  相似文献   

视频数据中的文本是视频语义理解和检索的重要信息来源.文中对视频中文本的检测、定位、提取、增强和识别进行了研究.提出了应用小波模极大值算法检测视频帧文本所在的位置,用由粗到精的多层定位方法以及金字塔模型,对于多尺度的静止和滚动中英文文字进行提取,最后对文本区域进行二值化.实验表明文中方法取得了良好的效果.  相似文献   

毋立芳  汪敏贵  简萌  刘旭 《信号处理》2020,36(9):1399-1406
体育视频包含大量不同类型的人体,其中运动员的行为与比赛进程和视频内容直接相关,因此运动员检 测是体育视频分析的关键环节。现有人体目标检测算法在通用人体检测任务上取得了良好的性能,但是无法有效区分运动员和非运动员。专门训练一个运动员检测模型需要标注大量的运动员位置,成本较高。本文提出了一种基于多示例学习的人体目标检测方法。在通用人体检测的基础上,引入多示例学习模块,基于图像级标注,通过弱监督方式自动学习获取特征映射矩阵,将人体特征映射到运动员特征空间,最后通过度量人体特征与运动员特征之间的相似度,实现运动员与非运动员的区分。对比实验结果表明,本文方法充分利用通用人体检测框架,以 极小的标注数据量达到了专门训练运动员检测模型的精度。   相似文献   

Semantic high-level event recognition of videos is one of most interesting issues for multimedia searching and indexing. Since low-level features are semantically distinct from high-level events, a hierarchical video analysis framework is needed, i.e., using mid-level features to provide clear linkages between low-level audio-visual features and high-level semantics. Therefore, this paper presents a framework for video event classification using temporal context of mid-level interval-based multimodal features. In the framework, a co-occurrence symbol transformation method is proposed to explore full temporal relations among multiple modalities in probabilistic HMM event classification. The results of our experiments on baseball video event classification demonstrate the superiority of the proposed approach.  相似文献   

For a variety of applications such as video surveillance and event annotation, the spatial–temporal boundaries between video objects are required for annotating visual content with high-level semantics. In this paper, we define spatial–temporal sampling as a unified process of extracting video objects and computing their spatial–temporal boundaries using a learnt video object model. We first provide a computational approach for learning an optimal key-object codebook sequence from a set of training video clips to characterize the semantics of the detected video objects. Then, dynamic programming with the learnt codebook sequence is used to locate the video objects with spatial–temporal boundaries in a test video clip. To verify the performance of the proposed method, a human action detection and recognition system is constructed. Experimental results show that the proposed method gives good performance on several publicly available datasets in terms of detection accuracy and recognition rate.  相似文献   

Semantic video analysis is a key issue in digital video applications, including video retrieval, annotation, and management. Most existing work on semantic video analysis is mainly focused on event detection for specific video genres, while the genre classification is treated as another independent issue. In this paper, we present a semantic framework for weakly supervised video genre classification and event analysis jointly by using probabilistic models for MPEG video streams. Several computable semantic features that can accurately reflect the event attributes are derived. Based on an intensive analysis on the connection between video genres and the contextual relationship among events, as well as the statistical characteristics of dominant event, a hidden Markov model (HMM) and naive Bayesian classifier (NBC) based analysis algorithm is proposed for video genre classification. Another Gaussian mixture model (GMM) is built to detect the contained events using the same semantic features, whilst an event adjustment strategy is proposed according to an analysis on the GMM structure and pre-definition of video events. Subsequently, a special event is recognized based on the detected events by another HMM. The simulative experiments on video genre classification and event analysis using a large number of video data sets demonstrate the promising performance of the proposed framework for semantic video analysis.  相似文献   

Dominant sets based movie scene detection   总被引:1,自引:0,他引:1  
Multimedia indexing and retrieval has become a challenging topic in organizing huge amount of multimedia data. This problem is not a trivial task for large visual databases; hence, segmentation into low- and high-level temporal video segments might improve the realization of this task. In this paper, we introduce a weighted undirected graph-based movie scene detection approach to detect semantically meaningful temporal video segments. The method is based on the idea of finding the dominant scene of the video according to the selected low-level feature. The proposed method starts from obtaining the most reliable solution first and exploit each solution in the subsequent steps recursively. The dominant movie scene boundary, which can be the highest probability to be the correct one, is determined and this scene boundary information is also exploited in the subsequent steps. We handle two partitioning strategies to determine the boundaries of the remaining scenes. One is a tree-based strategy and the other is an order-based strategy. The proposed dominant sets based movie scene detection method is compared with the graph-based video scene detection methods presented in literature.  相似文献   

Multimedia event detection has become a popular research topic due to the explosive growth of video data. The motion features in a video are often used to detect events because an event may contain some specific actions or moving patterns. Raw motion features are extracted from the entire video first and then aggregated to form the final video representation. However, this video-based representation approach is ineffective when used for realistic videos because the video length can be very different and the clues for determining an event may happen in only a small segment of the entire video. In this paper, we propose using a segment-based approach for video representation. Basically, original videos are divided into segments for feature extraction and classification, while still keeping the evaluation at the video level. The experimental results on recent TRECVID Multimedia Event Detection datasets proved the effectiveness of our approach.  相似文献   

In this paper, we present an advanced news video parsing system via exploring the visual characteristics of anchorperson scenes, which aims to provide personalized news services over Internet or mobile platforms. As the anchorperson shots serve as the root shots for constructing news video, the addressed system firstly performs anchorperson detection which divides the news into several segments. Due to the manipulation of multi-features and post-processing, our method of anchorperson detection can even be efficiently applied to news video whose anchorperson scenes are most challenging and complicated. Usually, the segments produced from anchorperson detection are regarded as news stories. However, an observation in our database proves this is not true because of the existing of interview scenes. These interview scenes are showed in the form that interviewer (anchorperson) and interviewee recursively appear. Thus, a technique called interview clustering based on face similarity is carried out to merge these interview segments. Another novel aspect of our system is entity summarization of interview scenes. We adopt it in the system at final. The effectiveness and robustness of the proposed system are demonstrated by the evaluation on 19 hours of news programs from 6 different TV Channels.  相似文献   

刘建伟  孙正康  刘泽宇  罗雄麟 《电子学报》2016,44(12):2908-2915
本文提出了一种利用核典型关联性分析提取源域目标域最大相关特征,使用核逻辑斯蒂回归模型进行域自适应学习的算法,该算法称为KCCA-DAML(Kernel Canonical Correlation Analysis for Domain Adaptation Learning).该算法基于特征集关联性分析,有效的减小源域和目标域的概率分布差异性,利用提取的最大相关特征通过核逻辑斯蒂回归模型实现源域到目标域的跨域学习.实验比较源域数据上核逻辑斯蒂学习模型、目标域上核逻辑斯蒂学习模型、源域和目标域上核逻辑斯蒂学习模型和KCCA-DAML模型,结果显示KCCA-DAML在真实数据集上成功的实现了跨域学习.  相似文献   

A compressed domain video saliency detection algorithm, which employs global and local spatiotemporal (GLST) features, is proposed in this work. We first conduct partial decoding of a compressed video bitstream to obtain motion vectors and DCT coefficients, from which GLST features are extracted. More specifically, we extract the spatial features of rarity, compactness, and center prior from DC coefficients by investigating the global color distribution in a frame. We also extract the spatial feature of texture contrast from AC coefficients to identify regions, whose local textures are distinct from those of neighboring regions. Moreover, we use the temporal features of motion intensity and motion contrast to detect visually important motions. Then, we generate spatial and temporal saliency maps, respectively, by linearly combining the spatial features and the temporal features. Finally, we fuse the two saliency maps into a spatiotemporal saliency map adaptively by comparing the robustness of the spatial features with that of the temporal features. Experimental results demonstrate that the proposed algorithm provides excellent saliency detection performance, while requiring low complexity and thus performing the detection in real-time.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号