首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
This paper presents a new attention model for detecting visual saliency in news video. In the proposed model, bottom-up (low level) features and top-down (high level) factors are used to compute bottom-up saliency and top-down saliency respectively. Then, the two saliency maps are fused after a normalization operation. In the bottom-up attention model, we use quaternion discrete cosine transform in multi-scale and multiple color spaces to detect static saliency. Meanwhile, multi-scale local motion and global motion conspicuity maps are computed and integrated into motion saliency map. To effectively suppress the background motion noise, a simple histogram of average optical flow is adopted to calculate motion contrast. Then, the bottom-up saliency map is obtained by combining the static and motion saliency maps. In the top-down attention model, we utilize high level stimulus in news video, such as face, person, car, speaker, and flash, to generate the top-down saliency map. The proposed method has been extensively tested by using three popular evaluation metrics over two widely used eye-tracking datasets. Experimental results demonstrate the effectiveness of our method in saliency detection of news videos compared to several state-of-the-art methods.  相似文献   

2.

Saliency prediction models provide a probabilistic map of relative likelihood of an image or video region to attract the attention of the human visual system. Over the past decade, many computational saliency prediction models have been proposed for 2D images and videos. Considering that the human visual system has evolved in a natural 3D environment, it is only natural to want to design visual attention models for 3D content. Existing monocular saliency models are not able to accurately predict the attentive regions when applied to 3D image/video content, as they do not incorporate depth information. This paper explores stereoscopic video saliency prediction by exploiting both low-level attributes such as brightness, color, texture, orientation, motion, and depth, as well as high-level cues such as face, person, vehicle, animal, text, and horizon. Our model starts with a rough segmentation and quantifies several intuitive observations such as the effects of visual discomfort level, depth abruptness, motion acceleration, elements of surprise, size and compactness of the salient regions, and emphasizing only a few salient objects in a scene. A new fovea-based model of spatial distance between the image regions is adopted for considering local and global feature calculations. To efficiently fuse the conspicuity maps generated by our method to one single saliency map that is highly correlated with the eye-fixation data, a random forest based algorithm is utilized. The performance of the proposed saliency model is evaluated against the results of an eye-tracking experiment, which involved 24 subjects and an in-house database of 61 captured stereoscopic videos. Our stereo video database as well as the eye-tracking data are publicly available along with this paper. Experiment results show that the proposed saliency prediction method achieves competitive performance compared to the state-of-the-art approaches.

  相似文献   

3.
针对移动镜头下的运动目标检测中的背景建模复杂、计算量大等问题,提出一种基于运动显著性的移动镜头下的运动目标检测方法,在避免复杂的背景建模的同时实现准确的运动目标检测。该方法通过模拟人类视觉系统的注意机制,分析相机平动时场景中背景和前景的运动特点,计算视频场景的显著性,实现动态场景中运动目标检测。首先,采用光流法提取目标的运动特征,用二维高斯卷积方法抑制背景的运动纹理;然后采用直方图统计衡量运动特征的全局显著性,根据得到的运动显著图提取前景与背景的颜色信息;最后,结合贝叶斯方法对运动显著图进行处理,得到显著运动目标。通用数据库视频上的实验结果表明,所提方法能够在抑制背景运动噪声的同时,突出并准确地检测出场景中的运动目标。  相似文献   

4.
The segmentation of objects and people in particular is an important problem in computer vision. In this paper, we focus on automatically segmenting a person from challenging video sequences in which we place no constraint on camera viewpoint, camera motion or the movements of a person in the scene. Our approach uses the most confident predictions from a pose detector as a form of anchor or keyframe stick figure prediction which helps guide the segmentation of other more challenging frames in the video. Since even state of the art pose detectors are unreliable on many frames –especially given that we are interested in segmentations with no camera or motion constraints –only the poses or stick figure predictions for frames with the highest confidence in a localized temporal region anchor further processing. The stick figure predictions within confident keyframes are used to extract color, position and optical flow features. Multiple conditional random fields (CRFs) are used to process blocks of video in batches, using a two dimensional CRF for detailed keyframe segmentation as well as 3D CRFs for propagating segmentations to the entire sequence of frames belonging to batches. Location information derived from the pose is also used to refine the results. Importantly, no hand labeled training data is required by our method. We discuss the use of a continuity method that reuses learnt parameters between batches of frames and show how pose predictions can also be improved by our model. We provide an extensive evaluation of our approach, comparing it with a variety of alternative grab cut based methods and a prior state of the art method. We also release our evaluation data to the community to facilitate further experiments. We find that our approach yields state of the art qualitative and quantitative performance compared to prior work and more heuristic alternative approaches.  相似文献   

5.
本文提出了一种基于运动和亮度显著性检测的烟雾区域分割方法,目的是解决传统的运动检测方法对于树叶抖动、摄像机抖动等不显著的运动区域比较敏感的问题。采用低秩结构化稀疏分解方法提取前景区域,然后计算烟雾的显著性,以便进一步分离。我们提出一种基于自适应参数的群稀疏鲁棒标准正交子空间学习(ROSL)的显著性测量方法。实验表明,该方法能够很好地处理大范围的烟雾视频,并能获得较好的烟雾检测结果。  相似文献   

6.
Extracting foreground objects from videos captured by a handheld camera has emerged as a new challenge. While existing approaches aim to exploit several clues such as depth and motion to extract the foreground layer, there are limitations in handling partial movement and cast shadow. In this paper, we bring a novel perspective to address these two issues by utilizing occlusion map introduced by object and camera motion and taking the advantage of interactive image segmentation methods. For partial movement, we treat each video frame as an image and synthesize “seeding” user interactions (i.e., user manually marking foreground and background) with both forward and backward occlusion maps to leverage the advances in high quality interactive image segmentation. For cast shadow, we utilize a paired region based shadow detection method to further refine initial segmentation results by removing detected shadow regions. Experimental results from both qualitative evaluation and quantitative evaluation on the Hopkins dataset demonstrate both the effectiveness and the efficiency of our proposed approach.  相似文献   

7.
障碍物检测与分割是地面无人车辆环境感知领域中一项重要的任务。针对传统障碍物检测与分割算法的计算量大、分割精度较差等问题,提出了一种基于显著性分析的障碍物检测、分割优化算法。首先,利用基于频率调谐的方法生成场景图像的显著图;然后,通过单目摄像机与激光雷达的联合标定将雷达反射点映射到显著图上;最后,结合单目摄像机和激光雷达两种传感器信息,通过改进的图像区域分割算法,实现障碍物的检测与分割。为了验证所提出算法有效性,采集多幅包含障碍物的典型越野场景图像,对该算法进行实验与仿真验证,结果证明了该算法的有效性。  相似文献   

8.
目的 立体视频能提供身临其境的逼真感而越来越受到人们的喜爱,而视觉显著性检测可以自动预测、定位和挖掘重要视觉信息,可以帮助机器对海量多媒体信息进行有效筛选。为了提高立体视频中的显著区域检测性能,提出了一种融合双目多维感知特性的立体视频显著性检测模型。方法 从立体视频的空域、深度以及时域3个不同维度出发进行显著性计算。首先,基于图像的空间特征利用贝叶斯模型计算2D图像显著图;接着,根据双目感知特征获取立体视频图像的深度显著图;然后,利用Lucas-Kanade光流法计算帧间局部区域的运动特征,获取时域显著图;最后,将3种不同维度的显著图采用一种基于全局-区域差异度大小的融合方法进行相互融合,获得最终的立体视频显著区域分布模型。结果 在不同类型的立体视频序列中的实验结果表明,本文模型获得了80%的准确率和72%的召回率,且保持了相对较低的计算复杂度,优于现有的显著性检测模型。结论 本文的显著性检测模型能有效地获取立体视频中的显著区域,可应用于立体视频/图像编码、立体视频/图像质量评价等领域。  相似文献   

9.
In this paper, we propose a novel stereo method for registering foreground objects in a pair of thermal and visible videos of close-range scenes. In our stereo matching, we use Local Self-Similarity (LSS) as similarity metric between thermal and visible images. In order to accurately assign disparities to depth discontinuities and occluded Region Of Interest (ROI), we have integrated color and motion cues as soft constraints in an energy minimization framework. The optimal disparity map is approximated for image ROIs using a Belief Propagation (BP) algorithm. We tested our registration method on several challenging close-range indoor video frames of multiple people at different depths, with different clothing, and different poses. We show that our global optimization algorithm significantly outperforms the existing state-of-the art method, especially for disparity assignment of occluded people at different depth in close-range surveillance scenes and for relatively large camera baseline.  相似文献   

10.
王雪  李占山  陈海鹏 《软件学报》2022,33(9):3165-3179
基于U-Net的编码-解码网络及其变体网络在医学图像语义分割任务中取得了卓越的分割性能.然而,网络在特征提取过程中丢失了部分空间细节信息,影响了分割精度.另一方面,在多模态的医学图像语义分割任务中,这些模型的泛化能力和鲁棒性不理想.针对以上问题,本文提出一种显著性引导及不确定性监督的深度卷积编解码网络,以解决多模态医学图像语义分割问题.该算法将初始生成的显著图和不确定概率图作为监督信息来优化语义分割网络的参数.首先,通过显著性检测网络生成显著图,初步定位图像中的目标区域;然后,根据显著图计算不确定分类的像素点集合,生成不确定概率图;最后,将显著图和不确定概率图与原图像一同送入多尺度特征融合网络,引导网络关注目标区域特征的学习,同时增强网络对不确定分类区域和复杂边界的表征能力,以提升网络的分割性能.实验结果表明,本文算法能够捕获更多的语义信息,在多模态医学图像语义分割任务中优于其他的语义分割算法,并具有较好的泛化能力和鲁棒性.  相似文献   

11.
对视频中的目标进行像素级分割是计算机视觉领域的研究热点,完全没有用户标注的无监督视频分割对分割算法提出了更高的要求。近几年在分割中常使用基于帧间运动信息进行建模的方法,即用光流等运动信息预测目标轮廓,再结合颜色等特征建立模型进行分割。针对这些方法产生的前景背景混淆以及边缘粗糙等问题,本文提出结合全卷积网络的视频目标分割方法。首先通过全卷积网络预测视频序列中显著目标的轮廓,结合光流获得的运动显著性标签进行修正,然后建立时间-空间图模型,运用图割的方法获得最终的预测标签。在SegTrack v2以及DAVIS这2个通用数据集上进行评估,结果表明本文方法较基于帧间运动信息的方法在分割效果上有明显的提高。  相似文献   

12.
A multimedia surveillance system aims to provide security and safety of people in a monitored space. However, due to the nature of surveillance, privacy-sensitive information such as face, gait, and other physical parameters based on the captured media from multiple sensors, can be revealed without the permission of the people who appear in the surveillance video. This is a major concern in recent days. Therefore, it is desirable to have such mechanism that can hide privacy-sensitive information as much as possible, yet supporting effective surveillance tasks. In this article, we propose a chaos cryptography based data scrambling approach that can be applied on selected regions of interest (ROIs) in video camera footage, which contains privacy-sensitive data. Our approach also supports multiple levels of abstraction of data hiding depending on the role of the authorized user. In order to evaluate the suitability of this approach, we applied our algorithm on some video camera footage and observed that our approach is computationally efficient and, hence, it can be applied for real-time video surveillance tasks in preserving privacy sensitive information.  相似文献   

13.
This paper presents a motion-based skin Region of Interest (ROI) detection method using a real-time connected component labeling algorithm to provide real-time and adaptive skin ROI detection in video images. Skin pixel segmentation in video images is a pre-processing step for face and hand gesture recognition, and motion is a cue for detecting foreground objects. We define skin ROIs as pixels of skin-like color where motion takes place. In the skin color estimation phase, RGB color histograms are utilized to define the skin color distribution and specify the threshold to segment skin-like regions. A parallel computed connected component labeling algorithm is also proposed to group the segmentation results into several clusters. If a cluster covers any motion pixel, this cluster is identified as a skin ROI. The method’s results for real images are shown, and its speed is evaluated for various parameters. This technology is compatible with monitoring systems, scene understanding, and natural user interfaces.  相似文献   

14.
周莺  张基宏  梁永生  柳伟 《计算机科学》2015,42(11):118-122
为了更准确有效地提取人眼观察视频的显著性区域,提出一种基于视觉运动特性的视频时空显著性区域提取方法。该方法首先通过分析视频每帧的频域对数谱得到空域显著图,利用全局运动估计和块匹配得到时域显著图,再结合人眼观察视频时的视觉特性,根据对不同运动特性视频的主观感知,动态融合时空显著图。实验分析从主客观两个方面衡量。视觉观测和量化指标均表明, 与其他经典方法相比,所提方法提取的显著性区域能够更准确地反映人眼的视觉注视区域。  相似文献   

15.
We present in this paper a Fuzzy Logic Controller (FLC) combined with a predictive algorithm to track an Unmanned Ground Vehicle (UGV), using an Unmanned Aerial Vehicle (UAV). The UAV is equipped with a down facing camera. The video flow is sent continuously to a ground station to be processed in order to extract the location of the UGV and send the commands back to the UAV to follow autonomously the UGV. To emulate an experienced UAVs pilot, we propose a fuzzy-logic set of rules. Double Exponential Smoothing algorithm is used to filter the measurements and give the predictive value of the errors in the image plan. The FLC inputs are the filtered errors (UGV position) in the image plan and the derivative of its predicted value. The outputs are pitch and roll commands to be sent to the UAV. We show the efficiency of the proposed controller experimentally, and discuss the improvement of the tracking results compared to our previous work.  相似文献   

16.
由于肺癌PET成像质量较低且待分割区域边界没有明显的灰度差,使得基于颜色特征的图像分割算法不能做到有效分割。本文提出了结合伪彩色与上下文感知的肺癌PET图像分割算法。首先,将原始的肺癌PET图像根据彩色查找表生成对应的伪彩图;然后,使用改进的上下文感知模型获得伪彩图对应的显著图,并采用大津法对显著图进行二值化处理,初始化显著图的分割区域;最后,使用改进的GrabCut算法迭代分割图像。算法应用于肺癌的PET图像分割。实验结果表明,本文算法有效提升肺癌PET图像的分割效率、提升分割精度,取消GrabCut算法、Snake算法的用户操作,实现图像分割自动化,具有较高的可靠性、执行效率、以及实际应用价值。  相似文献   

17.

In recent years, the significant progress has been achieved in the field of visual saliency modeling. Our research key is in video saliency, which differs substantially from image saliency and could be better detected by adding the gaze information from the movement of eyes while people are looking at the video. In this paper we purposed a novel gaze saliency method to predict video attention, which is inspired by the widespread usage of mobile smart devices with camera. It is a non-contacted method to predict visual attention, and it does not bring the burden on the hardware. Our method first extracts the bottom-up saliency maps from the video frames, and then constructs the mapping from eye images obtained by the camera in synchronization with the video frames to the screen region. Finally the combination between top-down gaze information and bottom-up saliency maps is conducted by point-wise multiplication to predict the video attention. Furthermore, the proposed approach is validated on the two datasets: one is the public dataset MIT, the other is the dataset we collected, versus other four usual methods, and the experiment results show that our method achieves the state-of-the-art.

  相似文献   

18.
提出了一种新颖的视频显著性检测方法。为了提取视频序列中具有高置信度的特征,根据输入帧和输入帧的初始显著图提出一种简单帧选择标准,并使用该简单选择标准挑选出视频序列中比较容易且准确提取前景对象的帧,从简单帧中获得鲁棒的前景背景标签;将图像进行超像素分割,提取时空特征与前景标签输入集成学习模型,经过多核SVM集成学习,最终生成像素级别的显著图,并且由运动特征扩散到整个视频集。各种视频序列的实验结果表明,该算法在定性和定量上优于传统的显着性检测算法。  相似文献   

19.
针对复杂背景和运动条件下视频显著性区域检测准确度不高的问题,本文提出了一个新的时空一致性优化模型,并基于颜色空间分布和运动空间分布特征,结合时空一致性优化方法构建了一个新的时空显著性区域检测模型。首先对视频帧进行超像素分割,然后提取三种具有互补性质的超像素级颜色空间分布特征和两种运动空间分布特征,再利用时空一致性分别融合优化空间显著特征和时间显著特征得到空间显著图和时间显著图。在时空融合阶段,利用时空一致性模型融合空间显著度和时间显著度得到超像素级的时空显著图。为进一步提高检测的准确度和完整度,通过一个能量最小化模型得到更精确的像素级时空显著图。通过与最新的视频显著性模型进行比较,本文算法有更高的准确率,对复杂背景和运动条件有强的鲁棒性。  相似文献   

20.
It is a challenging task for ordinary users to capture selfies with a good scene composition, given the limited freedom to position the camera. Creative hardware (e.g., selfie sticks) and software (e.g., panoramic selfie apps) solutions have been proposed to extend the background coverage of a selife, but to achieve a perfect composition on the spot when the selfie is captured remains to be difficult. In this paper, we propose a system that allows the user to shoot a selfie video by rotating the body first, then produce a final panoramic selfie image with user‐guided scene composition as postprocessing. Our key technical contribution is a fully Automatic, robust multi‐frame segmentation and stitching framework that is tailored towards the special characteristics of selfie images. We analyze the sparse feature points and employ a spatial‐temporal optimization for bilayer feature segmentation, which leads to more reliable background alignment than previous image stitching techniques. The sparse classification is then propagated to all pixels to create dense foreground masks for person‐background composition. Finally, based on a user‐selected foreground position, our system uses content‐preserving warping to produce a panoramic seflie with minimal distortion to the face region. Experimental results show that our approach can reliably generate high quality panoramic selfies, while a simple combination of previous image stitching and segmentation approaches often fails.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号