首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
董远  张纪伟  赵楠  常晓夫  刘巍 《中国通信》2012,9(8):105-121
The rapid growth of multimedia content necessitates powerful technologies to filter, classify, index and retrieve video documents more efficiently. However, the essential bottleneck of image and video analysis is the problem of semantic gap that low level features extracted by computers always fail to coincide with high-level concepts interpreted by humans. In this paper, we present a generic scheme for the detection video semantic concepts based on multiple visual features machine learning. Various global and local low-level visual features are systematically investigated, and kernel-based learning method equips the concept detection system to explore the potential of these features. Then we combine the different features and sub-systems on both classifier-level and kernel-level fusion that contribute to a more robust system. Our proposed system is tested on the TRECVID dataset. The resulted Mean Average Precision (MAP) score is much better than the benchmark performance, which proves that our concepts detection engine develops a generic model and performs well on both object and scene type concepts.  相似文献   

2.
Video semantic detection has been one research hotspot in the field of human-computer interaction. In video features-oriented sparse representation, the features from the same category video could not achieve similar coding results. To address this, the Locality-Sensitive Discriminant Sparse Representation (LSDSR) is developed, in order that the video samples belonging to the same video category are encoded as similar sparse codes which make them have better category discrimination. In the LSDSR, a discriminative loss function based on sparse coefficients is imposed on the locality-sensitive sparse representation, which makes the optimized dictionary for sparse representation be discriminative. The LSDSR for video features enhances the power of semantic discrimination to optimize the dictionary and build the better discriminant sparse model. More so, to further improve the accuracy of video semantic detection after sparse representation, a weighted K-Nearest Neighbor (KNN) classification method with the loss function that integrates reconstruction error and discrimination for the sparse representation is adopted to detect video semantic concepts. The proposed methods are evaluated on the related video databases in comparison with existing sparse representation methods. The experimental results show that the proposed methods significantly enhance the power of discrimination of video features, and consequently improve the accuracy of video semantic concept detection.  相似文献   

3.
Media aesthetic assessment is a key technique in computer vision, which is widely applied in computer game rendering, video/image classification. Low-level and high-level features fusion-based video aesthetic assessment algorithms have achieved impressive performance, which outperform photo- and motion-based algorithms, however, these methods only focus on aesthetic features of single-frame while ignore the inherent relationship between adjacent frames. Therefore, we propose a novel video aesthetic assessment framework, where structural cues among frames are well encoded. Our method consists of two components: aesthetic features extraction and structure correlation construction. More specifically, we incorporate both low-level and high-level visual features to construct aesthetic features, where salient regions are extracted for content understanding. Subsequently, we develop a structure correlation-based algorithm to evaluate the relationship among adjacent frames, where frames with similar structure property should have a strong correlation coefficient. Afterwards, a kernel multi-SVM is trained for video classification and high aesthetic video selection. Comprehensive experiments demonstrate the effectiveness of our method.  相似文献   

4.
In this paper, we propose a novel and robust modus operandi for fast and accurate shot boundary detection where the whole design philosophy is based on human perceptual rules and the well-known “Information Seeking Mantra”. By adopting a top–down approach, redundant video processing is avoided and furthermore elegant shot boundary detection accuracy is obtained under significantly low computational costs. Objects within shots are detected via local image features and used for revealing visual discontinuities among shots. The proposed method can be used for detecting all types of gradual transitions as well as abrupt changes. Another important feature is that the proposed method is fully generic, which can be applied to any video content without requiring any training or tuning in advance. Furthermore, it allows a user interaction to direct the SBD process to the user's “Region of Interest” or to stop it once satisfactory results are obtained. Experimental results demonstrate that the proposed algorithm achieves superior computational times compared to the state-of-art methods without sacrificing performance.  相似文献   

5.
Many video fingerprints have been proposed to handle the video transformations problems when the original contents are copied and redistributed. However, most of them did not take into account flipping and rotation transformations. In this paper, we propose a novel video fingerprint based on region binary patterns, aiming to realize robust and fast video copy detection against video transformations including rotation and flipping. We extract two complementary region binary patterns from several rings in keyframes. These two kinds of binary patterns are converted into a new type of patterns for the proposed video fingerprint which is robust against rotation and flipping. The experimental results demonstrated that the proposed video fingerprint is effective for video copy detection particularly in the case of rotation and flipping. Furthermore, our experimental results proved that the proposed method allows for high storage efficiency and low computation complexity, which is suitable for practical video copy system.  相似文献   

6.
Most semantic video search methods use text-keyword queries or example video clips and images. But such methods have limitations. To address the problems of example-based video search approaches and avoid the use of specialized models, we conduct semantic video searches using a reranking method that automatically reorders the initial text search results based on visual cues and associated context. We developed two general reranking methods that explore the recurrent visual patterns in many contexts, such as the returned images or video shots from initial text queries, and video stories from multiple channels.  相似文献   

7.
通过分析卡通与非卡通视频在视觉上的差异,对视频片断提取了MPEG-7描述子等8组视觉特征来构造卡通视频的特征空间;并将主动相关反馈技术引入到支撑向量机(SVM)算法中,设计了一种基于主动学习的卡通视频检测分类方法。利用大量实际视频片断所做的测试实验结果表明,该文选取的特征对卡通和非卡通视频有较好的区分能力;且与单纯的SVM算法以及传统相关反馈和SVM算法结合的方法相比,该文算法在检测性能上有较大的优势。  相似文献   

8.
Video summarization can facilitate rapid browsing and efficient video indexing in many applications. A good summary should maintain the semantic interestingness and diversity of the original video. While many previous methods extracted key frames based on low-level features, this study proposes Memorability-Entropy-based video summarization. The proposed method focuses on creating semantically interesting summaries based on image memorability. Further, image entropy is introduced to maintain the diversity of the summary. In the proposed framework, perceptual hashing-based mutual information (MI) is used for shot segmentation. Then, we use a large annotated image memorability dataset to fine-tune Hybrid-AlexNet. We predict the memorability score by using the fine-tuned deep network and calculate the entropy value of the images. The frame with the maximum memorability score and entropy value in each shot is selected to constitute the video summary. Finally, our method is evaluated on a benchmark dataset, which comes with five human-created summaries. When evaluating our method, we find it generates high-quality results, comparable to human-created summaries and conventional methods.  相似文献   

9.
AN HMM BASED ANALYSIS FRAMEWORK FOR SEMANTIC VIDEO EVENTS   总被引:1,自引:0,他引:1  
Semantic video analysis plays an important role in the field of machine intelligence and pattern recognition. In this paper, based on the Hidden Markov Model (HMM), a semantic recognition framework on compressed videos is proposed to analyze the video events according to six low-level features. After the detailed analysis of video events, the pattern of global motion and five features in foreground-the principal parts of videos, are employed as the observations of the Hidden Markov Model to classify events in videos. The applications of the proposed framework in some video event detections demonstrate the promising success of the proposed framework on semantic video analysis.  相似文献   

10.
Lane detection is an important task of road environment perception for autonomous driving. Deep learning methods based on semantic segmentation have been successfully applied to lane detection, but they require considerable computational cost for high complexity. The lane detection is treated as a particular semantic segmentation task due to the prior structural information of lane markings which have long continuous shape. Most traditional CNN are designed for the representation learning of semantic information, while this prior structural information is not fully exploited. In this paper, we propose a recurrent slice convolution module (called RSCM) to exploit the prior structural information of lane markings. The proposed RSCM is a special recurrent network structure with several slice convolution units (called SCU). The RSCM could obtain stronger semantic representation through the propagation of the prior structural information in SCU. Furthermore, we design a distance loss in consideration of the prior structure of lane markings. The lane detection network can be trained more steadily via the overall loss function formed by combining segmentation loss with the distance loss. The experimental results show the effectiveness of our method. We achieve excellent computation efficiency while keeping decent detection quality on lane detection benchmarks and the computational cost of our method is much lower than the state-of-the-art methods.  相似文献   

11.
Anomaly detection is a challenging task in the field of intelligent video surveillance. It aims to identify anomalous events by monitoring the video captured by visual sensors. The main difficulty of this task is that the definition of anomalies is ambiguous. In recent years, most anomaly detection methods use a two-stage learning strategy, i.e., feature extraction and model building. In this paper, with the idea of refactoring, we propose an end-to-end anomaly detection framework using cyclic consistent adversarial networks (CycleGAN). Dynamic skeleton features are used as network constraints to alleviate the inaccuracy of feature extraction algorithms of a single generative adversarial network. In the training phase, only normal video frames and the corresponding skeleton features are used to train the generator and discriminator. In the testing phase, anomalous behaviors with high reconstruction errors can be filtered out by manually set thresholds. To the best of our knowledge, this is the first time CycleGAN has been used for video anomaly detection. Experimental results on challenging datasets show that our method can accurately detect anomalous behaviors in videos collected by video surveillance systems and is comparable to the current state-of-the-art methods.  相似文献   

12.
A compressed domain video saliency detection algorithm, which employs global and local spatiotemporal (GLST) features, is proposed in this work. We first conduct partial decoding of a compressed video bitstream to obtain motion vectors and DCT coefficients, from which GLST features are extracted. More specifically, we extract the spatial features of rarity, compactness, and center prior from DC coefficients by investigating the global color distribution in a frame. We also extract the spatial feature of texture contrast from AC coefficients to identify regions, whose local textures are distinct from those of neighboring regions. Moreover, we use the temporal features of motion intensity and motion contrast to detect visually important motions. Then, we generate spatial and temporal saliency maps, respectively, by linearly combining the spatial features and the temporal features. Finally, we fuse the two saliency maps into a spatiotemporal saliency map adaptively by comparing the robustness of the spatial features with that of the temporal features. Experimental results demonstrate that the proposed algorithm provides excellent saliency detection performance, while requiring low complexity and thus performing the detection in real-time.  相似文献   

13.
Content-based image retrieval (CBIR) is a valuable computer vision technique which is increasingly being applied in the medical community for diagnosis support. However, traditional CBIR systems only deliver visual outputs, i.e., images having a similar appearance to the query, which is not directly interpretable by the physicians. Our objective is to provide a system for endomicroscopy video retrieval which delivers both visual and semantic outputs that are consistent with each other. In a previous study, we developed an adapted bag-of-visual-words method for endomicroscopy retrieval, called "Dense-Sift," that computes a visual signature for each video. In this paper, we present a novel approach to complement visual similarity learning with semantic knowledge extraction, in the field of in vivo endomicroscopy. We first leverage a semantic ground truth based on eight binary concepts, in order to transform these visual signatures into semantic signatures that reflect how much the presence of each semantic concept is expressed by the visual words describing the videos. Using cross-validation, we demonstrate that, in terms of semantic detection, our intuitive Fisher-based method transforming visual-word histograms into semantic estimations outperforms support vector machine (SVM) methods with statistical significance. In a second step, we propose to improve retrieval relevance by learning an adjusted similarity distance from a perceived similarity ground truth. As a result, our distance learning method allows to statistically improve the correlation with the perceived similarity. We also demonstrate that, in terms of perceived similarity, the recall performance of the semantic signatures is close to that of visual signatures and significantly better than those of several state-of-the-art CBIR methods. The semantic signatures are thus able to communicate high-level medical knowledge while being consistent with the low-level visual signatures and much shorter than them. In our resulting retrieval system, we decide to use visual signatures for perceived similarity learning and retrieval, and semantic signatures for the output of an additional information, expressed in the endoscopist own language, which provides a relevant semantic translation of the visual retrieval outputs.  相似文献   

14.
Zero-shot learning (ZSL) aims to recognize new objects that have never seen before by associating categories with their semantic knowledge. Existing works mainly focus on learning better visual-semantic mapping to align the visual and semantic space, while the effectiveness of learning discriminative visual features is neglected. In this paper, we propose an object-centric complementary features (OCF) learning model to take full advantage of visual information of objects with the guidance of semantic knowledge. This model can automatically discover the object region and obtain fine-scale samples without any human annotation. Then, the attention mechanism is used in our model to capture long-range visual features corresponding to semantic knowledge like ‘four legs’ and subtle visual differences between similar categories. Finally, we train our model with the guidance of semantic knowledge in an end-to-end manner. Our method is evaluated on three widely used ZSL datasets, CUB, AwA2, and FLO, and the experiment results demonstrate the efficacy of the object-centric complementary features, and our proposed method outperforms the state-of-the-art methods.  相似文献   

15.
Understanding of the scene content of a video sequence is very important for content-based indexing and retrieval of multimedia databases. Research in this area in the past several years has focused on the use of speech recognition and image analysis techniques. As a complimentary effort to the prior work, we have focused on using the associated audio information (mainly the nonspeech portion) for video scene analysis. As an example, we consider the problem of discriminating five types of TV programs, namely commercials, basketball games, football games, news reports, and weather forecasts. A set of low-level audio features are proposed for characterizing semantic contents of short audio clips. The linear separability of different classes under the proposed feature space is examined using a clustering analysis. The effective features are identified by evaluating the intracluster and intercluster scattering matrices of the feature space. Using these features, a neural net classifier was successful in separating the above five types of TV programs. By evaluating the changes between the feature vectors of adjacent clips, we also can identify scene breaks in an audio sequence quite accurately. These results demonstrate the capability of the proposed audio features for characterizing the semantic content of an audio sequence.  相似文献   

16.
As a state-of-the-art video compression technique, H.264/AVC has been deployed in many surveillance cameras to improve the compression efficiency. However, it induces very high coding complexity, and thus high power consumption. In this paper, a difference detection algorithm is proposed to reduce the computational complexity and power consumption in surveillance video compression by automatically distributing the video data to different modules of the video encoder according to their content similarity features. Without any requirement in changing the encoder hardware, the proposed algorithm provides high adaptability to be integrated into the existing H.264 video encoders. An average of over 82% of overall encoding complexity can be reduced regardless of whether or not the H.264 encoder itself has employed fast algorithms. No loss is observed in both subjective and objective video quality.  相似文献   

17.
Key frame extraction based on visual attention model   总被引:2,自引:0,他引:2  
Key frame extraction is an important technique in video summarization, browsing, searching and understanding. In this paper, we propose a novel approach to extract the most attractive key frames by using a saliency-based visual attention model that bridges the gap between semantic interpretation of the video and low-level features. First, dynamic and static conspicuity maps are constructed based on motion, color and texture features. Then, by introducing suppression factor and motion priority schemes, the conspicuity maps are fused into a saliency map that includes only true attention regions to produce attention curve. Finally, after time-constraint cluster algorithm grouping frames with similar content, the frames with maximum saliency value are selected as key-frames. Experimental results demonstrate the effectiveness of our approach for video summarization by retrieving the meaningful key frames.  相似文献   

18.
三维高效视频编码在产生了高效的编码效率的同时也是以大量的计算复杂性作为代价的。因此为了降低计算的复杂度,本文提出了一种基于深度学习网络的边缘检测的3D-HEVC深度图帧内预测快速算法。算法中首先使用整体嵌套边缘检测网络对深度图进行边缘检测,而后使用最大类间方差法将得到的概率边缘图进行二值化处理,得到显著性的边缘区域。最后针对处于不同区域的不同尺寸的预测单元,设计了不同的优化方法,通过跳过深度建模模式和其他某些不必要的模式来降低深度图帧内预测的模式选择的复杂度,最终达到减少深度图的编码复杂度的目的。经过实验仿真的验证,本文提出的算法与原始的编码器算法相比,平均总编码时间可减少35%左右,且深度图编码时间平均大约可减少42%,而合成视点的平均比特率仅增加了0.11%。即本文算法在可忽略的质量损失下,达到降低编码时间的目的。  相似文献   

19.
20.
Robust loop-closure detection is essential for visual SLAM. Traditional methods often focus on the geometric and visual features in most scenes but ignore the semantic information provided by objects. Based on this consideration, we present a strategy that models the visual scene as semantic sub-graph by only preserving the semantic and geometric information from object detection. To align two sub-graphs efficiently, we use a sparse Kuhn–Munkres algorithm to speed up the search for correspondence among nodes. The shape similarity and the Euclidean distance between objects in the 3-D space are leveraged unitedly to measure the image similarity through graph matching. Furthermore, the proposed approach has been analyzed and compared with the state-of-the-art algorithms at several datasets as well as two indoor real scenes, where the results indicate that our semantic graph-based representation without extracting visual features is feasible for loop-closure detection at potential and competitive precision.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号