首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
巩萍  程玉虎  王雪松 《电子学报》2015,43(12):2476-2483
现有肺结节良恶性计算机辅助诊断的依据通常为肺部CT图像的底层特征,而临床医生的诊断依据为高级语义特征.为克服这种图像底层特征和高级语义特征之间的不一致性,提出一种基于语义属性的肺结节良恶性判别方法.首先,利用阈值概率图方法提取肺结节图像;其次,一方面提取肺结节图像的形状、灰度、纹理、大小和位置等底层特征,组成样本特征集.另一方面,根据专家对肺结节属性的标注,提取结节属性集;然后,根据特征集和属性集建立属性预测模型,实现两者之间的映射;最后,利用预测的属性进行肺结节的良恶性分类.LIDC数据库上的实验结果表明所提方法具有较高的分类精度和AUC值.  相似文献   

2.
The exploitation of video data requires methods able to extract high-level information from the images. Video summarization, video retrieval, or video surveillance are examples of applications. In this paper, we tackle the challenging problem of recognizing dynamic video contents from low-level motion features. We adopt a statistical approach involving modeling, (supervised) learning, and classification issues. Because of the diversity of video content (even for a given class of events), we have to design appropriate models of visual motion and learn them from videos. We have defined original parsimonious global probabilistic motion models, both for the dominant image motion (assumed to be due to the camera motion) and the residual image motion (related to scene motion). Motion measurements include affine motion models to capture the camera motion and low-level local motion features to account for scene motion. Motion learning and recognition are solved using maximum likelihood criteria. To validate the interest of the proposed motion modeling and recognition framework, we report dynamic content recognition results on sports videos.  相似文献   

3.
Media aesthetic assessment is a key technique in computer vision, which is widely applied in computer game rendering, video/image classification. Low-level and high-level features fusion-based video aesthetic assessment algorithms have achieved impressive performance, which outperform photo- and motion-based algorithms, however, these methods only focus on aesthetic features of single-frame while ignore the inherent relationship between adjacent frames. Therefore, we propose a novel video aesthetic assessment framework, where structural cues among frames are well encoded. Our method consists of two components: aesthetic features extraction and structure correlation construction. More specifically, we incorporate both low-level and high-level visual features to construct aesthetic features, where salient regions are extracted for content understanding. Subsequently, we develop a structure correlation-based algorithm to evaluate the relationship among adjacent frames, where frames with similar structure property should have a strong correlation coefficient. Afterwards, a kernel multi-SVM is trained for video classification and high aesthetic video selection. Comprehensive experiments demonstrate the effectiveness of our method.  相似文献   

4.
Semantic video analysis is a key issue in digital video applications, including video retrieval, annotation, and management. Most existing work on semantic video analysis is mainly focused on event detection for specific video genres, while the genre classification is treated as another independent issue. In this paper, we present a semantic framework for weakly supervised video genre classification and event analysis jointly by using probabilistic models for MPEG video streams. Several computable semantic features that can accurately reflect the event attributes are derived. Based on an intensive analysis on the connection between video genres and the contextual relationship among events, as well as the statistical characteristics of dominant event, a hidden Markov model (HMM) and naive Bayesian classifier (NBC) based analysis algorithm is proposed for video genre classification. Another Gaussian mixture model (GMM) is built to detect the contained events using the same semantic features, whilst an event adjustment strategy is proposed according to an analysis on the GMM structure and pre-definition of video events. Subsequently, a special event is recognized based on the detected events by another HMM. The simulative experiments on video genre classification and event analysis using a large number of video data sets demonstrate the promising performance of the proposed framework for semantic video analysis.  相似文献   

5.
For a variety of applications such as video surveillance and event annotation, the spatial–temporal boundaries between video objects are required for annotating visual content with high-level semantics. In this paper, we define spatial–temporal sampling as a unified process of extracting video objects and computing their spatial–temporal boundaries using a learnt video object model. We first provide a computational approach for learning an optimal key-object codebook sequence from a set of training video clips to characterize the semantics of the detected video objects. Then, dynamic programming with the learnt codebook sequence is used to locate the video objects with spatial–temporal boundaries in a test video clip. To verify the performance of the proposed method, a human action detection and recognition system is constructed. Experimental results show that the proposed method gives good performance on several publicly available datasets in terms of detection accuracy and recognition rate.  相似文献   

6.
7.
AN HMM BASED ANALYSIS FRAMEWORK FOR SEMANTIC VIDEO EVENTS   总被引:1,自引:0,他引:1  
Semantic video analysis plays an important role in the field of machine intelligence and pattern recognition. In this paper, based on the Hidden Markov Model (HMM), a semantic recognition framework on compressed videos is proposed to analyze the video events according to six low-level features. After the detailed analysis of video events, the pattern of global motion and five features in foreground-the principal parts of videos, are employed as the observations of the Hidden Markov Model to classify events in videos. The applications of the proposed framework in some video event detections demonstrate the promising success of the proposed framework on semantic video analysis.  相似文献   

8.
张良  周长胜 《电子科技》2011,24(10):111-114
分析了视频数据与文本数据的差异,以及视频数据在视频分析检索方面存在的问题。从视频内容分析领域的研究热点出发,分别对视频语义库、与视频分析相关的视频低层特征、视频对象划分与识别、视频信息描述与编码等方面的技术进行了分析和对比。并提出了一个视频语义分析的框架和分析流程。  相似文献   

9.
Semantic event detection via multimodal data mining   总被引:1,自引:0,他引:1  
This paper presents a novel framework for video event detection. The core of the framework is an advanced temporal analysis and multimodal data mining method that consists of three major components: low-level feature extraction, temporal pattern analysis, and multimodal data mining. One of the unique characteristics of this framework is that it offers strong generality and extensibility with the capability of exploring representative event patterns with little human interference. The framework is presented with its application to the detection of the soccer goal events over a large collection of soccer video data with various production styles  相似文献   

10.
11.
The detection of near-duplicate video clips (NDVCs) is an area of current research interest and intense development. Most NDVC detection methods represent video clips with a unique set of low-level visual features, typically describing color or texture information. However, low-level visual features are sensitive to transformations of the video content. Given the observation that transformations tend to preserve the semantic information conveyed by the video content, we propose a novel approach for identifying NDVCs, making use of both low-level visual features (this is, MPEG-7 visual features) and high-level semantic features (this is, 32 semantic concepts detected using trained classifiers). Experimental results obtained for the publicly available MUSCLE-VCD-2007 and TRECVID 2008 video sets show that bimodal fusion of visual and semantic features facilitates robust NDVC detection. In particular, the proposed method is able to identify NDVCs with a low missed detection rate (3% on average) and a low false alarm rate (2% on average). In addition, the combined use of visual and semantic features outperforms the separate use of either of them in terms of NDVC detection effectiveness. Further, we demonstrate that the effectiveness of the proposed method is on par with or better than the effectiveness of three state-of-the-art NDVC detection methods either making use of temporal ordinal measurement, features computed using the Scale-Invariant Feature Transform (SIFT), or bag-of-visual-words (BoVW). We also show that the influence of the effectiveness of semantic concept detection on the effectiveness of NDVC detection is limited, as long as the mean average precision (MAP) of the semantic concept detectors used is higher than 0.3. Finally, we illustrate that the computational complexity of our NDVC detection method is competitive with the computational complexity of the three aforementioned NDVC detection methods.  相似文献   

12.
在传统的基于内容视频检索的方法中,由于视频的领域较宽,视频的低级视觉特征和高级概念之间存在着较大的语义鸿沟,常导致检索效果不佳.本文认为更有现实意义的做法是,以含有比镜头更多语义信息的事件相关故事单元为检索单位,通过提取事件相关媒体中的文本信息并利用机器学习方法自动建立事件类的模型,从而提供概念化的故事单元查询方式.本文提出了组合特征选择方法和一种二阶段修剪KNN:TSP-KNN,组合特征选择方法相对于MI方法更适合事件相关故事单元的检索.二阶段修剪KNN先对训练集进行修剪,然后再用KNN训练得到分类器,该方法解决了样本混叠以及多中心分布问题.实验结果表明所提出的方法是有效的,明显地提高了事件相关故事单元的检索性能.  相似文献   

13.
The information processing of sports video yields valuable semantics for content delivery over narrowband networks. Traditional image/video processing is formulated in terms of low-level features describing image/video structure and intensity, while the high-level knowledge such as common sense and human perceptual knowledge are encoded in abstract and nongeometric representations. The management of semantic information in video becomes more and more difficult because of the large difference in representations, levels of knowledge, and abstract episodes. This paper proposes a semantic highlight detection scheme using a Multi-level Semantic Network (MSN) for baseball video interpretation. The probabilistic structure can be applied for highlight detection and shot classification. Satisfactory results will be shown to illustrate better performance compared with the traditional ones.  相似文献   

14.
Image quality assessment is an indispensable in computer vision applications, such as image classification, image parsing. With the development of Internet, image data acquisition becomes more conveniently. However, image distortion is inevitable due to imperfect image acquisition system, image transmission medium and image recording equipment. Traditional image quality assessment algorithms only focus on low-level visual features such as color or texture, which could not encode high-level features effectively. CNN-based methods have shown satisfactory results in image quality assessment. However, existing methods have problems such as incomplete feature extraction, partial image block distortion, and inability to determine scores. So in this paper, we propose a novel framework for image quality assessment based on deep learning. We incorporate both low-level visual features and high-level semantic features to better describe images. And image quality is analyzed in a parallel processing mode. Experiments are conducted on LIVE and TID2008 datasets demonstrate the proposed model can predict the quality of the distorted image well, and both SROCC and PLCC can reach 0.92 or higher.  相似文献   

15.
针对无人机视角下车辆由于尺度小分辨率低等问题而难以精确分类定位,本文设计了一个轻量级特征提取网络用于提供车辆的多尺度中低层信息,并分别将其融入到主干神经网络中,实现中低层特征信息的传递;同时利用主干网络提取有利于车辆与背景或其他类别分类的高级语义信息,然后将深层高级语义特征与浅层特征进行融合实现高级语义信息的传递,因此类似引入双向网络能够有效地传递不同层次的信息,增强车辆的特征信息表示。此外,采用多路空洞卷积进行特征提取,使得中低层信息更加丰富多样性;并设计了一种灵活有效的融合模块,能够将中低层信息较好地融入到主干网络中增强目标车辆的判别性特征。实验结果表明,该算法能够在无人机数据集上取得很好的检测效果,同样满足实时的应用需求。   相似文献   

16.
Segmentation of moving objects in video sequences is a basic task in many applications. However, it is still challenging due to the semantic gap between the low-level visual features and the high-level human interpretation of video semantics. Compared with segmentation of fast moving objects, accurate and perceptually consistent segmentation of slowly moving objects is more difficult. In this paper, a novel hybrid algorithm is proposed for segmentation of slowly moving objects in video sequence aiming to acquire perceptually consistent results. Firstly, the temporal information of the differences among multiple frames is employed to detect initial moving regions. Then, the Gaussian mixture model (GMM) is employed and an improved expectation maximization (EM) algorithm is introduced to segment a spatial image into homogeneous regions. Finally, the results of motion detection and spatial segmentation are fused to extract final moving objects. Experiments are conducted and provide convincing results.  相似文献   

17.
On the social Web, the amount of video content either originated from wireless devices or previously received from media servers has increased enormously in the recent years. The astounding growth of Web videos has stimulated researchers to propose new strategies to organize them into their respective categories. Because of complex ontology and large variation in content and quality of Web videos, it is difficult to get sufficient, precisely labeled training data, which causes hindrance in automatic video classification. In this paper, we propose a novel content‐ and context‐based Web video classification framework by rendering external support through category discriminative terms (CDTs) and semantic relatedness measure (SRM). Mainly, a three‐step framework is proposed. Firstly, content‐based video classification is proposed, where twofold use of high‐level concept detectors is leveraged to classify Web videos. Initially, category classifiers induced from VIREO‐374 detectors are trained to classify Web videos, and then concept detectors with high confidence for each video are mapped to CDT through SRM‐assisted semantic content fusion function to further boost the category classifiers, which intuitively provide a more robust measure for Web video classification. Secondly, a context‐based video classification is proposed, where twofold use of contextual information is also harnessed. Initially, cosine similarity and then semantic similarity are measured between text features of each video and CDT through vector space model (VSM)‐ and SRM‐assisted semantic context fusion function, respectively. Finally, classification results from content and context are fused to compensate for the shortcomings of each other, which enhance the video classification performance. Experiments on large‐scale video dataset validate the effectiveness of the proposed solution.  相似文献   

18.
行人再识别问题中,包含语义信息的中层特征能够提供更强的判别力.由于中层特征也采用局部匹配方式,与底层特征一样存在由于不同行人部分表观区域比较相似而产生误匹配问题.考虑到行人几乎都处于站立姿态,同一行人在垂直方向上的表观序列比不同行人的更相似,提出了在中层特征的基础上引入行人垂直全局表观约束,并融合底层稠密块匹配的识别方法.实验结果表明,算法在最具挑战的公用VIPeR数据库和CUHK01数据库上,均取得了比现有方法更高的命中率.  相似文献   

19.
Automatic soccer video analysis and summarization   总被引:29,自引:0,他引:29  
We propose a fully automatic and computationally efficient framework for analysis and summarization of soccer videos using cinematic and object-based features. The proposed framework includes some novel low-level processing algorithms, such as dominant color region detection, robust shot boundary detection, and shot classification, as well as some higher-level algorithms for goal detection, referee detection, and penalty-box detection. The system can output three types of summaries: i) all slow-motion segments in a game; ii) all goals in a game; iii) slow-motion segments classified according to object-based features. The first two types of summaries are based on cinematic features only for speedy processing, while the summaries of the last type contain higher-level semantics. The proposed framework is efficient, effective, and robust. It is efficient in the sense that there is no need to compute object-based features when cinematic features are sufficient for the detection of certain events, e.g., goals in soccer. It is effective in the sense that the framework can also employ object-based features when needed to increase accuracy (at the expense of more computation). The efficiency, effectiveness, and robustness of the proposed framework are demonstrated over a large data set, consisting of more than 13 hours of soccer video, captured in different countries and under different conditions.  相似文献   

20.
A challenging problem in image content extraction and classification is building a system that automatically learns high-level semantic interpretations of images. We describe a Bayesian framework for a visual grammar that aims to reduce the gap between low-level features and high-level user semantics. Our approach includes modeling image pixels using automatic fusion of their spectral, textural, and other ancillary attributes; segmentation of image regions using an iterative split-and-merge algorithm; and representing scenes by decomposing them into prototype regions and modeling the interactions between these regions in terms of their spatial relationships. Naive Bayes classifiers are used in the learning of models for region segmentation and classification using positive and negative examples for user-defined semantic land cover labels. The system also automatically learns representative region groups that can distinguish different scenes and builds visual grammar models. Experiments using Landsat scenes show that the visual grammar enables creation of high-level classes that cannot be modeled by individual pixels or regions. Furthermore, learning of the classifiers requires only a few training examples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号