共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
This paper introduces a novel interactive framework for segmenting images using probabilistic hypergraphs which model the spatial and appearance relations among image pixels. The probabilistic hypergraph provides us a means to pose image segmentation as a machine learning problem. In particular, we assume that a small set of pixels, which are referred to as seed pixels, are labeled as the object and background. The seed pixels are used to estimate the labels of the unlabeled pixels by learning on a hypergraph via minimizing a quadratic smoothness term formed by a hypergraph Laplacian matrix subject to the known label constraints. We derive a natural probabilistic interpretation of this smoothness term, and provide a detailed discussion on the relation of our method to other hypergraph and graph based learning methods. We also present a front-to-end image segmentation system based on the proposed method, which is shown to achieve promising quantitative and qualitative results on the commonly used GrabCut dataset. 相似文献
3.
针对个性化人脸动漫自动生成的应用背景,提出了在单幅人脸图像中提取头发区域的方法.首先用手工标记训练图像中的头发区域,计算头发位置分布图像和头发颜色的概率分布函数;其次对于测试人脸图像结合头发位置和头发颜色的先验分布构建能量函数,采用割图法对该能量函数进行优化得到初始头发区域;最后利用K-均值算法对初始头发区域进行精细分割,并结合图像后处理算法得到最终的头发区域.实验结果表明,文中方法运算复杂度较低,处理一幅图像的平均时间为300ms左右,头发区域的平均检测准确率超过90%,基本满足个性化人脸动漫自动生成的需求. 相似文献
4.
In this research we address the problem of classification and labeling of regions given a single static natural image. Natural
images exhibit strong spatial dependencies, and modeling these dependencies in a principled manner is crucial to achieve good
classification accuracy. In this work, we present Discriminative Random Fields (DRFs) to model spatial interactions in images
in a discriminative framework based on the concept of Conditional Random Fields proposed by lafferty et al.(2001). The DRFs
classify image regions by incorporating neighborhood spatial interactions in the labels as well as the observed data. The
DRF framework offers several advantages over the conventional Markov Random Field (MRF) framework. First, the DRFs allow to
relax the strong assumption of conditional independence of the observed data generally used in the MRF framework for tractability.
This assumption is too restrictive for a large number of applications in computer vision. Second, the DRFs derive their classification
power by exploiting the probabilistic discriminative models instead of the generative models used for modeling observations
in the MRF framework. Third, the interaction in labels in DRFs is based on the idea of pairwise discrimination of the observed
data making it data-adaptive instead of being fixed a priori as in MRFs. Finally, all the parameters in the DRF model are
estimated simultaneously from the training data unlike the MRF framework where the likelihood parameters are usually learned
separately from the field parameters. We present preliminary experiments with man-made structure detection and binary image
restoration tasks, and compare the DRF results with the MRF results.
Sanjiv Kumar is currently with Google Research, Pittsburgh, PA, USA. His contact email is: sanjivk@google.com. 相似文献
5.
Minyoung Kim 《Data mining and knowledge discovery》2014,28(2):378-401
Predicting labels of structured data such as sequences or images is a very important problem in statistical machine learning and data mining. The conditional random field (CRF) is perhaps one of the most successful approaches for structured label prediction via conditional probabilistic modeling. In such models, it is traditionally assumed that each label is a random variable from a nominal category set (e.g., class categories) where all categories are symmetric and unrelated from one another. In this paper we consider a different situation of ordinal-valued labels where each label category bears a particular meaning of preference or order. This setup fits many interesting problems/datasets for which one is interested in predicting labels that represent certain degrees of intensity or relevance. We propose a fairly intuitive and principled CRF-like model that can effectively deal with the ordinal-scale labels within an underlying correlation structure. Unlike standard log-linear CRFs, learning the proposed model incurs non-convex optimization. However, the new model can be learned accurately using efficient gradient search. We demonstrate the improved prediction performance achieved by the proposed model on several intriguing sequence/image label prediction tasks. 相似文献
6.
Road Detection and Tracking from Aerial Desert Imagery 总被引:1,自引:0,他引:1
We present a fast, robust road detection and tracking algorithm for aerial images taken from an Unmanned Aerial Vehicle. A histogram-based adaptive threshold algorithm is used to detect possible road regions in an image. A probabilistic hough transform based line segment detection combined with a clustering method is implemented to further extract the road. The proposed algorithm has been extensively tested on desert images obtained using an Unmanned Aerial Vehicle. Our results indicate that we are able to successfully and accurately detect roads in 96% of the images. We experimentally validated our algorithm on over a thousand aerial images obtained using our UAV. These images consist of straight and curved roads in various conditions with significant changes in lighting and intensity. We have also developed a road-tracking algorithm that searches a local rectangular area in successive images. Initial results are presented that shows the efficacy and the robustness of this algorithm. Using this road tracking algorithm we are able to further improve the road detection and achieve a 98% accuracy. 相似文献
7.
8.
9.
Alexander Thomas Vittorio Ferrari Bastian Leibe Tinne Tuytelaars Luc Van Gool 《Computer Vision and Image Understanding》2009,113(12):1222-1234
Low-level cues in an image not only allow to infer higher-level information like the presence of an object, but the inverse is also true. Category-level object recognition has now reached a level of maturity and accuracy that allows to successfully feed back its output to other processes. This is what we refer to as cognitive feedback. In this paper, we study one particular form of cognitive feedback, where the ability to recognize objects of a given category is exploited to infer different kinds of meta-data annotations for images of previously unseen object instances, in particular information on 3D shape. Meta-data can be discrete, real- or vector-valued. Our approach builds on the Implicit Shape Model of Leibe and Schiele [B. Leibe, A. Leonardis, B. Schiele, Robust object detection with interleaved categorization and segmentation, International Journal of Computer Vision 77 (1–3) (2008) 259–289], and extends it to transfer annotations from training images to test images. We focus on the inference of approximative 3D shape information about objects in a single 2D image. In experiments, we illustrate how our method can infer depth maps, surface normals and part labels for previously unseen object instances. 相似文献
10.
Automatic content-based image categorization is a challenging research topic and has many practical applications. Images are
usually represented as bags of feature vectors, and the categorization problem is studied in the Multiple-Instance Learning
(MIL) framework. In this paper, we propose a novel learning technique which transforms the MIL problem into a standard supervised
learning problem by defining a feature vector for each image bag. Specifically, the feature vectors of the image bags are
grouped into clusters and each cluster is given a label. Using these labels, each instance of an image bag can be replaced
by a corresponding label to obtain a bag of cluster labels. Data mining can then be employed to uncover common label patterns
for each image category. These label patterns are converted into bags of feature vectors; and they are used to transform each
image bag in the data set into a feature vector such that each vector element is the distance of the image bag to a distinct
pattern bag. With this new image representation, standard supervised learning algorithms can be applied to classify the images
into the pre-defined categories. Our experimental results demonstrate the superiority of the proposed technique in categorization
accuracy as compared to state-of-the-art methods. 相似文献
11.
Visual (image and video) database systems require efficient indexing to enable fast access to the images in a database. In
addition, the large memory capacity and channel bandwidth requirements for the storage and transmission of visual data necessitate
the use of compression techniques. We note that image/video indexing and compression are typically pursued independently.
This reduces the storage efficiency and may degrade the system performance. In this paper, we present novel algorithms based
on vector quantization (VQ) for indexing of compressed images and video. To start with, the images are compressed using VQ.
In the first technique, for each codeword in the codebook, a histogram is generated and stored along with the codeword. We
note that the superposition of the histograms of the codewords, which are used to represent an image, is a close approximation
of the histogram of the image. This histogram is used as an index to store and retrieve the image. In the second technique,
the histogram of the labels of an image is used as an index to access the image. We also propose an algorithm for indexing
compressed video sequences. Here, each frame is encoded in the intraframe mode using VQ. The labels are used for the segmentation
of a video sequence into shots, and for indexing the representative frame of each shot. The proposed techniques not only provide
fast access to stored visual data, but also combine compression and indexing. The average retrieval rates are 95% and 94%
at compression ratios of 16:1 and 64:1, respectively. The corresponding cut detection rates are 97% and 90%, respectively. 相似文献
12.
Paul C. Conilione Dianhui Wang 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2011,15(6):1231-1245
Content-based image retrieval (CBIR) systems traditionally find images within a database that are similar to query image using
low level features, such as colour histograms. However, this requires a user to provide an image to the system. It is easier
for a user to query the CBIR system using search terms which requires the image content to be described by semantic labels.
However, finding a relationship between the image features and semantic labels is a challenging problem to solve. This paper
aims to discover semantic labels for facial features for use in a face image retrieval system. Face image retrieval traditionally
uses global face-image information to determine similarity between images. However little has been done in the field of face
image retrieval to use local face-features and semantic labelling. Our work aims to develop a clustering method for the discovery
of semantic labels of face-features. We also present a machine learning based face-feature localization mechanism which we
show has promise in providing accurate localization. 相似文献
13.
目的 针对细粒度图像分类中的背景干扰问题,提出一种利用自上而下注意图分割的分类模型。方法 首先,利用卷积神经网络对细粒度图像库进行初分类,得到基本网络模型。再对网络模型进行可视化分析,发现仅有部分图像区域对目标类别有贡献,利用学习好的基本网络计算图像像素对相关类别的空间支持度,生成自上而下注意图,检测图像中的关键区域。再用注意图初始化GraphCut算法,分割出关键的目标区域,从而提高图像的判别性。最后,对分割图像提取CNN特征实现细粒度分类。结果 该模型仅使用图像的类别标注信息,在公开的细粒度图像库Cars196和Aircrafts100上进行实验验证,最后得到的平均分类正确率分别为86.74%和84.70%。这一结果表明,在GoogLeNet模型基础上引入注意信息能够进一步提高细粒度图像分类的正确率。结论 基于自上而下注意图的语义分割策略,提高了细粒度图像的分类性能。由于不需要目标窗口和部位的标注信息,所以该模型具有通用性和鲁棒性,适用于显著性目标检测、前景分割和细粒度图像分类应用。 相似文献
14.
We address the problem of detecting irregularities in visual data, e.g., detecting suspicious behaviors in video sequences,
or identifying salient patterns in images. The term “irregular” depends on the context in which the “regular” or “valid” are
defined. Yet, it is not realistic to expect explicit definition of all possible valid configurations for a given context.
We pose the problem of determining the validity of visual data as a process of constructing a puzzle: We try to compose a
new observed image region or a new video segment (“the query”) using chunks of data (“pieces of puzzle”) extracted from previous
visual examples (“the database”). Regions in the observed data which can be composed using large contiguous chunks of data
from the database are considered very likely, whereas regions in the observed data which cannot be composed from the database
(or can be composed, but only using small fragmented pieces) are regarded as unlikely/suspicious. The problem is posed as
an inference process in a probabilistic graphical model. We show applications of this approach to identifying saliency in
images and video, for detecting suspicious behaviors and for automatic visual inspection for quality assurance.
Patent Pending 相似文献
15.
There is an increasing need for automatic image annotation tools to enable effective image searching in digital libraries. In this paper, we present a novel probabilistic model for image annotation based on content-based image retrieval techniques and statistical analysis. One key difficulty in applying statistical methods to the annotation of images is that the number of manually labeled images used to train the methods is normally insufficient. Numerous keywords cannot be correctly assigned to appropriate images due to lacking or missing information in the labeled image databases. To deal with this challenging problem, we also propose an enhanced model in which the annotated keywords of a new image are defined in terms of their similarity at different semantic levels, including the image level, keyword level, and concept level. To avoid missing some relevant keywords, the model labels the keywords with the same concepts as the new image. Our experimental results show that the proposed models are effective for annotating images that have different qualities of training data. 相似文献
16.
In maintenance of concrete structures, crack detection is important for the inspection and diagnosis of concrete structures.
However, it is difficult to detect cracks automatically. In this paper, we propose a robust automatic crack-detection method
from noisy concrete surface images. The proposed method includes two preprocessing steps and two detection steps. The first
preprocessing step is a subtraction process using the median filter to remove slight variations like shadings from concrete
surface images; only an original image is used in the preprocessing. In the second preprocessing step, a multi-scale line
filter with the Hessian matrix is used both to emphasize cracks against blebs or stains and to adapt the width variation of
cracks. After the preprocessing, probabilistic relaxation is used to detect cracks coarsely and to prevent noises. It is unnecessary
to optimize any parameters in probabilistic relaxation. Finally, using the results from the relaxation process, a locally
adaptive thresholding is performed to detect cracks more finely. We evaluate robustness and accuracy of the proposed method
quantitatively using 60 actual noisy concrete surface images. 相似文献
17.
Cabezas M Oliver A Lladó X Freixenet J Cuadra MB 《Computer methods and programs in biomedicine》2011,104(3):e158-e177
Normal and abnormal brains can be segmented by registering the target image with an atlas. Here, an atlas is defined as the combination of an intensity image (template) and its segmented image (the atlas labels). After registering the atlas template and the target image, the atlas labels are propagated to the target image. We define this process as atlas-based segmentation. In recent years, researchers have investigated registration algorithms to match atlases to query subjects and also strategies for atlas construction. In this paper we present a review of the automated approaches for atlas-based segmentation of magnetic resonance brain images. We aim to point out the strengths and weaknesses of atlas-based methods and suggest new research directions. We use two different criteria to present the methods. First, we refer to the algorithms according to their atlas-based strategy: label propagation, multi-atlas methods, and probabilistic techniques. Subsequently, we classify the methods according to their medical target: the brain and its internal structures, tissue segmentation in healthy subjects, tissue segmentation in fetus, neonates and elderly subjects, and segmentation of damaged brains. A quantitative comparison of the results reported in the literature is also presented. 相似文献
18.
Kobus Barnard Quanfu Fan Ranjini Swaminathan Anthony Hoogs Roderic Collins Pascale Rondot John Kaufhold 《International Journal of Computer Vision》2008,77(1-3):199-217
We present a new data set of 1014 images with manual segmentations and semantic labels for each segment, together with a methodology
for using this kind of data for recognition evaluation. The images and segmentations are from the UCB segmentation benchmark
database (Martin et al., in International conference on computer vision, vol. II, pp. 416–421, 2001). The database is extended by manually labeling each segment with its most specific semantic concept in WordNet (Miller et al.,
in Int. J. Lexicogr. 3(4):235–244, 1990). The evaluation methodology establishes protocols for mapping algorithm specific localization (e.g., segmentations) to our
data, handling synonyms, scoring matches at different levels of specificity, dealing with vocabularies with sense ambiguity
(the usual case), and handling ground truth regions with multiple labels. Given these protocols, we develop two evaluation
approaches. The first measures the range of semantics that an algorithm can recognize, and the second measures the frequency
that an algorithm recognizes semantics correctly. The data, the image labeling tool, and programs implementing our evaluation
strategy are all available on-line (kobus.ca//research/data/IJCV_2007).
We apply this infrastructure to evaluate four algorithms which learn to label image regions from weakly labeled data. The
algorithms tested include two variants of multiple instance learning (MIL), and two generative multi-modal mixture models.
These experiments are on a significantly larger scale than previously reported, especially in the case of MIL methods. More
specifically, we used training data sets up to 37,000 images and training vocabularies of up to 650 words.
We found that one of the mixture models performed best on image annotation and the frequency correct measure, and that variants
of MIL gave the best semantic range performance. We were able to substantively improve the performance of MIL methods on the
other tasks (image annotation and frequency correct region labeling) by providing an appropriate prior. 相似文献
19.
This paper introduces the novel volumetric methodology “appearance-cloning” as a viable solution for achieving a more improved
photo-consistent scene recovery, including a greatly enhanced geometric recovery performance, from a set of photographs taken
at arbitrarily distributed multiple camera viewpoints. We do so while solving many of the problems associated with previous
stereo-based and volumetric methodologies. We redesign the photo-consistency decision problem of individual voxel in volumetric
space as the photo-consistent shape search problem in image space, by generalizing the concept of the point correspondence
search between two images in stereo-based approach, within a volumetric framework.
In detail, we introduce a self-constrained greedy-style optimization methodology, which iteratively searches a more photo-consistent
shape based on the probabilistic shape photo-consistency measure, by using the probabilistic competition between candidate
shapes. Our new measure is designed to bring back the probabilistic photo-consistency of a shape by comparing the appearances
captured from multiple cameras with those rendered from that shape using the per-pixel Maxwell model in image space.
Through various scene recoveries experiments including specular and dynamic scenes, we demonstrate that if sufficient appearances
are given enough to reflect scene characteristics, our appearance-cloning approach can successfully recover both the geometry
and photometry information of a scene without any kind of scene-dependent algorithm tuning. 相似文献
20.
目的 深度语义分割网络的优良性能高度依赖于大规模和高质量的像素级标签数据。在现实任务中,收集大规模、高质量的像素级水体标签数据将耗费大量人力物力。为了减少标注工作量,本文提出使用已有的公开水体覆盖产品来创建遥感影像对应的水体标签,然而已有的公开水体覆盖产品的空间分辨率低且存在一定错误。对此,提出采用弱监督深度学习方法训练深度语义分割网络。方法 在训练阶段,将原始数据集划分为多个互不重叠的子数据集,分别训练深度语义分割网络,并将训练得到的多个深度语义分割网络协同更新标签,然后利用更新后的标签重复前述过程,重新训练深度语义分割网络,多次迭代后可以获得好的深度语义分割网络。在测试阶段,多源遥感影像经多个代表不同视角的深度语义分割网络分别预测,然后投票产生最后的水体检测结果。结果 为了验证本文方法的有效性,基于原始多源遥感影像数据创建了一个面向水体检测的多源遥感影像数据集,并与基于传统的水体指数阈值分割法和基于低质量水体标签直接学习的深度语义分割网络进行比较,交并比(intersection-over-union,IoU)分别提升了5.5%和7.2%。结论 实验结果表明,本文方法具有收敛性,并且光学影像和合成孔径雷达(synthetic aperture radar,SAR)影像的融合有助于提高水体检测性能。在使用分辨率低、噪声多的水体标签进行训练的情况下,训练所得多视角模型的水体检测精度明显优于基于传统的水体指数阈值分割法和基于低质量水体标签直接学习的深度语义分割网络。 相似文献