首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 250 毫秒
1.
Low-level cues in an image not only allow to infer higher-level information like the presence of an object, but the inverse is also true. Category-level object recognition has now reached a level of maturity and accuracy that allows to successfully feed back its output to other processes. This is what we refer to as cognitive feedback. In this paper, we study one particular form of cognitive feedback, where the ability to recognize objects of a given category is exploited to infer different kinds of meta-data annotations for images of previously unseen object instances, in particular information on 3D shape. Meta-data can be discrete, real- or vector-valued. Our approach builds on the Implicit Shape Model of Leibe and Schiele [B. Leibe, A. Leonardis, B. Schiele, Robust object detection with interleaved categorization and segmentation, International Journal of Computer Vision 77 (1–3) (2008) 259–289], and extends it to transfer annotations from training images to test images. We focus on the inference of approximative 3D shape information about objects in a single 2D image. In experiments, we illustrate how our method can infer depth maps, surface normals and part labels for previously unseen object instances.  相似文献   

2.
Detecting objects in complex scenes while recovering the scene layout is a critical functionality in many vision-based applications. In this work, we advocate the importance of geometric contextual reasoning for object recognition. We start from the intuition that objects' location and pose in the 3D space are not arbitrarily distributed but rather constrained by the fact that objects must lie on one or multiple supporting surfaces. We model such supporting surfaces by means of hidden parameters (i.e. not explicitly observed) and formulate the problem of joint scene reconstruction and object recognition as the one of finding the set of parameters that maximizes the joint probability of having a number of detected objects on K supporting planes given the observations. As a key ingredient for solving this optimization problem, we have demonstrated a novel relationship between object location and pose in the image, and the scene layout parameters (i.e. normal of one or more supporting planes in 3D and camera pose, location and focal length). Using a novel probabilistic formulation and the above relationship our method has the unique ability to jointly: i) reduce false alarm and false negative object detection rate; ii) recover object location and supporting planes within the 3D camera reference system; iii) infer camera parameters (view point and the focal length) from just one single uncalibrated image. Quantitative and qualitative experimental evaluation on two datasets (desk-top dataset [1] and LabelMe [2]) demonstrates our theoretical claims.  相似文献   

3.
密集的深度信息在计算机视觉的各种任务中有广泛的应用,然而深度相机在有光泽、透明、较远处的物体表面通常无法探测到深度信息,映射到深度图片上形成了大小不一的孔洞。因此,提出了一种沿法线方向传播的单一深度图像补洞算法。本文方法沿着物体本身的曲面变化进行扩散,把深度图片的补全问题转化成几何完成的问题。首先把2D的深度图像扩展到3D点云,然后3D点云沿着孔洞边界的法线方向向内收缩。收缩的过程中加入类似正态滤波器的约束函数来模拟深度的变化,使填充的点云更加适合整个物体的结构。最后把3D点云重新映射到2D图片上。本文的算法在NYU-Depth-v2数据集上进行测试,实验证明本文算法对孔洞的填充有较好的效果。  相似文献   

4.
A new strategy for automatic object extraction in highly complex scenes is presented in this paper. The method proposed gives a solution for 3D segmentation avoiding most restrictions imposed in other techniques. Thus, our technique is applicable on unstructured 3D information (i.e. cloud of points), with a single view of the scene, scenes consisting of several objects where contact, occlusion and shadows are allowed, objects with uniform intensity/texture and without restrictions of shape, pose or location. In order to have a fast segmentation stopping criteria, the number of objects in the scene is taken as input. The method is based on a new distributed segmentation technique that explores the 3D data by establishing a set of suitable observation directions. For each exploration viewpoint, a strategy [3D data]-[2D projected data]-[2D segmentation]-[3D segmented data] is accomplished. It can be said that this strategy is different from current 3D segmentation strategies. This method has been successfully tested in our lab on a set of real complex scenes. The results of these experiments, conclusions and future improvements are also shown in the paper.  相似文献   

5.
This paper presents the MOUGH (mixture of uniform and Gaussian Hough) Transform for shape-based object detection and tracking. We show that the edgels of a rigid object at a given orientation are approximately distributed according to a Gaussian mixture model (GMMs). A variant of the generalized Hough transform is proposed, voting using GMMs and optimized via Expectation-Maximization, that is capable of searching images for a mildly-deformable shape, based on a training dataset of (possibly noisy) images with only crude estimates of scale and centroid of the object in each image. Further modifications are proposed to optimize the algorithm for tracking. The method is able to locate and track objects reliably even against complex backgrounds such as dense moving foliage, and with a moving camera. Experimental results indicate that the algorithm is superior to previously published variants of the Hough transform and to active shape models in tracking pedestrians from a side view.  相似文献   

6.
This paper presents a novel vision-based global localization that uses hybrid maps of objects and spatial layouts. We model indoor environments with a stereo camera using the following visual cues: local invariant features for object recognition and their 3D positions for object pose estimation. We also use the depth information at the horizontal centerline of image where the optical axis passes through, which is similar to the data from a 2D laser range finder. This allows us to build our topological node that is composed of a horizontal depth map and an object location map. The horizontal depth map describes the explicit spatial layout of each local space and provides metric information to compute the spatial relationships between adjacent spaces, while the object location map contains the pose information of objects found in each local space and the visual features for object recognition. Based on this map representation, we suggest a coarse-to-fine strategy for global localization. The coarse pose is estimated by means of object recognition and SVD-based point cloud fitting, and then is refined by stochastic scan matching. Experimental results show that our approaches can be used for an effective vision-based map representation as well as for global localization methods.  相似文献   

7.
This paper proposes a new method of detecting an object containing multiple colors with non-homogeneous distributions in complex backgrounds and subsequently estimating the depth and shape of the object using a stereo camera. To extract features for object detection, this paper proposes fuzzy color histograms (FCHs) based on the self-splitting clustering (SSC) of the hue-saturation (HS) color space. For each scanning window in a pyramid of scaled images, the FCH is obtained by accumulating the fuzzy degrees of all of the pixels belonging to each cluster. The FCH is fed to a fuzzy classifier to detect an object in the left image captured by the stereo camera. To find the matched object region in the right image, the left and right images are first segmented using the SSC-partitioned HS space. The depth of the object is then found by performing stereo matching on the segmented images. To find the shape of the object, a disparity map is built using the estimated object depth to automatically determine the stereo matching window size and disparity search range. Finally, the shape of the object is segmented from the disparity map. The experimental results of the detection of different objects with depth and shape estimations are used to verify the performance of the proposed method. Comparisons with different detection and disparity map construction methods are performed to demonstrate the advantage of the proposed method.  相似文献   

8.
9.
10.
A general shape context framework is proposed for object/image retrieval in occluded and cluttered environment with hundreds of models as the potential matches of an input. The approach is general since it does not require separation of input objects from complex background. It works by first extracting consistent and structurally unique local neighborhood information from inputs or models, and then voting on the optimal matches. Its performance degrades gracefully with respect to the amount of structural information that is being occluded or lost. The local neighborhood information applicable to the system can be shape, color, texture feature, etc. Currently, we employ shape information only. The mechanism of voting is based on a novel hyper cube based indexing structure, and driven by dynamic programming. The proposed concepts have been tested on database with thousands of images. Very encouraging results have been obtained.  相似文献   

11.
为了解决类别级三维可形变目标姿态估计问题,基于目标的关键点,提出了一种面向类别的三维可形变目标姿态估计方法。该方法设计了一种基于关键点的端到端深度学习框架,框架以PointNet++为后端网络,通过特征提取、部位分割、关键点提取和基于关键点的姿态估计部分实现可形变目标的姿态估计,具有计算精度高、鲁棒性强等优势。同时,基于ANCSH方法设计了适用于K-AOPE网络的关键点标准化分层表示方法,该方法仅需提取目标少量的关键点即可表示类别物体。为了验证方法的有效性,在公共数据集shape2motion上进行测试。实验结果显示,提出的姿态估计方法(以眼镜类别为例)在旋转角上的误差分别为2.3°、3.1°、3.7°,平移误差分别为0.034、0.030、0.046,连接状态误差为2.4°、2.5°,连接参数误差为1.2°、0.9°,0.008、0.010。与ANCSH方法相比,所提方法具有较高的准确性和鲁棒性。  相似文献   

12.
We describe an approach to category-level detection and viewpoint estimation for rigid 3D objects from single 2D images. In contrast to many existing methods, we directly integrate 3D reasoning with an appearance-based voting architecture. Our method relies on a nonparametric representation of a joint distribution of shape and appearance of the object class. Our voting method employs a novel parameterization of joint detection and viewpoint hypothesis space, allowing efficient accumulation of evidence. We combine this with a re-scoring and refinement mechanism, using an ensemble of view-specific support vector machines. We evaluate the performance of our approach in detection and pose estimation of cars on a number of benchmark datasets. Finally we introduce the “Weizmann Cars ViewPoint” (WCVP) dataset, a benchmark for evaluating continuous pose estimation.  相似文献   

13.
基于对象形状的图象查询技术   总被引:13,自引:0,他引:13  
基于图象内容的查询是根据图象实体(或区域)的颜色、形状、纹理、空间关系等特征属性来查询图象.它把图象处理、图象识别、图象数据库3个领域的技术成果结合起来,是一个有前途的发展方向.本文所作的工作是研究和实现根据图象实体(或区域)的形状来查询图象,内容包括:(1) 人对于形状的认知过程;(2) 表征形状的特征量集合;(3) 一种快速有效的图象匹配算法;(4) 原型系统Photo Engine.  相似文献   

14.
This paper presents a robust framework for tracking complex objects in video sequences. Multiple hypothesis tracking (MHT) algorithm reported in (IEEE Trans. Pattern Anal. Mach. Intell. 18(2) (1996)) is modified to accommodate a high level representations (2D edge map, 3D models) of objects for tracking. The framework exploits the advantages of MHT algorithm which is capable of resolving data association/uncertainty and integrates it with object matching techniques to provide a robust behavior while tracking complex objects. To track objects in 2D, a 4D feature is used to represent edge/line segments and are tracked using MHT. In many practical applications 3D models provide more information about the object's pose (i.e., rotation information in the transformation space) which cannot be recovered using 2D edge information. Hence, a 3D model-based object tracking algorithm is also presented. A probabilistic Hausdorff image matching algorithm is incorporated into the framework in order to determine the geometric transformation that best maps the model features onto their corresponding ones in the image plane. 3D model of the object is used to constrain the tracker to operate in a consistent manner. Experimental results on real and synthetic image sequences are presented to demonstrate the efficacy of the proposed framework.  相似文献   

15.
Extended Gaussian image (EGI) and complex EGI (CEGI) have been widely used as the representation of 3D shapes for shape recognition and pose estimation. In this work, we extend the representations and present a new representation named enriched complex extended Gaussian image (EC-EGI). The representation follows the same framework of EGI and CEGI, which is to represent each surface patch of the target 3D shape as a weight at the associated spot on the surface of the Gaussian sphere. However, while the original CEGI uses a single complex number as the weight, the new representation uses three complex numbers, which are related to the centroid position of the surface patch in 3D. With the inclusion of more information in the new representation, not only could object pose be determined more accurately, but also some key ambiguities of shape representation that CEGI and EGI have also removed. The translation parameters in the pose estimation application could also be determined in a simpler and more accurate way. In addition, the Gaussian sphere partition problem of CEGI is no longer present. Experimental results on synthetic and real image data are shown to illustrate the performance of the proposed representation in pose estimation.  相似文献   

16.
Robust Object Detection with Interleaved Categorization and Segmentation   总被引:5,自引:0,他引:5  
This paper presents a novel method for detecting and localizing objects of a visual category in cluttered real-world scenes. Our approach considers object categorization and figure-ground segmentation as two interleaved processes that closely collaborate towards a common goal. As shown in our work, the tight coupling between those two processes allows them to benefit from each other and improve the combined performance. The core part of our approach is a highly flexible learned representation for object shape that can combine the information observed on different training examples in a probabilistic extension of the Generalized Hough Transform. The resulting approach can detect categorical objects in novel images and automatically infer a probabilistic segmentation from the recognition result. This segmentation is then in turn used to again improve recognition by allowing the system to focus its efforts on object pixels and to discard misleading influences from the background. Moreover, the information from where in the image a hypothesis draws its support is employed in an MDL based hypothesis verification stage to resolve ambiguities between overlapping hypotheses and factor out the effects of partial occlusion. An extensive evaluation on several large data sets shows that the proposed system is applicable to a range of different object categories, including both rigid and articulated objects. In addition, its flexible representation allows it to achieve competitive object detection performance already from training sets that are between one and two orders of magnitude smaller than those used in comparable systems.  相似文献   

17.
Estimation of human shape from images has numerous applications ranging from graphics to surveillance. A single image provides insufficient constraints (e.g. clothing), making human shape estimation more challenging. We propose a method to simultaneously estimate a person’s clothed and naked shapes from a single image of that person wearing clothing. The key component of our method is a deformable model of clothed human shape. We learn our deformable model, which spans variations in pose, body, and clothes, from a training dataset. These variations are derived by the non-rigid surface deformation, and encoded in various low-dimension parameters. Our deformable model can be used to produce clothed 3D meshes for different people in different poses, which neither appears in the training dataset. Afterward, given an input image, our deformable model is initialized with a few user-specified 2D joints and contours of the person. We optimize the parameters of the deformable model by pose fitting and body fitting in an iterative way. Then the clothed and naked 3D shapes of the person can be obtained simultaneously. We illustrate our method for texture mapping and animation. The experimental results on real images demonstrate the effectiveness of our method.  相似文献   

18.
在计算机视觉领域,三维网面的简化不仅要求保持物体形状和拓扑关系,还要求保持物体表面法线,纹理,颜色和边缘等物体特征,以使计算机视觉系统能有效地表示,描述,识别和理解物体和场景,为此讨论了一种基于边操作(边收缩,边分裂),并具有颜色或灰度纹理特征保持的三维网面的简化算法,该算法将网面不对称最大距离作为形状改变测度,将邻域内颜色或灰度最大改变量作为纹理改变测试,从而在大量简化模型数据的同时,有效地保持了模型的几何形状,拓扑关系,颜色或灰度特征,以及网面顶点均匀分布。  相似文献   

19.
Arbitrary shape object detection, which is mostly related to computer vision and image processing, deals with detecting objects from an image. In this paper, we consider the problem of detecting arbitrary shape objects as a clustering application by decomposing images into representative data points, and then performing clustering on these points. Our method for arbitrary shape object detection is based on COMUSA which is an efficient algorithm for combining multiple clusterings. Extensive experimental evaluations on real and synthetically generated data sets demonstrate that our method is very accurate and efficient.  相似文献   

20.
无人机可见光遥感影像中地物目标边界清晰度较低,容易导致地物目标与背景之间的区分度降低,进而难以提取地物目标。为此,提出无人机可见光遥感影像地物目标提取方法。从光谱特征、纹理特征和边缘特征三个方面分析无人机可见光遥感影像特征。结合三种影像特征对无人机可见光遥感影像数据集实行增广处理。对完成增广后的数据集定义影像编码标签,以此确定地物目标增强权重,通过参量化处理地物目标光谱特征,计算光谱吸收指数,获取地物目标提取表达式,从而实现无人机可见光遥感影像地物目标提取。实验结果表明,所提方法能够保证地物目标边界的清晰度,具有较强的地物目标提取能力。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号