Perceiving surfaces in a manner that accords with their physical properties is essential for successful behaviour. Since, however, a given retinal image can have been generated by an infinite variety of natural surfaces with different geometrical and/or physical qualities, the corresponding percepts cannot be determined by the stimulus per se. Rather, resolution of this quandary requires a strategy of vision that incorporates the statistical relationship of the information in retinal images to its sources in representative environments. To examine this probabilistic relationship with respect to the features of object surfaces, we analysed a database of range images in which the distances of all the objects in a series of natural scenes were measured with respect to the image plane by a laser range scanner. By taking any particular scene obtained in this way to be made up of a set of concatenated surface patches, we were able to explore the statistics of scene roughness, size-distance relationships, surface orientation and local curvature, as well as the independent components of natural surfaces. The relevance of these statistics to both perception and the neuronal organization of the underlying visual circuitry is discussed.  相似文献   

The neural mechanisms of early vision can be explained in terms of an information-theoretic optimization of the neural processing with respect to the statistical properties of the natural environment. Recent applications of this approach have been successful in the prediction of the linear filtering properties of ganglion cells and simple cells, but the relations between the environmental statistics and cortical nonlinearities, like those of end-stopped or complex cells, are not yet fully understood. Here we present extensions of our previous investigations of the exploitation of higher-order statistics by nonlinear neurons. We use multivariate wavelet statistics to demonstrate that a strictly linear processing would inevitably leave substantial statistical dependencies between the outputs of the units. We then consider how the basic nonlinearities of cortical neurons--gain control and ON/OFF half-wave rectification--can exploit these higher-order statistical dependencies. We first show that gain control provides an adaptation to the polar separability of the multivariate probability density function (PDF), and, together with an output nonlinearity, enables an overcomplete sparse coding. We then consider how the remaining higher-order dependencies between different units can be exploited by a combination of basic ON/OFF point nonlinearities and subsequent weighted linear combinations. We consider two statistical optimization schemes for the computation of the optimal weights: principal component analysis (PCA) and independent component analysis (ICA). Since the intermediate nonlinearities transform some of the higher-order dependencies into second-order dependencies even the basic PCA approach is able to exploit part of the redundancies. ICA ignores this second-order structure, but can exploit higher-order dependencies. Both schemes yield a variety of nonlinear units which comprise the typical nonlinear processing properties, such as end-stopping, side-stopping, complex-cell properties and extra-classical receptive field properties, but the 'ideal' complex cells seem only to occur with PCA. Thus, a combination of ON/OFF nonlinearities with an integrated PCA-ICA strategy seems necessary to exploit the statistical properties of natural images.  相似文献   

针对高分辨率遥感影像场景的分类,受人类视觉系统从场景中提取汇总统计信息用于场景感知的启发,提出场景汇总统计特征提取方法。该方法提取场景的平均方向信息和视觉杂乱度,利用Gabor滤波器统计场景的平均方向信息,并基于视觉拥堵进行场景的杂乱度度量,然后将两者组合在一起,形成基于汇总统计特征的复杂场景描述。在21类遥感数据集上的实验表明,当训练样本和测试样本各为50幅时,该方法的分类精度比Gist方法高6.5%,比词包模型(BOW)方法高3.22%,且计算简单,同时与Gist相比,不需要人工干预。  相似文献   

A new simple and computationally efficient approach to image segmentation via recursive region splitting and merging is presented. Unlike other techniques the criterion for splitting is based on a generalization of a two-class gradient relaxation method and merging uses a test for mean gray level equivalency for adjacent regions. The technique is illustrated by providing results for both synthetic and natural scenes.  相似文献   

针对同一传感器从不同视角拍摄图像的匹配,提出一种Harris-SIFT算法。首先对图像进行多尺度的预处理,使用动态阈值的Harris算子提取特征点,随后生成128维的SIFT特征向量并对特征向量进行相似度检测,最后建立匹配对应关系,实现特征向量的一一匹配。实验结果表明,该算法可有效适用于复杂场景下景物图像的匹配。  相似文献   

自然拍摄的人体照片由于背景图案较为复杂,采用传统基于图片色彩空间或能量 梯度的图像处理方法难以准确地识别人体的轮廓。采用神经网络的方法,可以提高识别的精度。 但是,一般的神经网络方法由于计算量与参数规模较大,难以在移动终端部署。因此,提出了 一种轻量级的神经网络策略以提取人体轮廓。该网络采用 MobileNet V2 与 U-Net 框架,通过构 建特定姿态的人体数据集进行训练,识别相应的人体轮廓形状。人体轮廓经过提取关键点、拟 合回归分析等后续处理,可估算人体的尺寸。该方法可应用在移动终端上,通过拍摄的人体照 片的方法测量人体的尺寸。实验表明,该方法能准确地提取复杂背景照片中的人体轮廓并测量 尺寸,在速度与存储占用方面较一般神经网络有一定优势。  相似文献   

The radiosity method is particularly suitable for global illumination calculations in static environments. Nonetheless, recent applications of image synthesis such as architectural simulation or lighting design require the ability to modify environments. Previous methods have attempted to deal with dynamic environments (environments where the geometry, the material properties, etc., can change)but still suffer some limitations in the case of moving objects. One of the main problems remaining is the efficient and accurate detection of which form factors must really be recomputed, since their calculation is the most time-consuming part of the radiosity method. To correctly understand and solve this problem, we start with a method in 2D for polygonal scenes using the visibility complex. It is a powerful data structure representing the visibility relationships between objects in the plane. We have developed and implemented an algorithm which uses this structure to efficiently compute the discontinuity mesh and the form factors for static scenes. We also propose an extension to our algorithm to efficiently update only the modified form factors when an object is moving. This approach enhances our understanding and will hopefully lead to efficient solutions in 3D.  相似文献   

Optic flow motion patterns can be a rich source of information about our own movement and about the structure of the environment we are moving in. We investigate the information available to the brain under real operating conditions by analyzing video sequences generated by physically moving a camera through various typical human environments. We consider to what extent the motion signal maps generated by a biologically plausible, two-dimensional array of correlation-based motion detectors (2DMD) not only depend on egomotion, but also reflect the spatial setup of such environments. We analyzed the local motion outputs by extracting the relative amounts of detected directions and comparing the spatial distribution of the motion signals to that of idealized optic flow. Using a simple template matching estimation technique, we are able to extract the focus of expansion and find relatively small errors that are distributed in characteristic patterns in different scenes. This shows that all types of scenes provide suitable motion information for extracting ego motion despite the substantial levels of noise affecting the motion signal distributions, attributed to the sparse nature of optic flow and the presence of camera jitter. However, there are large differences in the shape of the direction distributions between different types of scenes; in particular, man-made office scenes are heavily dominated by directions in the cardinal axes, which is much less apparent in outdoor forest scenes. Further examination of motion magnitudes at different scales and the location of motion information in a scene revealed different patterns across different scene categories. This suggests that self-motion patterns are not only relevant for deducing heading direction and speed but also provide a rich information source for scene structure and could be important for the rapid formation of the gist of a scene under normal human locomotion.  相似文献   

对由光源颜色变化引起的图像色彩偏差,进行了校正,并在YCbCr颜色空间建立了Cb-Cr色度查找表和亮度信息联合的肤色模型,应用预处理技术,去除部分非人脸区域,减少人脸检测的搜索空间,并采用模板匹配方法在人脸候选区域检测人脸.实验表明,该方法能够有效的从复杂环境的彩色图像中检测出左右旋转不超过45°的人脸,且不受人脸表情、尺度和数目的影响,且错误率较低.  相似文献   

Characteristics of natural scenes related to the fractal dimension   总被引:4,自引:0,他引:4  
Many objects in images of natural scenes are so complex and erratic, that describing them by the familiar models of classical geometry is inadequate. In this paper, we exploit the power of fractal geometry to generate global characteristics of natural scenes. In particular we are concerned with the following two questions: 1) Can we develop a measure which can distinguish between different global backgrounds (e.g., mountains and trees)? and 2) Can we develop a measure that is sensitive to change in distance (or scale)? We present a model based on fractional Brownian motion which will allow us to recover two characteristics related to the fractal dimension from silhouettes. The first characteristic is an estimate of the fractal dimension based on a least squares linear fit. We show that this feature is stable under a variety of real image conditions and use it to distinguish silhouettes of trees from silhouettes of mountains. Next we introduce a new theoretical concept called the average Holder constant and relate it mathematically to the fractal dimension. It is shown that this measurement is sensitive to scale in a predictable manner, and hence, provides the potential for use as a range indicator. Corroborating experimental results are presented.  相似文献   

Although it is now well known that natural images display consistent statistical properties which distinguish them from random luminance distributions, this ecological approach to vision has so far concentrated on those second-order image statistics which are quantified by image power spectra, and it appears to be the image phase spectra which carry the majority of the image-intrinsic information. The present work describes how conventional nth-order statistics can be modified so that they are sensitive to image phase structure only. The modified measures are applied to an ensemble of natural images, and the results show that natural images do have consistent higher-order statistical properties which distinguish them from random-phase images with the same power spectra. An interpretation of this finding in terms of higher-order spectra suggests that these consistent properties arise from the ubiquity of edge structures in natural images, and raises the possibility that the properties of ideal relative-phase-sensitive mechanisms could be determined directly from analyses of the higher-order structure of natural scenes.  相似文献   

Digital landscape realism often comes from the multitude of details that are hard to model such as fallen leaves, rock piles or entangled fallen branches. In this article, we present a method for augmenting natural scenes with a huge amount of details such as grass tufts, stones, leaves or twigs. Our approach takes advantage of the observation that those details can be approximated by replications of a few similar objects and therefore relies on mass‐instancing. We propose an original structure, the Ghost Tile, that stores a huge number of overlapping candidate objects in a tile, along with a pre‐computed collision graph. Details are created by traversing the scene with the Ghost Tile and generating instances according to user‐defined density fields that allow to sculpt layers and piles of entangled objects while providing control over their density and distribution.  相似文献   

Cheap, ubiquitous, high-resolution digital cameras have led to opportunities that demand camera-based text understanding, such as wearable computing or assistive technology. Perspective distortion is one of the main challenges for text recognition in camera captured images since the camera may often not have a fronto-parallel view of the text. We present a method for perspective recovery of text in natural scenes, where text can appear as isolated words, short sentences or small paragraphs (as found on posters, billboards, shop and street signs etc.). It relies on the geometry of the characters themselves to estimate a rectifying homography for every line of text, irrespective of the view of the text over a large range of orientations. The horizontal perspective foreshortening is corrected by fitting two lines to the top and bottom of the text, while the vertical perspective foreshortening and shearing are estimated by performing a linear regression on the shear variation of the individual characters within the text line. The proposed method is efficient and fast. We present comparative results with improved recognition accuracy against the current state-of-the-art.  相似文献   

自然场景图像中的文本提供了重要的语意信息,它是图像内容的重要来源.针对当前的求解算法普遍存在提取文本精确度不高等缺点,提出了一种文本定位准确的文本提取算法.先将原始图片进行金字塔分解,然后进行彩色图像边缘提取和二值化,再形态学文本定位,最后文本区域字符提取.对ICDAR数据库图片的测试结果表明,该方法对文字颜色、大小字体以及排列方向具有较强的鲁棒性,同时也具有较高的精确度和提取率.  相似文献   

Contour extraction of moving objects in complex outdoor scenes   总被引:29,自引:1,他引:29  
This paper presents a new approach to the extraction of the contour of a moving object. The method is based on the fusion of a motion segmentation technique using image subtraction and a color segmentation technique based on the split-and-merge paradigm and edge information obtained from using the Canny edge detector. The advantages of this method are the following: it can detect large moving objects, the background can be arbitrarily complicated and contain many nonmoving objects, and it requires only three image frames that need not be consecutive provided that the moving object is entirely contained in the three frames. It is assumed that there is only one moving object in the image and the objects are not blurred by their motion so that the edges in the image are sharp. The method was applied to road images containing a moving vehicle, and the results show that the contour was correctly extracted in 18 of the 20 cases. We show that this contour extraction method gives good results for other types of moving objects as well. We also describe how the extracted contour can be used to classify a given vehicle into five generic categories. In this study, 19 out of the 20 vehicles were correctly classified. These results demonstrate that integration of multiple cues obtained from relatively simple image analysis techniques leads to a robust extraction of the object of interest in complex outdoor scenes.Research supported by a grant from the U.S. Department of Transportation through the Great Lakes Center for Truck Transportation Research and by a grant from the National Science Foundation (CDA-8806599).  相似文献   

This paper presents a novel approach based on contextual Bayesian networks (CBN) for natural scene modeling and classification. The structure of the CBN is derived based on domain knowledge, and parameters are learned from training images. For test images, the hybrid streams of semantic features of image content and spatial information are piped into the CBN-based inference engine, which is capable of incorporating domain knowledge as well as dealing with a number of input evidences, producing the category labels of the entire image. We demonstrate the promise of this approach for natural scene classification, comparing it with several state-of-art approaches.  相似文献   

