首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper presents an algorithm for the real-time computation of disparity using video stereo images captured by a stereo webcam. This algorithm is designed to provide both real-time throughput and robust disparity estimation for real-world applications where computation is limited to a pre-defined region-of-interest (ROI). More specifically, this algorithm is used as part of a hand-pair gesture recognition application where the disparity is computed for two ROI around a hand-pair identified by the segmentation component of the recognition application. The developed algorithm provides the required relative difference in disparity with background at high frame rates for the hand-pair gesture recognition application. The results obtained with an inexpensive commercial VGA stereo webcam show a robust disparity computation of 20?ms/frame enabling real-time hand-pair gesture recognition at 25?fps with >90% recognition rate for a maximum hand speed of 40?cm/s and for hand distances between 30 and 150?cm away from the camera.  相似文献   

2.
This paper presents a new multi-pass hierarchical stereo-matching approach for generation of digital terrain models (DTMs) from two overlapping aerial images. Our method consists of multiple passes which compute stereo matches with a coarse-to-fine and sparse-to-dense paradigm. An image pyramid is generated and used in the hierarchical stereo matching. Within each pass, the DTM is refined by using the image pyramid from the coarse to the fine level. At the coarsest level of the first pass, a global stereo-matching technique, the intra-/inter-scanline matching method, is used to generate a good initial DTM for the subsequent stereo matching. Thereafter, hierarchical block matching is applied to image locations where features are detected to refine the DTM incrementally. In the first pass, only the feature points near salient edge segments are considered in block matching. In the second pass, all the feature points are considered, and the DTM obtained from the first pass is used as the initial condition for local searching. For the passes after the second pass, 3D interactive manual editing can be incorporated into the automatic DTM refinement process whenever necessary. Experimental results have shown that our method can successfully provide accurate DTM from aerial images. The success of our approach and system has also been demonstrated with a flight simulation software. Received: 4 November 1996 / Accepted: 20 October 1997  相似文献   

3.
In this paper, we propose a novel system that strives to achieve advanced content-based image retrieval using seamless combination of two complementary approaches: on the one hand, we propose a new color-clustering method to better capture color properties of the original images; on the other hand, expecting that image regions acquired from the original images inevitably contain many errors, we make use of the available erroneous, ill-segmented image regions to accomplish the object-region-based image retrieval. We also propose an effective image-indexing scheme to facilitate fast and efficient image matching and retrieval. The carefully designed experimental evaluation shows that our proposed image retrieval system surpasses other methods under comparison in terms of not only quantitative measures, but also image retrieval capabilities.  相似文献   

4.
Traditional digital particle image velocimetry (DPIV) methods are previously based on area-correlation. Though proven to be very time-consuming and error prone, it has been widely adopted because it is conceptually simple, and easy to implement, and also because there are few alternatives. This paper provides a non-correlative, conceptually new, fast and efficient approach for DPIV which takes the nature of flow into consideration. An incompressible affine flow model (IAFM) is introduced to describe a flow that incorporates rational constraint directly into the computation. This IAFM, combining with a modified optical flow method – named total optical flow computation, provides a linear system solution to DPIV. Experimental results on real images demonstrate our method to be a very promising approach for DPIV. Received: 23 March 1998 / Accepted: 1 September 1999  相似文献   

5.
REFLICS: Real-time flow imaging and classification system   总被引:1,自引:0,他引:1  
An accurate analysis of a large dynamic system like our oceans requires spatially fine and temporally matched data collection methods. Current methods to estimate fish stock size from pelagic (marine) fish egg abundance by using ships to take point samples of fish eggs have large margins of error due to spatial and temporal undersampling. The real-time flow imaging and classification system (REFLICS) enhances fish egg sampling by obtaining continuous, accurate information on fish egg abundance as the ship cruises along in the area of interest. REFLICS images the dynamic flow with a progressive-scan area camera (60 frames/s) and a synchronized strobe in backlighting configuration. Digitization and processing occur on a dual-processor Pentium II PC and a pipeline-based image-processing board. REFLICS uses a segmentation algorithm to locate fish-egg-like objects in the image and then a classifier to determine fish egg, species, and development stage (age). We present an integrated system design of REFLICS and performance results. REFLICS can perform in real time (60 Hz), classify fish eggs with low false negative rates on real data collected from a cruise, and work in harsh conditions aboard ships at sea. REFLICS enables cost-effective, real-time assessment of pelagic fish eggs for research and management. Received: 12 April 2000 / Accepted: 6 July 2000  相似文献   

6.
A stereo-vision system for support of planetary surface exploration   总被引:2,自引:0,他引:2  
Abstract. In this paper, we present a system that was developed for the European Space Agency (ESA) for the support of planetary exploration. The system that is sent to the planetary surface consists of a rover and a lander. The lander contains a stereo head equipped with a pan-tilt mechanism. This vision system is used both for modeling the terrain and for localization of the rover. Both tasks are necessary for the navigation of the rover. Due to the stress that occurs during the flight, a recalibration of the stereo-vision system is required once it is deployed on the planet. Practical limitations make it unfeasible to use a known calibration pattern for this purpose; therefore, a new calibration procedure had to be developed that could work on images of the planetary environment. This automatic procedure recovers the relative orientation of the cameras and the pan and tilt axes, as well as the exterior orientation for all the images. The same images are subsequently used to reconstruct the 3-D structure of the terrain. For this purpose, a dense stereo-matching algorithm is used that (after rectification) computes a disparity map. Finally, all the disparity maps are merged into a single digital terrain model. In this paper, a simple and elegant procedure is proposed that achieves that goal. The fact that the same images can be used for both calibration and 3-D reconstruction is important, since, in general, the communication bandwidth is very limited. In addition to navigation and path planning, the 3-D model of the terrain is also used for virtual-reality simulations of the mission, wherein the model is texture mapped with the original images. The system has been implemented, and the first tests on the ESA planetary terrain testbed were successful.  相似文献   

7.
One important step in the analysis of digitized land use map images is the separation of the information in layers. In this paper we present a technique called Selective Attention Filter which is able to extract or enhance some features of the image that correspond to conceptual layers in the map by extracting information from results of clustering of local regions on the map. Different parameters can be used to extract or enhance different information on the image. Details on the algorithm, examples of application of the filter and results are also presented. Received: October 1, 1997 / Revised June 16, 1998  相似文献   

8.
Abstract. Providing a customized result set based upon a user preference is the ultimate objective of many content-based image retrieval systems. There are two main challenges in meeting this objective: First, there is a gap between the physical characteristics of digital images and the semantic meaning of the images. Secondly, different people may have different perceptions on the same set of images. To address both these challenges, we propose a model, named Yoda, that conceptualizes content-based querying as the task of soft classifying images into classes. These classes can overlap, and their members are different for different users. The “soft” classification is hence performed for each and every image feature, including both physical and semantic features. Subsequently, each image will be ranked based on the weighted aggregation of its classification memberships. The weights are user-dependent, and hence different users would obtain different result sets for the same query. Yoda employs a fuzzy-logic based aggregation function for ranking images. We show that, in addition to some performance benefits, fuzzy aggregation is less sensitive to noise and can support disjunctive queries as compared to weighted-average aggregation used by other content-based image retrieval systems. Finally, since Yoda heavily relies on user-dependent weights (i.e., user profiles) for the aggregation task, we utilize the users' relevance feedback to improve the profiles using genetic algorithms (GA). Our learning mechanism requires fewer user interactions, and results in a faster convergence to the user's preferences as compared to other learning techniques. Correspondence to: Y.-S. Chen (E-mail: yishinc@usc.edu) This research has been funded in part by NSF grants EEC-9529152 (IMSC ERC) and IIS-0082826, NIH-NLM R01-LM07061, DARPA and USAF under agreement nr. F30602-99-1-0524, and unrestricted cash gifts from NCR, Microsoft, and Okawa Foundation.  相似文献   

9.
An architecture for handwritten text recognition systems   总被引:1,自引:1,他引:0  
This paper presents an end-to-end system for reading handwritten page images. Five functional modules included in the system are introduced in this paper: (i) pre-processing, which concerns introducing an image representation for easy manipulation of large page images and image handling procedures using the image representation; (ii) line separation, concerning text line detection and extracting images of lines of text from a page image; (iii) word segmentation, which concerns locating word gaps and isolating words from a line of text image obtained efficiently and in an intelligent manner; (iv) word recognition, concerning handwritten word recognition algorithms; and (v) linguistic post-pro- cessing, which concerns the use of linguistic constraints to intelligently parse and recognize text. Key ideas employed in each functional module, which have been developed for dealing with the diversity of handwriting in its various aspects with a goal of system reliability and robustness, are described in this paper. Preliminary experiments show promising results in terms of speed and accuracy. Received October 30, 1998 / Revised January 15, 1999  相似文献   

10.
A model-based hand gesture recognition system   总被引:2,自引:0,他引:2  
This paper introduces a model-based hand gesture recognition system, which consists of three phases: feature extraction, training, and recognition. In the feature extraction phase, a hybrid technique combines the spatial (edge) and the temporal (motion) information of each frame to extract the feature images. Then, in the training phase, we use the principal component analysis (PCA) to characterize spatial shape variations and the hidden Markov models (HMM) to describe the temporal shape variations. A modified Hausdorff distance measurement is also applied to measure the similarity between the feature images and the pre-stored PCA models. The similarity measures are referred to as the possible observations for each frame. Finally, in recognition phase, with the pre-trained PCA models and HMM, we can generate the observation patterns from the input sequences, and then apply the Viterbi algorithm to identify the gesture. In the experiments, we prove that our method can recognize 18 different continuous gestures effectively. Received: 19 May 1999 / Accepted: 4 September 2000  相似文献   

11.
A system to navigate a robot into a ship structure   总被引:1,自引:0,他引:1  
Abstract. A prototype system has been built to navigate a walking robot into a ship structure. The 8-legged robot is equipped with an active stereo head. From the CAD-model of the ship good view points are selected, such that the head can look at locations with sufficient edge features, which are extracted automatically for each view. The pose of the robot is estimated from the features detected by two vision approaches. One approach searches in stereo images for junctions and measures the 3-D position. The other method uses monocular image and tracks 2-D edge features. Robust tracking is achieved with a method of edge projected integration of cues (EPIC). Two inclinometres are used to stabilise the head while the robot moves. The results of the final demonstration to navigate the robot within centimetre accuracy are given.  相似文献   

12.
Stereo images acquired by a stereo camera setup provide depth estimation of a scene. Numerous machine vision applications deal with retrieval of 3D information. Disparity map recovery from a stereo image pair involves computationally complex algorithms. Previous methods of disparity map computation are mainly restricted to software-based techniques on general-purpose architectures, presenting relatively high execution time. In this paper, a new hardware-implemented real-time disparity map computation module is realized. This enables a hardware-based fuzzy inference system parallel-pipelined design, for the overall module, implemented on a single FPGA device with a typical operating frequency of 138 MHz. This provides accurate disparity map computation at a rate of nearly 440 frames per second, given a stereo image pair with a disparity range of 80 pixels and 640 × 480 pixels spatial resolution. The proposed method allows a fast disparity map computational module to be built, enabling a suitable module for real-time stereo vision applications.  相似文献   

13.
We propose a stereo correspondence method by minimizing intensity and gradient errors simultaneously. In contrast to conventional use of image gradients, the gradients are applied in the deformed image space. Although a uniform smoothness constraint is imposed, it is applied only to nonfeature regions. To avoid local minima in the function minimization, we propose to parameterize the disparity function by hierarchical Gaussians. Both the uniqueness and the ordering constraints can be easily imposed in our minimization framework. Besides, we propose a method to estimate the disparity map and the camera response difference parameters simultaneously. Experiments with various real stereo images show robust performances of our algorithm  相似文献   

14.
Head tracking using stereo   总被引:2,自引:0,他引:2  
Head tracking is an important primitive for smart environments and perceptual user interfaces where the poses and movements of body parts need to be determined. Most previous solutions to this problem are based on intensity images and, as a result, suffer from a host of problems including sensitivity to background clutter and lighting variations. Our approach avoids these pitfalls by using stereo depth data together with a simple human-torso model to create a head-tracking system that is both fast and robust. We use stereo data (Commercial equipment and materials are identified in order to adequately specify certain procedures. In no case does such identification imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the materials or equipment identified are necessarily the best available for the purpose.) to derive a depth model of the background that is then employed to provide accurate foreground segmentation. We then use directed local edge detectors on the foreground to find occluding edges that are used as features to fit to a torso model. Once we have the model parameters, the location and orientation of the head can be easily estimated. A useful side effect from using stereo data is the ability to track head movement through a room in three dimensions. Experimental results on real image sequences are given. Accepted: 13 August 2001  相似文献   

15.
In this paper, we address the analysis of 3D shape and shape change in non-rigid biological objects imaged via a stereo light microscope. We propose an integrated approach for the reconstruction of 3D structure and the motion analysis for images in which only a few informative features are available. The key components of this framework are: 1) image registration using a correlation-based approach, 2) region-of-interest extraction using motion-based segmentation, and 3) stereo and motion analysis using a cooperative spatial and temporal matching process. We describe these three stages of processing and illustrate the efficacy of the proposed approach using real images of a live frog's ventricle. The reconstructed dynamic 3D structure of the ventricle is demonstrated in our experimental results, and it agrees qualitatively with the observed images of the ventricle.  相似文献   

16.
This paper describes the issues involved in the design of a system for evaluating improvements in the performance of a real-time address recognition system being used by the United States Postal Service for processing mail-piece images. Evaluation of the performance of recognition systems is normally carried out by measuring the performance of the system on a representative sample of images. Designing a comprehensive and valid testing scenario is a complex task that requires careful attention. Sampling live mail-stream to generate a deck of images representative of the general mail-stream for testing, truthing (generating reference data on a significant number of images), grading and evaluation, and designing tools to facilitate these functions are important topics that need to be addressed. This paper describes the efforts of the United States Postal Service and CEDAR towards developing an infrastructure for sampling, truthing, and testing of mail-stream images. Received: July 25, 2000 / Revised version: July 31, 2001  相似文献   

17.
Abstract. We exploit the gap in ability between human and machine vision systems to craft a family of automatic challenges that tell human and machine users apart via graphical interfaces including Internet browsers. Turing proposed [Tur50] a method whereby human judges might validate “artificial intelligence” by failing to distinguish between human and machine interlocutors. Stimulated by the “chat room problem” posed by Udi Manber of Yahoo!, and influenced by the CAPTCHA project [BAL00] of Manuel Blum et al. of Carnegie-Mellon Univ., we propose a variant of the Turing test using pessimal print: that is, low-quality images of machine-printed text synthesized pseudo-randomly over certain ranges of words, typefaces, and image degradations. We show experimentally that judicious choice of these ranges can ensure that the images are legible to human readers but illegible to several of the best present-day optical character recognition (OCR) machines. Our approach is motivated by a decade of research on performance evaluation of OCR machines [RJN96,RNN99] and on quantitative stochastic models of document image quality [Bai92,Kan96]. The slow pace of evolution of OCR and other species of machine vision over many decades [NS96,Pav00] suggests that pessimal print will defy automated attack for many years. Applications include `bot' barriers and database rationing. Received: February 14, 2002 / Accepted: March 28, 2002 An expanded version of: A.L. Coates, H.S. Baird, R.J. Fateman (2001) Pessimal Print: a reverse Turing Test. In: {\it Proc. 6th Int. Conf. on Document Analysis and Recognition}, Seattle, Wash., USA, September 10–13, pp. 1154–1158 Correspondence to: H. S. Baird  相似文献   

18.
We present an autonomous mobile robot navigation system using stereo fish-eye lenses for navigation in an indoor structured environment and for generating a model of the imaged scene. The system estimates the three-dimensional (3D) position of significant features in the scene, and by estimating its relative position to the features, navigates through narrow passages and makes turns at corridor ends. Fish-eye lenses are used to provide a large field of view, which images objects close to the robot and helps in making smooth transitions in the direction of motion. Calibration is performed for the lens-camera setup and the distortion is corrected to obtain accurate quantitative measurements. A vision-based algorithm that uses the vanishing points of extracted segments from a scene in a few 3D orientations provides an accurate estimate of the robot orientation. This is used, in addition to 3D recovery via stereo correspondence, to maintain the robot motion in a purely translational path, as well as to remove the effects of any drifts from this path from each acquired image. Horizontal segments are used as a qualitative estimate of change in the motion direction and correspondence of vertical segment provides precise 3D information about objects close to the robot. Assuming detected linear edges in the scene as boundaries of planar surfaces, the 3D model of the scene is generated. The robot system is implemented and tested in a structured environment at our research center. Results from the robot navigation in real environments are presented and discussed. Received: 25 September 1996 / Accepted: 20 October 1996  相似文献   

19.
Due to the fuzziness of query specification and media matching, multimedia retrieval is conducted by way of exploration. It is essential to provide feedback so that users can visualize query reformulation alternatives and database content distribution. Since media matching is an expensive task, another issue is how to efficiently support exploration so that the system is not overloaded by perpetual query reformulation. In this paper, we present a uniform framework to represent statistical information of both semantics and visual metadata for images in the databases. We propose the concept of query verification, which evaluates queries using statistics, and provides users with feedback, including the strictness and reformulation alternatives of each query condition as well as estimated numbers of matches. With query verification, the system increases the efficiency of the multimedia database exploration for both users and the system. Such statistical information is also utilized to support progressive query processing and query relaxation. Received: 9 June 1998/ Accepted: 21 July 2000 Published online: 4 May 2001  相似文献   

20.
We present a scheme for reliable and accurate surface reconstruction from stereoscopic images containing only fine texture and no stable high-level features. Partial shape information is used to improve surface computation: first by fitting an approximate, global, parametric model, and then by refining this model via local correspondence processes. This scheme eliminates the window size selection problem in existing area-based stereo correspondence schemes. These ideas are integrated in a practical vision system that is being used by environmental scientists to study wind erosion of bulk material such as coal ore being transported in open rail cars. Received: 14 August 1995 / Accepted: 27 May 1997  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号