首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A model-based computer vision system for recognizing handwritten ZIP codes   总被引:1,自引:1,他引:0  
This paper describes a recognition system for handwritten ZIP Codes currently under development at the Environmental Research Institute of Michigan (ERIM). Included within this system are techniques for preprocessing address block images, locating ZIP Codes, splitting touching characters, and identifying handwritten numerals. These techniques rely on mathematical morphology-based image processing and on hierarchical matching of object models to symbolic image representations. The image processing uses adaptive filtering, thresholding, and skeletonizing to create binary and state-labeled images. The matching process uses these images and extensively developed handwritten digit models to identify ZIP Codes. The end-to-end system has been tested on 500 randomly selected address block images. The system correctly recognized a large portion of the ZIP Codes in the test images (45.0%), and incorrectly classified a very low percentage of isolated handwritten digits (0.9%). Overall performance continues to be improved through incremental digit model refinement.This work was funded by the Office of Advanced Technology, United States Postal Service under contract 104230-86-H-0042.  相似文献   

2.
Detection, segmentation, and classification of specific objects are the key building blocks of a computer vision system for image analysis. This paper presents a unified model-based approach to these three tasks. It is based on using unsupervised learning to find a set of templates specific to the objects being outlined by the user. The templates are formed by averaging the shapes that belong to a particular cluster, and are used to guide a probabilistic search through the space of possible objects. The main difference from previously reported methods is the use of on-line learning, ideal for highly repetitive tasks. This results in faster and more accurate object detection, as system performance improves with continued use. Further, the information gained through clustering and user feedback is used to classify the objects for problems in which shape is relevant to the classification. The effectiveness of the resulting system is demonstrated in two applications: a medical diagnosis task using cytological images, and a vehicle recognition task. Received: 5 November 2000 / Accepted: 29 June 2001 Correspondence to: K.-M. Lee  相似文献   

3.
This paper describes an adaptive recognition system for isolated handwritten characters and the experiments carried out with it. The characters used in our experiments are alphanumeric characters, including both the upper- and lower-case versions of the Latin alphabets and three Scandinavian diacriticals. The writers are allowed to use their own natural style of writing. The recognition system is based on the k-nearest neighbor rule. The six character similarity measures applied by the system are all based on dynamic time warping. The aim of the first experiments is to choose the best combination of the simple preprocessing and normalization operations and the dissimilarity measure for a multi-writer system. However, the main focus of the work is on online adaptation. The purpose of the adaptations is to turn a writer-independent system into writer-dependent and increase recognition performance. The adaptation is carried out by modifying the prototype set of the classifier according to its recognition performance and the user's writing style. The ways of adaptation include: (1) adding new prototypes; (2) inactivating confusing prototypes; and (3) reshaping existing prototypes. The reshaping algorithm is based on the Learning Vector Quantization. Four different adaptation strategies, according to which the modifications of the prototype set are performed, have been studied both offline and online. Adaptation is carried out in a self-supervised fashion during normal use and thus remains unnoticed by the user. Received June 30, 1999 / Revised September 29, 2000  相似文献   

4.
This paper describes a laser-based computer vision system used for automatic fruit recognition. It is based on an infrared laser range-finder sensor that provides range and reflectance images and is designed to detect spherical objects in non-structured environments. Image analysis algorithms integrate both range and reflectance information to generate four characteristic primitives which give evidence of the existence of spherical objects. The output of this vision system includes 3D position, radius and surface reflectivity of each spherical object. It has been applied to the AGRIBOT orange harvesting robot, where it has obtained good fruit detection rates and unlikely false detections.  相似文献   

5.
We describe a process of word recognition that has high tolerance for poor image quality, tunability to the lexical content of the documents to which it is applied, and high speed of operation. This process relies on the transformation of text images into character shape codes, and on special lexica that contain information on the shape of words. We rely on the structure of English and the high efficiency of mapping between shape codes and the characters in the words. Remaining ambiguity is reduced by template matching using exemplars derived from surrounding text, taking advantage of the local consistency of font, face and size as well as image quality. This paper describes the effects of lexical content, structure and processing on the performance of a word recognition engine. Word recognition performance is shown to be enhanced by the application of an appropriate lexicon. Recognition speed is shown to be essentially independent of the details of lexical content provided the intersection of the occurrences of words in the document and the lexicon is high. Word recognition accuracy is dependent on both intersection and specificity of the lexicon. Received May 1, 1998 / Revised October 20, 1998  相似文献   

6.
Experiments comparing neural networks trained with crisp and fuzzy desired outputs are described. A handwritten word recognition algorithm using the neural networks for character level confidence assignment was tested on images of words taken from the United States Postal Service mailstream. The fuzzy outputs were defined using a fuzzy k-nearest neighbor algorithm. The crisp networks slightly outperformed the fuzzy networks at the character level but the fuzzy networks outperformed the crisp networks at the word level. This empirical result is interpreted as an example of the principle of least commitment  相似文献   

7.
Extraction of some meta-information from printed documents without carrying out optical character recognition (OCR) is considered. It can be statistically verified that important terms in technical articles are mainly printed in italic, bold, and all-capital style. A quick approach to detecting them is proposed here. This approach is based on the global shape heuristics of these styles of any font. Important words in a document are sometimes printed in larger size as well. A smart approach for the determination of font size is also presented. Detection of type styles helps in improving OCR performance, especially for reading italicized text. Another advantage to identifying word type styles and font size has been discussed in the context of extracting: (i) different logical labels; and (ii) important terms from the document. Experimental results on the performance of the approach on a large number of good quality, as well as degraded, document images are presented. Received July 12, 2000 / Revised October 1, 2000  相似文献   

8.
Computing systems are essential resources for both the business and public sectors. With the increasing interdependence of integrated electronic commerce and business applications within the global computing environment, performance and reliability are of great concern. Poor performance can mean lost cooperation, opportunity, and revenue. This paper describes performance challenges that these applications face over the short and long term. We present an analytic technique that can predict the performance of an e-commerce application over a given deployment period. This technique can be used to deduce performance stress testing vectors over this period and for design and capacity planning exercises. A Web-based shopping server case study is used as an example. Published online: 22 August 2001  相似文献   

9.
This paper describes a performance evaluation study in which some efficient classifiers are tested in handwritten digit recognition. The evaluated classifiers include a statistical classifier (modified quadratic discriminant function, MQDF), three neural classifiers, and an LVQ (learning vector quantization) classifier. They are efficient in that high accuracies can be achieved at moderate memory space and computation cost. The performance is measured in terms of classification accuracy, sensitivity to training sample size, ambiguity rejection, and outlier resistance. The outlier resistance of neural classifiers is enhanced by training with synthesized outlier data. The classifiers are tested on a large data set extracted from NIST SD19. As results, the test accuracies of the evaluated classifiers are comparable to or higher than those of the nearest neighbor (1-NN) rule and regularized discriminant analysis (RDA). It is shown that neural classifiers are more susceptible to small sample size than MQDF, although they yield higher accuracies on large sample size. As a neural classifier, the polynomial classifier (PC) gives the highest accuracy and performs best in ambiguity rejection. On the other hand, MQDF is superior in outlier rejection even though it is not trained with outlier data. The results indicate that pattern classifiers have complementary advantages and they should be appropriately combined to achieve higher performance. Received: July 18, 2001 / Accepted: September 28, 2001  相似文献   

10.
Geometric groundtruth at the character, word, and line levels is crucial for designing and evaluating optical character recognition (OCR) algorithms. Kanungo and Haralick proposed a closed-loop methodology for generating geometric groundtruth for rescanned document images. The procedure assumed that the original image and the corresponding groundtruth were available. It automatically registered the original image to the rescanned one using four corner points and then transformed the original groundtruth using the estimated registration transformation. In this paper, we present an attributed branch-and-bound algorithm for establishing the point correspondence that uses all the data points. We group the original feature points into blobs and use corners of blobs for matching. The Euclidean distance between character centroids is used as the error metric. We conducted experiments on synthetic point sets with varying layout complexity to characterize the performance of two matching algorithms. We also report results on experiments conducted using the University of Washington dataset. Finally, we show examples of application of this methodology for generating groundtruth for microfilmed and FAXed versions of the University of Washington dataset documents. Received: July 24, 2001 / Accepted: May 20, 2002  相似文献   

11.
In this paper, a two-stage HMM-based recognition method allows us to compensate for the possible loss in terms of recognition performance caused by the necessary trade-off between segmentation and recognition in an implicit segmentation-based strategy. The first stage consists of an implicit segmentation process that takes into account some contextual information to provide multiple segmentation-recognition hypotheses for a given preprocessed string. These hypotheses are verified and re-ranked in a second stage by using an isolated digit classifier. This method enables the use of two sets of features and numeral models: one taking into account both the segmentation and recognition aspects in an implicit segmentation-based strategy, and the other considering just the recognition aspects of isolated digits. These two stages have been shown to be complementary, in the sense that the verification stage compensates for the loss in terms of recognition performance brought about by the necessary tradeoff between segmentation and recognition carried out in the first stage. The experiments on 12,802 handwritten numeral strings of different lengths have shown that the use of a two-stage recognition strategy is a promising idea. The verification stage brought about an average improvement of 9.9% on the string recognition rates. On touching digit pairs, the method achieved a recognition rate of 89.6%. Received June 28, 2002 / Revised July 03, 2002  相似文献   

12.
An architecture for handwritten text recognition systems   总被引:1,自引:1,他引:0  
This paper presents an end-to-end system for reading handwritten page images. Five functional modules included in the system are introduced in this paper: (i) pre-processing, which concerns introducing an image representation for easy manipulation of large page images and image handling procedures using the image representation; (ii) line separation, concerning text line detection and extracting images of lines of text from a page image; (iii) word segmentation, which concerns locating word gaps and isolating words from a line of text image obtained efficiently and in an intelligent manner; (iv) word recognition, concerning handwritten word recognition algorithms; and (v) linguistic post-pro- cessing, which concerns the use of linguistic constraints to intelligently parse and recognize text. Key ideas employed in each functional module, which have been developed for dealing with the diversity of handwriting in its various aspects with a goal of system reliability and robustness, are described in this paper. Preliminary experiments show promising results in terms of speed and accuracy. Received October 30, 1998 / Revised January 15, 1999  相似文献   

13.
Target recognition is a multilevel process requiring a sequence of algorithms at low, intermediate and high levels. Generally, such systems are open loop with no feedback between levels and assuring their performance at the given probability of correct identification (PCI) and probability of false alarm (Pf) is a key challenge in computer vision and pattern recognition research. In this paper, a robust closed-loop system for recognition of SAR images based on reinforcement learning is presented. The parameters in model-based SAR target recognition are learned. The method meets performance specifications by using PCI and Pf as feedback for the learning system. It has been experimentally validated by learning the parameters of the recognition system for SAR imagery, successfully recognizing articulated targets, targets of different configuration and targets at different depression angles.  相似文献   

14.
Summary. In a distributed system, high-level actions can be modeled by nonatomic events. This paper proposes causality relations between distributed nonatomic events and provides efficient testing conditions for the relations. The relations provide a fine-grained granularity to specify causality relations between distributed nonatomic events. The set of relations between nonatomic events is complete in first-order predicate logic, using only the causality relation between atomic events. For a pair of distributed nonatomic events X and Y, the evaluation of any of the causality relations requires integer comparisons, where and , respectively, are the number of nodes on which the two nonatomic events X and Y occur. In this paper, we show that this polynomial complexity of evaluation can by simplified to a linear complexity using properties of partial orders. Specifically, we show that most relations can be evaluated in integer comparisons, some in integer comparisons, and the others in integer comparisons. During the derivation of the efficient testing conditions, we also define special system execution prefixes associated with distributed nonatomic events and examine their knowledge-theoretic significance. Received: July 1997 / Accepted: May 1998  相似文献   

15.
This paper describes a deformable model of the human iris which forms part of a system for accurate off-line measurement of binocular three-dimensional eye movements, particularly cyclotorsion (torsion), from video image sequences. At least two existing systems measure torsion from infrared video images by pupil tracking followed by cross correlation using arcs of bandpass-filtered iris texture. Unfortunately, pupil expansion and contraction reduces the accuracy of this method unless drugs are used to constrict the pupil, which causes temporary blurred vision. A five-parameter deformable model of the iris is therefore developed for analysing images obtained without the use of drugs. This model can translate (horizontal and vertical eye motion), rotate (torsion) and scale both uniformly and radially (pupil changes). Torsion measurements obtained with the model are repeatable and accurate to within 0.1°; this performance is illustrated by analysing binocular torsion during fixation on a stationary target. Received: 27 August 1997 / Accepted: 15 January 1998  相似文献   

16.
Segmentation and recognition of Chinese bank check amounts   总被引:1,自引:0,他引:1  
This paper describes a system for the recognition of legal amounts on bank checks written in the Chinese language. It consists of subsystems that perform preprocessing, segmentation, and recognition of the legal amount. In each step of the segmentation and recognition phases, a list of possible choices are obtained. An approach is adopted whereby a large number of choices can be processed effectively and efficiently in order to achieve the best recognition result. The contribution of this paper is the proposal of a grammar checker for Chinese bank check amounts. It is found to be very effective in reducing the substitution error rate. The recognition rate of the system is 74.0%, the error rate is 10.4%, and the reliability is 87.7%. Received June 9, 2000 / Revised January 10, 2001  相似文献   

17.
While techniques for evaluating the performance of lower-level document analysis tasks such as optical character recognition have gained acceptance in the literature, attempts to formalize the problem for higher-level algorithms, while receiving a fair amount of attention in terms of theory, have generally been less successful in practice, perhaps owing to their complexity. In this paper, we introduce intuitive, easy-to-implement evaluation schemes for the related problems of table detection and table structure recognition. We also present the results of several small experiments, demonstrating how well the methodologies work and the useful sorts of feedback they provide. We first consider the table detection problem. Here algorithms can yield various classes of errors, including non-table regions improperly labeled as tables (insertion errors), tables missed completely (deletion errors), larger tables broken into a number of smaller ones (splitting errors), and groups of smaller tables combined to form larger ones (merging errors). This leads naturally to the use of an edit distance approach for assessing the results of table detection. Next we address the problem of evaluating table structure recognition. Our model is based on a directed acyclic attribute graph, or table DAG. We describe a new paradigm, “graph probing,” for comparing the results returned by the recognition system and the representation created during ground-truthing. Probing is in fact a general concept that could be applied to other document recognition tasks as well. Received July 18, 2000 / Accepted October 4, 2001  相似文献   

18.
Interactive performance systems allow multiple, networked users to take part in a performance either as players or audience members and influence the performance's development in real time. Among the most prominent research issues concerning the development of these systems are the provision of effective interaction capabilities and adequate means of expression to the audience. This article describes a set of methods for the detection, analysis, synchronization and rendering of audience behavior in real-time during such an event. Furthermore, it describes ways in which audience feedback is utilized to dynamically index the contents of a performance so as to embed this event in large-scale multimedia services. All these methods have been applied in the creation of MISSION, a multi-user game involving both players and audience. This article presents and analyzes the design of the system and the results of user trials with it.  相似文献   

19.
This paper describes a new kind of neural network – Quantum Neural Network (QNN) – and its application to the recognition of handwritten numerals. QNN combines the advantages of neural modelling and fuzzy theoretic principles. Novel experiments have been designed for in-depth studies of applying the QNN to both real data and confusing images synthesized by morphing. Tests on synthesized data examine QNN's fuzzy decision boundary with the intention to illustrate its mechanism and characteristics, while studies on real data prove its great potential as a handwritten numeral classifier and the special role it plays in multi-expert systems. An effective decision-fusion system is proposed and a high reliability of 99.10% has been achieved. Received October 26, 1998 / Revised January 9, 1999  相似文献   

20.
The performance of electronic commerce systems has a major impact on their acceptability to users. Different users also demand different levels of performance from the system, that is, they will have different Quality of Service (QoS) requirements. Electronic commerce systems are the integration of several different types of servers and each server must contribute to meeting the QoS demands of the users. In this paper we focus on the role, and the performance, of a database server within an electronic commerce system. We examine the characteristics of the workload placed on a database server by an electronic commerce system and suggest a range of QoS requirements for the database server based on this analysis of the workload. We argue that a database server must be able to dynamically reallocate its resources in order to meet the QoS requirements of different transactions as the workload changes. We describe Quartermaster, which is a system to support dynamic goal-oriented resource management in database management systems, and discuss how it can be used to help meet the QoS requirements of the electronic commerce database server. We provide an example of the use of Quartermaster that illustrates how the dynamic reallocation of memory resources can be used to meet the QoS requirements of a set of transactions similar to transactions found in an electronic commerce workload. We briefly describe the memory reallocation algorithms used by Quartermaster and present experiments to show the impact of the reallocations on the performance of the transactions. Published online: 22 August 2001  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号