共查询到20条相似文献,搜索用时 31 毫秒
1.
Transforming paper documents into XML format with WISDOM++ 总被引:1,自引:1,他引:0
Oronzo Altamura Floriana Esposito Donato Malerba 《International Journal on Document Analysis and Recognition》2001,4(1):2-17
The transformation of scanned paper documents to a form suitable for an Internet browser is a complex process that requires
solutions to several problems. The application of an OCR to some parts of the document image is only one of the problems.
In fact, the generation of documents in HTML format is easier when the layout structure of a page has been extracted by means
of a document analysis process. The adoption of an XML format is even better, since it can facilitate the retrieval of documents
in the Web. Nevertheless, an effective transformation of paper documents into this format requires further processing steps,
namely document image classification and understanding. WISDOM++ is a document processing system that operates in five steps:
document analysis, document classification, document understanding, text recognition with an OCR, and transformation into HTML/XML format. The innovative aspects described in the paper are: the preprocessing algorithm, the adaptive page segmentation,
the acquisition of block classification rules using techniques from machine learning, the layout analysis based on general
layout principles, and a method that uses document layout information for conversion to HTML/XML formats. A benchmarking of
the system components implementing these innovative aspects is reported.
Received June 15, 2000 / Revised November 7, 2000 相似文献
2.
Pietro Parodi Roberto Fontana 《International Journal on Document Analysis and Recognition》1999,2(2-3):67-79
This paper describes a novel method for extracting text from document pages of mixed content. The method works by detecting
pieces of text lines in small overlapping columns of width , shifted with respect to each other by image elements (good default values are: of the image width, ) and by merging these pieces in a bottom-up fashion to form complete text lines and blocks of text lines. The algorithm requires
about 1.3 s for a 300 dpi image on a PC with a Pentium II CPU, 300 MHz, MotherBoard Intel440LX. The algorithm is largely independent
of the layout of the document, the shape of the text regions, and the font size and style. The main assumptions are that the
background be uniform and that the text sit approximately horizontally. For a skew of up to about 10 degrees no skew correction
mechanism is necessary. The algorithm has been tested on the UW English Document Database I of the University of Washington
and its performance has been evaluated by a suitable measure of segmentation accuracy. Also, a detailed analysis of the segmentation
accuracy achieved by the algorithm as a function of noise and skew has been carried out.
Received April 4, 1999 / Revised June 1, 1999 相似文献
3.
Amer Dawoud Mohamed Kamel 《International Journal on Document Analysis and Recognition》2002,5(1):28-38
Binarization of document images with poor contrast, strong noise, complex patterns, and variable modalities in the gray-scale
histograms is a challenging problem. A new binarization algorithm has been developed to address this problem for personal
cheque images. The main contribution of this approach is optimizing the binarization of a part of the document image that
suffers from noise interference, referred to as the Target Sub-Image (TSI), using information easily extracted from another
noise-free part of the same image, referred to as the Model Sub-Image (MSI). Simple spatial features extracted from MSI are
used as a model for handwriting strokes. This model captures the underlying characteristics of the writing strokes, and is
invariant to the handwriting style or content. This model is then utilized to guide the binarization in the TSI. Another contribution
is a new technique for the structural analysis of document images, which we call “Wavelet Partial Reconstruction” (WPR). The
algorithm was tested on 4,200 cheque images and the results show significant improvement in binarization quality in comparison
with other well-established algorithms.
Received: October 10, 2001 / Accepted: May 7, 2002
This research was supported in part by NCR and NSERC's industrial postgraduate scholarship No. 239464.
A simplified version of this paper has been presented at ICDAR 2001 [3]. 相似文献
4.
Oleg Okun Matti Pietikäinen Jaakko Sauvola 《International Journal on Document Analysis and Recognition》1999,2(2-3):132-144
The existing skew estimation techniques usually assume that the input image is of high resolution and that the detectable
angle range is limited. We present a more generic solution for this task that overcomes these restrictions. Our method is
based on determination of the first eigenvector of the data covariance matrix. The solution comprises image resolution reduction,
connected component analysis, component classification using a fuzzy approach, and skew estimation. Experiments on a large
set of various document images and performance comparison with two Hough transform-based methods show a good accuracy and
robustness for our method.
Received October 10, 1998 / Revised version September 9, 1999 相似文献
5.
J. Hu R.S. Kashi D. Lopresti G.T. Wilfong 《International Journal on Document Analysis and Recognition》2002,4(3):140-153
While techniques for evaluating the performance of lower-level document analysis tasks such as optical character recognition
have gained acceptance in the literature, attempts to formalize the problem for higher-level algorithms, while receiving a
fair amount of attention in terms of theory, have generally been less successful in practice, perhaps owing to their complexity.
In this paper, we introduce intuitive, easy-to-implement evaluation schemes for the related problems of table detection and
table structure recognition. We also present the results of several small experiments, demonstrating how well the methodologies
work and the useful sorts of feedback they provide. We first consider the table detection problem. Here algorithms can yield
various classes of errors, including non-table regions improperly labeled as tables (insertion errors), tables missed completely
(deletion errors), larger tables broken into a number of smaller ones (splitting errors), and groups of smaller tables combined
to form larger ones (merging errors). This leads naturally to the use of an edit distance approach for assessing the results
of table detection. Next we address the problem of evaluating table structure recognition. Our model is based on a directed
acyclic attribute graph, or table DAG. We describe a new paradigm, “graph probing,” for comparing the results returned by
the recognition system and the representation created during ground-truthing. Probing is in fact a general concept that could
be applied to other document recognition tasks as well.
Received July 18, 2000 / Accepted October 4, 2001 相似文献
6.
Rule-based document structure understanding with a fuzzy combination of layout and textual features 总被引:1,自引:1,他引:0
Stefan Klink Thomas Kieninger 《International Journal on Document Analysis and Recognition》2001,4(1):18-26
Document image processing is a crucial process in office automation and begins at the ‘OCR’ phase with difficulties in document
‘analysis’ and ‘understanding’. This paper presents a hybrid and comprehensive approach to document structure analysis. Hybrid
in the sense that it makes use of layout (geometrical) as well as textual features of a given document. These features are
the base for potential conditions which in turn are used to express fuzzy matched rules of an underlying rule base. Rules
can be formulated based on features which might be observed within one specific layout object. However, rules can also express
dependencies between different layout objects. In addition to its rule driven analysis, which allows an easy adaptation to
specific domains with their specific logical objects, the system contains domain-independent markup algorithms for common
objects (e.g., lists).
Received June 19, 2000 / Revised November 8, 2000 相似文献
7.
8.
Hon-Son Don 《International Journal on Document Analysis and Recognition》2001,4(2):131-138
A new thresholding method, called the noise attribute thresholding method (NAT), for document image binarization is presented
in this paper. This method utilizes the noise attribute features extracted from the images to make the selection of threshold
values for image thresholding. These features are based on the properties of noise in the images and are independent of the
strength of the signals (objects and background) in the image. A simple noise model is given to explain these noise properties.
The NAT method has been applied to the problem of removing text and figures printed on the back of the paper. Conventional
global thresholding methods cannot solve this kind of problem satisfactorily. Experimental results show that the NAT method
is very effective.
Received July 05, 1999 / Revised July 07, 2000 相似文献
9.
Claudia Wenzel Heiko Maus 《International Journal on Document Analysis and Recognition》2001,3(4):248-260
Knowledge-based systems for document analysis and understanding (DAU) are quite useful whenever analysis has to deal with
the changing of free-form document types which require different analysis components. In this case, declarative modeling is
a good way to achieve flexibility. An important application domain for such systems is the business letter domain. Here, high
accuracy and the correct assignment to the right people and the right processes is a crucial success factor. Our solution
to this proposes a comprehensive knowledge-centered approach: we model not only comparatively static knowledge concerning
document properties and analysis results within the same declarative formalism, but we also include the analysis task and
the current context of the system environment within the same formalism. This allows an easy definition of new analysis tasks
and also an efficient and accurate analysis by using expectations about incoming documents as context information. The approach
described has been implemented within the VOPR (VOPR is an acronym for the Virtual Office PRototype.) system. This DAU system
gains the required context information from a commercial workflow management system (WfMS) by constant exchanges of expectations
and analysis tasks. Further interaction between these two systems covers the delivery of results from DAU to the WfMS and
the delivery of corrected results vice versa.
Received June 19, 1999 / Revised November 8, 2000 相似文献
10.
Hwan-Chul Park Se-Young Ok Young-Jung Yu Hwan-Gue Cho 《International Journal on Document Analysis and Recognition》2001,4(2):115-130
Automatic character recognition and image understanding of a given paper document are the main objectives of the computer
vision field. For these problems, a basic step is to isolate characters and group words from these isolated characters. In
this paper, we propose a new method for extracting characters from a mixed text/graphic machine-printed document and an algorithm
for distinguishing words from the isolated characters. For extracting characters, we exploit several features (size, elongation,
and density) of characters and propose a characteristic value for classification using the run-length frequency of the image
component. In the context of word grouping, previous works have largely been concerned with words which are placed on a horizontal
or vertical line. Our word grouping algorithm can group words which are on inclined lines, intersecting lines, and even curved
lines. To do this, we introduce the 3D neighborhood graph model which is very useful and efficient for character classification
and word grouping. In the 3D neighborhood graph model, each connected component of a text image segment is mapped onto 3D
space according to the area of the bounding box and positional information from the document. We conducted tests with more
than 20 English documents and more than ten oriental documents scanned from books, brochures, and magazines. Experimental
results show that more than 95% of words are successfully extracted from general documents, even in very complicated oriental
documents.
Received August 3, 2001 / Accepted August 8, 2001 相似文献
11.
Lixin Fan Liying Fan Chew Lim Tan 《International Journal on Document Analysis and Recognition》2003,5(2-3):88-101
Abstract. For document images corrupted by various kinds of noise, direct binarization images may be severely blurred and degraded.
A common treatment for this problem is to pre-smooth input images using noise-suppressing filters. This article proposes an
image-smoothing method used for prefiltering the document image binarization. Conceptually, we propose that the influence
range of each pixel affecting its neighbors should depend on local image statistics. Technically, we suggest using coplanar matrices to capture the structural and textural distribution of similar pixels at each site. This property adapts the smoothing process
to the contrast, orientation, and spatial size of local image structures. Experimental results demonstrate the effectiveness
of the proposed method, which compares favorably with existing methods in reducing noise and preserving image features. In
addition, due to the adaptive nature of the similar pixel definition, the proposed filter output is more robust regarding
different noise levels than existing methods.
Received: October 31, 2001 / October 09, 2002
Correspondence to:L. Fan (e-mail: fanlixin@ieee.org) 相似文献
12.
Christian Shin David Doermann Azriel Rosenfeld 《International Journal on Document Analysis and Recognition》2001,3(4):232-247
Searching for documents by their type or genre is a natural way to enhance the effectiveness of document retrieval. The layout
of a document contains a significant amount of information that can be used to classify it by type in the absence of domain-specific
models. Our approach to classification is based on “visual similarity” of layout structure and is implemented by building
a supervised classifier, given examples of each class. We use image features such as percentages of text and non-text (graphics,
images, tables, and rulings) content regions, column structures, relative point sizes of fonts, density of content area, and
statistics of features of connected components which can be derived without class knowledge. In order to obtain class labels
for training samples, we conducted a study where subjects ranked document pages with respect to their resemblance to representative
page images. Class labels can also be assigned based on known document types, or can be defined by the user. We implemented
our classification scheme using decision tree classifiers and self-organizing maps.
Received June 15, 2000 / Revised November 15, 2000 相似文献
13.
Nikolaos Stamatopoulos Basilis Gatos Stavros J. PerantonisAuthor vitae 《Pattern recognition》2009,42(12):3158-3168
Image segmentation is a major task of handwritten document image processing. Many of the proposed techniques for image segmentation are complementary in the sense that each of them using a different approach can solve different difficult problems such as overlapping, touching components, influence of author or font style etc. In this paper, a combination method of different segmentation techniques is presented. Our goal is to exploit the segmentation results of complementary techniques and specific features of the initial image so as to generate improved segmentation results. Experimental results on line segmentation methods for handwritten documents demonstrate the effectiveness of the proposed combination method. 相似文献
14.
Michael Cannon Judith Hochberg Patrick Kelly 《International Journal on Document Analysis and Recognition》1999,2(2-3):80-89
We present a useful method for assessing the quality of a typewritten document image and automatically selecting an optimal
restoration method based on that assessment. We use five quality measures that assess the severity of background speckle,
touching characters, and broken characters. A linear classifier uses these measures to select a restoration method. On a 139-document
corpus, our methodology reduced the corpus OCR character error rate from 20.27% to 12.60%.
Received November 10, 1998 / Revised October 27, 1999 相似文献
15.
Hideaki Goto Hirotomo Aso 《International Journal on Document Analysis and Recognition》2002,4(4):258-268
Recent remarkable progress in computer systems and printing devices has made it easier to produce printed documents with
various designs. Text characters are often printed on colored backgrounds, and sometimes on complex backgrounds such as photographs,
computer graphics, etc. Some methods have been developed for character pattern extraction from document images and scene images
with complex backgrounds. However, the previous methods are suitable only for extracting rather large characters, and the
processes often fail to extract small characters with thin strokes. This paper proposes a new method by which character patterns
can be extracted from document images with complex backgrounds. The method is based on local multilevel thresholding and pixel
labeling, and region growing. This framework is very useful for extracting character patterns from badly illuminated document
images. The performance of extracting small character patterns has been improved by suppressing the influence of mixed-color
pixels around character edges. Experimental results show that the method is capable of extracting very small character patterns
from main text blocks in various documents, separating characters and complex backgrounds, as long as the thickness of the
character strokes is more than about 1.5 pixels.
Received July 23, 2001 / Accepted November 5, 2001 相似文献
16.
An information retrieval system that captures both visual and textual contents from paper documents can derive maximal benefits
from DAR techniques while demanding little human assistance to achieve its goals. This article discusses technical problems,
along with solution methods, and their integration into a well-performing system. The focus of the discussion is very difficult
applications, for example, Chinese and Japanese documents. Solution methods are also highlighted, with the emphasis placed
upon some new ideas, including window-based binarization using scale measures, document layout analysis for solving the multiple
constraint problem, and full-text searching techniques capable of evading machine recognition errors.
Received May 25, 2000 / Revised November 7, 2000 相似文献
17.
In this paper, we present a new approach to extract characters on a license plate of a moving vehicle, given a sequence of
perspective-distortion-corrected license plate images. Different from many existing single-frame approaches, our method simultaneously
utilizes spatial and temporal information. We first model the extraction of characters as a Markov random field (MRF), where
the randomness is used to describe the uncertainty in pixel label assignment. With the MRF modeling, the extraction of characters
is formulated as the problem of maximizing a posteriori probability based on a given prior knowledge and observations. A genetic algorithm with local greedy mutation operator is
employed to optimize the objective function. Experiments and comparison study were conducted and some of our experimental
results are presented in the paper. It is shown that our approach provides better performance than other single frame methods.
Received: 13 August 1997 / Accepted: 7 October 1997 相似文献
18.
Efficient extraction of primitives from line drawings composed of horizontal and vertical lines 总被引:6,自引:0,他引:6
The performance of the algorithms for the extraction of primitives for the interpretation of line drawings is usually affected
by the degradation of the information contained in the document due to factors such as low print contrast, defocusing, skew,
etc. In this paper, we are proposing two algorithms for the extraction of primitives with good performance under degradation.
The application of the algorithms is restricted to line drawings composed of horizontal and vertical lines. The performance
of the algorithms has been evaluated by using a protocol described in the literature.
Received: 6 August 1996 / Accepted: 16 July 1997 相似文献
19.
Henry S. Baird Allison L. Coates Richard J. Fateman 《International Journal on Document Analysis and Recognition》2003,5(2-3):158-163
Abstract. We exploit the gap in ability between human and machine vision systems to craft a family of automatic challenges that tell
human and machine users apart via graphical interfaces including Internet browsers. Turing proposed [Tur50] a method whereby
human judges might validate “artificial intelligence” by failing to distinguish between human and machine interlocutors. Stimulated
by the “chat room problem” posed by Udi Manber of Yahoo!, and influenced by the CAPTCHA project [BAL00] of Manuel Blum et
al. of Carnegie-Mellon Univ., we propose a variant of the Turing test using pessimal print: that is, low-quality images of machine-printed text synthesized pseudo-randomly over certain ranges of words, typefaces,
and image degradations. We show experimentally that judicious choice of these ranges can ensure that the images are legible
to human readers but illegible to several of the best present-day optical character recognition (OCR) machines. Our approach
is motivated by a decade of research on performance evaluation of OCR machines [RJN96,RNN99] and on quantitative stochastic
models of document image quality [Bai92,Kan96]. The slow pace of evolution of OCR and other species of machine vision over
many decades [NS96,Pav00] suggests that pessimal print will defy automated attack for many years. Applications include `bot'
barriers and database rationing.
Received: February 14, 2002 / Accepted: March 28, 2002
An expanded version of: A.L. Coates, H.S. Baird, R.J. Fateman (2001) Pessimal Print: a reverse Turing Test. In: {\it Proc.
6th Int. Conf. on Document Analysis and Recognition}, Seattle, Wash., USA, September 10–13, pp. 1154–1158
Correspondence to: H. S. Baird 相似文献
20.
This paper presents a system for automatic generation of the adjacency matrix from the image of graphs. The graph, we assume,
is printed or hand printed and available as a part of a document either separately or along with text and picture. A morphology-based
approach is used here to separate components of the graphs: vertices, edges and labels. A novel technique is proposed to traverse
the nonplanar edges joining the vertices. The proposed method may be used for logical compression of the information contained
in the graph image in the form of an adjacency matrix. It may also be used to replace the cumbersome, error-prone and time-consuming
manual method of generation of the adjacency matrix for graphs with large number of vertices and complex interconnections. 相似文献