首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Marginal noise is a common phenomenon in document analysis which results from the scanning of thick documents or skew documents. It usually appears in the front of a large and dark region around the margin of document images. Marginal noise might cover meaningful document objects, such as text, graphics and forms. The overlapping of marginal noise with meaningful objects makes it difficult to perform the task of segmentation and recognition of document objects. This paper proposes a novel approach to remove marginal noise. The proposed approach consists of two steps which are marginal noise detection and marginal noise deletion. Marginal noise detection will reduce an original document image into a smaller image, and then find marginal noise regions according to the shape length and location of the split blocks. After the detection of marginal noise regions, different removal methods are performed. A local thresholding method is proposed for the removal of marginal noise in gray-scale document images, whereas a region growing method is devised for binary document images. Experimenting with a wide variety of test samples reveals the feasibility and effectiveness of our proposed approach in removing marginal noises.  相似文献   

2.
读书机器人利用光机电一体化技术,实现翻页、版面信息采集、版面分析与文字识别、朗读等自动化功能。由于书本的厚度和装订线,自动翻页机构和视觉系统可能导致版面文字图像的几何变形,直接影响机器人的文字识别能力。因此提出一种版面文字图像的预处理算法,包括版面分析、图像二值化、并通过建立数学模型矫正扭曲变形,改善获取的版面图像质量,提高OCR识别率,保证读书机器人阅读流畅和工作稳定。  相似文献   

3.
Skew estimation and page segmentation are the two closely related processing stages for document image analysis. Skew estimation needs proper page segmentation, especially for document images with multiple skews that are common in scanned images from thick bound publications in 2-up style or postal envelopes with various printed labels. Even if only a single skew is concerned for a document image, the presence of minority regions of different skews or undefined skew such as noise may severely affect the estimation for the dominant skew. Page segmentation, on the other hand, may need to know the exact skew angle of a page in order to work properly. This paper presents a skew estimation method with built-in skew-independent segmentation functionality that is capable of handling document images with multiple regions of different skews. It is based on the convex hulls of the individual components (i.e. the smallest convex polygon that fully contains a component) and that of the component groups (i.e. the smallest convex polygon that fully contain all the components in a group) in a document image. The proposed method first extracts the convex hulls of the components, segments an image into groups of components according to both the spatial distances and size similarities among the convex hulls of the components. This process not only extracts the hints of the alignments of the text groups, but also separate noise or graphical components from that of the textual ones. To verify the proposed algorithms, the full sets of the real and the synthetic samples of the University of Washington English Document Image Database I (UW-I) are used. Quantitative and qualitative comparisons with some existing methods are also provided.  相似文献   

4.
Transforming paper documents into XML format with WISDOM++   总被引:1,自引:1,他引:0  
The transformation of scanned paper documents to a form suitable for an Internet browser is a complex process that requires solutions to several problems. The application of an OCR to some parts of the document image is only one of the problems. In fact, the generation of documents in HTML format is easier when the layout structure of a page has been extracted by means of a document analysis process. The adoption of an XML format is even better, since it can facilitate the retrieval of documents in the Web. Nevertheless, an effective transformation of paper documents into this format requires further processing steps, namely document image classification and understanding. WISDOM++ is a document processing system that operates in five steps: document analysis, document classification, document understanding, text recognition with an OCR, and transformation into HTML/XML format. The innovative aspects described in the paper are: the preprocessing algorithm, the adaptive page segmentation, the acquisition of block classification rules using techniques from machine learning, the layout analysis based on general layout principles, and a method that uses document layout information for conversion to HTML/XML formats. A benchmarking of the system components implementing these innovative aspects is reported. Received June 15, 2000 / Revised November 7, 2000  相似文献   

5.
Projection methods have been used in the analysis of bitonal document images for different tasks such as page segmentation and skew correction for more than two decades. However, these algorithms are sensitive to the presence of border noise in document images. Border noise can appear along the page border due to scanning or photocopying. Over the years, several page segmentation algorithms have been proposed in the literature. Some of these algorithms have come into widespread use due to their high accuracy and robustness with respect to border noise. This paper addresses two important questions in this context: 1) Can existing border noise removal algorithms clean up document images to a degree required by projection methods to achieve competitive performance? 2) Can projection methods reach the performance of other state-of-the-art page segmentation algorithms (e.g., Docstrum or Voronoi) for documents where border noise has successfully been removed? We perform extensive experiments on the University of Washington (UW-III) data set with six border noise removal methods. Our results show that although projection methods can achieve the accuracy of other state-of-the-art algorithms on the cleaned document images, existing border noise removal techniques cannot clean up documents captured under a variety of scanning conditions to the degree required to achieve that accuracy.  相似文献   

6.
This paper presents a text block extraction algorithm that takes as its input a set of text lines of a given document, and partitions the text lines into a set of text blocks, where each text block is associated with a set of homogeneous formatting attributes, e.g. text-alignment, indentation. The text block extraction algorithm described in this paper is probability based. We adopt an engineering approach to systematically characterising the text block structures based on a large document image database, and develop statistical methods to extract the text block structures from the image. All the probabilities are estimated from an extensive training set of various kinds of measurements among the text lines, and among the text blocks in the training data set. The off-line probabilities estimated in the training then drive all decisions in the on-line text block extraction. An iterative, relaxation-like method is used to find the partitioning solution that maximizes the joint probability. To evaluate the performance of our text block extraction algorithm, we used a three-fold validation method and developed a quantitative performance measure. The algorithm was evaluated on the UW-III database of some 1600 scanned document image pages. The text block extraction algorithm identifies and segments 91% of text blocks correctly.  相似文献   

7.
In this paper, we describe an image based document retrieval system which runs on camera enabled mobile devices. “Mobile Retriever” aims to seamlessly link physical and digital documents by allowing users to snap a picture of the text of a document and retrieve its electronic version from a database. Experiments show that for a database of 100,093 pages, the correct document can be retrieved in less than 4 s at a success rate over 95%. Our system extracts token pairs from the text, to efficiently index and retrieve candidate pages using only a small portion of the image. We use token triplets that define the orientation of three corresponding tokens to effectively prune the false positives and identify the correct page to retrieve. We stress the importance of geometrical relationship between feature points and show its effectiveness in our camera based image retrieval system.  相似文献   

8.
为生成含噪声的扫描文档图像的基准标引信息,系统首先基于无噪声的PDF文档抽取理想化标引信息,采用透视变换模型,将其与含噪声文档图像进行配准,最终生成含噪声图像的基准标引信息,将其用于测试文字识别、检索的精度.系统还基于几种经典的图像退化模型,批量产生了含不同噪声类型的文档图像.经实验表明,该系统标引信息精度高,图像退化结果与实际噪声效果接近.  相似文献   

9.
印刷文献信息采集处理是文本信息处理应用,特别是数字化图书馆建设中十分繁重而又必须从事的工作。由于目前广泛使用的字符光学识别系统(OCR)无法对具有偏斜角度的扫描文本图象进行自动加工处理,所以需要大量的人工介入,即以手工方法纠正图象偏斜。因为无法有效地进行扫描文本集的批量处理,所以难以提高处理效率。针对这一问题,在讨论文本图象轮廓投影性质的基础上,利用其相关系数与文本偏斜角的统计依赖关系,构造了一个用于文本图象的自动偏斜纠正方法。  相似文献   

10.
Document representation and its application to page decomposition   总被引:6,自引:0,他引:6  
Transforming a paper document to its electronic version in a form suitable for efficient storage, retrieval, and interpretation continues to be a challenging problem. An efficient representation scheme for document images is necessary to solve this problem. Document representation involves techniques of thresholding, skew detection, geometric layout analysis, and logical layout analysis. The derived representation can then be used in document storage and retrieval. Page segmentation is an important stage in representing document images obtained by scanning journal pages. The performance of a document understanding system greatly depends on the correctness of page segmentation and labeling of different regions such as text, tables, images, drawings, and rulers. We use the traditional bottom-up approach based on the connected component extraction to efficiently implement page segmentation and region identification. A new document model which preserves top-down generation information is proposed based on which a document is logically represented for interactive editing, storage, retrieval, transfer, and logical analysis. Our algorithm has a high accuracy and takes approximately 1.4 seconds on a SGI Indy workstation for model creation, including orientation estimation, segmentation, and labeling (text, table, image, drawing, and ruler) for a 2550×3300 image of a typical journal page scanned at 300 dpi. This method is applicable to documents from various technical journals and can accommodate moderate amounts of skew and noise  相似文献   

11.
Cui  Zheng  Hu  Yongli  Sun  Yanfeng  Gao  Junbin  Yin  Baocai 《Multimedia Tools and Applications》2022,81(17):23615-23632

Image-text retrieval task has received a lot of attention in the modern research field of artificial intelligence. It still remains challenging since image and text are heterogeneous cross-modal data. The key issue of image-text retrieval is how to learn a common feature space while semantic correspondence between image and text remains. Existing works cannot gain fine cross-modal feature representation because the semantic relation between local features is not effectively utilized and the noise information is not suppressed. In order to address these issues, we propose a Cross-modal Alignment with Graph Reasoning (CAGR) model, in which the refined cross-modal features in the common feature space are learned and then a fine-grained cross-modal alignment method is implemented. Specifically, we introduce a graph reasoning module to explore semantic connection for local elements in each modality and measure their importance by self-attention mechanism. In a multi-step reasoning manner, the visual semantic graph and textual semantic graph can be effectively learned and the refined visual and textual features can be obtained. Finally, to measure the similarity between image and text, a novel alignment approach named cross-modal attentional fine-grained alignment is used to compute similarity score between two sets of features. Our model achieves the competitive performance compared with the state-of-the-art methods on Flickr30K dataset and MS-COCO dataset. Extensive experiments demonstrate the effectiveness of our model.

  相似文献   

12.
基于改进Hough变换的文本图像倾斜校正方法   总被引:2,自引:0,他引:2  
文本图像在扫描输入时产生的倾斜现象会对后续的页面分割及光学字符识别(OCR)处理产生很大的影响,而传统的标准Hough变换虽然具有对噪声不敏感,不依赖于直线连续性的优点,但由于计算量偏大,速度慢,在实用时有较大的局限性。提出一种基于改进的Hough变换的文本图像倾斜校正方法,通过在变分辨率图像中采用不同的文本方向提取算法,及选择合理投票门限等改进Hough变换的措施,减小了由图像区域及文字笔画粗细所产生的对倾角判定的不利影响,并使用基于偏移值的方法实现页面倾斜的快速校正。实验结果表明,该算法实现了大范围高精度的文本图像倾角的快速检测,具有较强的实用性。  相似文献   

13.
14.
15.
Bo  Chew Lim 《Pattern recognition》2005,38(12):2333-2350
Skew estimation for textual document images is a well-researched topic and numerals of methods have been reported in the literature. One of the major challenges is the presence of interfering non-textual objects of various types and quantities in the document images. Many existing methods require proper separation of the textual objects which are well aligned from the non-textual objects which are mostly nonaligned. Some comparative evaluation work on the existing methods chooses only the text zones of the test image database. Therefore, the object filtering or zoning stage is crucial to the skew detection stage. However, it is difficult if not impossible to design general-purpose filters that are able to discriminate noises from textual components. This paper presents a robust, general-purpose skew estimation method that does not need any filtering or zoning preprocessing. In fact, this method does apply filtering, but not on the input components at the beginning of the detection process, rather on the output spectrum at the end of the detection process. Therefore, the problem of finding a textual component filter has been transformed into finding a convolution filter on the output accumulator array. This method consists of three steps: (1) the calculation of the slopes of the virtual lines that pass through the centroids of all the unique pairs of the connected components in an image, and quantizes the arctangents of the slopes into a 1-D accumulator array that covers the range from -90 to +90; (2) a special convolution on the resultant histogram, after which there remain only the prominent peaks that possibly correspond to the skew angles of the image; (3) the verification of the detection result. Its computational complexity and detection precision are uncoupled, unlike those projection-profile-based or Hough-transform-based methods whose speeds drop when higher precision is in demand. Speedup measures on the baseline implementation are also presented. The University of Washington English Document Image Database I (UWDB-I) contains a large number of scanned document images with significant amount of non-textual objects. Therefore, it is a good image database for evaluating the proposed method.  相似文献   

16.
基于视窗的OCR页面图像倾斜检测方法   总被引:2,自引:0,他引:2       下载免费PDF全文
文档在扫描输入过程中,所生成的页面图像一般都存在一定的角度倾斜,当页面图像倾斜角度过大时,将对进一步的版面分析以及字符识别产生不良影响。为了快速准确地检测页面图像倾斜角度和降低计算量,提出了一种基于视窗变换的页面图像倾斜检测方法,该算法首先对视窗中的文字及图片的细节部分进行模糊,然后对其边沿进行直线拟合,以便快速检测页面图像倾斜角度。实验结果表明,该方法能快速准确地检测出各类页面图像的倾斜角度,并具有良好的适应性。  相似文献   

17.
A prototype document image analysis system for technical journals   总被引:3,自引:0,他引:3  
Nagy  G. Seth  S. Viswanathan  M. 《Computer》1992,25(7):10-22
Gobbledoc, a system providing remote access to stored documents, which is based on syntactic document analysis and optical character recognition (OCR), is discussed. In Gobbledoc, image processing, document analysis, and OCR operations take place in batch mode when the documents are acquired. The document image acquisition process and the knowledge base that must be entered into the system to process a family of page images are described. The process by which the X-Y tree data structure converts a 2-D page-segmentation problem into a series of 1-D string-parsing problems that can be tackled using conventional compiler tools is also described. Syntactic analysis is used in Gobbledoc to divide each page into labeled rectangular blocks. Blocks labeled text are converted by OCR to obtain a secondary (ASCII) document representation. Since such symbolic files are better suited for computerized search than for human access to the document content and because too many visual layout clues are lost in the OCR process (including some special characters), Gobbledoc preserves the original block images for human browsing. Storage, networking, and display issues specific to document images are also discussed  相似文献   

18.
19.
Imaged document text retrieval without OCR   总被引:6,自引:0,他引:6  
We propose a method for text retrieval from document images without the use of OCR. Documents are segmented into character objects. Image features, namely the vertical traverse density (VTD) and horizontal traverse density (HTD), are extracted. An n-gram-based document vector is constructed for each document based on these features. Text similarity between documents is then measured by calculating the dot product of the document vectors. Testing with seven corpora of imaged textual documents in English and Chinese as well as images from the UW1 (University of Washington 1) database confirms the validity of the proposed method  相似文献   

20.
Document layout analysis or page segmentation is the task of decomposing document images into many different regions such as texts, images, separators, and tables. It is still a challenging problem due to the variety of document layouts. In this paper, we propose a novel hybrid method, which includes three main stages to deal with this problem. In the first stage, the text and non-text elements are classified by using minimum homogeneity algorithm. This method is the combination of connected component analysis and multilevel homogeneity structure. Then, in the second stage, a new homogeneity structure is combined with an adaptive mathematical morphology in the text document to get a set of text regions. Besides, on the non-text document, further classification of non-text elements is applied to get separator regions, table regions, image regions, etc. The final stage, in refinement region and noise detection process, all regions both in the text document and non-text document are refined to eliminate noises and get the geometric layout of each region. The proposed method has been tested with the dataset of ICDAR2009 page segmentation competition and many other databases with different languages. The results of these tests showed that our proposed method achieves a higher accuracy compared to other methods. This proves the effectiveness and superiority of our method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号