期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parameter-free geometric document layout analysis 总被引：1，自引：0，他引：1

Seong-Whan Lee Dae-Seok Ryu 《IEEE transactions on pattern analysis and machine intelligence》2001,23(11):1240-1256

Automatic transformation of paper documents into electronic documents requires geometric document layout analysis at the first stage. However, variations in character font sizes, text line spacing, and document layout structures have made it difficult to design a general-purpose document layout analysis algorithm for many years. The use of some parameters has therefore been unavoidable in previous methods. The authors propose a parameter-free method for segmenting the document images into maximal homogeneous regions and identifying them as texts, images, tables, and ruling lines. A pyramidal quadtree structure is constructed for multiscale analysis and a periodicity measure is suggested to find a periodical attribute of text regions for page segmentation. To obtain robust page segmentation results, a confirmation procedure using texture analysis is applied to only ambiguous regions. Based on the proposed periodicity measure, multiscale analysis, and confirmation procedure, we could develop a robust method for geometric document layout analysis independent of character font sizes, text line spacing, and document layout structures. The proposed method was experimented with the document database from the University of Washington and the MediaTeam Document Database. The results of these tests have shown that the proposed method provides more accurate results than previous ones 相似文献

2.

Document representation and its application to page decomposition 总被引：6，自引：0，他引：6

Jain A.K. Bin Yu 《IEEE transactions on pattern analysis and machine intelligence》1998,20(3):294-308

Transforming a paper document to its electronic version in a form suitable for efficient storage, retrieval, and interpretation continues to be a challenging problem. An efficient representation scheme for document images is necessary to solve this problem. Document representation involves techniques of thresholding, skew detection, geometric layout analysis, and logical layout analysis. The derived representation can then be used in document storage and retrieval. Page segmentation is an important stage in representing document images obtained by scanning journal pages. The performance of a document understanding system greatly depends on the correctness of page segmentation and labeling of different regions such as text, tables, images, drawings, and rulers. We use the traditional bottom-up approach based on the connected component extraction to efficiently implement page segmentation and region identification. A new document model which preserves top-down generation information is proposed based on which a document is logically represented for interactive editing, storage, retrieval, transfer, and logical analysis. Our algorithm has a high accuracy and takes approximately 1.4 seconds on a SGI Indy workstation for model creation, including orientation estimation, segmentation, and labeling (text, table, image, drawing, and ruler) for a 2550×3300 image of a typical journal page scanned at 300 dpi. This method is applicable to documents from various technical journals and can accommodate moderate amounts of skew and noise 相似文献

3.

Extracting text from scanned Arabic books: a large-scale benchmark dataset and a fine-tuned Faster-R-CNN model

Elanwar Randa Qin Wenda Betke Margrit Wijaya Derry 《International Journal on Document Analysis and Recognition》2021,24(4):349-362

Datasets of documents in Arabic are urgently needed to promote computer vision and natural language processing research that addresses the specifics of the language. Unfortunately, publicly available Arabic datasets are limited in size and restricted to certain document domains. This paper presents the release of BE-Arabic-9K, a dataset of more than 9000 high-quality scanned images from over 700 Arabic books. Among these, 1500 images have been manually segmented into regions and labeled by their functionality. BE-Arabic-9K includes book pages with a wide variety of complex layouts and page contents, making it suitable for various document layout analysis and text recognition research tasks. The paper also presents a page layout segmentation and text extraction baseline model based on fine-tuned Faster R-CNN structure (FFRA). This baseline model yields cross-validation results with an average accuracy of 99.4% and F1 score of 99.1% for text versus non-text block classification on 1500 annotated images of BE-Arabic-9K. These results are remarkably better than those of the state-of-the-art Arabic book page segmentation system ECDP. FFRA also outperforms three other prior systems when tested on a competition benchmark dataset, making it an outstanding baseline model to challenge.

相似文献

4.

Complex documents images segmentation based on steerable pyramid features

Mohamed Benjelil Slim Kanoun Rémy Mullot Adel M. Alimi 《International Journal on Document Analysis and Recognition》2010,13(3):209-228

相似文献

5.

Accurate segmentation of complex document image using digital shearlet transform with neutrosophic set as uncertainty handling tool

《Applied Soft Computing》2017

In any image segmentation problem, there exist uncertainties. These uncertainties occur from gray level and spatial ambiguities in an image. As a result, accurate segmentation of text regions from non-text regions (graphics/images) in mixed and complex documents is a fairly difficult problem. In this paper, we propose a novel text region segmentation method based on digital shearlet transform (DST). The method is capable of handling the uncertainties arising in the segmentation process. To capture the anisotropic features of the text regions, the proposed method uses the DST coefficients as input features to a segmentation process block. This block is designed using the neutrosophic set (NS) for management of the uncertainty in the process. The proposed method is experimentally verified extensively and the performance is compared with that of some state-of-the-art techniques both quantitatively and qualitatively using benchmark dataset. 相似文献

6.

Text segmentation using gabor filters for automatic document processing 总被引：24，自引：0，他引：24

Anil K. Jain Sushil Bhattacharjee 《Machine Vision and Applications》1992,5(3):169-184

There is a considerable interest in designing automatic systems that will scan a given paper document and store it on electronic media for easier storage, manipulation, and access. Most documents contain graphics and images in addition to text. Thus, the document image has to be segmented to identify the text regions, so that OCR techniques may be applied only to those regions. In this paper, we present a simple method for document image segmentation in which text regions in a given document image are automatically identified. The proposed segmentation method for document images is based on a multichannel filtering approach to texture segmentation. The text in the document is considered as a textured region. Nontext contents in the document, such as blank spaces, graphics, and pictures, are considered as regions with different textures. Thus, the problem of segmenting document images into text and nontext regions can be posed as a texture segmentation problem. Two-dimensional Gabor filters are used to extract texture features for each of these regions. These filters have been extensively used earlier for a variety of texture segmentation tasks. Here we apply the same filters to the document image segmentation problem. Our segmentation method does not assume any a priori knowledge about the content or font styles of the document, and is shown to work even for skewed images and handwritten text. Results of the proposed segmentation method are presented for several test images which demonstrate the robustness of this technique. This work was supported by the National Science Foundation under NSF grant CDA-88-06599 and by a grant from E. 1. Du Pont De Nemours & Company. 相似文献

7.

Geometric structure analysis of document images: a knowledge-based approach 总被引：1，自引：0，他引：1

Kyong-Ho Lee Yoon-Chul Choy Sung-Bae Cho 《IEEE transactions on pattern analysis and machine intelligence》2000,22(11):1224-1240

This paper presents a knowledge-based method for sophisticated geometric structure analysis of technical journal pages. The proposed knowledge base encodes geometric characteristics that are not only common in technical journals but also publication-specific in the form of rules. The method takes the hybrid of top-down and bottom-up techniques and consists of two phases: region segmentation and identification. Generally, the result of the segmentation process does not have a one-to-one matching with composite layout components. Therefore, the proposed method identifies non-text objects, such as images, drawings, and tables, as well as text objects, by splitting or grouping segmented regions into composite layout components. Experimental results with 372 images scanned from the IEEE Transactions on Pattern Analysis and Machine Intelligence show that the proposed method has performed geometric structure analysis successfully on more than 99 percent of the test images. 相似文献

8.

Segmentation of Page Images Using the Area Voronoi Diagram

Koichi Kise Akinori Sato Motoi Iwata 《Computer Vision and Image Understanding》1998,70(3):370-382

This paper presents a method of page segmentation based on the approximated area Voronoi diagram. The characteristics of the proposed method are as follows: (1) The Voronoi diagram enables us to obtain the candidates of boundaries of document components from page images with non-Manhattan layout and a skew. (2) The candidates are utilized to estimate the intercharacter and interline gaps without the use of domain-specific parameters to select the boundaries. From the experimental results for 128 images with non-Manhattan layout and the skew of 0°∼45° as well as 98 images with Manhattan layout, we have confirmed that the method is effective for extraction of body text regions, and it is as efficient as other methods based on connected component analysis. 相似文献

9.

Text extraction in complex color documents

C. StrouthopoulosN. Papamarkos A.E. Atsalakis 《Pattern recognition》2002,35(8):1743-1758

Text extraction in mixed-type documents is a pre-processing and necessary stage for many document applications. In mixed-type color documents, text, drawings and graphics appear with millions of different colors. In many cases, text regions are overlaid onto drawings or graphics. In this paper, a new method to automatically detect and extract text in mixed-type color documents is presented. The proposed method is based on a combination of an adaptive color reduction (ACR) technique and a page layout analysis (PLA) approach. The ACR technique is used to obtain the optimal number of colors and to convert the document into the principal of them. Then, using the principal colors, the document image is split into the separable color plains. Thus, binary images are obtained, each one corresponding to a principal color. The PLA technique is applied independently to each of the color plains and identifies the text regions. A merging procedure is applied in the final stage to merge the text regions derived from the color plains and to produce the final document. Several experimental and comparative results, exhibiting the performance of the proposed technique, are also presented. 相似文献

10.

Rough-fuzzy clustering and multiresolution image analysis for text-graphics segmentation

《Applied Soft Computing》2015

相似文献

11.

Classification of document pages using structure-based features

Christian Shin David Doermann Azriel Rosenfeld 《International Journal on Document Analysis and Recognition》2001,3(4):232-247

Searching for documents by their type or genre is a natural way to enhance the effectiveness of document retrieval. The layout of a document contains a significant amount of information that can be used to classify it by type in the absence of domain-specific models. Our approach to classification is based on “visual similarity” of layout structure and is implemented by building a supervised classifier, given examples of each class. We use image features such as percentages of text and non-text (graphics, images, tables, and rulings) content regions, column structures, relative point sizes of fonts, density of content area, and statistics of features of connected components which can be derived without class knowledge. In order to obtain class labels for training samples, we conducted a study where subjects ranked document pages with respect to their resemblance to representative page images. Class labels can also be assigned based on known document types, or can be defined by the user. We implemented our classification scheme using decision tree classifiers and self-organizing maps. Received June 15, 2000 / Revised November 15, 2000 相似文献

12.

改进的基于连通域的版面分割方法

于明郭佥王栋壮于洋《计算机工程与应用》2013,49(17):195-198

版面分割是版面分析的重要组成部分,经过大量的研究,如今已到了一个比较成熟的阶段。对基于连通域的版面分割算法进行了改进,能有效快速地分割较为复杂的版面图像,同时有效减少原有算法中阈值引起的分割错误的情况。先对文本图像进行单个字体的区域扩充,使后续的连通间距统计更为准确和方便,再通过连通间距的统计对图像进行模糊整合,进行文本图像的连通区域分割。实验结果表明,改进的基于连通域的算法分割版面准确,速度快,适用范围广,对于较为复杂的版面分割更具有优越性。相似文献

13.

A robust system for document layout analysis using multilevel homogeneity structure

《Expert systems with applications》2017

One of the difficulties in the understanding of document images is document layout analysis, which is the first step in document image modeling. In this paper, a robust system for which a multilevel-homogeneity structure is used in accordance with a hybrid methodology is proposed to deal with this problem. Our system consists of the following three main stages: classification, segmentation, and refinement and labeling. Different from other page segmentation methods, the proposed system includes an efficient algorithm to detect table regions in document images. Besides, to create an effective application, the proposed system is designed to work with a variety of document languages. The proposed method was tested with the ICDAR2015 competition (RDCL-2015) and three other published datasets in different languages. The results of these tests show that the accuracy of proposed system is superior to the previous methods. 相似文献

14.

A multi-plane approach for text segmentation of complex document images

Yen-Lin Chen Author Vitae 《Pattern recognition》2009,42(7):1419-1444

This study presents a new method, namely the multi-plane segmentation approach, for segmenting and extracting textual objects from various real-life complex document images. The proposed multi-plane segmentation approach first decomposes the document image into distinct object planes to extract and separate homogeneous objects including textual regions of interest, non-text objects such as graphics and pictures, and background textures. This process consists of two stages—localized histogram multilevel thresholding and multi-plane region matching and assembling. Then a text extraction procedure is applied on the resultant planes to detect and extract textual objects with different characteristics in the respective planes. The proposed approach processes document images regionally and adaptively according to their respective local features. Hence detailed characteristics of the extracted textual objects, particularly small characters with thin strokes, as well as gradational illuminations of characters, can be well-preserved. Moreover, this way also allows background objects with uneven, gradational, and sharp variations in contrast, illumination, and texture to be handled easily and well. Experimental results on real-life complex document images demonstrate that the proposed approach is effective in extracting textual objects with various illuminations, sizes, and font styles from various types of complex document images. 相似文献

15.

The document spectrum for page layout analysis 总被引：17，自引：0，他引：17

O'Gorman L. 《IEEE transactions on pattern analysis and machine intelligence》1993,15(11):1162-1173

Page layout analysis is a document processing technique used to determine the format of a page. This paper describes the document spectrum (or docstrum), which is a method for structural page layout analysis based on bottom-up, nearest-neighbor clustering of page components. The method yields an accurate measure of skew, within-line, and between-line spacings and locates text lines and text blocks. It is advantageous over many other methods in three main ways: independence from skew angle, independence from different text spacings, and the ability to process local regions of different text orientations within the same image. Results of the method shown for several different page formats and for randomly oriented subpages on the same image illustrate the versatility of the method. We also discuss the differences, advantages, and disadvantages of the docstrum with respect to other lay-out methods 相似文献

16.

基于视窗的OCR页面图像倾斜检测方法 总被引：2，自引：0，他引：2

下载免费PDF全文

靳从魏之来杨静宇《中国图象图形学报》2004,9(11):1290-1293

文档在扫描输入过程中，所生成的页面图像一般都存在一定的角度倾斜，当页面图像倾斜角度过大时，将对进一步的版面分析以及字符识别产生不良影响。为了快速准确地检测页面图像倾斜角度和降低计算量，提出了一种基于视窗变换的页面图像倾斜检测方法，该算法首先对视窗中的文字及图片的细节部分进行模糊，然后对其边沿进行直线拟合，以便快速检测页面图像倾斜角度。实验结果表明，该方法能快速准确地检测出各类页面图像的倾斜角度，并具有良好的适应性。相似文献

17.

Unified HMM-based layout analysis framework and algorithm

陈明丁晓青吴佑寿《中国科学F辑(英文版)》2003,46(6):401-408

To manipulate the layout analysis problem for complex or irregular document image, a Unified HMM-based Layout Analysis Framework is presented in this paper. Based on the multi-resolution wavelet analysis results of the document image, we use HMM method in both inner-scale image model and trans-scale context model to classify the pixel region properties, such as text, picture or background. In each scale, a HMM direct segmentation method is used to get better inner-scale classification result. Then another HMM method is used to fuse the inner-scale result in each scale and then get better final segmentation result. The optimized algorithm uses a stop rule in the coarse to fine multi-scale segmentation process, so the speed is improved remarkably. Experiments prove the efficiency of proposed algorithm. 相似文献

18.

基于模式链分析的文本页面图像的分割与分类

下载免费PDF全文

李艳玲王加俊《中国图象图形学报》2005,10(6):741-745

为了能对复杂版式的文本图像(如包含镶嵌在文字中的形状不规则的图片区)的页面进行图文分割与分类,提出了一种新的基于模式链分析的文本页面分割与分类算法。该算法首先使用外接矩形框出图像中的所有黑像素,并且存入矩形框链表中,再组合所有相邻的矩形进而形成模式,最后依据各模式的统计特征分类,输出文字区和图片区两类图像。另外,对大图片模式周围个别不确定的模式,本文采用了上下文分类的算法进行再次分类。实验结果表明,该算法不仅运算速度快,而且能够对复杂版式的页面图像进行正确的图文分割和分类。相似文献

19.

Hierarchical content classification and script determination for automatic document image processing

Zheru ChiAuthor Vitae Qing WangAuthor Vitae Wan-Chi SiuAuthor Vitae 《Pattern recognition》2003,36(11):2483-2500

相似文献

20.

Convex hull based skew estimation

Bo Yuan Author Vitae Chew Lim Tan Author Vitae 《Pattern recognition》2007,40(2):456-475

Skew estimation and page segmentation are the two closely related processing stages for document image analysis. Skew estimation needs proper page segmentation, especially for document images with multiple skews that are common in scanned images from thick bound publications in 2-up style or postal envelopes with various printed labels. Even if only a single skew is concerned for a document image, the presence of minority regions of different skews or undefined skew such as noise may severely affect the estimation for the dominant skew. Page segmentation, on the other hand, may need to know the exact skew angle of a page in order to work properly. This paper presents a skew estimation method with built-in skew-independent segmentation functionality that is capable of handling document images with multiple regions of different skews. It is based on the convex hulls of the individual components (i.e. the smallest convex polygon that fully contains a component) and that of the component groups (i.e. the smallest convex polygon that fully contain all the components in a group) in a document image. The proposed method first extracts the convex hulls of the components, segments an image into groups of components according to both the spatial distances and size similarities among the convex hulls of the components. This process not only extracts the hints of the alignments of the text groups, but also separate noise or graphical components from that of the textual ones. To verify the proposed algorithms, the full sets of the real and the synthetic samples of the University of Washington English Document Image Database I (UW-I) are used. Quantitative and qualitative comparisons with some existing methods are also provided. 相似文献