期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Geometric rectification of camera-captured document images

Liang J DeMenthon D Doermann D 《IEEE transactions on pattern analysis and machine intelligence》2008,30(4):591-605

Compared to typical scanners, handheld cameras offer convenient, flexible, portable, and non-contact image capture, which enables many new applications and breathes new life into existing ones. However, camera-captured documents may suffer from distortions caused by non-planar document shape and perspective projection, which lead to failure of current OCR technologies. We present a geometric rectification framework for restoring the frontal-flat view of a document from a single camera-captured image. Our approach estimates 3D document shape from texture flow information obtained directly from the image without requiring additional 3D/metric data or prior camera calibration. Our framework provides a unified solution for both planar and curved documents and can be applied in many, especially mobile, camera-based document analysis applications. Experiments show that our method produces results that are significantly more OCR compatible than the original images. 相似文献

2.

Script-independent text line segmentation in freestyle handwritten documents

Li Y Zheng Y Doermann D Jaeger S Li Y 《IEEE transactions on pattern analysis and machine intelligence》2008,30(8):1313-1329

相似文献

3.

Text and non-text separation in offline document images: a survey

Bhowmik Showmik Sarkar Ram Nasipuri Mita Doermann David 《International Journal on Document Analysis and Recognition》2018,21(1-2):1-20

International Journal on Document Analysis and Recognition (IJDAR) - Separation of text and non-text is an essential processing step for any document analysis system. Therefore, it is important to... 相似文献

4.

Document Image Coding for Processing and Retrieval

Omid E. Kia David S. Doermann 《The Journal of VLSI Signal Processing》1998,20(1-2):121-135

Document images belong to a unique class of images where the information is embedded in the language represented by a series of symbols on the page rather than in the visual objects themselves. Since these symbols tend to appear repeatedly, a domain-specific image coding strategy can be designed to facilitate enhanced compression and retrieval. In this paper we describe a coding methodology that not only exploits component-level redundancy to reduce code length but also supports efficient data access. The approach identifies and organizes symbol patterns which appear repeatedly. Similar components are represented by a single prototype stored in a library and the location of each component instance is coded along with the residual between it and its prototype. A representation is built which provides a natural information index allowing access to individual components. Compression results are competitive and compressed-domain access is superior to competing methods. Applications to network-related problems have been considered, and show promising results. 相似文献

5.

Symbolic Compression and Processing of Document Images

Omid E Kia David S Doermann Azriel Rosenfeld Rama Chellapa 《Computer Vision and Image Understanding》1998,70(3):335-349

In this paper, we describe a compression and representation scheme which exploits the component-level redundancy found within a document image. The approach identifies patterns which appear repeatedly, represents similar patterns with a single prototype, stores the location of pattern instances, and codes the residuals between the prototypes and the pattern instances. Using a novel encoding scheme, we provide a representation that facilitates scalable lossy compression and progressive transmission and supports document image analysis in the compressed domain. We motivate the approach, provide details of the encoding procedures, report compression results, and describe a class of document image understanding tasks that operate on the compressed representation. 相似文献

6.

Recovery of temporal information from static images of handwriting 总被引：3，自引：0，他引：3

David S. Doermann Azriel Rosenfeld 《International Journal of Computer Vision》1995,15(1-2):143-164

The problem of off-line handwritten character recognition has eluded a satisfactory solution for several decades. Researchers working in the area of on-line recognition have had greater success, but the possibility of extracting on-line information from static images has not been fully explored. The experience of forensic document examiners assures us that in many cases, such information can be successfully recovered.We outline the design of a system for the recovery of temporal information from static handwritten images. We provide a taxonomy of local, regional and global temporal clues which are often found in hand-written samples, and describe methods for recovering these clues from the image.We show how this system can benefit from obtaining a comprehensive understanding of the handwriting signal and a detailed analysis of stroke and sub-stroke properties. We suggest that the recovery task requires that we break away from traditional thresholding and thinning techniques, and we provide a framework for such analysis. We demonstrate how isolated temporal clues can reliably be extracted from this framework and propose a control structure for integrating the partial information.We show how many seemingly ambiguous situations can be resolved by the derived clues and our knowledge of the writing process, and provide several examples to illustrate our approach.The support of this research by the Ricoh Corporation is gratefully acknowledged. 相似文献

7.

Editorial

David Doermann Seong-Whan Lee Sargur Srihari Karl Tombre Azriel Rosenfeld 《International Journal on Document Analysis and Recognition》1998,1(1):1-2

相似文献

8.

Language identification for handwritten document images using a shape codebook

Guangyu Zhu Xiaodong Yu Yi Li David DoermannAuthor vitae 《Pattern recognition》2009,42(12):3184-3191

相似文献

9.

VCode—Pervasive Data Transfer Using Video Barcode

Xu Liu Doermann D. Huiping Li 《Multimedia, IEEE Transactions on》2008,10(3):361-371

In this paper, we describe a novel data transfer scheme that uses the camera in a smart phone as an alternative data channel. The data is encoded as a sequence of 2-D barcode images, displayed on a flat panel display, acquired by the camera, and decoded in real time by the software embedded in device. The decoded data is written to a file. Compared with existing data channels, such as CDMA/GPRS, cables, Bluetooth, and Infrared, our method relies on visual communication and does not require special hardware or data plans. Users only need to point the camera at a monitor displaying the VCode to download. Technical challenges to overcome include correction of perspective distortion, compensation for contrast variation, and efficient implementation of small footprint software into a mobile device. We address these challenges and present our solution in detail. We have implemented a prototype which allows users to download various types of files successfully, including pictures, ring tones and Java games onto camera phones running Symbian and Windows Mobile platforms. We discuss the limitations of our solution and outline future work to overcome these limitations. 相似文献

10.

Video retrieval of near-duplicates using κ-nearest neighbor retrieval of spatio-temporal descriptors

Daniel DeMenthon David Doermann 《Multimedia Tools and Applications》2006,30(3):229-253

This paper describes a novel methodology for implementing video search functions such as retrieval of near-duplicate videos and recognition of actions in surveillance video. Videos are divided into half-second clips whose stacked frames produce 3D space-time volumes of pixels. Pixel regions with consistent color and motion properties are extracted from these 3D volumes by a threshold-free hierarchical space-time segmentation technique. Each region is then described by a high-dimensional point whose components represent the position, orientation and, when possible, color of the region. In the indexing phase for a video database, these points are assigned labels that specify their video clip of origin. All the labeled points for all the clips are stored into a single binary tree for efficient -nearest neighbor retrieval. The retrieval phase uses video segments as queries. Half-second clips of these queries are again segmented by space-time segmentation to produce sets of points, and for each point the labels of its nearest neighbors are retrieved. The labels that receive the largest numbers of votes correspond to the database clips that are the most similar to the query video segment. We illustrate this approach for video indexing and retrieval and for action recognition. First, we describe retrieval experiments for dynamic logos, and for video queries that differ from the indexed broadcasts by the addition of large overlays. Then we describe experiments in which office actions (such as pulling and closing drawers, taking and storing items, picking up and putting down a phone) are recognized. Color information is ignored to insure independence of action recognition to people's appearance. One of the distinct advantages of using this approach for action recognition is that there is no need for detection or recognition of body parts. 相似文献