首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Documents may be captured at any orientation when viewed with a hand-held camera. Here, a method of recovering fronto-parallel views of perspectively skewed text documents in single images is presented, useful for ‘point-and-click’ scanning or when generally seeking regions of text in a scene. We introduce a novel extension to the commonly used 2D projection profiles in document recognition to locate the horizontal vanishing point of the text plane. Following further analysis, we segment the lines of text to determine the style of justification of the paragraphs. The change in line spacings exhibited due to perspective is then used to locate the document's vertical vanishing point. No knowledge of the camera focal length is assumed. Using the vanishing points, a fronto-parallel view is recovered which is then suitable for OCR or other high-level recognition. We provide results demonstrating the algorithm's performance on documents over a wide range of orientations.  相似文献   

2.
由于相机成像系统中相机、成像平面、文档平面三者位姿角度的原因,会导致相机获取的文档图像为透视投影而不是正视图,对后期的文字提取与识别等处理造成消极影响;对相机成像系统进行分析,影响相机成像的因素是数学模型下的相机的成像坐标系与世界坐标系的相互转换问题,所以利用了相机坐标系经旋转矩阵转换的技术,提出了一种通过旋转矩阵将真实成像相机坐标系转换为正视图的虚拟相机坐标系,并进行图像像素点的重映射,以实现对文档图像的位姿校正;采用双目相机进行文档拍摄实验,结果表明该方法过程简单易行、可行性高、能有效的对文档图像进行矫正  相似文献   

3.
Common OCR (Optical Character Recognition) systems fail to detect and recognize small text strings of few characters, in particular when a text line is not horizontal. Such text regions are typical for chart images. In this paper we present an algorithm that is able to detect small text regions regardless of string orientation and font size or style. We propose to use this algorithm as a preprocessing step for text recognition with a common OCR engine. According to our experimental results, one can get up to 20 times better text recognition rate, and 15 times higher text recognition precision when the proposed algorithm is used to detect text location, size and orientation, before using an OCR system. Experiments have been performed on a benchmark set of 1000 chart images created with the XML/SWF Chart tool, which contain about 14000 text regions in total.  相似文献   

4.
In a video conferencing setting, people often use an elongated meeting table with the major axis along the camera direction. A standard wide-angle perspective image of this setting creates significant foreshortening, thus the people sitting at the far end of the table appear very small relative to those nearer the camera. This has two consequences. First, it is difficult for the remote participants to see the faces of those at the far end, thus affecting the experience of the video conferencing. Second, it is a waste of the screen space and network bandwidth because most of the pixels are used on the background instead of on the faces of the meeting participants. In this paper, we present a novel technique, called Spatially-Varying-Uniform scaling functions, to warp the images to equalize the head sizes of the meeting participants without causing undue distortion. This technique works for both the 180-degree views where the camera is placed at one end of the table and the 360-degree views where the camera is placed at the center of the table. We have implemented this algorithm on two types of camera arrays: one with 180-degree view, and the other with 360-degree view. On both hardware devices, image capturing, stitching, and head-size equalization are run in real time. In addition, we have conducted user study showing that people clearly prefer head-size equalized images.  相似文献   

5.
In this paper, we present an approach for consistently labeling people and for detecting human–object interactions using mono-camera surveillance video. The approach is based on a robust appearance-based correlogram model combined with histogram information to model color distributions of people and objects in the scene. The models are dynamically built from non-stationary objects, which are the outputs of background subtraction, and are used to identify objects on a frame-by-frame basis. We are able to detect when people merge into groups and to segment them even during partial occlusion. We can also detect when a person deposits or removes an object. The models persist when a person or object leaves the scene and are used to identify them when they reappear. Experiments show that the models are able to accommodate perspective foreshortening that occurs with overhead camera angles, as well as partial occlusion. The results show that this is an effective approach that is able to provide important information to algorithms performing higher-level analysis, such as activity recognition, where human–object interactions play an important role.  相似文献   

6.
图像中的文本字符存在于杂乱的背景之中,拍摄视角的不同使得文本具有较大的几何变形,再加上存在光照变化、字符颜色不统一等现象会导致背景分离和文本识别困难.为此提出一种基于图像文本区域的图像聚类方法.该方法首先对自然场景图像中已定位的文本区域提取局部特征描述,并使用随机投影方法将局部特征矢量集映射为固定维的特征向量,然后对包含图像文本区域的图像进行聚类.这种方法避免了由图像分割与字符识别带来的困难.实验结果表明,该方法可以对包含文字的自然场景图像有效地进行聚类,聚类的准确率能达到86.66%.  相似文献   

7.
This paper presents a new Bayesian-based method of unconstrained handwritten offline Chinese text line recognition. In this method, a sample of a real character or non-character in realistic handwritten text lines is jointly recognized by a traditional isolated character recognizer and a character verifier, which requires just a moderate number of handwritten text lines for training. To improve its ability to distinguish between real characters and non-characters, the isolated character recognizer is negatively trained using a linear discriminant analysis (LDA)-based strategy, which employs the outputs of a traditional MQDF classifier and the LDA transform to re-compute the posterior probability of isolated character recognition. In tests with 383 text lines in HIT-MW database, the proposed method achieved the character-level recognition rates of 71.37% without any language model, and 80.15% with a bi-gram language model, respectively. These promising results have shown the effectiveness of the proposed method for unconstrained handwritten offline Chinese text line recognition.  相似文献   

8.
This paper presents a novel framework for Euclidean structure recovery utilizing a scaled orthographic view and perspective views simultaneously. A scaled orthographic view is introduced in order to automatically obtain camera parameters such as camera positions, orientation, and focal length. Scaled orthographic properties enable all camera parameters to be calculated implicitly and perspective properties enable a Euclidean structure to be recovered. The method can recover a Euclidean structure with at least seven point correspondences across a scaled orthographic view and perspective views. Experimental results for both computed and natural images verify that the method recovers structure with sufficient accuracy to demonstrate potential utility. The proposed method can be applied to an interface for 3D modeling, recognition and tracking  相似文献   

9.
一种视频中字符的集成型切分与识别算法   总被引:3,自引:0,他引:3  
杨武夷  张树武 《自动化学报》2010,36(10):1468-1476
视频文本行图像识别的技术难点主要来源于两个方面: 1)粘连字符的切分与识别问题; 2)复杂背景中字符的切分与识别问题. 为了能够同时切分和识别这两种情况中的字符, 提出了一种集成型的字符切分与识别算法. 该集成型算法首先对文本行图像二值化, 基于二值化的文本行图像的水平投影估计文本行高度. 其次根据字符笔划粘连的程度, 基于图像分析或字符识别对二值图像中的宽连通域进行切分. 然后基于字符识别组合连通域得到候选识别结果, 最后根据候选识别结果构造词图, 基于语言模型从词图中选出字符识别结果. 实验表明该集成型算法大大降低了粘连字符及复杂背景中字符的识别错误率.  相似文献   

10.
Dot-matrix text recognition is a difficult problem, especially when characters are broken into several disconnected components. We present a dot-matrix text recognition system which uses the fact that dot-matrix fonts are fixed-pitch, in order to overcome the difficulty of the segmentation process. After finding the most likely pitch of the text, a decision is made as to whether the text is written in a fixed-pitch or proportional font. Fixed-pitch text is segmented using a pitch-based segmentation process that can successfully segment both touching and broken characters. We report performance results for the pitch estimation, fixed-pitch decision and segmentation, and recognition processes. Received October 18, 1999 / Revised April 21, 2000  相似文献   

11.
We quantify the observation by Kender and Freudenstein (1987) that degenerate views occupy a significant fraction of the viewing sphere surrounding an object. For a perspective camera geometry, we introduce a computational model that can be used to estimate the probability that a view degeneracy will occur in a random view of a polyhedral object. For a typical recognition system parameterization, view degeneracies typically occur with probabilities of 20 percent and, depending on the parameterization, as high as 50 percent. We discuss the impact of view degeneracy on the problem of object recognition and, for a particular recognition framework, relate the cost of object disambiguation to the probability of view degeneracy. To reduce this cost, we incorporate our model of view degeneracy in an active focal length control paradigm that balances the probability of view degeneracy with the camera field of view. In order to validate both our view degeneracy model as well as our active focal length control model, a set of experiments are reported using a real recognition system operating on real images  相似文献   

12.
《Real》1999,5(3):215-230
The problem of a real-time pose estimation between a 3D scene and a single camera is a fundamental task in most 3D computer vision and robotics applications such as object tracking, visual servoing, and virtual reality. In this paper we present two fast methods for estimating the 3D pose using 2D to 3D point and line correspondences. The first method is based on the iterative use of a weak perspective camera model and forms a generalization of DeMenthon's method (1995) which consists of determining the pose from point correspondences. In this method the pose is iteratively improved with a weak perspective camera model and at convergence the computed pose corresponds to the perspective camera model. The second method is based on the iterative use of a paraperspective camera model which is a first order approximation of perspective. We describe in detail these two methods for both non-planar and planar objects. Experiments involving synthetic data as well as real range data indicate the feasibility and robustness of these two methods. We analyse the convergence of these methods and we conclude that the iterative paraperspective method has better convergence properties than the iterative weak perspective method. We also introduce a non-linear optimization method for solving the pose problem.  相似文献   

13.
Text line segmentation in handwritten documents is an important task in the recognition of historical documents. Handwritten document images contain text lines with multiple orientations, touching and overlapping characters between consecutive text lines and different document structures, making line segmentation a difficult task. In this paper, we present a new approach for handwritten text line segmentation solving the problems of touching components, curvilinear text lines and horizontally overlapping components. The proposed algorithm formulates line segmentation as finding the central path in the area between two consecutive lines. This is solved as a graph traversal problem. A graph is constructed using the skeleton of the image. Then, a path-finding algorithm is used to find the optimum path between text lines. The proposed algorithm has been evaluated on a comprehensive dataset consisting of five databases: ICDAR2009, ICDAR2013, UMD, the George Washington and the Barcelona Marriages Database. The proposed method outperforms the state-of-the-art considering the different types and difficulties of the benchmarking data.  相似文献   

14.
Offline handwritten Amharic word recognition   总被引:1,自引:0,他引:1  
This paper describes two approaches for Amharic word recognition in unconstrained handwritten text using HMMs. The first approach builds word models from concatenated features of constituent characters and in the second method HMMs of constituent characters are concatenated to form word model. In both cases, the features used for training and recognition are a set of primitive strokes and their spatial relationships. The recognition system does not require segmentation of characters but requires text line detection and extraction of structural features, which is done by making use of direction field tensor. The performance of the recognition system is tested by a dataset of unconstrained handwritten documents collected from various sources, and promising results are obtained.  相似文献   

15.
Segment Based Camera Calibration   总被引:5,自引:2,他引:3       下载免费PDF全文
The basic idea of calibrating a camera system in previous approaches is to determine camera parmeters by using a set of known 3D points as calibration reference.In this paper,we present a method of camera calibration in whih camera parameters are determined by a set of 3D lines.A set of constraints is derived on camea parameters in terms of perspective line mapping.Form these constraints,the same perspective transformation matrix as that for point mapping can be computed linearly.The minimum number of calibration lines is 6.This result generalizes that of Liu,Huang and Faugeras^[12] for camera location determination in which at least 8 line correspondences are required for linear computation of camera location.Since line segments in an image can be located easily and more accurately than points,the use of lines as calibration reference tends to ease the computation in inage preprocessing and to improve calibration accuracy.Experimental results on the calibration along with stereo reconstruction are reported.  相似文献   

16.
Detecting and recognizing text in natural images are quite challenging and have received much attention from the computer vision community in recent years. In this paper, we propose a robust end-to-end scene text recognition method, which utilizes tree-structured character models and normalized pictorial structured word models. For each category of characters, we build a part-based tree-structured model (TSM) so as to make use of the character-specific structure information as well as the local appearance information. The TSM could detect each part of the character and recognize the unique structure as well, seamlessly combining character detection and recognition together. As the TSMs could accurately detect characters from complex background, for text localization, we apply TSMs for all the characters on the coarse text detection regions to eliminate the false positives and search the possible missing characters as well. While for word recognition, we propose a normalized pictorial structure (PS) framework to deal with the bias caused by words of different lengths. Experimental results on a range of challenging public datasets (ICDAR 2003, ICDAR 2011, SVT) demonstrate that the proposed method outperforms state-of-the-art methods both for text localization and word recognition.  相似文献   

17.
王寅同  郑豪  常合友  李朔 《控制与决策》2023,38(7):1825-1834
中文手写文本识别是模式识别领域中的研究热点问题之一,其存在字符类别数量多、书写风格差异大和训练数据集标记难等问题.针对上述问题,提出无切分无循环的残差注意网络结构用于端到端手写文本识别.首先,以ResNet-26为主体结构,使用深度可分离卷积提取有意义特征,残差注意门控模块提升文本图像中的关键区域的重要性;其次,采用批量双线性插值模型对输入表征进行拉伸-挤压,实现二维文本表征到一维文本行表征的文本行上采样;最后,以连接时序分类作为识别模型的损失函数,实现高层次抽取表征与字符序列标记的对应关系.在CASIA-HWDB2.x和ICDAR2013两个数据集上进行实验研究,结果表明,所提方法在没有任何字符或文本行的位置信息时能够有效地实现端到端手写文本识别,且优于现有的方法.  相似文献   

18.
Text recognition captured in multiple frames by a hand-held video camera is a challenging task because it is possible to capture and recognize a longer line of text while improving the quality of the text image by utilizing the redundancy of the overlapping areas between the frames. For this task, the video frames should be registered, i.e., mosaiced, after compensating for their distortions due to camera shakes. In this paper, a mosaicing-by-recognition technique is proposed where the problems of video mosaicing and text recognition are formulated as a unified optimization problem and solved by a dynamic programming-based optimization algorithm simultaneously and collaboratively. Experimental results indicate that, even if the frames undergo various distortions such as rotation, scaling, translation, and nonlinear speed fluctuation of camera movement, the proposed technique provides fine mosaic image by accurate distortion estimation (around 90% of perfect estimation) and character recognition accuracy (over 95%).  相似文献   

19.
利用上下文相关信息的汉字文本识别   总被引:5,自引:1,他引:4  
为了改善汉字文本识别率, 本文提出了一种基于语料库统计概率的后处理方法, 该方法利用上下文相关信息, 超过词汇对于汉字文本识别, 把具有确定性边界的一个汉字序列多数情况为一个句子作为一个处理单元, 利用统计获得的字字同现概率,采用动态规划方法, 获得了令人满意的效果。  相似文献   

20.
Camera-based character recognition has gained attention with the growing use of camera-equipped portable devices. One of the most challenging problems in recognizing characters with hand-held cameras is that captured images undergo motion blur due to the vibration of the hand. Since it is difficult to remove the motion blur from small characters via image restoration, we propose a recognition method without de-blurring. The proposed method includes a generative learning method in the training step to simulate blurred images by controlling blur parameters. The method consists of two steps. The first step recognizes the blurred characters based on the subspace method, and the second one reclassifies structurally similar characters using blur parameters estimated from the camera motion. We have experimentally proved that the effective use of motion blur improves the recognition accuracy of camera-captured characters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号