期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

郭育生谭怒涛黄磊刘昌平《中文信息学报》2008,22(4):83-87

为了从中英文混排的中文文档中定位数学公式,提出了一种基于中文字符识别和公式符号识别的数学公式定位方法。该方法主要由中文字符提取、内嵌公式提取和独立公式定位三个部分组成。在中文字符提取中,首先提取字符块信息中文字符识别结果、公式符号识别结果和字符块的几何特征,然后使用决策树的方法区分中文字符和非中文字符。在内嵌公式提取中,使用公式符号的语义信息、符号间的角标关系和公式的语义信息等从非中文字符中定位内嵌公式。在独立数学公式定位中,对包含较多内嵌公式符号且不包含中文字符的文字行提取版式结构特征,并使用高斯混合模型区分独立公式和普通文字行。在148幅文档图像共包含3 690个公式组成的测试集上取得了91.19%的公式定位正确率。相似文献

2.

Isolated structural error analysis of printed mathematical expressions

P. Pavan Kumar Arun Agarwal Chakravarthy Bhagvati 《Pattern Analysis & Applications》2018,21(4):1097-1107

相似文献

3.

Automatic extraction of printed mathematical formulas using fuzzy logic and propagation of context 总被引：7，自引：0，他引：7

A. Kacem A. Belaïd M. Ben Ahmed 《International Journal on Document Analysis and Recognition》2001,4(2):97-108

相似文献

4.

Recognition of online handwritten mathematical expressions

Garain U. Chaudhuri B.B. 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2004,34(6):2366-2376

This paper aims at automatic understanding of online handwritten mathematical expressions (MEs) written on an electronic tablet. The proposed technique involves two major stages: symbol recognition and structural analysis. Combination of two different classifiers have been used to achieve high accuracy for the recognition of symbols. Several online and offline features are used in the structural analysis phase to identify the spatial relationships among symbols. A context-free grammar has been designed to convert the input expressions into their corresponding T(E)X strings which are subsequently converted into MathML format. Contextual information has been used to correct several structure interpretation errors. A new method for evaluating performance of the proposed system has been formulated. Experiments on a dataset of considerable size strongly support the feasibility of the proposed system. 相似文献

5.

中文科技文档中的数学表达式定位 总被引：1，自引：0，他引：1

张志伟孔凡让刘维来龙潜刘永斌《中文信息学报》2007,21(4):86-91

数学表达式定位是印刷体数学表达式识别的前提。针对中文科技文档,分别对独立表达式和内嵌表达式的定位问题提出了新的方法。采用自适应神经模糊推理系统(ANFIS) 对行特征进行分类,提取出独立表达式;采用模糊聚类和动态规划方法,从文档中依次提取出汉字、中文标点和英文字符,利用启发式规则合并剩余的数学符号而提取出内嵌表达式。实验表明,提出的表达式定位方法有很高的正确率。相似文献

6.

Machine printed text and handwriting identification in noisy document images 总被引：1，自引：0，他引：1

Zheng Y Li H Doermann D 《IEEE transactions on pattern analysis and machine intelligence》2004,26(3):337-353

In this paper, we address the problem of the identification of text in noisy document images. We are especially focused on segmenting and identifying between handwriting and machine printed text because: 1) Handwriting in a document often indicates corrections, additions, or other supplemental information that should be treated differently from the main content and 2) the segmentation and recognition techniques requested for machine printed and handwritten text are significantly different. A novel aspect of our approach is that we treat noise as a separate class and model noise based on selected features. Trained Fisher classifiers are used to identify machine printed text and handwriting from noise and we further exploit context to refine the classification. A Markov Random Field-based (MRF) approach is used to model the geometrical structure of the printed text, handwriting, and noise to rectify misclassifications. Experimental results show that our approach is robust and can significantly improve page segmentation in noisy document collections. 相似文献

7.

印刷体文献中数学公式识别及描述系统研究 总被引：1，自引：0，他引：1

陈德裕朱学芳苏啸晨杭月芹《计算机应用》2009,29(3):789-791

印刷体数学公式识别系统的建立,需要对数学公式结构本身及其字符识别方法和识别后的描述方法进行研究。为此建立了数学公式识别及描述实验系统,实现了部分数学公式的结构本身及其字符的识别,能完成从图像到文本的转换,对识别的结果能用数学建模语言进行有效表示。相似文献

8.

An optical character recognition system for printed Telugu text

C.?Vasantha?Lakshmi Email author C.?Patvardhan 《Pattern Analysis & Applications》2004,7(2):190-204

相似文献

9.

Distinction between handwritten and machine-printed text based on the bag of visual words model

Konstantinos Zagoris Ioannis Pratikakis Apostolos Antonacopoulos Basilis Gatos Nikos Papamarkos 《Pattern recognition》2014

相似文献

10.

基于Parzen窗的印刷文档数学公式抽取的研究 总被引：3，自引：0，他引：3

杨捧田学东《计算机工程与应用》2005,41(23):200-202

数学公式抽取是公式识别的首要步骤,目前相关的研究还很欠缺。针对印刷文档中数学公式的抽取展开了研究,提出了一种Parzen窗和启发式规则相结合的公式抽取方法。对于孤立式公式采用Parzen窗方法将其从文档中抽取出来,对于嵌入式公式采用启发式规则将其从文本行中抽取出来。实验表明,这两种抽取方法的结合取得了较好的效果。相似文献

11.

Hidden markov model based optical character recognition in the presence of deterministic transformations 总被引：2，自引：0，他引：2

Oscar E Agazzi Shyh-shiaw Kuo 《Pattern recognition》1993,26(12):1813-1826

A method is introduced to combine and jointly optimize recognition and image normalization in optical character recognition algorithms based on pseudo two-dimensional (2D) hidden Markov models (HMMs). The method can be combined with a previous method for joint segmentation and recognition of connected text. It also provides a maximum likelihood estimate of the transformation parameters (scaling factor, slant angle, etc.), that can be used by higher level modules in an intelligent document recognition system as an aid in the recognition process. The computational cost of this technique is modest. Experimental results on a data base of distorted printed characters are presented. 相似文献

12.

Automatic recognition of printed Farsi texts

B Parhami M Taraghi 《Pattern recognition》1981,14(1-6):395-403

相似文献

13.

A corpus for OCR research on mathematical expressions

Utpal Garain B. B. Chaudhuri 《International Journal on Document Analysis and Recognition》2005,7(4):241-259

This paper is concerned with research on OCR (optical character recognition) of printed mathematical expressions. Construction of a representative corpus of technical and scientific documents containing expressions is discussed. A statistical investigation of the corpus is presented, and usefulness of this analysis is demonstrated in the related research problems, namely, (i) identification and segmentation of expression zones from the rest of the document, (ii) recognition of expression symbols, (iii) interpretation of expression structures, and (iv) performance evaluation of a mathematical expression recognition system. Moreover, a groundtruthing format has been proposed to facilitate automatic evaluation of expression recognition techniques. Received: 10 July 2003, Accepted: 22 November 2004, Published online: 18 March 2005 Correspondence to: Utpal Garain 相似文献

14.

Text line extraction in graphical documents using background and foreground information

Partha Pratim Roy Umapada Pal Josep Lladós 《International Journal on Document Analysis and Recognition》2012,15(3):227-241

In graphical documents (e.g., maps, engineering drawings), artistic documents etc., the text lines are annotated in multiple orientations or curvilinear way to illustrate different locations or symbols. For the optical character recognition of such documents, individual text lines from the documents need to be extracted. In this paper, we propose a novel method to segment such text lines and the method is based on the foreground and background information of the text components. To effectively utilize the background information, a water reservoir concept is used here. In the proposed scheme, at first, individual components are detected and grouped into character clusters in a hierarchical way using size and positional information. Next, the clusters are extended in two extreme sides to determine potential candidate regions. Finally, with the help of these candidate regions, individual lines are extracted. The experimental results are presented on different datasets of graphical documents, camera-based warped documents, noisy images containing seals, etc. The results demonstrate that our approach is robust and invariant to size and orientation of the text lines present in the document. 相似文献

15.

印刷体数学公式的识别

李奋华黄潇《电脑开发与应用》2007,20(3):27-29

介绍了一个印刷体数学公式识别系统,它由公式字符识别和结构分析两部分组成。在公式字符识别中,采用了一些适用于公式字符的特殊处理方法;在结构分析中,根据数学公式的结构布局,采用了一种将“自顶向下”和“自底向上”策略相结合的数学公式结构分析方法,实现了数学公式的重用,实验表明,这种方法能取得较好的识别效果。相似文献

16.

Extraction of type style-based meta-information from imaged documents

B.B. Chaudhuri U. Garain 《International Journal on Document Analysis and Recognition》2001,3(3):138-149

Extraction of some meta-information from printed documents without carrying out optical character recognition (OCR) is considered. It can be statistically verified that important terms in technical articles are mainly printed in italic, bold, and all-capital style. A quick approach to detecting them is proposed here. This approach is based on the global shape heuristics of these styles of any font. Important words in a document are sometimes printed in larger size as well. A smart approach for the determination of font size is also presented. Detection of type styles helps in improving OCR performance, especially for reading italicized text. Another advantage to identifying word type styles and font size has been discussed in the context of extracting: (i) different logical labels; and (ii) important terms from the document. Experimental results on the performance of the approach on a large number of good quality, as well as degraded, document images are presented. Received July 12, 2000 / Revised October 1, 2000 相似文献

17.

印刷体数学公式重构技术的研究

田学东徐丽娟李娜《计算机应用与软件》2008,25(5):67-69

数学公式重构是公式识别的重要环节,目前相关的研究还很欠缺.基于MathML提出了一种印刷体数学公式重构的方法.在已实现的公式符号识别与结构分析程序所生成的公式关系树基础上,将公式关系树重构为MathML文档,并设计公式编辑器,实现了公式的再编辑和重用.实验表明,这种重构方法对印刷体数学公式具有较好的适应性和较高的准确率. 相似文献

18.

A segmentation-free approach to recognise printed Sinhala script using linear symmetry

H.L. Premaratne Author Vitae J. Bigun^{Author Vitae} 《Pattern recognition》2004,37(10):2081-2089

相似文献

19.

Document binarisation using Kohonen SOM

Badekas E. Papamarkos N. 《Image Processing, IET》2007,1(1):67-84

An integrated system for the binarisation of normal and degraded printed documents for the purpose of visualisation and recognition of text characters is proposed. In degraded documents, where considerable background noise or variation in contrast and illumination exists, there are many pixels that cannot be easily classified as foreground or background pixels. For this reason, it is necessary to perform document binarisation by combining and taking into account the results of a set of binarisation techniques, especially for document pixels that have high vagueness. The proposed binarisation technique takes advantages of the benefits of a set of selected binarisation algorithms by combining their results using a Kohonen self-organising map neural network. In order to improve further the binarisation results, significant improvements are proposed for two of the most powerful document binarisation techniques used, that is for the adaptive logical level technique and for the improvement of integrated function algorithm. The proposed binarisation technique is extensively tested with a variety of degraded documents. Several experimental and comparative results, demonstrating the performance of the proposed technique, are presented 相似文献

20.

Identification of different script lines from multi-script documents 总被引：1，自引：0，他引：1

U. Pal B. B. Chaudhuri 《Image and vision computing》2002,20(13-14)

相似文献