首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper presents a syntactic method for sophisticated logical structure analysis that transforms document images with multiple pages and hierarchical structure into an electronic document based on SGML/XML. To produce a logical structure more accurately and quickly than previous works of which the basic units are text lines, the proposed parsing method takes text regions with hierarchical structure as input. Furthermore, we define a document model that is able to describe geometric characteristics and logical structure information of documents efficiently and present its automated creation method. Experimental results with 372 images scanned from the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) show that the method has performed logical structure analysis successfully and generated a document model automatically. Particularly, the method generates SGML/XML documents as the result of structural analysis, so that it enhances the reusability of documents and independence of platform.  相似文献   

2.
We introduce a new class of grammars called uniquely parsable grammars (UPGs). A UPG is a kind of phrase structure grammar having a restricted type of rewriting rules, where parsing can be performed without backtracking. We show that, in spite of such restriction to the rules, UPGs are universal in their generating ability. We then define three subclasses of UPGs. They are M-UPGs (monotonic UPGs), RC-UPGs (UPGs with right-terminating and context-free-like rules), and REG-UPGs (regular UPGs). It is proved that the generating abilities of the classes of M-UPGs, RC-UPGs, and REG-UPGs are exactly characterized by the classes of deterministic linear-bounded automata, deterministic pushdown automata, and deterministic finite automata, respectively. Especially, the class of RC-UPGs gives a very simple grammatical characterization of the class of deterministic context-free languages. Thus, these four classes form a deterministic counterpart of the classical Chomsky hierarchy. Received August 30, 1995 / May 13, 1996  相似文献   

3.
Two simple but powerful tools are introduced in LR theory, viz. item grammars and parsing automata. Parsing automata are used to define a large class of correct prefix parsers operating in linear time. DeRemer's and Pager's parsing methods turn out to be special cases. LL and LC tests, as well as inclusion theorems for the classes of LL, LC, and LR grammars will be based on parsing automata, too.  相似文献   

4.
Structured documents are usually processed by tree-based document transformers, which transform the document tree representing the structure of the input document into another tree structure. Event-based document transformers, by contrast, recognize the input as a stream of parsing events, i.e., lexical tokens, and process the events one by one in an event-driven manner. Event-based document transformers have advantages that they need less memory space and that they are more tolerant of large inputs, compared to tree-based transformers, which construct the intermediate tree representation.This paper proposes an algorithm which derives an event-based transformer from a given specification of a document transformation over a tree structure. The derivation of an event-based transformer is carried out in the framework of attribute grammars. We first obtain an attribute grammar which processes a stream of parsing events, by applying a deforestation method; We then derive an attribute evaluation scheme relevant to the event-based transformation. Using this algorithm, one can develop event-based document transformers in a more declarative style than directly programming over the stream of parsing events.  相似文献   

5.
This paper describes approaches for machine learning of context free grammars (CFGs) from positive and negative sample strings, which are implemented in Synapse system. The grammatical inference consists of a rule generation by “inductive CYK algorithm,” mechanisms for incremental learning, and search. Inductive CYK algorithm generates minimum production rules required for parsing positive samples, when the bottom-up parsing by CYK algorithm does not succeed. The incremental learning is used not only for synthesizing grammars by giving the system positive strings in the order of their length but also for learning grammars from other similar grammars. Synapse can synthesize fundamental ambiguous and unambiguous CFGs including nontrivial grammars such as the set of strings not of the form ww with w∈{a,b}+.  相似文献   

6.
7.
The Standard Generalized Markup Language (SGML) is a language for representing document structure. This paper discusses ways in which the SGML language might be used to represent graphic as well as textual contents of a document. By using SGML markup for both graphics and text, a document processing application can achieve a more uniform treatment and tighter coupling between these two types of materials.  相似文献   

8.
基于SGML的电子公文标准化研究与实现   总被引:3,自引:0,他引:3  
文中通过对现有国家行政机关公文处理流程的研究,提出并具体实现了一个基于SGML的电子公文标准。按照国家有关规定制定了一个电子公文的SGML文件类型定义,设计了一个电子公文的SGML模板,并开发一下带有语法制导功能的SGML公文编辑器,对生成的SGML格式公文还提供了转换器,实现了网上电子文化的标准化。  相似文献   

9.
This paper describes an approach to the problem of articulating multimedia information based on parsing and syntax-directed translation that uses Relational Grammars. This translation is followed by a constraint-solving mechanism to create the final layout. Grammatical rules provide the mechanism for mapping from a representation of the content and context of a presentation to forms that specify the media objects to be realized. These realization forms include sets of spatial and temporal constraints between elements of the presentation. Individual grammars encapsulate the “look and feel” of a presentation and can be used as generators of such a style. By making the grammars sensitive to the requirements of the output medium, parsing can introduce flexibility into the information realization process.  相似文献   

10.
A new class of context-free grammars, called dynamic context-free grammars, is introduced. These grammars have the ability to change the set of production rules dynamically during the derivation of some terminal string. The notion of LL() parsing is adapted to this grammar model. We show that dynamic LL() parsers are as powerful as LR() parsers, i.e. that they are capable to analyze every deterministic context-free language while using only one symbol of lookahead. Received: 24 August 1994 / 5 January 1996  相似文献   

11.
The combination of SGML and database technology allows to refine both declarative and navigational access mechanisms for structured document collection: with regard to declarative access, the user can formulate complex information needs without knowing a query language, the respective document type definition (DTD) or the underlying modelling. Navigational access is eased by hyperlink-rendition mechanisms going beyond plain link-integrity checking. With our approach, the database-internal representation of documents is configurable. It allows for an efficient implementation of operations, because DTD knowledge is not needed for document structure recognition. We show how the number of method invocations and the cost of parsing can be significantly reduced. Edited by Y.C. Tay. Received April 22, 1996 / Accepted March 16, 1997  相似文献   

12.
Summary A parser model is presented whose structure is a generalization of the well known LR(k) parsers. Various classes of this parser that would be both practical and efficient to use in a compiler are examined. Associated with these classes of parsers is a hierarchy of type-0 grammars, each grammatical class being defined in terms of the form and structure of derivations. In particular, parsers based on a class called deterministic regular parsable (DRP) grammars will detect any errors as soon as possible during a left to right scan of the input. LR(k) grammars are also DRP. Much research related to LR(k) grammars and parsing is also applicable to DRP grammars and their associated parsers.  相似文献   

13.
角色反演算法在问答系统中的应用   总被引:1,自引:0,他引:1  
该文介绍了如何将角色反演算法的思想用在多信息源多语种问答系统中来构建句法分析器。常用的句法分析算法由于受到语法规模大小的限制,一般都不能有效地应用到实际的自然语言处理当中。角色反演算法思想是将Chart算法的高空间效率和广义LR算法的高时间效率有效地结合起来,从而大大提高了综合的分析效率。基于多信息源多语种的问答系统,拥有大规模语法(上万条语法规则),通过引入角色反演算法思想,可以分别在问句分析模块和答句生成模块中有效地完成问句和文本答案候选文档的句法分析。  相似文献   

14.
sgmltool是作者开发的一种支持各类SGML应用系统的编译系统。sgmltool动态解释DTD和SGML文档,将后者转换成一种规范化的内部结构,并保存到面向对象数据库OSCAR中。OSCAR中规范的SGML数据结构2可作为其它SGML应用系统的基础。  相似文献   

15.
Summary Methods for the automatic construction of error handling parsers are presented. The resulting parsers are capable of correcting all syntax errors by insertion and/or deletion of terminal symbols to the right of the error location. Thus, the output of the parser always corresponds to a syntactically valid program. This contributes significantly to the reliability and robustness of a compiler. The speed of parsing correct parts of a program is not affected by the presence of the error handling capability. The correction algorithm is easy to implement. Apart from the parsing tables only one character per parser state is required to control the correction process. The method is applicable to a wide class of stack automata including LL(k), LR(k), SLR(k), and LALR(k) parsers. It is shown that for LL(k) grammars error correction can be obtained as a byproduct of the canonical LL(k) parser generation. A similar result can be obtained for LR(k) grammars if the parser generator is slightly modified. The method has been successfully added to an LALR(1) parser generator.  相似文献   

16.
A deterministic parallel LL parsing algorithm is presented. The algorithm is based on a transformation from a parsing problem to parallel reduction. First, a nondeterministic version of a parallel LL parser is introduced. Then, it is transformed into the deterministic version—the LLP parser. The deterministic LLP(q,k) parser uses two kinds of information to select the next operation — a lookahead string of length up to k symbols and a lookback string of length up to q symbols. Deterministic parsing is available for LLP grammars, a subclass of LL grammars. Since the presented deterministic and nondeterministic parallel parsers are both based on parallel reduction, they are suitable for most parallel architectures.  相似文献   

17.
词义消歧要解决如何让计算机理解多义词在上下文中的具体含义,对信息检索、机器翻译、文本分类和自动文摘等自然语言处理问题有着十分重要的作用。通过引入句法信息,提出了一种新的词义消歧方法。构造歧义词汇上下文的句法树,提取句法信息、词性信息和词形信息作为消歧特征。利用贝叶斯模型来建立词义消歧分类器,并将其应用到测试数据集上。实验结果表明:消歧的准确率有所提升,达到了65%。  相似文献   

18.
Tomita-style generalised LR (GLR) algorithms extend the standard LR algorithm to non-deterministic grammars by performing all possible choices of action. Cubic complexity is achieved if all rules are of length at most two. In this paper we shall show how to achieve cubic time bounds for all grammars by binarising the search performed whilst executing reduce actions in a GLR-style parser. We call the resulting algorithm Binary Right Nulled GLR (BRNGLR) parsing. The binarisation process generates run-time behaviour that is related to that shown by a parser which pre-processes its grammar or parse table into a binary form, but without the increase in table size and with a reduced run-time space overhead. BRNGLR parsers have worst-case cubic run time on all grammars, linear behaviour on LR(1) grammars and produce, in worst-case cubic time, a cubic size binary SPPF representation of all the derivations of a given sentence.  相似文献   

19.
Summary This paper defines a hierarchy of languages which is properly contained in the context sensitive languages and which starts with the context free family. The hierarchy is defined inductively by controlling labeled linear grammars with languages in one family to yield languages in the next larger family. The families of the hierarchy have properties analogous to those of the context free family, in particular, the new mechanism introduced is very suitable for parsing.A language in the n-th family is specified by a sequence of n — 1 labeled linear grammars and a context free grammar. By assuming that the reversals of the first n — 1 grammars and the last labeled linear grammar are precedence grammars, the concepts and parsing algorithm of Wirth and Weber extend to yield a parsing algorithm within the hierarchy. This considerably enhances the usefulness of the construction and allows much of the power of the context sensitive languages to become accessible in measured amounts for potential programming applications.  相似文献   

20.
Summary The use of context-free grammars to define the syntax of programming languages is complicated by the phenomenon of ambiguity. Ambiguity can be resolved by the specification of a unique canonical parse. A set of rules is given which defines a canonical bottom-up parse, and these rules are implemented in a left-to-right bottom-up parsing algorithm. A second set of rules is given which defines a canonical top-down parse, and these rules are similarly implemented in a left-to-right top-down parsing algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号