首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 609 毫秒
1.
This paper presents a lexical choice component for complex noun phrases. We first explain why lexical choice for NPs deserves special attention within the standard pipeline architecture for a generator. The task of the lexical chooser for NPs is more complex than for clauses because the syntax of NPs is less understood than for clauses, and therefore, syntactic realization components, while they accept a predicate-argument structure as input for clauses, require a purely syntactic tree as input for NPs. The task of mapping conceptual relations to different syntactic modifiers is therefore left to the lexical chooser for NPs.The paper focuses on the syntagmatic aspect of lexical choice, identifying a process called NP planning. It focuses on a set of communicative goals that NPs can satisfy and specifies an interface between the different components of the generator and the lexical chooser.The technique presented for NP planning encapsulates a rich lexical knowledge and allows for the generation of a wide variety of syntactic constructions. It also allows for a large paraphrasing power because it dynamically maps conceptual information to various syntactic slots.  相似文献   

2.
Recently, the scientific interest in addressing metonymy phenomena from a computational perspective has increased significantly. Considerable effort is invested in this, but issues addressing metonymy in the context of natural language generation have been widely ignored so far, and also comparable multilingual analyses are rather sparse. Motivated by these shortcomings, we investigate methods for representing knowledge required to express metonymic relations in several ways and in multiple languages, and we present techniques for generating these alternative verbalizations. In particular, we demonstrate how mapping schemata that enable lexical expressions on the basis of conceptual specifications to be built are derived from the Qualia Structure of Pustejovsky's Generative Lexicon. Moreover, our enterprise has led to the exposition of interesting cross-language differences, notably the use of prefixed verbs and compound nouns in German, as opposed to widely equivalent expressions entailing implicit metonymic relations, as frequently found in English. A main achievement of our approach lies in bringing computational lexical semantics and natural language generation closer together, so that the linguistic foundations of lexical choice in natural language generation are strengthened.  相似文献   

3.
The process of lexical choice usually consists of determining a single way of expressing a given content. In some cases such as gerund translation, however, there is no single solution; a choice must be made among several variants which differ in their syntactic behavior. Based on a bilingual corpus analysis, this paper explains first which factors influence the availability of variants. In a second step, some criteria for deciding on one or the other variant are discussed. It will be shown that the stylistic evaluation of the syntactic structures induced by alternative lexical items is of central importance in lexical choice. Finally, an implementation of the resulting model is described.  相似文献   

4.
Multilingual generation in machine translation (MT) requires a knowledge organization that facilitates the task of lexical choice, i.e. selection of lexical units to be used in the generation of a target-language sentence. This paper investigates the extent to which lexicalization patterns involving the lexical aspect feature [+telic] may be used for translating events and states among languages. Telicity has been correlated syntactically with both transitivity and unaccusativity, and semantically with Talmy's path of a motion event, the representation of which characterizes languages parametrically.Taking as our starting point the syntactic/semantic classification in Levin's English Verb Classes and Alternations, we examine the relation between telicity and the syntactic contexts, or alternations, outlined in this work, identifying systematic relations between the lexical aspect features and the semantic components that potentiate these alternations. Representing lexical aspect — particularly telicity — is therefore crucial for the tasks of lexical choice and syntactic realization. Having enriched the data in Levin (by correlating the syntactic alternations (Part I) and semantic verb classes (Part II) and marking them for telicity) we assign to verbs lexical semantic templates (LSTs). We then demonstrate that it is possible from these templates to build a large-scale repository for lexical conceptual structures which encode meaning components that correspond to different values of the telicity feature. The LST framework preserves both semantic content and semantic structure (following Grimshaw during the processes of lexical choice and syntactic realization. Application of this model identifies precisely where the Knowledge Representation component may profitably augment our rules of composition, to identify cases where the interlingua underlying the source language sentence must be either reduced or modified in order to produce an appropriate target language sentence.  相似文献   

5.
Natural language generation systems should choose nouns by searching for lexical units that (i) are known to the user; (ii) truthfully describe the object being lexicalised; (iii) convey sufficient information to fulfill the system's underlying communicative goals; and (iv) are maximal under a lexical preference function. This model of lexical choice allows a clean separation to be made between what the system knows about the object being lexicalized and what it wishes to communicate about this object. The model also allows lexical choice to be biased towards basic-level and other preferred lexical units. Les systèmes de génération du langage naturel devraient choisir des noms en recherchant des unités lexicales (i) qui soient connues de ľutilisateur, (ii) qui décrivent fidèlement ľobjet soumis à un traitement lexical, (iii) qui transmettent suffisamment ?information pour répondre aux objectifs de communication du système, et (iv) qui soient maximales dans une fonction de préférence lexicale. Ce modéle de choix lexical permet une nette séparation entre ce que le système sait à propos de ľobjet soumis à un traitement lexical et ce qu'il souhaite communiquer à propos de cet objet. Le modèle permet également au choix lexical ?avoir un préjugé vis-à-vis les unités lexicales de niveau de base et autres unités lexicales préférées.  相似文献   

6.
This paper addresses the issue of how natural language generation technology can contribute to less intrusive wearable devices. Based on the investigation of how humans adapt the form of their utterances to the context of their hearer, we propose a strategy to relate (physical) context to the automated generation of natural language utterances. First we emphasise that different dimensions of context need to be taken into account and illustrate this with examples of lexical choice. Then we elaborate a strategy for determining sentence structure and prosody annotation based on the context relating to focus of attention. Our approach sets up an experimental basis in the context of an advice-giving wearable device (parrot).  相似文献   

7.
In this paper we present a constraint satisfaction approach to the pragmatic control of the language generation process in dialogue systems. After a brief problem statement providing motivational background for a constraint satisfaction approach to context sensitive language generation, a theoretical framework designed and implemented in part by the authors (in the context of several research and development projects) will be argued for in concise manner. Particularly, the way in which constraints influence the generation of context-sensitive, relevant utterances in simultaneous interaction with several human speakers will be considered in detail, providing a computational framework for handling “pragmatic” aspects, such as the degree of illocutionary force, semantic ellipsis, or lexical selection of concessive connectors. Then, limiting cases of our approach will be pointed out, showing in what measure by acting on the constraints we can enlarge these limits.  相似文献   

8.
基于语义的关键词提取算法   总被引:3,自引:1,他引:2  
关键词1提供了文档内容的概要信息,它们被使用在很多数据挖掘的应用中,在目前的关键词提取算法中,我们发现词汇层面(代表意思的词)和概念层面(意思本身)的差别导致了关键字提取的不准确,比如不同语法的词可能有着相同的意思,而相同语法的词在不同的上下文有着不同的意思.为了解决这个问题,这篇文章提出使用词义代替词并且通过考虑关键候选词的语义信息来提高关键词提取算法性能的方法.与现有的关键词提取方法不同,该方法首先通过使用消歧算法,通过上下文得到候选词的词义;然后在后面的词合并、特征提取和评估的步骤中,候选词义之间的语义相关度被用来提高算法的性能.在评估算法时,我们采用一种更为有效的基于语义的评估方法与著名的Kea系统作比较.在不同领域间的实验中可以发现,当考虑语义信息后,关键词提取算法的性能能够得到很大的提高.在同领域的实验中,我们的算法的性能与Kea 算法的相近.我们的算法没有领域的限制性,因此具有更好的应用前景.  相似文献   

9.
10.
我们为蒙古语词法分析建立了一种生成式的概率统计模型。该模型将蒙古语语句的词法分析结果描述为有向图结构,图中节点表示分析结果中的词干、词缀及其相应标注,而边则表示节点之间的转移或生成关系。特别地,在本工作中我们刻画了词干到词干转移概率、词缀到词缀转移概率、词干到词缀生成概率、相应的标注之间的三种转移或生成概率,以及词干或词缀到相应标注相互生成概率。以内蒙古大学开发的20万词规模的三级标注人工语料库为训练数据,该模型取得了词级切分正确率95.1%,词级联合切分与标注正确率93%的成绩。  相似文献   

11.
We introduce a dual-use methodology for automating the maintenance and growth of two types of knowledge sources, which are crucial for natural language text understanding—background knowledge of the underlying domain and linguistic knowledge about the lexicon and the grammar of the underlying natural language. A particularity of this approach is that learning occurs simultaneously with the on-going text understanding process. The knowledge assimilation process is centered around the linguistic and conceptual ‘quality' of various forms of evidence underlying the generation, assessment and on-going refinement of lexical and concept hypotheses. On the basis of the strength of evidence, hypotheses are ranked according to qualitative plausibility criteria, and the most reasonable ones are selected for assimilation into the already given lexical class hierarchy and domain ontology.  相似文献   

12.
This report describes the current state of our central research thrust in the area of natural language generation. We have already reported on our text-level theory of lexical selection in natural language generation ([59, 60]), on a unification-based syntactic processor for syntactic generation ([73]) and designed a relatively flexible blackboard-oriented architecture for integrating these and other types of processing activities in generation ([60]). We have implemented these ideas in our prototype generator, Diogenes — a DIstributed, Opportunistic GENEration System — and tested our lexical selection and syntactic generation modules in a comprehensive natural language processing project — the KBMT-89 machine translation system ([15]). At this stage we are developing a more comprehensive Diogenes system, concentrating on both the theoretical and the system-building aspects of a) formulating a more comprehensive theory of distributed natural language generation; b) extending current theories of text organization as they pertain to the task of planning natural language texts; c) improving and extending the knowledge representation and the actual body of background knowledge (both domain and discourse/pragmatic) required for comprehensive text planning; d) designing and implementing algorithms for dynamic realization of text structure and integrating them into the blackboard style of communication and control; e) designing and implementing control algorithms for distributed text planning and realization. In this document we describe our ideas concerning opportunistic control for a natural language generation planner and present a research and development plan for the Diogenes project.Many people have contributed to the design and development of the Diogenes generation system over the last four years, especially Eric Nyberg, Rita McCardell, Donna Gates, Christine Defrise, John Leavitt, Scott Huffman, Ed Kenschaft and Philip Werner. Eric Nyberg and Masaru Tomita have created genkit, which is used as the syntactic component of Diogenes. A short version of this article appeared in Proceedings of IJCAI-89, co-authored with Victor Lesser and Eric Nyberg. To all the above many thanks. The remaining errors are the responsibility of this author.  相似文献   

13.
It is implicitly assumed that data obtained from different modalities in response time research are comparable. However, this assumption has not been tested and verified, and scholars do not really know whether their choice has any effect on data, and consequently, whether they have lost experimental control. This research compares three modes (key-press, voice and mouse) in three of the most commonly used low conflict tasks (simple reaction, lexical decision and semantic categorization) to confirm the above assumption. To gain more precision, linguistic and semantic gradients have been tested. Results show that there are no functional differences in the simple reaction task. In the lexical decision task a frequency effect for all modes has been found. Specific S–R mapping rules do not need to be reversed when using a button-press or a mouse mode, but they have effects on the voice mode. In the semantic categorization task, a gradient and a frequency effect have been found in all modes. However, word frequency can affect the data. It is recommended to reverse S–R mapping rules in the voice mode in order to avoid differences with the manual modes. In conclusion, differences in low conflict tasks exist and must be taken into consideration when comparing studies in which different devices have been used.  相似文献   

14.
研究从基于STEPAP203标准的中性文件中提取零件工艺特征以及根据工艺特征建立加工工艺的方法。分析了STEPAP203文件及其数据结构,运用VB.NET语言进行词法分析,根据电极工艺特征的特点,研究出了从STEP文件中提取并识别简单工艺特征的方法。根据零件的不同加工方法,利用PowerMILL的"宏指令"功能保存电极的优秀编程资源,建立了可重用的加工工艺数据库,最后建立加工工艺与工艺特征之间的映射关系。所采用的工艺特征识别方法能够识别常用的模具电极工艺特征,与可重用的加工工艺结合能够实现特征识别系统与CAM系统的无缝连接。  相似文献   

15.
The Centre for Irish and Celtic Studies at the University ofUlster is currently producing a digital dictionary of medievalIrish (eDIL) based on the standard Dictionary of the Irish Languagepublished by the Royal Irish Academy, Dublin. This paper addressessome of the problems encountered in the digitization process,including data capture, processing non-standard characters,modifications to the TEI guidelines, automatic generation oftags, and the establishment of a lexical view while preservingthe original format of the paper dictionary.  相似文献   

16.
Generation     
The structure and function of the target-language generation module for KBMT-89 is described. The lexical selection module (which includes thematic-role subcategorization, a meaning distance metric, and syntactic subcategorization) is presented. We also describe the generation mapping rules, and rule interpretation in the generation of f-structures for target language utterances.  相似文献   

17.
18.
汉英翻译系统英文生成中选词模型的设计   总被引:1,自引:1,他引:0  
本文描述了一种基于实例比较,辅以语义模式匹配的英文选词模型的设计。首先,我们讨论了汉英翻译系统英文生成中选词的重要性,然后比较了几种可能的选词策略并提出我们的选词模型,接着我们较详细地描述了生成词典的结构以及选词算法。文中,我们还简要介绍了我们所使用的语义知识资源——《知网》。  相似文献   

19.
This paper deals with verb-verb morphological disambiguation of two different verbs that have the same inflected form. The verb-verb morphological ambiguity (VVMA) is one of the critical Korean parts of speech (POS) tagging issues. The recognition of verb base forms related to ambiguous words highly depends on the lexical information in their surrounding contexts and the domains they occur in. However, current probabilistic morpheme-based POS tagging systems cannot handle VVMA adequately since most of them have a limitation to reflect a broad context of word level, and they are trained on too small amount of labeled training data to represent sufficient lexical information required for VVMA disambiguation.In this study, we suggest a classifier based on a large pool of raw text that contains sufficient lexical information to handle the VVMA. The underlying idea is that we automatically generate the annotated training set applicable to the ambiguity problem such as VVMA resolution via unlabeled unambiguous instances which belong to the same class. This enables to label ambiguous instances with the knowledge that can be induced from unambiguous instances. Since the unambiguous instances have only one label, the automatic generation of their annotated corpus are possible with unlabeled data.In our problem, since all conjugations of irregular verbs do not lead to the spelling changes that cause the VVMA, a training data for the VVMA disambiguation are generated via the instances of unambiguous conjugations related to each possible verb base form of ambiguous words. This approach does not require an additional annotation process for an initial training data set or a selection process for good seeds to iteratively augment a labeling set which are important issues in bootstrapping methods using unlabeled data. Thus, this can be strength against previous related works using unlabeled data. Furthermore, a plenty of confident seeds that are unambiguous and can show enough coverage for learning process are assured as well.We also suggest a strategy to extend the context information incrementally with web counts only to selected test examples that are difficult to predict using the current classifier or that are highly different from the pre-trained data set.As a result, automatic data generation and knowledge acquisition from unlabeled text for the VVMA resolution improved the overall tagging accuracy (token-level) by 0.04%. In practice, 9-10% out of verb-related tagging errors are fixed by the VVMA resolution whose accuracy was about 98% by using the Naïve Bayes classifier coupled with selective web counts.  相似文献   

20.
Translating between dissimilar languages requires an account of the use of divergent word orders when expressing the same semantic content. Reordering poses a serious problem for statistical machine translation systems and has generated a considerable body of research aimed at meeting its challenges. Direct evaluation of reordering requires automatic metrics that explicitly measure the quality of word order choices in translations. Current metrics, such as BLEU, only evaluate reordering indirectly. We analyse the ability of current metrics to capture reordering performance. We then introduce permutation distance metrics as a direct method for measuring word order similarity between translations and reference sentences. By correlating all metrics with a novel method for eliciting human judgements of reordering quality, we show that current metrics are largely influenced by lexical choice, and that they are not able to distinguish between different reordering scenarios. Also, we show that permutation distance metrics correlate very well with human judgements, and are impervious to lexical differences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号