首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper addresses the problem of automatic acquisition of lexical knowledge for rapid construction of engines for machine translation and embedded multilingual applications. We describe new techniques for large-scale construction of a Chinese–English verb lexicon and we evaluate the coverage and effectiveness of the resulting lexicon. Leveraging off an existing Chinese conceptual database called How Net and a large, semantically rich English verb database, we use thematic-role information to create links between Chinese concepts and English classes. We apply the metrics of recall and precision to evaluate the coverage and effectiveness of the linguistic resources. The results of this work indicate that: (a) we are able to obtain reliable Chinese–English entries both with and without pre-existing semantic links between the two languages; (b) if we have pre-existing semantic links, we are able to produce a more robust lexical resource by merging these with our semantically rich English database; (c) in our comparisons with manual lexicon creation, our automatic techniques were shown to achieve 62% precision, compared to a much lower precision of 10% for arbitrary assignment of semantic links. This revised version was published online in November 2006 with corrections to the Cover Date.  相似文献   

2.
This paper presents a methodology for evaluating Arabic Machine Translation (MT) systems. We are specifically interested in evaluating lexical coverage, grammatical coverage, semantic correctness and pronoun resolution correctness. The methodology presented is statistical and is based on earlier work on evaluating MT lexicons in which the idea of the importance of a specific word sense to a given application domain and how its presence or absence in the lexicon affects the MT system’s lexical quality, which in turn will affect the overall system output quality. The same idea is used in this paper and generalized so as to apply to grammatical coverage, semantic correctness and correctness of pronoun resolution. The approach adopted in this paper has been implemented and applied to evaluating four English-Arabic commercial MT systems. The results of the evaluation of these systems are presented for the domain of the Internet and Arabization.  相似文献   

3.
The lexicon is a major part of any Machine Translation (MT) system. If the lexicon of an MT system is not adequate, this will affect the quality of the whole system. Building a comprehensive lexicon, i.e., one with a high lexical coverage, is a major activity in the process of developing a good MT system. As such, the evaluation of the lexicon of an MT system is clearly a pivotal issue for the process of evaluating MT systems. In this paper, we introduce a new methodology that was devised to enable developers and users of MT Systems to evaluate their lexicons semi-automatically. This new methodology is based on the idea of the importance of a specific word or, more precisely, word sense, to a given application domain. This importance, or weight, determines how the presence of such a word in, or its absence from, the lexicon affects the MT system's lexical quality, which in turn will naturally affect the overall output quality. The method, which adopts a black-box approach to evaluation, was implemented and applied to evaluating the lexicons of three commercialEnglish–Arabic MT systems. A specific domain was chosen in which the various word-sense weights were determined by feeding sample texts from the domain into a system developed specifically for that purpose. Once this database of word senses and weights was built, test suites were presented to each of the MT systems under evaluation and their output rated by a human operator as either correct or incorrect. Based on this rating, an overall automated evaluation of the lexicons of the systems was deduced.  相似文献   

4.
This paper describes techniques for automatic construction of dictionaries for use in large-scale foreign language tutoring (FLT) and interlingual machine translation (MT) systems. The dictionaries are based on a language-independent representation called lexical conceptual structure (LCS). A primary goal of the LCS research is to demonstrate that synonymous verb senses share distributional patterns. We show how the syntax–semantics relation can be used to develop a lexical acquisition approach that contributes both toward the enrichment of existing online resources and toward the development of lexicons containing more complete information than is provided in any of these resources alone. We start by describing the structure of the LCS and showing how this representation is used in FLT and MT. We then focus on the problem of building LCS dictionaries for large-scale FLT and MT. First, we describe authoring tools for manual and semi-automatic construction of LCS dictionaries; we then present a more sophisticated approach that uses linguistic techniques for building word definitions automatically. These techniques have been implemented as part of a set of lexicon-development tools used in the milt FLT project.  相似文献   

5.
Multilingual generation in machine translation (MT) requires a knowledge organization that facilitates the task of lexical choice, i.e. selection of lexical units to be used in the generation of a target-language sentence. This paper investigates the extent to which lexicalization patterns involving the lexical aspect feature [+telic] may be used for translating events and states among languages. Telicity has been correlated syntactically with both transitivity and unaccusativity, and semantically with Talmy's path of a motion event, the representation of which characterizes languages parametrically.Taking as our starting point the syntactic/semantic classification in Levin's English Verb Classes and Alternations, we examine the relation between telicity and the syntactic contexts, or alternations, outlined in this work, identifying systematic relations between the lexical aspect features and the semantic components that potentiate these alternations. Representing lexical aspect — particularly telicity — is therefore crucial for the tasks of lexical choice and syntactic realization. Having enriched the data in Levin (by correlating the syntactic alternations (Part I) and semantic verb classes (Part II) and marking them for telicity) we assign to verbs lexical semantic templates (LSTs). We then demonstrate that it is possible from these templates to build a large-scale repository for lexical conceptual structures which encode meaning components that correspond to different values of the telicity feature. The LST framework preserves both semantic content and semantic structure (following Grimshaw during the processes of lexical choice and syntactic realization. Application of this model identifies precisely where the Knowledge Representation component may profitably augment our rules of composition, to identify cases where the interlingua underlying the source language sentence must be either reduced or modified in order to produce an appropriate target language sentence.  相似文献   

6.
In this paper it is assumed that syntactic structure is projected from the lexicon. The lexical representation, which encodes the linguistically relevant aspects of the meanings of words, thus determines and constrains the syntax. Therefore, if semantic analysis of syntactic structures is to be possible, it is necessary to determine the content and structure of lexical semantic representations. The paper argues for a certain form of lexical representation by presenting the problem of a particular non-standard structure, the verb phrase of the form V-NP-Adj corresponding to various constructions of secondary predication in English. It is demonstrated that the solution to the semantic analysis of this structure lies in the meaning of the structure's predicators, in particular the lexical semantic representation of the verb. Verbs are classified according to the configuration of the lexical semantic representations, whether basic or derived. It is these specific configurations that restrict the possibilities of secondary predication. Given the class of a verb, its relation to the secondary predicate is predictable; and the correct interpretation of the V-NP-Adj string is therefore possible.This work is based on papers presented to the 1988 meetings of the Canadian Linguistic Association and the Brandeis Workshop on Theoretical and Computational Issues in Lexical Semantics. I am grateful to the audiences at these two meetings for comments, and to Anna-Maria di Sciullo, Diane Massam, Yves Roberge and James Pustejovsky for helpful discussion. I also thank SSHRC for funding the research of which this work forms part.  相似文献   

7.
A Semantic Network of English: The Mother of All WordNets   总被引:1,自引:0,他引:1  
We give a brief outline of the design and contents of the English lexical database WordNet, which serves as a model for similarly conceived wordnets in several European languages. WordNet is a semantic network, in which the meanings of nouns, verbs, adjectives, and adverbs are represented in terms of their links to other (groups of) words via conceptual-semantic and lexical relations. Each part of speech is treated differently reflecting different semantic properties. We briefly discuss polysemy in WordNet, and focus on the case of meaning extensions in the verb lexicon. Finally, we outline the potential uses of WordNet not only for applications in natural language processing, but also for research in stylistic analyses in conjunction with a semantic concordance.  相似文献   

8.
Wordnets, which are repositories of lexical semantic knowledge containing semantically linked synsets and lexically linked words, are indispensable for work on computational linguistics and natural language processing. While building wordnets for Hindi and Marathi, two major Indo-European languages, we observed that the verb hierarchy in the Princeton Wordnet was rather shallow. We set to constructing a verb knowledge base for Hindi, which arranges the Hindi verbs in a hierarchy of is-a (hypernymy) relation. We realized that there are unique Indian language phenomena that bear upon the lexicalization vs. syntactically derived choice. One such example is the occurrence of conjunct and compound verbs (called Complex Predicates) which are found in all Indian languages. This paper presents our experience in the construction of lexical knowledge bases for Indian languages with special attention to Hindi. The question of storing versus deriving complex predicates has been dealt with linguistically and computationally. We have constructed empirical tests to decide if a combination of two words, the second of which is a verb, is a complex predicate or not. Such tests provide a principled way of deciding the status of complex predicates in Indian language wordnets.  相似文献   

9.
This paper describes the lexical-semantic basis for UNITRAN, an implemented scheme for translating Spanish, English, and German bidirectionally. Two claims made here are that the current representation handles many distinctions (or divergences) across languages without recourse to language-specific rules and that the lexical-semantic framework provides the basis for a systematic mapping between the interlingua and the syntactic structure. The representation adopted is an extended version of lexical conceptual structure which is suitable to the task of translating between divergent structures for two reasons: (1) it provides an abstraction of language-independent properties from structural idiosyncrasies; and (2) it is compositional in nature. The lexical-semantic approach addresses the divergence problem by using a linguistically grounded mapping that has access to parameter settings in the lexicon. We will examine a number of relevant issues including the problem of defining primitives, the issue of interlinguality, the cross-linguistic coverage of the system, and the mapping between the syntactic structure and the interlingua. A detailed example of lexical-semantic composition will be presented.  相似文献   

10.
This paper presents a lexical model dedicated to the semanticrepresentation and interpretation of individual words inunrestricted text, where sense discrimination is difficult toassess. We discuss the need of a lexicon including local inferencemechanisms and cooperating with as many other knowledge sources(about syntax, semantics and pragmatics) as possible. We suggest aminimal representation (that is, the smallest representationpossible) acting as a bridge between a conceptual representation andthe microscopic sense variations of lexical semantics. We describean interpretation method providing one or many alternativecandidate(s) to the word, as representatives of its meaning in thesentence (and text).  相似文献   

11.
This paper describes and discusses some theoretical and practical problems arising from developing a system to combine the structured but incomplete information from machine readable dictionaries (MRDs) with the unstructured but more complete information available in corpora for the creation of a bilingual lexical data base, presenting a methodology to integrate information from both sources into a single lexical data structure. The BICORD system (BIlingual CORpus-enhanced Dictionaries) involves linking entries in Collins English-French and French-English bilingual dictionary with a large English-French and French-English bilingual corpus. We have concentrated on the class of action verbs of movement, building on earlier work on lexical correspondences specific to this verb class between languages (Klavans and Tzoukermann, 1989), (Klavans and Tzoukermann, 1990a), (Klavans and Tzoukermann, 1990b).1 We first examine the way prototypical verbs of movement are translated in the Collins-Robert (Atkins, Duval, and Milne, 1978) bilingual dictionary, and then analyze the behavior of some of these verbs in a large bilingual corpus. We incorporate the results of linguistic research on the theory of verb types to motivate corpus analysis coupled with data from MRDs for the purpose of establishing lexical correspondences with the full range of associated translations, and with statistical data attached to the relevant nodes.  相似文献   

12.
Concrete concepts are often easier to understand than abstract concepts. The notion of abstractness is thus closely tied to the organisation of our semantic memory, and more specifically our internal lexicon, which underlies our word sense disambiguation (WSD) mechanisms. State-of-the-art automatic WSD systems often draw on a variety of contextual cues and assign word senses by an optimal combination of statistical classifiers. The validity of various lexico-semantic resources as models of our internal lexicon and the cognitive aspects pertinent to the lexical sensitivity of WSD are seldom questioned. We attempt to address these issues by examining psychological evidence of the internal lexicon and its compatibility with the information available from computational lexicons. In particular, we compare the responses from a word association task against existing lexical resources, WordNet and SUMO, to explore the relation between sense abstractness and semantic activation, and thus the implications on semantic network models and the lexical sensitivity of WSD. Our results suggest that concrete senses are more readily activated than abstract senses, and broad associations are more easily triggered than narrow paradigmatic associations. The results are expected to inform the construction of lexico-semantic resources and WSD strategies.  相似文献   

13.
We propose a lexical account of event nouns, in particular of deverbal nominalisations, whose meaning is related to the event expressed by their base verb. The literature on nominalisations often assumes that the semantics of the base verb completely defines the structure of action nominals. We argue that the information in the base verb is not sufficient to completely determine the semantics of action nominals. We exhibit some data from different languages, especially from Romance language, which show that nominalisations focus on some aspects of the verb semantics. The selected aspects, however, seem to be idiosyncratic and do not automatically result from the internal structure of the verb nor from its interaction with the morphological suffix. We therefore propose a partially lexicalist approach view of deverbal nouns. It is made precise and computable by using the Montagovian generative lexicon, a type theoretical framework introduced by Bassac, Mery and Retoré in this journal in 2010. This extension of Montague semantics with a richer type system easily incorporates lexical phenomena like the semantics of action nominals in particular deverbals, including their polysemy and (in)felicitous copredications.  相似文献   

14.
VerbNet—the most extensive online verb lexicon currently available for English—has proved useful in supporting a variety of NLP tasks. However, its exploitation in multilingual NLP has been limited by the fact that such classifications are available for few languages only. Since manual development of VerbNet is a major undertaking, researchers have recently translated VerbNet classes from English to other languages. However, no systematic investigation has been conducted into the applicability and accuracy of such a translation approach across different, typologically diverse languages. Our study is aimed at filling this gap. We develop a systematic method for translation of VerbNet classes from English to other languages which we first apply to Polish and subsequently to Croatian, Mandarin, Japanese, Italian, and Finnish. Our results on Polish demonstrate high translatability with all the classes (96% of English member verbs successfully translated into Polish) and strong inter-annotator agreement, revealing a promising degree of overlap in the resultant classifications. The results on other languages are equally promising. This demonstrates that VerbNet classes have strong cross-lingual potential and the proposed method could be applied to obtain gold standards for automatic verb classification in different languages. We make our annotation guidelines and the six language-specific verb classifications available with this paper.  相似文献   

15.
该文吸收已有动词研究的相关成果,提出了动词语义词典开发的相关原则和研制思路,界定并描写了词典中所涉及的相关属性信息,并对词典的总体文件结构及其各个库的信息进行了描写和说明。最终开发了融合词汇语义和句法语义,涵盖词形、词性、释义、义类、义场、句法范畴信息、语义范畴信息、语义句模等多种信息参数的开放性的动词语义知识词典。该词典可以在歧义分化、词义关系考察、句法—语义接口、句模抽取等方面提供支持。  相似文献   

16.
We describe a Chinese lexical semantic resource that consists of 11,765 predicates (mostly verbs and their nominalizations) analyzed with coarse-grained senses and semantic roles. We show that distinguishing senses at a coarse-grained level is a necessary part of specifying the semantic roles and describe our strategies for sense determination for purposes of predicate-argument structure specification. The semantic roles are postulated to account for syntactic variations, the different ways in which the semantic roles of a predicate are realized. The immediate purpose for this lexical semantic resource is to support the annotation of the Chinese PropBank, but we believe it can also serve as stepping stone for higher-level semantic generalizations.  相似文献   

17.
In this paper, we investigate the phenomenon of verb–particle constructions, discussing their characteristics and availability in some lexical resources. Given the limited coverage provided by these resources and the constantly growing number of verb–particle combinations, possible ways of extending their coverage are investigated, taking into account regular patterns found in some productive combinations of verbs and particles. We propose, in particular, the use of a semantic classification of verbs (such as that defined by [English verb classes and alternations – a preliminary investigation, The University of Chicago Press]) as a means to obtain productive verb–particle constructions and the use of the World Wide Web to validate them, and discuss the issues involved in adopting such an approach.  相似文献   

18.
An algorithm for semantic interpretation that integrates the determination of the meaning of verbs, the attachment and meaning of prepositions, and the determination of thematic roles is presented. The parser does not resolve structural ambiguity, which is solely the task of the semantic interpreter. Lexical semantic information about nouns and verbs is applied to the resolution of verb polysemy and modifier attachment. Semantic interpretation is centered on the representation of the meaning of the verb, called verbal concept. Verbal concepts are organized into a classification hierarchy. As long as the meaning of the verb remains unknown, parsing proceeds on a syntactic basis. Once the meaning of the verb is recognized, the semantic component makes sense of the syntactic relations built so far by the parser and of those still to be parsed. The algorithm has been implemented and tested on real–world texts.  相似文献   

19.
A large number of wording choices naturally occurring in English sentences cannot be accounted for on semantic or syntactic grounds. They represent arbitrary word usages and are termed collocations. In this paper, we show how collocations can enhance the task of lexical selection in language generation. Previous language generation systems were not able to account for collocations for two reasons: they did not have the lexical information in compiled form and the lexicon formalisms available were not able to handle the variations in collocational knowledge. We describe an implemented generator, Cook, which uses a wide range of collocations to produce sentences in the stock market domain. Cook uses a flexible lexicon containing a range of collocations, from idiomatic phrases to word pairs that were compiled automatically from text corpora using a lexicographic tool, Xtract. We show how Cook is able to merge collocations of various types to produce a wide variety of sentences.  相似文献   

20.
One may indicate the potentials of an MT system by stating what text genres it can process, e.g., weather reports and technical manuals. This approach is practical, but misleading, unless domain knowledge is highly integrated in the system. Another way to indicate which fragments of language the system can process is to state its grammatical potentials, or more formally, which languages the grammars of the system can generate. This approach is more technical and less understandable to the layman (customer), but it is less misleading, since it stresses the point that the fragments which can be translated by the grammars of a system need not necessarily coincide exactly with any particular genre. Generally, the syntactic and lexical rules of an MT system allow it to translate many sentences other than those belonging to a certain genre. On the other hand it probably cannot translate all the sentences of a particular genre. Swetra is a multilanguage MT system defined by the potentials of a formal grammar (standard referent grammar) and not by reference to a genre. Successful translation of sentences can be guaranteed if they are within a specified syntactic format based on a specified lexicon. The paper discusses the consequences of this approach (Grammatically Restricted Machine Translation, GRMT) and describes the limits set by a standard choice of grammatical rules for sentences and clauses, noun phrases, verb phrases, sentence adverbials, etc. Such rules have been set up for English, Swedish and Russian, mainly on the basis of familiarity (frequency) and computer efficiency, but restricting the grammar and making it suitable for several languages poses many problems for optimization. Sample texts — newspaper reports — illustrate the type of text that can be translated with reasonable success among Russian, English and Swedish.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号