首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
Under statistical learning framework, the paper focuses on how to use traditional linguistic findings on anaphora resolution as a guide for mining and organizing contextual features for Chinese co-reference resolution. The main achievements are as follows. (1) In order to simulate "syntactic and semantic parallelism factor", we extract "bags of word form and POS" feature and "bag of seines" feature from the contexts of the entity mentions and incorporate them into the baseline feature set. (2) Because it is too coarse to use the feature of bags of word form, POS tag and seme to determine the syntactic and semantic parallelism between two entity mentions, we propose a method for contextual feature reconstruction based on semantic similarity computation, in order that the reconstructed contextual features could better approximate the anaphora resolution factor of "Syntactic and Semantic Parallelism Preferences". (3) We use an entity-mention-based contextual feature representation instead of isolated word-based contextual feature representation, and expand the size of the contextual windows in addition, in order to approximately simulate "the selectional restriction factor" for anaphora resolution. The experiments show that the multi-level contextual features are useful for co-reference resolution, and the statistical system incorporated with these features performs well on the standard ACE datasets.  相似文献   

2.
Story understanding is one of the important branches of natural language understanding research in AI techniques.The story understanding approach based on Story Parsing Grammar (SPG) involves that SPG is used to represent different abstracting processes of stories with different levels in story understanding and that the story understanding process is converted to the recognition process of stories using the syntactic parser of SPG.This kind of story understanding is called story parsing.In this paper,firstly a subclass of SPG,called Weak Precedence SPG(WPSPG),is defined.Afterwards the syntactic parsing algorithm of WPSPG is studied.An example of story parsing is also given.  相似文献   

3.
Considering the fact that P2P (Peer-to-Peer) systems are self-organized and autonomous, social-control mechanism (like trust and reputation) is essential to evaluate the trustworthiness of participating peers and to combat the selfish, dishonest and malicious peer behaviors. So, naturally, we advocate that P2P systems that gradually act as an important information infrastructure should be multi-disciplinary research topic, and reflect certain features of our society. So, from economic and social perspective, this paper designs the incentive-compatible reputation feedback scheme based on well-known economic model, and characterizes the social features of trust network in terms of efficiency and cost. Specifically, our framework has two distinctive purposes: first, from high-level perspective, we argue trust system is a special kind of social network, and an accurate characterization of the structural properties of the network can be of fundamental importance to understand the dynamics of the system. Thus, inspired by the concept of weighted small-world, this paper proposes new measurements to characterize the social properties of trust system, that is, high global and local efficiency, and low cost; then, from relative low-level perspective, we argue that reputation feedback is a special kind of information, and it is not free. So, based on economic model, VCG (Vickrey-Clarke-Grove)-like reputation remuneration mechanism is proposed to stimulate rational peers not only to provide reputation feedback, but truthfully offer feedback. Furthermore, considering that trust and reputation is subjective, we classify the trust into functional trust and referral trust, and extend the referral trust to include two factors: similarity and truthfulness, which can efficiently reduce the trust inference error. The preliminary simulation results show the benefits of our proposal and the emergence of certain social properties in trust network.  相似文献   

4.
In this paper we study the problem of recommending scientific articles to users in an online community with a new perspective of considering topic regression modeling and articles relational structure analysis simultaneously. First, we present a novel topic regression model, the topic regression matrix factorization (tr-MF), to solve the problem. The main idea of tr-MF lies in extending the matrix factorization with a probabilistic topic modeling. In particular, tr-MF introduces a regression model to regularize user factors through probabilistic topic modeling under the basic hypothesis that users share similar preferences if they rate similar sets of items. Consequently, tr-MF provides interpretable latent factors for users and items, and makes accurate predictions for community users. To incorporate the relational structure into the framework of tr-MF, we introduce relational matrix factorization. Through combining tr-MF with the relational matrix femtorization, we propose the topic regression collective matrix factorization (tr-CMF) model. In addition, we also present the collaborative topic regression model with relational matrix factorization (CTR-RMF) model, which combines the existing collaborative topic regression (CTR) model and relational matrix factorization (RMF). From this point of view, CTR-RMF can be considered as an appropriate baseline for tr-CMF. Further, we demonstrate the efficacy of the proposed models on a large subset of the data from CiteULike, a bibliography sharing service dataset. The proposed models outperform the state-of-the-art matrix factorization models with a significant margin. Specifically, the proposed models are effective in making predictions for users with only few ratings or even no ratings, and support tasks that are specific to a certain field, neither of which has been addressed in the existing literature.  相似文献   

5.
This paper puts forward a text-circled semantic schema by which a special flow chart of cognitive alteration and processing breakdown in Machine Translation (MT) system is clearly presented. Based on the theoretical analysis of textual Garden Path Phenomenon (GPP), we devise a formula to measure the dramatically changeable value of textual GPP. The data-provided evidence in A Farewell to Arms shows the textual GPP can motivate the development of plots and adjust the analyst's original horizon of expectation. Despite the limitation of incompatible, subjective and sample-restricted features involved in the theoretical framework and formula, this computational analysis makes MT system pay more attention to text-circled cognitive alteration rather than only highlight the lexical or syntactic translation, and as a result aims to make the effectiveness of machine translation of the literary text improved.  相似文献   

6.
Co-occurrence networks of Chinese characters are constructed from collections of essays in different periods of China:the ancient Chinese language,the Chinese language in Wei,Jin,and Southern-Northern Dynasties,the recent Chinese language,and the modern Chinese language,and their statistical parameters are studied.It has been found that 99.6% networks have the scale-free feature and 95.0% networks have the smallworld e?ect.This study reveals some commonalities and di?erences among articles in different periods of China from a complex network perspective.There has been a controversial question as to whether the literatures in Wei,Jin,and Southern-Northern Dynasties should belong to the ancient Chinese language or the recent Chinese language in the linguistic study.Our work shows that the statistical parameters of networks in Wei,Jin,and Southern-Northern Dynasties are clearly different from those of networks in the other periods of China,and it seems more reasonable that the literatures in Wei,Jin,and Southern-Northern Dynasties belong to the recent Chinese language.  相似文献   

7.
Nowadays,we are heading towards integrating hundreds to thousands of cores on a single chip.However,traditional system software and middleware are not well suited to manage and provide services at such large scale.To improve the scalability and adaptability of operating system and middleware services on future many-core platform,we propose the pinned OS/services.By porting each OS and runtime system(middleware) service to a separate core(special hardware acceleration),we expect to achieve maximal performance gain and energy efficiency in many-core environments.As a case study,we target on XML(Extensible Markup Language),the commonly used data transfer/store standard in the world.We have successfully implemented and evaluated the design of porting XML parsing service onto Intel 48-core Single-Chip Cloud Computer(SCC) platform.The results show that it can provide considerable energy saving.However,we also identified heavy performance penalties introduced from memory side,making the parsing service bloated.Hence,as a further step,we propose the memory-side hardware accelerator for XML parsing.With specified hardware design,we can further enhance the performance gain and energy efficiency,where the performance can be improved by 20% with 12.27% energy reduction.  相似文献   

8.
Chinese-English machine translation is a significant and challenging problem in information processing.The paper presents an interlingua-based Chinese-English natural language translation system(ICENT).It introduces the realization mechanism of Chinses language analysis,which contains syntactic parsing and semantic analyzing and gives the design of interlingua in details .Experimental results and system evaluation are given .The sesult is satisfying.  相似文献   

9.
In the paper an intelligent speech production system is established by using language information processing technology.The concept of bi-directional grammar is proposed in Chinese language information processing and a corresponding Chinese characteristic network is completed.Correct text can be generated through grammar parsing and some additional rules.According to the generated text the system generates speech which has good quality in naturalness and intelligibility using Chinese Text-to-Speech Conversion System.  相似文献   

10.
By employing the Peer-to-Peer (P2P) model, which is considered as a promising approach to solve many problems in distributed environment, we present a distributed network intrusion detection system named PeerIDS - an IDS solution which values the properties of feasibility, durability and scalability most. Viewing the problem from a different perspective as against its counterparts, PeerIDS can provide the networked computation environment with robust and scalable protection while still staying efficient with the bursting of both types and traffic of malicious attacks through automatically and evenly distributing the intrusion detection workload among all the cooperating PeerIDS instances. Compared with many other distributed intrusion detection approaches, no single point of failure can be found in a farm of synergized PeerIDS instances. Moreover, PeerIDS entails almost no additional administration work after the installation and first time setup.  相似文献   

11.
梳理了汉语语法学界对“句式”这一术语的认识分歧;从中文信息处理角度分析了当前本领域句法分析和树库构建缺乏句式结构的现状;对黎氏语法形式化研究作了一个最新的综述,指出其在句式结构方面的优势和仍存在的不足;以黎氏语法图解法为原型改造设计出一种新型的汉语图解析句法,具体包括图形化的句法结构表示和结构化的XML存储格式。  相似文献   

12.
引入标点处理的层次化汉语长句句法分析方法   总被引:6,自引:1,他引:6  
在分析汉语标点符号用法和句法功能的基础上,本文提出了一种新的面向汉语长句的层次化句法分析方法。这种方法和传统的不考虑标点符号的一遍分析方法的主要区别在于两个方面:第一,利用部分标点符号的特殊功能将复杂长句分割成子句序列,从而把整句的句法分析分成两级来进行。这种“分而治之”的策略大大降低了在传统的一遍分析方法中同时识别子句或短语之间的句法关系以及子句和短语内部成分的句法关系的困难。第二,从大规模树库中提取包含所有标点符号的语法规则和相应概率分布信息,有利于句法分析和歧义消解。实验证明我们的方法与传统的一遍图表(chart)分析方法相比,能够大大减少时间消耗和歧义边的个数,并且提高了复杂长句分析的正确率和召回率约7%。  相似文献   

13.
为解决句法分析任务中的块边界识别和块内结构分析问题,该文基于概念复合块描述体系进行了块分析探索。通过概念复合块与以往的基本块和功能块描述体系的对比分析,深入挖掘了概念复合块自动分析的主要难点所在,提出了一种基于“移进-归约”模型的汉语概念复合块自动分析方法。在从清华句法树库TCT中自动提取的概念复合块标注库上,多层次、多角度对概念复合块自动分析性能进行了纵向与横向评估,初步实验结果证明了该分析方法对简单概念复合块分析的有效性,为后续进行更复杂的概念复合块的句法语义分析研究打下了很好的基础。  相似文献   

14.
一个基于GLR算法的英汉机器翻译浅层句法分析器   总被引:5,自引:0,他引:5  
浅层句法分析是指短语级的自然语言句法分析。在研制MatLink英汉机器翻译系统的过程中,提出了扩充的CFG文法用于描述英语短语句法,并改进了GLR算法,设计实现了用于英汉翻译的英语浅层句法分析器。该分析器采用多出口的分析表结构,引入符号映射函数实现短语边界的自动识别,用孩子兄弟树描述短语的句法结构,并通过短语转换模式实现源语言向目标语言的短语级转换。最后,通过对一个实例句子的分析阐述了该浅层句法分析器的设计思想和工作过程。  相似文献   

15.
汉语句子的组块分析体系   总被引:26,自引:1,他引:25  
周强  孙茂松  黄昌宁 《计算机学报》1999,22(11):1158-1165
介绍了一种描述能力介于线性词序列和完整句法树表示之间的浅层句法知识描述体系-组块分析体系,并详细讨论了其中两大部分;词界块和成分组的基本内容及其自动识别算法,在此基础上,提出了一种分阶段构造汉语树库的新设想,即先构造组块库,再构造树库,进行了一系列句法分析和知识获取实验,包括1)自然识别汉语最长名词短语;2)自动获取汉语句法知识等。所有这些工作都证明了这种知识描述体系的实用性和有效性。  相似文献   

16.
句法分析是自然语言处理领域中重要的基础研究问题之一。近年来,基于统计学习模型的句法分析方法研究受到了广泛关注,多种模型与算法先后被提出。从采用的学习模型和算法类型着手,该文系统地对各种主流和前沿方法进行了归纳与分类,着重对各类模型和算法的思想进行了分析和对比,并对中文句法分析的研究现状进行了综述;最后,对句法分析下一步的研究方向与趋势进行了展望。  相似文献   

17.
面向数据的句法分析技术   总被引:7,自引:1,他引:7  
面向数据的分析技术(Data-Oriented Parsing ,DOP) 首先由Scha (1990) 年提出。该处理技术具体表达了这样的假设:人类对语言的领悟和创造依赖于以往具体的语言经验,而不是依赖于抽象的语法规则。DOP 技术框架可以分为: (1) 建立包括以往成功分析的语言经验的标注语料库; (2) 从语料库中抽取片段单元来构造新语言的分析过程;(3) 计算分析过程的概率。DOP 模型建立在包含大量语言现象的语料库基础上,把经过标注的语料库看作一个语法( Grammar) 。当输入一个新的语言现象时,系统通过对语料库中片段单元的组合运算来组合分析过程。根据所有片段单元的共现频率来评估最有可能性的分析结果。本文详细论述了语料库的标注,片段单元的定义,组合分析和概率计算。  相似文献   

18.
统计句法分析建模中基于信息论的特征类型分析   总被引:2,自引:0,他引:2  
统计句法分析利用概率评价模型评价每棵选句法树存在的可能性,选择概率值最高的候选句法树作为最终的句法分析结果。因此,统计句法分析的核心是一个概率评价模型,而各种概率评价模型的本质区别主要在于它们分别是根据上下文中的哪些特征来赋予句法树概率的。在统计句法分析研究领域,虽然已经提出了大量的概率评价模型,然而,不同的模型用得到了不同类型的特征,如何评价这些特征类型对于句法分析的作用呢?针对以上的问题,本研究为统计句法分析提出了一种特征类型的分析模型,该模型可以从信息论的角度量化地分析不同类型的上下文特征对于句法结构的预测作用。其基本思想是利用信息论中熵与条件熵的度量来显示一个特征类型是否抓住了预测句法结构的主要信息。如果加入某个特征类型之后当前句法结构的不确定性(熵)明显下降,则认为该特征类型抓住了上下文中影响句法结构的某些主要信息。特征类型分析的信息论模型利用预测信息量、预测信息增益、预测信息关联度以及预测信息总量四种度量从不同的仙量化地分析各种特征类型及特征类型组合对于当前目标的预测作用。实验以Penn TreeBank为训练集,将上下文中不同的特征类型对于句法分析规则的预测作用进行了系统的量化分析,得出了一系列有关不同特征类型及特征类型组合对句法结构的预测作用的结论。  相似文献   

19.
结合结构下文及词汇信息的汉语句法分析方法   总被引:2,自引:0,他引:2  
针对句法分析中上下文无关语法模型对句子信息利用的不足,通过融入结构下文和部分词汇信息,提出两种基于概率上下文无关语法模型的短语结构消歧方法,以达到消解结构歧义的目的;引入分层分析的算法,通过损失一定的时间效率使得在提高分析准确率的同时保证分析结果的全面性。实验结果表明,融入结构下文及词汇信息的汉语句法分析方法,利用了更多的句子信息,与上下文无关语法相比有着更强的消歧能力。  相似文献   

20.
通过大量记录的正确处理实例的分析过程和结果,在句法分析时,搜寻近似实例或片段,匹配相似语言结构和分析过程,这样的句法分析体现了“语言分析依赖经验”的思想。基于这样的思想,本文提出了一种基于模式匹配的句法分析的方法,即从大规模标注语料树库中抽取出蕴含的句法模式,构建模式、子模式及其规约库,句法分析的过程转化为模式匹配和局部模式转换的过程。实验表明句法分析的各项指标都比较理想,尤其是处理效率很高,平均句耗时0.46秒(CPU为Intel双核2.8G,内存为1G)。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号