首页 | 官方网站   微博 | 高级检索  
     

面向文本聚类的实体—动作关联模型研究
引用本文:刘作国,陈笑蓉. 面向文本聚类的实体—动作关联模型研究[J]. 中文信息学报, 2018, 32(5): 22-30
作者姓名:刘作国  陈笑蓉
作者单位:贵州大学 计算机科学与技术学院,贵州 贵阳 550025
基金项目:国家自然科学基金(61363028)
摘    要:该文提出面向文本聚类分析的实体—动作关联模型EARM,探讨汉语语义实体及其行为的描述方法。汉语属于非形态语言,语句没有时态及语态的变化,词类跟句法成分之间也不是简单的一一对应关系。该文提出一种句法成分识别机制,根据词汇类别特征及位置特征识别实体及动作。在句法成分识别的基础上展开句法分析,通过匹配句型特征建立实体—动作关联模型EARM,描述实体的行为及状态。对于嵌套句型等较为复杂的句型结构,需要在句法分析过程中实施动作层次分解,将复杂语句分解为简单的基本句型,以便于挖掘实体—动作关联。考虑到汉语语法比较灵活,语句成分缺省和倒装现象相对普遍,该文提出了倒装句的识别机制,通过匹配接近的句型进行实体移位,调整语序。论述了基于统计模型的EARM权重量化策略,借助语法树的最大公共子图量化文本的相似度并实施聚类,设计并开展了EARM实体—动作分析实验和EARM聚类实验。实验结果表明EARM的分析是准确有效的,聚类结果是合理的。

关 键 词:文本表示模型  实体—动作关联  句型识别  动作层次分解  

A Entity-Action Relationship Model for Text Clustering
LIU Zuoguo,CHEN Xiaorong. A Entity-Action Relationship Model for Text Clustering[J]. Journal of Chinese Information Processing, 2018, 32(5): 22-30
Authors:LIU Zuoguo  CHEN Xiaorong
Affiliation:College of Computer Science and Technology, Guizhou University, Guiyang, Guizhou 550025, China
Abstract:This paper present an Entity-Action Relationship Model (EARM) for text clustering with a purpose to describe Chinese semantic entities and behaviors. Since Chinese is a non-inflection language, we cannot easily find a one-to-one relationship between word properties and syntax elements at the surface level. A syntax element recognition mechanism is designed to recognize entities and actions according to words properties and position characters. Then EARM is built according to sentence patterns so as to describe the entities' behaviors and states. For some complex sentences, e.g. the nested sentences, it is necessary to launch action layer decomposition and simplify them into simple sentences in order to mine Entity-Action Relationship during the period of syntax analysis. For the omission and inversion in the syntaxa recognition mechanism is designed to move entities and reorder sentences by matching inverted sentences with similar sentence patterns. Maximum Common Sub-graphs of syntax trees are introduced to calculate text similarity and take clustering. Finally, the experiment shows that EARM is accurate and effective and the clustering result is reasonable.
Keywords:text expression model    entity-action relationship    sentence patterns recognition    action layer decomposition  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号