首页 | 官方网站   微博 | 高级检索  
     

基于语义构词的汉语词语语义相似度计算
引用本文:康司辰,刘 扬.基于语义构词的汉语词语语义相似度计算[J].中文信息学报,2017,31(1):94-101.
作者姓名:康司辰  刘 扬
作者单位:1. 北京大学 中国语言文学系,北京 100871;
2. 北京大学 计算语言学研究所,北京 100871;
3. 北京大学 计算语言教育部重点实验室,北京 100871
基金项目:国家社科基金(16BYY137);国家重点基础研究发展计划资助项目(2014CB340504);国家社科基金(12&ZD119)
摘    要:汉语词语语义相似度计算,在中文信息处理的多种应用中扮演至关重要的角色。基于汉语字本位的思想,我们采用词类、构词结构、语素义等汉语语义构词知识,以“语素概念”为基础,计算汉语词语语义相似度。这种词义知识表示简单、直观、易于拓展,计算模型简洁、易懂,采用了尽可能少的特征和参数。实验表明,该文方法在典型“取样词对”上的表现突出,其数值更符合人类的感性认知,且在全局数据上也表现出了合理的分布规律。

关 键 词:词语语义相似度计算  语义构词  词义知识表示  语素概念  

Semantic Word-formation Based Chinese Word Similarity Computing
KANG Sichen,LIU Yang.Semantic Word-formation Based Chinese Word Similarity Computing[J].Journal of Chinese Information Processing,2017,31(1):94-101.
Authors:KANG Sichen  LIU Yang
Affiliation:1. Department of Chinese Language and Literature, Peking University, Beijing 100871, China;
2. Institute of Computational Linguistics, Peking University, Beijing 100871, China;
3. Key Laboratory of Computational Linguistic(Ministry of Education), Peking University, Beijing 100871, China
Abstract:Chinese word similarity computing plays an important role in the Chinese information processing. Based on the notion of character-orientation, Chinese semantic word-formation knowledge, including word POS, word-formation pattern and morphemic concepts, is employed to compute Chinese word similarity. This lexical knowledge representation is simple, intuitive and easy to expand and the model is straight-forward, with characteristics and parameters adopted as less as possible. Experimental results show that the approach is promising for the typical sampling word pair. Also, the numerical values of similarity are more in line with human cognition and present a reasonable distribution of the global data.
Keywords:Chinese word similarity computing  Chinese semantic word-formation  lexical knowledge representation  morphemic concepts  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号