首页 | 官方网站   微博 | 高级检索  
     

基于《知网》的汉语未登录词语义相似度计算
引用本文:张瑞霞,杨国增,吴慧欣. 基于《知网》的汉语未登录词语义相似度计算[J]. 中文信息学报, 2012, 26(1): 16-22
作者姓名:张瑞霞  杨国增  吴慧欣
作者单位:1. 华北水利水电学院 信息工程学院,河南 郑州 450011;2.郑州师范学院 数学系 河南 郑州 450044
基金项目:河南省科技厅基础研究项目(082300410140)
摘    要:提出了一种基于《知网》的汉语未登录词语义相似度计算方法。该方法首先参照意合网络理论构造了语义关系匹配函数;接着在用概念图表示未登录词语义信息的基础上,根据节点在语义表示中的作用不同对其分类;然后应用匹配函数对弧、节点对及节点对集进行分类;最后设计了未登录词的整体相似度、不同类型节点对及节点对集相似度的计算方法。该方法能够合理分类未登录词的语义信息并能将其充分利用到计算过程中,实验结果证明此方法是有效的。

关 键 词:《知网》  语义相似度  未登录词  概念图  

A New Measure of Semantic Similarity between Unknown Chinese Words Based on HowNet
ZHANG Ruixia , YANG Guozeng , WU Huixin. A New Measure of Semantic Similarity between Unknown Chinese Words Based on HowNet[J]. Journal of Chinese Information Processing, 2012, 26(1): 16-22
Authors:ZHANG Ruixia    YANG Guozeng    WU Huixin
Affiliation:1. School of Information Engineering, North China University of Water Conservancy
and Electric Power, Henan Zhengzhou 450011, China;
2. Department of Mathematics, Zhengzhou Normal University, Henan Zhengzhou 450044, China
Abstract:A new measure based on HowNet is put forward to compute the semantic similarity between unknown Chinese words.Firstly,the semantic matching function is constructed according the YiHeNet;secondly,nodes in the concept graphs of unknown Chinese words are classified according to their different effects in expressing the semantic information;then,the three notions of arcs,node pairs and node pair sets are classified according to matching functions;finally,similarity measures are designed to compute the similarities of unknown Chinese words,similarities of different node pairs and similarities of different node pair sets.This new measure helps to classify the semantic information of those unknown words and to apply it into the computing course,and experiments prove its effectiveness.
Keywords:HowNet  semantic similarity  unknown words  concept graphs
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号