一种基于粗糙集的大规模语料库语言学知识发现模型 A Model for Linguistic Knowledge Discovery from Large-Scale Corpuses Based on Rough Set Techniques期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于粗糙集的大规模语料库语言学知识发现模型

引用本文：	陈清才,王晓龙,赵健.一种基于粗糙集的大规模语料库语言学知识发现模型[J].计算机工程与科学,2004,26(5):56-61.

作者姓名：	陈清才王晓龙赵健

作者单位：	哈尔滨工业大学计算机科学与技术学院,黑龙江,哈尔滨,150001

基金项目：	国家自然科学基金资助项目 ( 60 175 0 2 0 )

摘要：	文中首先通过语言学特征表来对文本信息进行结构化处理，同时实现了对远距离约束的表示；然后借助于面向个体的数据泛化算法来去除语言学特征表中的冗余信息，并利用规则抽取算法过滤特征表中不一致的部分，从而为相应的自然语言处理任务建立了一个一致、高效的规则库。最后，本文研究了模型在汉语词义排歧以及音字转换中的应用，在采用了动态规则平滑算法后，分别获得了0．93和0．95的判别精度以及0．92和0．89的覆盖率，这一结果显示模型具有很高的实用性。
关键词：	浯言学知识发现粗糙集自动排歧汉语音字转换音字转换动态规则平滑算法
文章编号：	1007-130X(2004)05-0056-06
修稿时间：	2002年3月3日
A Model for Linguistic Knowledge Discovery from Large-Scale Corpuses Based on Rough Set Techniques

CHEN Qing-cai,WANG Xiao-long,ZHAO Jian.A Model for Linguistic Knowledge Discovery from Large-Scale Corpuses Based on Rough Set Techniques[J].Computer Engineering & Science,2004,26(5):56-61.

Authors:	CHEN Qing-cai WANG Xiao-long ZHAO Jian

Abstract:	In the paper, a linguistic feature table (LFT) is first provided to structurize textural information and to represent long-distance constraints. Then, the redundant information in the LFT is wiped off by a kind of object-oriented data generalization algorithm, inconsistent objects are filtered through the rule extraction algorithm and a consistent and efficient rule base is constructed for the NLP application. At last, the applications in Chinese word sense disambiguation and Chinese pinyin-to-character conversion are presented. In the case of introducing a dynamic rule smoothing algorithm, our experiment achieves 0.93 and 0.95 of decision precisions and 0.92 and 0.89 of rule recall rates with respect to these two applications, which shows the good performance of the model.

Keywords:	linguistic knowledge discovery rough set disambiguation Chinese pinyin-to-character conversion
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程与科学》浏览原始摘要信息
	点击此处可从《计算机工程与科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏