开放式汉语自动分词的学习机制 Learning Mechanism of Chinese Automatic Words Segmentation Based on Open Corpus期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

开放式汉语自动分词的学习机制

引用本文：	黄德根,岳函,李丽双.开放式汉语自动分词的学习机制[J].小型微型计算机系统,2005,26(8):1406-1410.

作者姓名：	黄德根岳函李丽双

作者单位：	大连理工大学,计算机系,辽宁,大连,116024

基金项目：	国家自然科学基金（60373095）资助

摘要：	针对统计模型词典动态适应性不高及大规模语料库建设中人工代价昂贵的问题，在基于统计的汉语自动分词基础上，引入了以错误驱动为基础的开放学习机制，通过有监督和无监督相结合的学习方法，建立了包含可信度修正和部分三元语法信息的多元分词模型，讨论了切分算法和人机交互中的具体问题，并通过实验确定模型系数和闽值．实验结果表明，该分词模型经三次学习后。闭式分词中的切分错误有78．44％得到纠正，切分正确率达到99．43％，开式分词中的切分错误有63．56％得到纠正，切分正确率达到98．46％．系统具有较高的实用价值．
关键词：	自动分词开放式学习机制错误驱动
文章编号：	1000-1220（2005）08-1406-05
收稿时间：	2004-02-12
修稿时间：	2004-02-12
Learning Mechanism of Chinese Automatic Words Segmentation Based on Open Corpus

HUANG De-gen,Yue HAN,LI Li-shuang.Learning Mechanism of Chinese Automatic Words Segmentation Based on Open Corpus[J].Mini-micro Systems,2005,26(8):1406-1410.

Authors:	HUANG De-gen Yue HAN LI Li-shuang

Abstract:	For improving the dynamic adaptability and cutting manual cost, this paper presents the mechanism of open learning based on error-driven for Chinese automatic word segmentation from statistics. Using the method of supervised and unsupervised learning, word segmentation model including reliability revising and partial tri-gram information is set up. Several problems such as segmentation algorithm and human-computer interface during system implementing are discussed. The parameters and thresholds of the model are determined through the experiments. The test result shows that, after three times learning with the open learning model, the close test accuracy can reach 99.43% while the open one reach 98.46% and 78.44% errors are corrected in close test and 63.56% in open test. The practical value of system has been greatly improved.

Keywords:	automatic word segmentation open learning mechanism error-driving
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏