大标记集汉语字(词)Markov 语言模型的建立 Construction and Application of Large Symbol Set of Chinese Character/Word Markov Language Model期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

大标记集汉语字(词)Markov 语言模型的建立

引用本文：	王轩,李巍,王晓龙,赵淑香.大标记集汉语字(词)Markov 语言模型的建立[J].哈尔滨工业大学学报,1997(5).

作者姓名：	王轩李巍王晓龙赵淑香

作者单位：	哈尔滨工业大学计算机系应用软件教研室

基金项目：	国家八六三高技术计划,霍英东基金

摘要：	给出了一种基于快速排序和归并排序的高阶汉语大标记集Ｍａｒｋｏｖ统计语言模型的统计算法，并对算法的时间复杂性和空间复杂性进行了分析。依据这种统计算法，设计实现了一个汉语字（词）概率统计系统。通过对上千万字的汉语语料的统计，建立起了汉语字（词）一元、二元和三元Ｍａｒｋｏｖ模型，并对统计结果进行了分析。
关键词：	Markov模型统计语言模型
Construction and Application of Large Symbol Set of Chinese Character/Word Markov Language Model

Wang Xuan\ Li Wei\ Wang Xiaolong\ Zhao Shuxiang.Construction and Application of Large Symbol Set of Chinese Character/Word Markov Language Model[J].Journal of Harbin Institute of Technology,1997(5).

Authors:	Wang Xuan\ Li Wei\ Wang Xiaolong\ Zhao Shuxiang

Affiliation:	Dept of Computer Science and Engineering

Abstract:	This paper puts forward an algorithm which combines quick sorting and merge sorting to construct a large symbol set of Chinese character/word Markov Language models The time and the space complexity are discussed According to the algorithm,a Chinese character/word probability distribution computing system is introduced The unigram,bigram and trigram Chinese larguage models based on more than twenty million Chinese characters,and the results are analyzed From the experimental results we find statistical language models have a good performance in approaching the near constraint relationship of the Chinese language

Keywords:	Markov model statistical language model
本文献已被 CNKI 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏