首页 | 官方网站   微博 | 高级检索  
     

大标记集汉语字(词)Markov 语言模型的建立
引用本文:王轩,李巍,王晓龙,赵淑香.大标记集汉语字(词)Markov 语言模型的建立[J].哈尔滨工业大学学报,1997(5).
作者姓名:王轩  李巍  王晓龙  赵淑香
作者单位:哈尔滨工业大学计算机系应用软件教研室
基金项目:国家八六三高技术计划,霍英东基金
摘    要:给出了一种基于快速排序和归并排序的高阶汉语大标记集Markov统计语言模型的统计算法,并对算法的时间复杂性和空间复杂性进行了分析。依据这种统计算法,设计实现了一个汉语字(词)概率统计系统。通过对上千万字的汉语语料的统计,建立起了汉语字(词)一元、二元和三元Markov模型,并对统计结果进行了分析。

关 键 词:Markov模型  统计语言模型

Construction and Application of Large Symbol Set of Chinese Character/Word Markov Language Model
Wang Xuan\ Li Wei\ Wang Xiaolong\ Zhao Shuxiang.Construction and Application of Large Symbol Set of Chinese Character/Word Markov Language Model[J].Journal of Harbin Institute of Technology,1997(5).
Authors:Wang Xuan\ Li Wei\ Wang Xiaolong\ Zhao Shuxiang
Affiliation:Dept of Computer Science and Engineering
Abstract:This paper puts forward an algorithm which combines quick sorting and merge sorting to construct a large symbol set of Chinese character/word Markov Language models The time and the space complexity are discussed According to the algorithm,a Chinese character/word probability distribution computing system is introduced The unigram,bigram and trigram Chinese larguage models based on more than twenty million Chinese characters,and the results are analyzed From the experimental results we find statistical language models have a good performance in approaching the near constraint relationship of the Chinese language
Keywords:Markov model  statistical language model
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号