首页 | 官方网站   微博 | 高级检索  
     

基于全局折扣的统计语言模型平滑技术
引用本文:黄永文,何中市.基于全局折扣的统计语言模型平滑技术[J].重庆大学学报(自然科学版),2005,28(8):51-55.
作者姓名:黄永文  何中市
作者单位:重庆大学,计算机学院,重庆,400030;重庆大学,计算机学院,重庆,400030
摘    要:数据平滑是用来解决统计语言模型在实际应用中遇到的数据稀疏问题.现有平滑技术利用不同的折扣和补偿策略来处理数据稀疏问题,在计算复杂性与合理性方面各有其优缺点.针对二元模型,笔者提出了一种基于全局折扣GD(Global Discount)的平滑技术,其基本思想是对模型中每个二元对的频率值都进行不同程度的折扣,并用低阶模型对零概率事件进行补偿,通过极小化困惑度原则体现了模型的合理性.实验结果表明该平滑技术优于目前常用的Katz平滑技术.

关 键 词:统计语言模型  平滑技术  全局折扣  困惑度
文章编号:1000-582X(2005)08-0051-05
收稿时间:2005-04-05
修稿时间:2005年4月5日

Smoothing Technique for Statistical Language Model Based on Global Discount
HUANG Yong-wen,HE Zhong-Shi.Smoothing Technique for Statistical Language Model Based on Global Discount[J].Journal of Chongqing University(Natural Science Edition),2005,28(8):51-55.
Authors:HUANG Yong-wen  HE Zhong-Shi
Abstract:Smoothing techniques are mainly used to solve the problem of sparse data for statistical language model. The present smoothing techniques deal with the data sparse problem using different discount and compensate strategy, and they have different merit or shortcoming on complexity and rationality. This paper presents a new kind of smoothing technique based on global discount for Bi-gram model. The model parameters, probabilities for bigram, are discounted according to frequency of bigram, and are compensated according to lower-level model for unseen events in the model, whose rationality is indicated by minimizing the perplexity. Experiment results show that the technique is superior to commonly used Katz smoothing technique.
Keywords:statistical language model  smoothing technique  global discount  perplexity
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《重庆大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《重庆大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号