首页 | 官方网站   微博 | 高级检索  
     

基于多特征的搭配翻译模型研究
引用本文:陈鄞,吕雅娟,李生.基于多特征的搭配翻译模型研究[J].哈尔滨工业大学学报,2007,39(11):1790-1795.
作者姓名:陈鄞  吕雅娟  李生
作者单位:1. 哈尔滨工业大学,国家教育部微软重点实验室,哈尔滨,150001
2. 微软亚洲研究院,北京,100080
摘    要:提出一种新的搭配(Collocation)翻译方法,该方法在最大熵模型框架下,充分利用各种从单语和双语语料库中获取的信息.与过去的过分依赖双语语料库的方法不同,新的搭配翻译方法可以使用单语语料库训练翻译模型,在搭配内在信息的基础上,进一步引入了上下文信息.采用EM(Expectation Maximization)算法估计基于上下文的词汇翻译概率.本模型同时具备集成来自双语语料库信息的能力.实验表明,本文方法优于现有的基于单语语料库的搭配翻译方法,在双语语料库的支持下还可以得到更好的结果.

关 键 词:搭配  最大熵  单语语料库  EM算法
文章编号:0367-6234(2007)11-1790-06
收稿时间:2005-10-28
修稿时间:2005年10月28

Study on the feature-rich collocation translation
CHEN Yin,LU Ya-juan,LI Sheng.Study on the feature-rich collocation translation[J].Journal of Harbin Institute of Technology,2007,39(11):1790-1795.
Authors:CHEN Yin  LU Ya-juan  LI Sheng
Abstract:This paper proposes a new method for collocation translation. We exploit a collocation translation model that can make full use of all available information derived from both monolingual and bilingual corpora. Instead of heavily relying on bilingual parallel corpora, our approach can train translation models using monolingual corpora. Both inside-collocation information and contextual information are exploited in our model. The EM algorithm is applied to estimate contextual word translation probabilities using a monolingual corpus. Our model also has the ability to integrate bilingual derived features if they are available. Experiments show that our approach outperforms the existing monolingual corpus based on methods in collocation translation and achieves better results when making use of available bilingual corpus.
Keywords:collocation  maximum entropy  monolingual corpora  expectation maximization algorithm
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号