基于双语语料库的翻译等价对自动抽取 Automatic Extraction of Translational Equivalence Based on Bilingual Corpora期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于双语语料库的翻译等价对自动抽取

引用本文：	吕雅娟,李生,赵铁军,杨沐昀.基于双语语料库的翻译等价对自动抽取[J].高技术通讯,2003,13(5):19-24.

作者姓名：	吕雅娟李生赵铁军杨沐昀

作者单位：	哈尔滨工业大学计算机科学与技术学院,哈尔滨,150001

基金项目：	863计划(2001AA114101)资助项目。

摘要：	提出了一种利用双语语料库自动抽取多词翻译等价对的方法。首先利用N—gram模型获得候选翻译单元，然后根据统计同现计算候选等价对的翻译概率，并用贪心策略实现翻译等价对的自动抽取。在翻译概率的计算中对3种常用的统计同现测度进行了比较。实验表明，当语料规模较小时，对数似然比(Log Likelihood Ratio)测度对于翻译等价对的抽取具有较好的效果。与现有方法相比，该方法较好地解决了翻译等价对抽取中多词单元对应及间接相关问题。
关键词：	双语语料库自动抽取 N－gram模型翻译概率计算机知识获取候选翻译单元
Automatic Extraction of Translational Equivalence Based on Bilingual Corpora

Lu Yajuan,Li Sheng,Zhao Tiejun,Yang Muyun.Automatic Extraction of Translational Equivalence Based on Bilingual Corpora[J].High Technology Letters,2003,13(5):19-24.

Authors:	Lu Yajuan Li Sheng Zhao Tiejun Yang Muyun

Abstract:	This paper describes a method to acquire multi-word translational equivalences from English-Chinese parallel corpora. Translation candidates are firstly obtained using N-gram model. Then, an iterative algorithm is used to extract translation equivalences according to statistical translation measures. Three statistical translation measures: Dice coefficient, Phi-Square Coefficient and Log Likelihood Ratio are compared in experiments and it is proved that Log Likelihood Ratio works better when training corpus is small. Compared with previous works, the proposed method solves the difficulty of multi-word unit correspondences and the problem of indirect association. Experiments on real corpus produced very promising results.

Keywords:	Bilingual corpora Translational equivalence N-gram Knowledge acquisition
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏