融合先验信息的蒙汉神经网络机器翻译模型 Mongolian-Chinese Neural Machine Translation with Priori Information期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

融合先验信息的蒙汉神经网络机器翻译模型

引用本文：	樊文婷,侯宏旭,王洪彬,武静,李金廷.融合先验信息的蒙汉神经网络机器翻译模型[J].中文信息学报,2018,32(6):36-43.

作者姓名：	樊文婷侯宏旭王洪彬武静李金廷

作者单位：	内蒙古大学计算机学院,内蒙古呼和浩特 010021

基金项目：	国家自然科学基金(61362028)

摘要：	神经网络机器翻译模型在蒙古文到汉文的翻译任务上取得了很好的效果。神经网络翻译模型仅利用双语语料获得词向量,而有限的双语语料规模却限制了词向量的表示。该文将先验信息融合到神经网络机器翻译中,首先将大规模单语语料训练得到的词向量作为翻译模型的初始词向量,同时在词向量中加入词性特征,从而缓解单词的语法歧义问题。其次,为了降低翻译模型解码器的计算复杂度以及模型的训练时间,通常会限制目标词典大小,这导致大量未登录词的出现。该文利用加入词性特征的词向量计算单词之间的相似度,将未登录词用目标词典中与之最相近的单词替换,以缓解未登录词问题。最终实验显示在蒙古文到汉文的翻译任务上将译文的BLEU值提高了2.68个BLEU点。
关键词：	重现神经网络未登录词词向量词性标注
Mongolian-Chinese Neural Machine Translation with Priori Information

FAN Wenting,HOU Hongxu,WANG Hongbin,WU Jing,LI Jinting.Mongolian-Chinese Neural Machine Translation with Priori Information[J].Journal of Chinese Information Processing,2018,32(6):36-43.

Authors:	FAN Wenting HOU Hongxu WANG Hongbin WU Jing LI Jinting

Affiliation:	College of Computer Science, Inner Mongolia University, Hohhot, Inner Mongolia 010021, China

Abstract:	Neural machine translation (NMT) has become a prominent model in Mongolian-Chinese translation task. We implement neural machine translation model with priori information. On one hand,we train word representations using large-scale monolingual corpus to act as the initial word vectors. On the other hand,we add part-of-speech feature for word vector to solve the problem of grammatical ambiguity. To solve the out of vocabulary problem,we use word embedding to calculate the similarity of words,then replace the out-of-vocabulary words by the most similar words who are covered by the target vocabulary. In the task of Mongolian-Chinese machine translation,experimental results show that BLEU increased 2.68 points.

Keywords:	recurrent neural network out-of-vocabulary word embedding part-of-speech

	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏