首页 | 官方网站   微博 | 高级检索  
     

融合EMD最小化双语词典的汉—越无监督神经机器翻译
引用本文:薛明亚,余正涛,文永华,于志强.融合EMD最小化双语词典的汉—越无监督神经机器翻译[J].中文信息学报,2021,35(3):43-50.
作者姓名:薛明亚  余正涛  文永华  于志强
作者单位:1.昆明理工大学 信息工程与自动化学院,云南 昆明 650500;
2.昆明理工大学 云南省人工智能重点实验室,云南 昆明 650500
基金项目:国家重点研发计划(2019QY1801);国家自然科学基金(61732005,61672271,61761026,61762056,61866020);云南省高新技术产业专项(201606)
摘    要:神经机器翻译在平行语料充足的任务中能取得很好的效果,然而对于资源稀缺型语种的翻译任务则往往效果不佳。汉语和越南语之间没有大规模的平行语料库,在这项翻译任务中,该文探索只使用容易获得的汉语和越南语单语语料,通过挖掘单语语料中词级别的跨语言信息,融合到无监督翻译模型中提升翻译性能;该文提出了融合EMD(Earth Mover's Distance)最小化双语词典的汉—越无监督神经机器翻译方法,首先分别训练汉语和越南语的单语词嵌入,通过最小化它们的EMD训练得到汉越双语词典,然后再将该词典作为种子词典训练汉越双语词嵌入,最后利用共享编码器的无监督机器翻译模型构建汉—越无监督神经机器翻译方法。实验表明,该方法能有效提升汉越无监督神经机器翻译的性能。

关 键 词:无监督学习  EMD  汉语—越南语  神经机器翻译  
收稿时间:2019-12-19

Chinese-Vietnamese Unsupervised Neural Machine Translation Based on EMD Minimal Bilingual Dictionary
XUE Mingya,YU Zhengtao,WEN Yonghua,YU Zhiqiang.Chinese-Vietnamese Unsupervised Neural Machine Translation Based on EMD Minimal Bilingual Dictionary[J].Journal of Chinese Information Processing,2021,35(3):43-50.
Authors:XUE Mingya  YU Zhengtao  WEN Yonghua  YU Zhiqiang
Affiliation:1.Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan 650500, China;2.Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, Yunnan 650500, China
Abstract:Neural machine translation (NMT) has achieved good results in tasks with sufficient parallel corpora, but often has poor results in translation tasks with scarce resources. To address NMT between Chinese and Vietnamese without large-scale parallel corpus, we explore the use of easily available Chinese and Vietnamese monolingual corpora by mining cross-language information at the word level. A Chinese-Vietnamese unsupervised neural machine translation method that incorporates Earth Mover's Distance(EMD) to minimize bilingual dictionaries is proposed. First, monolingual word embeddings for Chinese and Vietnamese are trained independently, and a Chinese-Vietnamese bilingual dictionary is obtained by minimizing their EMD. The dictionary is then used as a seed dictionary to train the Chinese-Vietnamese bilingual word embeddings. Finally, the shared encoder unsupervised machine translation model is applied to construct a Chinese-Vietnamese unsupervised neural machine translation. Experiments show that this method can effectively improve the performance of Chinese-Vietnamese unsupervised neural machine translation.
Keywords:unsupervised learning  Earth Mover's Distance  Chinese-Vietnamese  neural machine translation  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号