首页 | 官方网站   微博 | 高级检索  
     

基于源语言句法增强解码的神经机器翻译方法
引用本文:龚龙超,郭军军,余正涛.基于源语言句法增强解码的神经机器翻译方法[J].计算机应用,2022,42(11):3386-3394.
作者姓名:龚龙超  郭军军  余正涛
作者单位:昆明理工大学 信息工程与自动化学院,昆明 650504
云南省人工智能重点实验室(昆明理工大学),昆明 650504
基金项目:国家自然科学基金资助项目(61866020);科技创新2030—“新一代人工智能”重大项目(2020AAA0107904);云南省应用基础研究计划项目(2019FB082)
摘    要:当前性能最优的机器翻译模型之一Transformer基于标准的端到端结构,仅依赖于平行句对,默认模型能够自动学习语料中的知识;但这种建模方式缺乏显式的引导,不能有效挖掘深层语言知识,特别是在语料规模和质量受限的低资源环境下,句子解码缺乏先验约束,从而造成译文质量下降。为了缓解上述问题,提出了基于源语言句法增强解码的神经机器翻译(SSED)方法,显式地引入源语句句法信息指导解码。所提方法首先利用源语句句法信息构造句法感知的遮挡机制,引导编码自注意力生成一个额外的句法相关表征;然后将句法相关表征作为原句表征的补充,通过注意力机制融入解码,共同指导目标语言的生成,实现对模型的先验句法增强。在多个IWSLT及WMT标准机器翻译评测任务测试集上的实验结果显示,与Transformer基线模型相比,所提方法的BLEU值提高了0.84~3.41,达到了句法相关研究的最先进水平。句法信息与自注意力机制融合是有效的,利用源语言句法可指导神经机器翻译系统的解码过程,显著提高译文质量。

关 键 词:自然语言处理  神经机器翻译  句法信息  Transformer  增强解码  外部知识融入  
收稿时间:2021-11-19
修稿时间:2021-11-25

Neural machine translation method based on source language syntax enhanced decoding
Longchao GONG,Junjun GUO,Zhengtao YU.Neural machine translation method based on source language syntax enhanced decoding[J].journal of Computer Applications,2022,42(11):3386-3394.
Authors:Longchao GONG  Junjun GUO  Zhengtao YU
Affiliation:Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming Yunnan 650504,China
Key Laboratory of Artificial Intelligence in Yunnan Province (Kunming University of Science and Technology),Kunming Yunnan 650504,China
Abstract:Transformer, one of the best existing machine translation models, is based on the standard end?to?end structure and only relies on pairs of parallel sentences, which is believed to be able to learn knowledge in the corpus automatically. However, this modeling method lacks explicit guidance and cannot effectively mine deep language knowledge, especially in the low?resource environment with limited corpus size and quality, where the sentence encoding has no prior knowledge constraints, leading to the decline of translation quality. In order to alleviate the issues above, a neural machine translation model based on source language syntax enhanced decoding was proposed to explicitly use the source language syntax to guide the encoding, namely SSED (Source language Syntax Enhanced Decoding). A syntax?aware mask mechanism based on the syntactic information of the source sentence was constructed at first, and an additional syntax?dependent representation was generated by guiding the encoding self?attention. Then the syntax?dependent representation was used as a supplement to the representation of the original sentence and the decoding process was integrated by attention mechanism, which jointly guided the generation of the target language, realizing the enhancement of the prior syntax. Experimental results on several standard IWSLT (International Conference on Spoken Language Translation) and WMT (Conference on Machine Translation) machine translation evaluation task test sets show that compared with the baseline model Transformer, the proposed method obtains a BLEU score improvement of 0.84 to 3.41 respectively, achieving the state?of?the?art results of the syntactic related research. The fusion of syntactic information and self?attention mechanism is effective, the use of source language syntax can guide the decoding process of the neural machine translation system and significantly improve the quality of translation.
Keywords:Natural Language Processing (NLP)  neural machine translation  syntactic information  Transformer  enhanced decoding  external knowledge incorporation  
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号