首页 | 官方网站   微博 | 高级检索  
     

基于成分共享的英汉小句对齐语料库标注体系研究
引用本文:葛诗利,宋柔.基于成分共享的英汉小句对齐语料库标注体系研究[J].中文信息学报,2020,34(6):27-35.
作者姓名:葛诗利  宋柔
作者单位:1.广东外语外贸大学 语言与人工智能实验室,广东 广州 510420;
2.北京语言大学 信息科学学院,北京 100083
基金项目:国家自然科学基金(61672175);国家语委重点项目(ZDI135-30)
摘    要:英汉小句对齐语料库服务于英语和汉语小句的语法结构对应关系研究和应用,对于语言理论和语言翻译(包括人的翻译和机器翻译)有重要意义。前人的语法理论和相关语料库的工作对于小句复合体和小句的界定缺乏充分研究,在理论上有缺陷,难以支持自然语言处理的应用。该文首先为英汉小句对齐语料库的建设做理论准备。从近年提出的汉语小句复合体的理论出发,该文界定了成分共享的概念,基于话头共享和引语共享来界定英语的小句和小句复合体,使小句和小句复合体具有功能的完整性和单一性。在此基础上,该文设计了英汉小句对齐的标注体系,包括英语NT小句标注和汉语译文生成及组合。语料库的标注表明,在小句复合体层面上英汉翻译涉及到的结构变换,其部件可以限制为英语小句和话头、话体,无须涉及话头和话体内部的结构。基于这些工作的英汉小句对齐语料库为语言本体研究和英汉语言对比、英汉机器翻译等应用提供了结构化的标注样本。

关 键 词:成分共享  话头共享  小句  小句复合体  英汉机器翻译  

English-Chinese Clause Alignment Corpus Tagging System Based on Component Sharing
GE Shili,SONG Rou.English-Chinese Clause Alignment Corpus Tagging System Based on Component Sharing[J].Journal of Chinese Information Processing,2020,34(6):27-35.
Authors:GE Shili  SONG Rou
Affiliation:1.Laboratory of Language and Artificial Intalligerce, Guangdong University of Foreign Studies, Guangzhou, Guangdong 510420, China;
2.School of Information Science, Beijing Language and Culture University, Beijing 100083, China
Abstract:English-Chinese clause alignment corpus serves the study and application of grammatical structure correspondence between English and Chinese clauses. It is of great significance to linguistic theory and language translation (including human translation and machine translation). Previous work on grammar theory and corpus lacks sufficient research on definitions of clause and clause complex. It is theoretically defective and difficult to support the application of natural language processing. Firstly, this paper makes theoretical preparations for the construction of English-Chinese clause alignment corpus. Starting from the theory of Chinese clause complex put forward in recent years, this paper defines the concept of component sharing, and further defines English clause and clause complex based on naming sharing and quotation sharing, which endows clause and clause complex with integrity and unity. Based on the study, an English-Chinese clause alignment annotation system is designed, including English NT clause tagging and Chinese translation generation and combination. The corpus annotation shows that, at the clause complex level, the components involved by the structural transformation in English-Chinese translation can be limited to English clauses, and related naming and telling, without involving the internal structure of namings and tellings. Based on these works, the English-Chinese clause aligned corpus provides research samples for linguistic research, English-Chinese language comparison, and English-Chinese machine translation.
Keywords:component sharing  naming sharing  clause  clause complex  English-Chinese machine translation  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号