首页 | 官方网站   微博 | 高级检索  
     

基于自动句对齐的相似古文句子检索
引用本文:郭锐,宋继华,廖敏.基于自动句对齐的相似古文句子检索[J].中文信息学报,2008,22(2):87-91,105.
作者姓名:郭锐  宋继华  廖敏
作者单位:北京师范大学 信息科学与技术学院 北京 100875
基金项目:国家社科基金资助项目(05BYY022)
摘    要:随着语料库语言学的兴起,基于实例的机器翻译(EBMT)得到越来越多的研究。如何快速准确地构建大规模古今汉语平行语料库,以及从大量的对齐实例(句子级)中检索和输入句子最相似的源句子是基于实例的古今汉语机器翻译必须解决的问题。本文综合考虑句子长度、汉字字形、标点符号三个因素提出了古今汉语句子互译模型,基于遗传算法、动态规划算法实现了古今汉语的自动句对齐。接着为古文句子建立全文索引,基于汉字的信息熵,本文设计与实现一种高效的最相似古文句子检索算法。最后给出了自动句对齐和最相似古文句子检索的实验结果。

关 键 词:计算机应用  中文信息处理  古今汉语平行语料库  句子对齐  相似句子  基于实例的机器翻译  
文章编号:1003-0077(2008)02-0087-05
收稿时间:2007-04-10
修稿时间:2007-12-27

Ancient Sentence Search Based on Sentence Auto-Alignment in Parallel Corpus of Ancient and Modern Chinese
GUO Rui,SONG Ji-hua,LIAO Min.Ancient Sentence Search Based on Sentence Auto-Alignment in Parallel Corpus of Ancient and Modern Chinese[J].Journal of Chinese Information Processing,2008,22(2):87-91,105.
Authors:GUO Rui  SONG Ji-hua  LIAO Min
Affiliation:Information Science and Technology College, Beijing Normal University, Beijing 100875,China
Abstract:Along with the Corpus Linguistics' prosperity and development,the research on Example Based Machine Translation(EBMT) has a flourishing prospect.In this area,two problems must be solved: 1) Constructing a large-scale parallel corpus with high accuracy and speed.2) Searching the most similar sentence with the input sentence from the huge aligned examples.This paper aimed at EBMT between ancient and modern Chinese.First,a new translation model was built which takes the length of the sentence,character information and punctuation into account at the same time.Then,a new approach for aligning bilingual sentences automatically was proposed based on genetic algorithm and Dynamic Programming.Finally,a new similarity method was given based on Chinese characters' information entropy.Experimental results showed that our methods achieved good performance.
Keywords:computer application  Chinese information processing  parallel corpus of ancient and modern Chinese  sentence alignment  similar sentence  EBMT
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号