基于规则和统计的日语分词和词性标注的研究 Study on Japanese Word Segmentation and POS Tagging Based on Rules and Statistics期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于规则和统计的日语分词和词性标注的研究

引用本文：	姜尚仆,陈群秀.基于规则和统计的日语分词和词性标注的研究[J].中文信息学报,2010,24(1):117-123.

作者姓名：	姜尚仆陈群秀

作者单位：	1. 清华大学信息科学与技术国家实验室,北京 100084; 2. 清华大学计算机科学与技术系,北京 100084

基金项目：	国家863计划重点资助项目(2006AA010109)

摘要：	日语分词和词性标注是以日语为源语言的机器翻译等自然语言处理工作的第一步。该文提出了一种基于规则和统计的日语分词和词性标注方法,使用基于单一感知器的联合分词和词性标注算法作为基本框架,在其中加入了基于规则的词语的邻接属性作为特征。在小规模测试集上的实验结果表明,这种方法分词的F值达到了98.2%,分词加词性标注的F值达到了94.8%。该文所采用的方法已经成功应用到日汉机器翻译系统中。
关键词：	人工智能机器翻译日汉机器翻译系统日语分词日语词性标注联合分词
Study on Japanese Word Segmentation and POS Tagging Based on Rules and Statistics

JIANG Shangpu,CHEN Qunxiu.Study on Japanese Word Segmentation and POS Tagging Based on Rules and Statistics[J].Journal of Chinese Information Processing,2010,24(1):117-123.

Authors:	JIANG Shangpu CHEN Qunxiu

Affiliation:	1. National Laboratory for information Science and Technology, Tsinghua University, Beijing 100084, China; 2. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

Abstract:	Word segmentation and part-of-speech tagging is the first step of Japanese natural language processing tasks,such as machine translation in which Japanese is the source language.In this paper,a Japanese word segmentation and POS tagging approach based on rules and statistics is proposed.Adopting a single perceptron based joint word segmentation and POS tagging algorithm as the basic framework,this method is combined with the features of adjacency attributes which are derived by heuristic rules.The experimen...

Keywords:	artificial intelligence machine translation Japanese-Chinese machine translation system Japanese word segmentation Japanese POS tagging joint word segmentation
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏