首页 | 官方网站   微博 | 高级检索  
     

中文文本中外国人名与中国人名同步识别方法
引用本文:高红,黄德根,杨元生.中文文本中外国人名与中国人名同步识别方法[J].小型微型计算机系统,2006,27(4):715-719.
作者姓名:高红  黄德根  杨元生
作者单位:大连理工大学,计算机科学与工程系,辽宁,大连,116024
基金项目:中国科学院资助项目;高等学校博士学科点专项科研项目
摘    要:根据中国人名和外国人名的构成特点产生潜在中国人名和外国人名,然后把它们作为节点词加入到句子的分词有向图中,利用上下文信息对有向图的边赋值.使有向图最短路径对应句子正确切分.在确定句子正确切分时识别出句子中的外国人名和中国人名,该方法可以避免由分词结果造成的人名不能被召回的现象,提高了人名识别的召回率.通过对真实语料的测试,在封闭测试中该方法对中国人名和外国人名识别的综合指标F值为97.30%.

关 键 词:汉语自动分词  人名识别  未登录词识别
文章编号:1000-1220(2006)04-0715-05
收稿时间:12 3 2004 12:00AM
修稿时间:2004-12-03

Foreign Person Names and Chinese Person Names Recognition in Chinese Texts
GAO Hong,HUANG De-gen,YANG Yuan-sheng.Foreign Person Names and Chinese Person Names Recognition in Chinese Texts[J].Mini-micro Systems,2006,27(4):715-719.
Authors:GAO Hong  HUANG De-gen  YANG Yuan-sheng
Affiliation:Department of Computer Science and Engineering, Dalian University of Technology, Dalian 116024, China
Abstract:Foreign person name (FP-name) and Chinese person name (CP-name) candidates are generated according to their inherent characteristics. Then add all candidates into the segmentation digraph of a sentence as vertices and assign a weight to each edge of the digraph with statistics derived from the training corpus. Thus the shortest path of the digraph is exactly the correct segmentation of the sentence. When select the correct segmentation of the sentence, FP-names and CP-names can be recognized. The proposed method can avoid person name errors brought up by segmentation. The experimental result shows the F value is 97.30% in close test.
Keywords:Chinese word segmentation  person name recognition  unknown words recognition
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号