首页 | 官方网站   微博 | 高级检索  
     

基于混合策略的英语基本名词短语识别——边界统计和词性串规则校正相结合的策略
引用本文:梁颖红,赵铁军,姚建民,于浩,徐冰.基于混合策略的英语基本名词短语识别——边界统计和词性串规则校正相结合的策略[J].计算机工程与应用,2004,40(35):1-3,121.
作者姓名:梁颖红  赵铁军  姚建民  于浩  徐冰
作者单位:1. 东北林业大学信息与计算机工程学院,哈尔滨,150001;哈尔滨工业大学计算机科学与技术学院,哈尔滨,150001
2. 哈尔滨工业大学计算机科学与技术学院,哈尔滨,150001
基金项目:国家自然科学基金(编号:60302021,60375019);国家863高技术研究发展计划项目(子课题)(编号:2002AA117010-09);科技部政府间国际合作项目(编号:CI-2003-03)资助
摘    要:基本名词短语识别是自然语言处理领域非常重要的子任务。文中总结了一些有代表性的基本名词短语识别方法,并对多种典型英语基本名词短语识别的结果进行了比较和对照,提出并实现了边界统计和词性串校正相结合的英语基本名词短语识别方法。该方法把基本名词短语识别分成主次分明的两部分,边界统计作为主要部分能够正确识别出大部分基本名词短语,词性串规则作为辅助手段在对前者识别出的基本名词短语进行核对和校正的同时还对边界统计方法遗漏的基本名词短语进行再回收。此方法中,词性串规则弥补了边界统计无法顾及基本名词短语内部组合规律的缺点,提高了精确率和召回率。采用此方法,基本名词短语识别的精确率达到96.22%,召回率97.59%,Fβ=196.90%,F值超出了目前报道的最好结果。

关 键 词:基本名词短语  语块  边界统计  词性串规则
文章编号:1002-8331-(2004)35-0001-03

English Base Noun Phrase Identification Based on Hybrid Strategy--The Strategy of Combination of Boundary Statistic and the Amendment of the String of Part of Speech
Liang Yinghong , Zhao Tiejun Yao Jianmin Yu Hao Xu Bing.English Base Noun Phrase Identification Based on Hybrid Strategy--The Strategy of Combination of Boundary Statistic and the Amendment of the String of Part of Speech[J].Computer Engineering and Applications,2004,40(35):1-3,121.
Authors:Liang Yinghong  Zhao Tiejun Yao Jianmin Yu Hao Xu Bing
Affiliation:Liang Yinghong 1,2 Zhao Tiejun 2 Yao Jianmin 2 Yu Hao 2 Xu Bing 21
Abstract:Base noun phrase identification is an important sub -task in natural language processing.Representative methods of base noun phrase identification are summarized in this paper,whose results are compared and analyzed.A novel method of base noun phrase identification is proposed which combines boundary statistic and the amendment by the string of part of speech.The method divides the base noun phrase identification task into two parts.As the primary part,boundary statistic method can correctly identify most of the base noun phrases.The rules serve as the secondary part,which is composed of a string of part of speech tags.The rules make amendments to the base noun phrase identified by the primary part,at the same time recycle the base noun phrases which are neglected by the primary part,thus enhancing both the precision and recall.The secondary part of the method remedies the primary part by taking into account the interior constitution of base noun phrase.The method reaches a precision of96.22%and recall of97.59%in English base noun phrase identification,whose F β=1 reaches96.90%.Compared to other method the method achieves the highest F score.
Keywords:base noun phrase  chunk  boundary statistic  bunches of part of speech
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号