首页 | 官方网站   微博 | 高级检索  
     

采用改进重采样和BRF方法的定义抽取研究
引用本文:潘湑,顾宏斌.采用改进重采样和BRF方法的定义抽取研究[J].中文信息学报,2011,25(3):30-38.
作者姓名:潘湑  顾宏斌
作者单位:南京航空航天大学 民航学院,江苏 南京 210016
摘    要:为了从专业领域语料中发现并获取所有的专业术语定义,该文提出了使用分类方法进行专业术语定义抽取的方法。该文采用一种基于实例距离分布信息的过采样方法,将其与随机欠采样方法结合用以建立平衡训练语料,并使用BRF(Balanced Random Forest)方法来获得C4.5决策树的聚合分类结果。该方法获得了最好65%的F1-measure成绩和78%的F2-measure成绩,超过了仅使用BRF方法取得的成绩。

关 键 词:自然语言处理  术语定义  定义抽取  文本分类  重采样  

Definition Extraction with Improving Re-Sampling and BRF
PAN Xu,GU Hongbin.Definition Extraction with Improving Re-Sampling and BRF[J].Journal of Chinese Information Processing,2011,25(3):30-38.
Authors:PAN Xu  GU Hongbin
Affiliation:College of Civil Aviation, Nanjing University of Aeronautics & Astronautics, Nanjing, Jiangsu 210016, China
Abstract:In this paper, we introduce a classification method to identify definitions of all terms from an aviation domain corpus. This method proposes a novel approach to over-sampling minority instance using distance distribution information, which is further combined bythe random under-sampling majority instance to construct a balanced training set. It adopts the balance random forest (BRF) to build the final aggregating classifier of C4.5 decision tree. This method achieves the best score with 65% in F1-measure and 78% in F2-measure, out-performing baseline of BRF method.
Key wordsnature language process; term definition; definition extraction; text categorization; re-sampling
Keywords:nature language process  term definition  definition extraction  text categorization  re-sampling  
本文献已被 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号