首页 | 官方网站   微博 | 高级检索  
     


Chinese Term Extraction Based on PAT Tree
Authors:ZHANG Feng  FAN Xiao-zhong and XU Yun
Affiliation:School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
Abstract:A new method of automatic Chinese term extraction is proposed based on Patricia (PAT) tree. Mutual information is calculated based on prefix searching in PAT tree of domain corpus to estimate the internal associative strength between Chinese characters in a string. It can improve the speed of term candidate extraction largely compared with methods based on domain corpus directly. Common collocation suffix, prefix bank are constructed and term part of speech (POS) composing rules are summarized to improve the precision of term extraction. Experiment results show that the F-measure is 74.97%.
Keywords:term extraction  PAT tree  mutual information  corpus
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《北京理工大学学报(英文版)》浏览原始摘要信息
点击此处可从《北京理工大学学报(英文版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号