首页 | 官方网站   微博 | 高级检索  
     

基于概念层次的英文文本自动分类研究
引用本文:厉宇航,罗振声,程慕胜.基于概念层次的英文文本自动分类研究[J].计算机工程与应用,2004,40(11):75-77.
作者姓名:厉宇航  罗振声  程慕胜
作者单位:清华大学人文学院计算语言学研究室,北京,100084
摘    要:该文意在设计并且实现一个针对英文文本的自动归类以及检索系统,重点在于提高分类方法的准确率。自动文本分类系统中,一般来说文本内容是以N维特征空间的形式存储的,所以特征提取的方法和准确率极大地影响到分类结果的正确率。传统方法是基于词形的,并不考察词语的意义,忽略了同一意义下词形的多样性、不确定性以及词义之间的关系,尤其是上下位关系。该文提出的方法,在向量空间模型(VSM)的基础上,以“概念”为基础,同时考虑词义的上位关系,使得训练过程中可以从词语中提炼出更加概括性的信息,从而达到提高分类精度的目的。

关 键 词:自动文本分类  概念层次  VSM  WordNet
文章编号:1002-8331-(2004)11-0075-03

Research on Automatic Text Classification Methods Based on Concept Hierarchies
Li Yuhang Luo Zhensheng Cheng Musheng.Research on Automatic Text Classification Methods Based on Concept Hierarchies[J].Computer Engineering and Applications,2004,40(11):75-77.
Authors:Li Yuhang Luo Zhensheng Cheng Musheng
Abstract:This paper aims at designing and implementing an automatic classification and retrieval system for English documents,focusing on improving the result of the classification algorithm.The documents in an automatic text classification sys tem are represented by feature vectors,and the overall performance is dependent on the algorithm and its accuracy of feature selection.Conventional word-fo rm based automatic classification systems ignore all semantic information of th e words,so the diversity and indeterminacy of word-forms will harm the result .This paper proposes a new feature extraction algorithm,which is based on the Vector Space Model,and uses concepts as features,giving further consideration to the concepts' inter-phrase relativity,especially the hypernymy.The algori thm enables the extraction of more abstract concepts of a text,and thus improve s the classification result.
Keywords:Automatic text classification  Concepts hierarchy  VSM  WordNet
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号