首页 | 官方网站   微博 | 高级检索  
     

基于概念向量空间的文档语义分类模型研究
引用本文:李海蓉.基于概念向量空间的文档语义分类模型研究[J].图书情报工作,2011,55(24):106-26.
作者姓名:李海蓉
作者单位:西华师范大学图书馆
摘    要:针对传统文档自动分类方法和目前语义分类方法中存在的问题,提出一种新的基于概念向量空间的文档语义分类模型,该模型通过字符匹配算法将原文档高维词向量空间中相互独立的词项匹配到描述本体概念的属性集合,进而映射成属性集合对应的本体概念,形成低维的、语义丰富的文档概念向量空间。采用目前非常流行的数据集“20Newsgroups”作为实验数据集,对基于概念向量空间的文档语义分类模型进行实验验证。实验结果表明:提出的文档语义分类方法与传统基于词向量空间的文档分类方法相比,能够极大地降低向量空间维度,提高文档分类的性能。

关 键 词:概念向量空间  文档自动分类  文档语义分类  模型  
收稿时间:2011-07-11
修稿时间:2011-09-03

Semantic Classification Model of Documents Based on Concept Vector Space
Li Hairong.Semantic Classification Model of Documents Based on Concept Vector Space[J].Library and Information Service,2011,55(24):106-26.
Authors:Li Hairong
Affiliation:Library, China West Normal University,
Abstract:For solving the existing problems in the traditional text classification methods and the current semantic classification methods,this paper proposes a new semantic classification model of documents based on concept vector space.This model utilizes character-based matching algorithm to match words in word vector space of documents with attribute sets of ontology concepts,if words are exist in attribute sets.Then it replaces words with ontology concepts corresponding to attribute sets,thus the concept vector space with the lower dimensionality and abundant semantics is formed.The paper takes the "20Newsgroups" as experimental datasets and carries out a semantic classification experiment of documents.Experimental results show that the proposed method can greatly decrease the dimensionality of vector space and improve the text classification performance.
Keywords:concept vector space automatic classification of documents semantic classification of documents model
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号