首页 | 官方网站   微博 | 高级检索  
     

基于图核算法的文本分类
引用本文:蒋强荣,宋烈金.基于图核算法的文本分类[J].计算机与现代化,2017,0(11):13-115.
作者姓名:蒋强荣  宋烈金
摘    要:在文本分类研究中,向量空间模型具有表示形式简单的特点,但只能表示特征词的词频信息而忽视了特征词间的结构信息和语义语序信息,所以可能导致不同文档被表示为相同向量。针对这种问题,本文采用图结构模型表示文本,把文本表示成一个有向图(简称文本图),可有效解决结构化信息缺失的问题。本文将图核技术应用于文本分类,提出适用于文本图之间的相似度计算的图核算法--间隔通路核,然后利用支持向量机对文本进行分类。在文本集上的实验结果表明:与向量空间模型相比,间隔通路核相比于其他核函数的分类准确率更高,所以间隔通路核是一种很好的图结构相似性计算算法,能广泛应用于文本分类中。

关 键 词:图结构  向量空间模型  间隔通路核  支持向量机  文本分类  
收稿时间:2017-11-21

Text Categorization Based on Graph Kernel
JIANG Qiang-rong,SONG Lie-jin.Text Categorization Based on Graph Kernel[J].Computer and Modernization,2017,0(11):13-115.
Authors:JIANG Qiang-rong  SONG Lie-jin
Abstract:In text classification, vector space model has the characteristic of simple representation, but only represents frequency information of feature word and ignores the structural information and semantic information of word order between words, which may lead to different documents to be represented as vectors of the same. In view of this problem, this paper uses the graph structure model to represent text, and a text is represented as a directed graph (abbreviated as text graph), which effectively solves the problem of the lack of structured information. In this paper, the graph kernel technique is applied to text classification, and a graph kernel algorithm, which is suitable for the computation of the similarity between text graphs, is proposed. Then support vector machine is used to classify the texts. The experimental results on the text set show that compared with the vector space model, the classification accuracy of interval walk kernel is better than other kernel functions, so it is a good graph structure similarity calculation algorithm and it can be widely used in text classification.
Keywords:graph structure  vector space model  gap walk kernel  support vector machine  text categorization  
点击此处可从《计算机与现代化》浏览原始摘要信息
点击此处可从《计算机与现代化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号