首页 | 官方网站   微博 | 高级检索  
     

基于流形正则化的文档分类算法研究
引用本文:徐海瑞,张文生,吴双. 基于流形正则化的文档分类算法研究[J]. 计算机科学, 2012, 39(3): 196-199
作者姓名:徐海瑞  张文生  吴双
作者单位:(中国科学院自动化研究所 北京100190)
摘    要:基于流形正则化框架提出一种分类算法(MI_I}RI_SC),以解决高维文档分类问题。该算法通过构建训练样本的最近部图来佑计数据空间的几何结构并将其作为流形正则化项,结合多变量线性回归获得高维文档的低维流形结构,并采用k近部分类器对低维流形进行分类,得到针对多类问题的分类器。该算法能够充分利用训练样本的类别信息来帮助学习以提取有效特征。通过在Rcutcrs 21578数据集上的实验,证明该算法的分类性能和运行速度比传统分类器有较大的提高。

关 键 词:局部鉴别嵌入,流形学习,文档分类,k近部,流形正则化

Document Classification Algorithm Based on Manifold Regularization
XU Hai-rui,ZHANG Wen-sheng,WU Shuang. Document Classification Algorithm Based on Manifold Regularization[J]. Computer Science, 2012, 39(3): 196-199
Authors:XU Hai-rui  ZHANG Wen-sheng  WU Shuang
Affiliation:(Institute oI Automation, Chinese Academy oI Sciences,Beijing 100190,China)
Abstract:A novel document classification algorithm based on manifold regularization framework, which is called MI_I}RLSC, is presented to resolve high dimensional document classification. In the proposed MLI}RLSC, a nearest neighborgraph was constructed and the intrinsic geometrical structure of the sample space was taken as a manifold regularizationterm,then it was incorporated into the objective function of the multivariate linear regression to extract lower dimen-sional space. The classification and predication in the lower dimensional feature space are implemented with kNN. Ai-ming to extract effective features for the multi-class problem, MLD-RLSC can make use of all labeled samples. Experi-mental results on Reuters 21578 dataset demonstrate that the proposed algorithm is of higher classification accuracy andfaster running speed.
Keywords:LDE   Manifold learning   Next categorization   kNN   Manifold regularization
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号