基于特征聚合与最大熵的文本分类算法 TEXT CLASSIFICATION BASED ON MAXIMUM ENTROPY AND FEATURE AGGREGATION期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于特征聚合与最大熵的文本分类算法

引用本文：	陈光,刘宗田.基于特征聚合与最大熵的文本分类算法[J].计算机应用与软件,2008,25(3):263-264,277.

作者姓名：	陈光刘宗田

作者单位：	上海大学计算机工程与科学学院,上海,200072

摘要：	网络信息浩如烟海又纷繁芜杂,从中掌握最有效的信息是信息处理的一大目标,而文本分类是组织和管理数据的有力手段.由于最大熵模型可以综合观察到的各种相关或不相关的概率知识,具有对许多问题的处理都可以达到较好的结果的优势,将最大熵模型引入到中文文本分类的研究中,并通过使用一种特征聚合的算法改进特征选择的有效性.实验表明与Bayes、KNN和SVM这三种性能优越的算法相比,基于最大熵的文本分类算法可取得较之更优的分类精度.
关键词：	文本分类最大熵模型特征选取
收稿时间：	2006-11-15
修稿时间：	2006年11月15
TEXT CLASSIFICATION BASED ON MAXIMUM ENTROPY AND FEATURE AGGREGATION

Chen Guang,Liu Zongtian.TEXT CLASSIFICATION BASED ON MAXIMUM ENTROPY AND FEATURE AGGREGATION[J].Computer Applications and Software,2008,25(3):263-264,277.

Authors:	Chen Guang Liu Zongtian

Affiliation:	Chen Guang Liu Zongtian(School of Computer Engineering , Science,Shanghai University,Shanghai 200072,China)

Abstract:	The Internet has become the main source for people to get various information. Text classification has become the key technology in document data organization and processing. Maximum Entropy Model, a probability estimation technique widely used for a variety of natural language tasks, is used for text classification. A feature aggregation algorithm is used to select efficient feature. The experimental results show that compared with Bayes, KNN and SVM, the proposed text classification algorithm achieves better performance.

Keywords:	Text classification Maximum entropy model Feature selection
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏