A fuzzy method to learn text classifier from labeled and unlabeled examples期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

A fuzzy method to learn text classifier from labeled and unlabeled examples

Authors:	LIU Hong HUANG Shang-teng

Affiliation:	Dept.of Computer Science,Shanghai Jiaotong University,Shanghai 200030,China;Dept.of Computer Science,Shanghai Jiaotong University,Shanghai 200030,China

Abstract:	In text classification, labeling documents is a tedious and costly task, as it would consume a lot of expert time. On the other hand, it usually is easier to obtain a lot of unlabeled documents, with the help of some tools like Digital Library, Crawler Programs, and Searching Engine. To learn text classifier from labeled and unlabeled examples, a novel fuzzy method is proposed. Firstly, a Seeded Fuzzy c-means Clustering algorithm is proposed to learn fuzzy clusters from a set of labeled and unlabeled examples. Secondly, based on the resulting fuzzy clusters, some examples with high confidence are selected to construct training data set. Finally,the constructed training data set is used to train Fuzzy Support Vector Machine, and get text classifier. Empirical results on two benchmark datasets indicate that, by incorporating unlabeled examples into learning process,the method performs significantly better than FSVM trained with a small number of labeled examples only. Also, the method proposed performs at least as well as the related method-EM with Naive Bayes. One advantage of the method proposed is that it does not rely on any parametric assumptions about the data as it is usually the case with generative methods widely used in semi-supervised learning.

Keywords:	text categorization fuzzy clustering
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏