首页 | 官方网站   微博 | 高级检索  
     

基于seeds集和频繁项集挖掘的半监督聚类算法
引用本文:赵倩,尚学群,王淼.基于seeds集和频繁项集挖掘的半监督聚类算法[J].计算机工程与应用,2010,46(8):123-126.
作者姓名:赵倩  尚学群  王淼
作者单位:西北工业大学,计算机学院,西安,710072
基金项目:陕西省自然科学基金Grant No.2007F27;;西北工业大学研究生创新实验室项目(No.07046)~~
摘    要:半监督聚类在无监督学习中通过对少量监督信息的有效利用提高聚类性能。提出一种基于seeds集的半监督聚类算法,它采用Apiori算法对初始seeds集和扩大规模后seeds集的数据进行频繁项集挖掘,使得数据中存在的噪音数据和误标记数据得到净化、修正,以改善seeds集质量,提高聚类性能。该算法使用带权χ2测试这一数学模型作为分类规则度量指标,以对无标记数据进行类标签值预测。实验结果显示,所提出的结合了频繁项集挖掘和带权χ2测试的基于seeds集的半监督聚类算法不仅改善了seeds集质量,也提高了预测结果的精确度,优化了聚类性能。

关 键 词:半监督聚类  频繁项集挖掘  带权χ2测试  seeds集
收稿时间:2008-9-18
修稿时间:2008-12-4  

Semi_supervised clustering algorithm based on seeds set and frequent itemset mining
ZHAO Qian,SHANG Xue-qun,WANG Miao.Semi_supervised clustering algorithm based on seeds set and frequent itemset mining[J].Computer Engineering and Applications,2010,46(8):123-126.
Authors:ZHAO Qian  SHANG Xue-qun  WANG Miao
Affiliation:ZHAO Qian,SHANG Xue-qun,WANG MiaoSchool of Computer,Northwestern Polytechnical University,Xi'an 710072,China
Abstract:Semi_supervised clustering makes use of few supervised information in unsupervised clustering to boost the clustering performance.This paper proposes a semi_supervised clustering algorithm based on seeds set and frequent itemset mining,which mines frequent itemsets in the beginning seeds set and the enlarged seeds set for eliminating the noise data and correcting the mislabeled data to improve the quality of seeds set and enhance the performance of clustering.A weighted X~2 measure,as a classification rule evaluation measure.is used to label unlabeled data and they are added into the initial seeds set to enlarge the scale.The experimental results show that the proposed approach effectively reduces the noise data,and not only makes the results more correct but also makes the performance of clustering more better.
Keywords:semi_supervised clustering  frequent itemset mining  weighted X~2 measure  seeds set
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号