首页 | 官方网站   微博 | 高级检索  
     

基于主动数据选取的半监督聚类算法
引用本文:文 平,冷明伟,陈晓云.基于主动数据选取的半监督聚类算法[J].计算机应用研究,2012,29(8):2841-2844.
作者姓名:文 平  冷明伟  陈晓云
作者单位:1. 兰州大学信息科学与工程学院,兰州,730000
2. 兰州大学信息科学与工程学院,兰州730000;上饶师范学院数学与计算机学院,江西上饶334001
基金项目:江西省教育厅科技课题资助项目(GJJ11609)
摘    要:利用少量标签数据获得较高聚类精度的半监督聚类技术是近年来数据挖掘和机器学习领域的研究热点。但是现有的半监督聚类算法在处理极少量标签数据和多密度不平衡数据集时的聚类精度比较低。基于主动学习技术研究标签数据选取,提出了一个新的半监督聚类算法。该算法结合最小生成树聚类和主动学习思想,选取包含信息较多的数据点作为标签数据,使用类KNN思想对类标签进行传播。通过在UCI标准数据集和模拟数据集上的测试,结果表明提出的算法比其他算法在处理多密度、不平衡数据集时有更高精度且稳定的聚类结果。

关 键 词:数据挖掘  半监督聚类  主动学习  标签数据  数据选取  最小生成树  多密度数据集  不平衡数据集

Novel semi-supervised clustering algorithm based on active data selection
WEN Ping,LENG Ming-wei,CHEN Xiao-yun.Novel semi-supervised clustering algorithm based on active data selection[J].Application Research of Computers,2012,29(8):2841-2844.
Authors:WEN Ping  LENG Ming-wei  CHEN Xiao-yun
Affiliation:1. School of Information Science & Engineering, Lanzhou University, Lanzhou 730000, China; 2. School of Mathematics & Computer Science, Shangrao Normal University, Shangrao Jiangxi 334001, China
Abstract:Semi-supervised clustering, which aims to significantly improve the clustering results using limited supervision, has inevitably been the research focus in data mining and machine learning in recent years. But the accuracy of existing semi-clustering algorithms is low when dealing with the datasets with little labeled data or the multi-density and unbalanced datasets. Based on the active learning, this paper studied the data selection and presented a novel semi-supervised clustering algorithm. It selected information-rich data as labeled data by combining the ideas of minimum spanning tree clustering and active lear-ning, and then used the KNN-like technology to propagate labels. Evaluating on several UCI standard datasets and synthetic datasets, the results show that the proposed method has manifest higher accuracy and stable performance in comparison with others, even when the datasets are multi-density and unbalanced.
Keywords:data mining  semi-supervised clustering  active learning  labeled data  data selection  minimum spanning tree  multi-density dataset  unbalanced dataset
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号