首页 | 官方网站   微博 | 高级检索  
     

一种高维数据聚类遗传算法
引用本文:孙浩军,熊琅环.一种高维数据聚类遗传算法[J].计算机工程与科学,2010,32(8):94-97.
作者姓名:孙浩军  熊琅环
作者单位:汕头大学计算机科学系,广东,汕头,515063
基金项目:广东省自然科学基金资助项目 
摘    要:聚类分析是数据挖掘中的一个重要研究课题。在许多实际应用中,聚类分析的数据往往具有很高的维度,例如文档数据、基因微阵列等数据可以达到上千维,而在高维数据空间中,数据的分布较为稀疏。受这些因素的影响,许多对低维数据有效的经典聚类算法对高维数据聚类常常失效。针对这类问题,本文提出了一种基于遗传算法的高维数据聚类新方法。该方法利用遗传算法的全局搜索能力对特征空间进行搜索,以找出有效的聚类特征子空间。同时,为了考察特征维在子空间聚类中的特征,本文设计出一种基于特征维对子空间聚类贡献率的适应度函数。人工数据、真实数据的实验结果以及与k-means算法的对比实验证明了该方法的可行性和有效性。

关 键 词:高维数据聚类  遗传算法  特征子空间
收稿时间:2009-03-31
修稿时间:2009-10-21

A Genetic Algorithm for High-Dimensional Data Clustering
SUN Hao-jun,XIONG Lang-huan.A Genetic Algorithm for High-Dimensional Data Clustering[J].Computer Engineering & Science,2010,32(8):94-97.
Authors:SUN Hao-jun  XIONG Lang-huan
Affiliation:(Department of Computer Science,Shantou University,Shantou 515063,China)
Abstract:Clustering analysis is an important subject in data mining. In many real applications, the clustering data are usually high dimensional. For example, the document data and DNA microarray data generally have several hundreds or even a thousand dimensions. While in high dimensional space, the distributions of the data are usually sparse; it makes most of those traditional clustering algorithms which work well on low dimensional data invalid for high dimensional data. To solve such a problem, a new high dimensional data clustering approach based on genetic algorithms is proposed in this paper. The search capability of genetic algorithms is exploited to find the effective feature subspaces for clustering. In order to study the characteristics of dimensions shown in clustering, the degree of features which contribute to subspace clustering is designed as fitness function in this paper. The experimental results on the artificial data set, real life data set and the comparison experiment with the k means algorithm indicate the feasibility and efficiency of the proposed approach.
Keywords:high dimensional data clustering  genetic algorithm  feature subspace
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号