首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 686 毫秒
1.
在众多聚类算法中,谱聚类作为一种代表性的图聚类算法,由于其对复杂数据分布的适应性强、聚类效果好等优点而受到人们的广泛关注.然而,由于其高计算时间复杂度难以应用于处理大规模数据.为提高谱聚类算法在大规模数据集上的可用性,提出关键节点选择的快速图聚类算法.该算法包含三个重要步骤:第一,提出一种充分考虑抱团性和分离性的快速节点重要性评价方法;第二,选择关键节点代替原数据集构建二分图,通过奇异值分解获得数据的近似特征向量;第三,集成多次的近似特征向量,提高近似谱聚类结果的鲁棒性.该算法将时间复杂度由谱聚类原有的O(n3)降低到O(t(n+2n2)),增强了其在大规模数据集上的可用性.通过该算法与其他七个具有代表性的谱聚类算法在五个Benchmark数据集上进行的实验分析,比较结果展示了该算法相比其他算法能够更加高效地识别数据中的复杂类结构.  相似文献   

2.
针对网络故障检测中利用先验知识不足和多数谱聚类算法需事先确定聚类数的问题,提出一种新的基于成对约束信息传播与自动确定聚类数相结合的半监督自动谱聚类算法。通过学习一种新的相似性测度函数来满足约束条件,改进NJW聚类算法,对非规范化的Laplacian矩阵特征向量进行自动谱聚类,从而提高聚类性能。在UCI标准数据集和网络实测数据上的实验表明,该算法较相关比对算法聚类准确率更高,可满足网络故障检测的实际需要。  相似文献   

3.
针对大数据环境下高维数据聚类速度慢、准确率低的问题,提出了一种面向大数据的快速自动聚类算法(FACABD)。FACABD聚类算法利用谱聚类算法对大数据集进行归一化和列降维,提出了一种新的快速区域进化的粒子群算法(FRE-PSO),并利用该算法进行行降维;然后在降维处理后的数据基础上,引入聚类模糊隶属度基数,自动发现簇的数目,根据类簇数目,采用FRE-PSO算法结合模糊聚类算法快速完成自动聚类。在人工生成数据集和UCI机器学习数据集上的实验结果表明,该算法能够在数据驱动下快速自动聚类,有效地提高了运行速度和精度。  相似文献   

4.
近邻传播聚类(AP)方法是近年来出现的一种广受关注的聚类方法,在处理多类、大规模数据集时,能够在较短的时间得到较理想的结果,因此与传统方法相比具有很大的优势。但是对于一些聚类结构复杂的数据集,往往不能得到很好的聚类结果。通过分析数据的聚类特性,设计了一种可以根据数据结构自动调整参数的核函数,数据集在其映射得到的核空间中线性可分或几乎线性可分,对该核空间中的数据集进行近邻传播聚类,有效提高了AP聚类的精确度和速度。算法有效性分析以及仿真实验验证了所提算法在处理大规模复杂结构数据集上的性能优于原始AP算法。  相似文献   

5.
针对传统谱聚类算法在处理大规模数据集时,聚类精度低并且存在相似度矩阵存储开销大和拉普拉斯矩阵特征分解计算复杂度高的问题。提出了一种加权PageRank改进地标表示的自编码谱聚类算法,首先选取数据亲和图中权重最高的节点作为地标点,以选定的地标点与其他数据点之间的相似关系来逼近相似度矩阵作为叠加自动编码器的输入。然后利用聚类损失同时更新自动编码器和聚类中心的参数,从而实现可扩展和精确的聚类。实验表明,在几种典型的数据集上,所提算法与地标点谱聚类算法和深度谱聚类算法相比具有更好的聚类性能。  相似文献   

6.
《计算机科学与探索》2016,(11):1614-1622
密度峰聚类是一种新的基于密度的聚类算法,该算法不需要预先指定聚类数目,能够发现非球形簇。针对密度峰聚类算法需要人工确定聚类中心的缺陷,提出了一种自动确定聚类中心的密度峰聚类算法。首先,计算每个数据点的局部密度和该点到具有更高密度数据点的最短距离;其次,根据排序图自动确定聚类中心;最后,将剩下的每个数据点分配到比其密度更高且距其最近的数据点所属的类别,并根据边界密度识别噪声点,得到聚类结果。将新算法与原密度峰算法进行对比,在人工数据集和UCI数据集上的实验表明,新算法不仅能够自动确定聚类中心,而且具有更高的准确率。  相似文献   

7.
聚类分组数的自动确定是谱聚类算法中一个亟待解决的问题.针对谱聚类算法聚类分组数的获取问题,提出一种基于人工免疫的自适应谱聚类算法.该算法通过模拟抗体的克隆选择机制和免疫系统的初次免疫应答、二次免疫应答机制,实现了数据样本聚类分组数的自动调整,解决了聚类算法需要人工输入聚类分组数的弊端.并分别在线性模拟数据、非凸模拟数据和UCI数据集上验证了算法的可行性、算法在非凸数据集上的优势以及算法的有效性.实验结果表明该算法可以自动获取正确的聚类分组数,提高聚类效果,减少达到全局最优解时的迭代次数,具有较高的稳定性.  相似文献   

8.
解决文本聚类集成问题的两个谱算法   总被引:8,自引:0,他引:8  
徐森  卢志茂  顾国昌 《自动化学报》2009,35(7):997-1002
聚类集成中的关键问题是如何根据不同的聚类器组合为最终的更好的聚类结果. 本文引入谱聚类思想解决文本聚类集成问题, 然而谱聚类算法需要计算大规模矩阵的特征值分解问题来获得文本的低维嵌入, 并用于后续聚类. 本文首先提出了一个集成算法, 该算法使用代数变换将大规模矩阵的特征值分解问题转化为等价的奇异值分解问题, 并继续转化为规模更小的特征值分解问题; 然后进一步研究了谱聚类算法的特性, 提出了另一个集成算法, 该算法通过求解超边的低维嵌入, 间接得到文本的低维嵌入. 在TREC和Reuters文本数据集上的实验结果表明, 本文提出的两个谱聚类算法比其他基于图划分的集成算法鲁棒, 是解决文本聚类集成问题行之有效的方法.  相似文献   

9.
谱聚类将数据聚类问题转化成图划分问题,通过寻找最优的子图,对数据点进行聚类。谱聚类的关键是构造合适的相似矩阵,将数据集的内在结构真实地描述出来。针对传统的谱聚类算法采用高斯核函数来构造相似矩阵时对尺度参数的选择很敏感,而且在聚类阶段需要随机确定初始的聚类中心,聚类性能也不稳定等问题,本文提出了基于消息传递的谱聚类算法。该算法采用密度自适应的相似性度量方法,可以更好地描述数据点之间的关系,然后利用近邻传播(Affinity propagation,AP)聚类中“消息传递”机制获得高质量的聚类中心,提高了谱聚类算法的性能。实验表明,新算法可以有效地处理多尺度数据集的聚类问题,其聚类性能非常稳定,聚类质量也优于传统的谱聚类算法和k-means算法。  相似文献   

10.
聚类集成中的关键问题是如何根据不同的聚类成员组合为更好的聚类结果.引入谱聚类算法解决该问题,提出了基于相似度矩阵的谱算法(SMSA),但该算法高昂的计算代价使其不适合大规模文本集.进一步研究了谱聚类算法的特性,对超边的相似度矩阵进行谱分析.提出了基于超边相似度矩阵的元聚类算法(HSM-MCLA).真实文本数据集的实验结果表明:SMSA和HSM-MCLA比其他基于图划分的集成算法更优越;HSM-MCLA可获得与SMSA相当的结果,而计算需求却明显低于SMSA.  相似文献   

11.
用于文本聚类的模糊谱聚类算法   总被引:1,自引:0,他引:1       下载免费PDF全文
谱聚类方法的应用已经开始从图像分割领域扩展到文本挖掘领域中,并取得了一定的成果。在自动确定聚类数目的基础上,结合模糊理论与谱聚类算法,提出了一种应用在多文本聚类中的模糊聚类算法,该算法主要描述了如何实现单个文本同时属于多个文本类的模糊谱聚类方法。实验仿真结果表明该算法具有很好的聚类效果。  相似文献   

12.
While spectral clustering can produce high-quality clusterings on small data sets, computational cost makes it infeasible for large data sets. Affinity Propagation (AP) has a limitation that it is hard to determine the value of parameter ‘preference’ which can lead to an optimal clustering solution. These problems limit the scope of application of the two methods. In this paper, we develop a novel fast two-stage spectral clustering framework with local and global consistency. Under this framework, we propose a Fast density-Weighted low-rank Approximation Spectral Clustering (FWASC) algorithm to address the above issues. The proposed algorithm is a high-quality graph partitioning method, and simultaneously considers both the local and global structure information contained in the data sets. Specifically, we first present a new Fast Two-Stage AP (FTSAP) algorithm to coarsen the input sparse graph and produce a small number of final representative exemplars, which is a simple and efficient sampling scheme. Then we present a density-weighted low-rank approximation spectral clustering algorithm to operate those representative exemplars on the global underlying structure of data manifold. Experimental results show that our algorithm outperforms the state-of-the-art spectral clustering and original AP algorithms in terms of speed, memory usage, and quality.  相似文献   

13.
The task of discovering natural groupings of input patterns, or clustering, is an important aspect of machine learning and pattern analysis. In this paper, we study the widely used spectral clustering algorithm which clusters data using eigenvectors of a similarity/affinity matrix derived from a data set. In particular, we aim to solve two critical issues in spectral clustering: (1) how to automatically determine the number of clusters, and (2) how to perform effective clustering given noisy and sparse data. An analysis of the characteristics of eigenspace is carried out which shows that (a) not every eigenvectors of a data affinity matrix is informative and relevant for clustering; (b) eigenvector selection is critical because using uninformative/irrelevant eigenvectors could lead to poor clustering results; and (c) the corresponding eigenvalues cannot be used for relevant eigenvector selection given a realistic data set. Motivated by the analysis, a novel spectral clustering algorithm is proposed which differs from previous approaches in that only informative/relevant eigenvectors are employed for determining the number of clusters and performing clustering. The key element of the proposed algorithm is a simple but effective relevance learning method which measures the relevance of an eigenvector according to how well it can separate the data set into different clusters. Our algorithm was evaluated using synthetic data sets as well as real-world data sets generated from two challenging visual learning problems. The results demonstrated that our algorithm is able to estimate the cluster number correctly and reveal natural grouping of the input data/patterns even given sparse and noisy data.  相似文献   

14.
Aggregation pheromone density based data clustering   总被引:1,自引:0,他引:1  
Ants, bees and other social insects deposit pheromone (a type of chemical) in order to communicate between the members of their community. Pheromone, that causes clumping or clustering behavior in a species and brings individuals into a closer proximity, is called aggregation pheromone. This article presents a new algorithm (called, APC) for clustering data sets based on this property of aggregation pheromone found in ants. An ant is placed at each location of a data point, and the ants are allowed to move in the search space to find points with higher pheromone density. The movement of an ant is governed by the amount of pheromone deposited at different points of the search space. More the deposited pheromone, more is the aggregation of ants. This leads to the formation of homogenous groups of data. The proposed algorithm is evaluated on a number of well-known benchmark data sets using different cluster validity measures. Results are compared with those obtained using two popular standard clustering techniques namely average linkage agglomerative and k-means clustering algorithm and with an ant-based method called adaptive time-dependent transporter ants for clustering (ATTA-C). Experimental results justify the potentiality of the proposed APC algorithm both in terms of the solution (clustering) quality as well as execution time compared to other algorithms for a large number of data sets.  相似文献   

15.
Iterative refinement clustering algorithms are widely used in data mining area, but they are sensitive to the initialization. In the past decades, many modified initialization methods have been proposed to reduce the influence of initialization sensitivity problem. The essence of iterative refinement clustering algorithms is the local search method. The big numbers of the local minimum points which are embedded in the search space make the local search problem hard and sensitive to the initialization. The smaller number of local minimum points, the more robust of initialization for a local search algorithm is. In this paper, we propose a Top–Down Clustering algorithm with Smoothing Search Space (TDCS3) to reduce the influence of initialization. The main steps of TDCS3 are to: (1) dynamically reconstruct a series of smoothed search spaces into a hierarchical structure by ‘filling’ the local minimum points; (2) at the top level of the hierarchical structure, an existing iterative refinement clustering algorithm is run with random initialization to generate the clustering result; (3) eventually from the second level to the bottom level of the hierarchical structure, the same clustering algorithm is run with the initialization derived from the previous clustering result. Experiment results on 3 synthetic and 10 real world data sets have shown that TDCS3 has significant effects on finding better, robust clustering result and reducing the impact of initialization.  相似文献   

16.
多尺度的谱聚类算法   总被引:1,自引:1,他引:0       下载免费PDF全文
提出了一种多尺度的谱聚类算法。与传统谱聚类算法不同,多尺度谱聚类算法用改进的k-means算法对未经规范的Laplacian矩阵的特征向量进行聚类。与传统k-means算法不同,改进的k-means算法提出一种新颖的划分数据点到聚类中心的方法,通过比较聚类中心与原点的距离和引入尺度参数来计算数据点与聚类中心的距离。实验表明,改进算法在人工数据集上取得令人满意的结果,在真实数据集上聚类结果较优。  相似文献   

17.
A self-organizing network for hyperellipsoidal clustering (HEC)   总被引:3,自引:0,他引:3  
We propose a self-organizing network for hyperellipsoidal clustering (HEC). It consists of two layers. The first employs a number of principal component analysis subnetworks to estimate the hyperellipsoidal shapes of currently formed clusters. The second performs competitive learning using the cluster shape information from the first. The network performs partitional clustering using the proposed regularized Mahalanobis distance, which was designed to deal with the problems in estimating the Mahalanobis distance when the number of patterns in a cluster is less than or not considerably larger than the dimensionality of the feature space during clustering. This distance also achieves a tradeoff between hyperspherical and hyperellipsoidal cluster shapes so as to prevent the HEC network from producing unusually large or small clusters. The significance level of the Kolmogorov-Smirnov test on the distribution of the Mahalanobis distances of patterns in a cluster to the cluster center under the Gaussian cluster assumption is used as a compactness measure. The HEC network has been tested on a number of artificial data sets and real data sets, We also apply the HEC network to texture segmentation problems. Experiments show that the HEC network leads to a significant improvement in the clustering results over the K-means algorithm with Euclidean distance. Our results on real data sets also indicate that hyperellipsoidal shaped clusters are often encountered in practice.  相似文献   

18.
In the last few years, hypergraph-based methods have gained considerable attention in the resolution of real-world clustering problems, since such a mode of representation can handle higher-order relationships between elements compared to the standard graph theory. The most popular and promising approach to hypergraph clustering arises from concepts in spectral hypergraph theory [53], and clustering is configured as a hypergraph cut problem where an appropriate objective function has to be optimized. The spectral relaxation of this optimization problem allows to get a clustering that is close to the optimum, but this approach generally suffers from its high computational demands, especially in real-world problems where the size of the data involved in their resolution becomes too large. A natural way to overcome this limitation is to operate a reduction of the hypergraph, where spectral clustering should be applied over a hypergraph of smaller size. In this paper, we introduce two novel hypergraph reduction algorithms that are able to maintain the hypergraph structure as accurate as possible. These algorithms allowed us to design a new approach devoted to hypergraph clustering, based on the multilevel paradigm that operates in three steps: (i) hypergraph reduction; (ii) initial spectral clustering of the reduced hypergraph and (iii) clustering refinement. The accuracy of our hypergraph clustering framework has been demonstrated by extensive experiments with comparison to other hypergraph clustering algorithms, and have been successfully applied to image segmentation, for which an appropriate hypergraph-based model have been designed. The low running times displayed by our algorithm also demonstrates that the latter, unlike the standard spectral clustering approach, can handle datasets of considerable size.  相似文献   

19.
构建视觉词典是BOVW模型中关键的一个步骤,目前大多数视觉词典是基于K-means聚类方式构建。然而由于K-means聚类的局限性以及样本空间结构的复杂性与高维性,这种方式构建的视觉词典往往区分性能较差。在谱聚类的框架下,提出一种区分性能更强的视觉词典学习算法,为了减少特征在量化过程中区分性能的降低以及谱聚类固有的存储计算问题,算法根据训练样本的类别标签对训练数据进行划分,基于Nystrom谱聚类得到各子样本数据集的中心并得到最终的视觉词典。在Scene-15数据集上的实验结果验证了算法的正确性和有效性。特别当训练样本有限时,采用该算法生成的视觉词典性能较优。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号