首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 500 毫秒
1.
近邻传播聚类(AP)方法是近年来出现的一种广受关注的聚类方法,在处理多类、大规模数据集时,能够在较短的时间得到较理想的结果,因此与传统方法相比具有很大的优势。但是对于一些聚类结构复杂的数据集,往往不能得到很好的聚类结果。通过分析数据的聚类特性,设计了一种可以根据数据结构自动调整参数的核函数,数据集在其映射得到的核空间中线性可分或几乎线性可分,对该核空间中的数据集进行近邻传播聚类,有效提高了AP聚类的精确度和速度。算法有效性分析以及仿真实验验证了所提算法在处理大规模复杂结构数据集上的性能优于原始AP算法。  相似文献   

2.
核密度估计及其在聚类算法构造中的应用   总被引:10,自引:0,他引:10  
经典数理统计学中的核密度估计理论是构造基于数据集密度函数聚类算法的理论基础,采用分箱近似的快速核密度函数估计方法同样为构造高效的聚类算法提供了依据.通过对核密度估计理论及其快速分箱核近似方法的讨论,给出分箱近似密度估计相对于核密度估计的均方误差界,提出基于网格数据重心的分箱核近似方法.在不改变计算复杂度的条件下,基于网格数据重心的分箱核近似密度函数计算可以有效地降低近似误差,这一思想方法对于构造高效大规模数据聚类分析算法具有指导意义.揭示了基于网格上密度函数近似的聚类算法与核密度估计理论之间的关系.  相似文献   

3.
针对传统模糊核聚类算法当数据类差别很大时,小数据类被误分或被大数据类吞并的缺陷,提出了一种新的加权模糊核C 均值聚类算法(Weighted Fuzzy Kernel C-Means),为每一个类分配了一个动态权值;同时将该算法引入到谱聚类中,设计了一个以图像灰度特征作为分类样本的改进的谱聚类算法,解决了谱聚类应用于图像分割时权矩阵的谱难以计算的实际问题。实验结果表该算法具有较好的分割效果。  相似文献   

4.
在众多聚类算法中,谱聚类作为一种代表性的图聚类算法,由于其对复杂数据分布的适应性强、聚类效果好等优点而受到人们的广泛关注.然而,由于其高计算时间复杂度难以应用于处理大规模数据.为提高谱聚类算法在大规模数据集上的可用性,提出关键节点选择的快速图聚类算法.该算法包含三个重要步骤:第一,提出一种充分考虑抱团性和分离性的快速节点重要性评价方法;第二,选择关键节点代替原数据集构建二分图,通过奇异值分解获得数据的近似特征向量;第三,集成多次的近似特征向量,提高近似谱聚类结果的鲁棒性.该算法将时间复杂度由谱聚类原有的O(n3)降低到O(t(n+2n2)),增强了其在大规模数据集上的可用性.通过该算法与其他七个具有代表性的谱聚类算法在五个Benchmark数据集上进行的实验分析,比较结果展示了该算法相比其他算法能够更加高效地识别数据中的复杂类结构.  相似文献   

5.
一种基于核的快速可能性聚类算法   总被引:1,自引:1,他引:0       下载免费PDF全文
传统的快速聚类算法大多基于模糊C均值算法(Fuzzy C-means,FCM),而FCM对初始聚类中心敏感,对噪音数据敏感并且容易收敛到局部极小值,因而聚类准确率不高。可能性C-均值聚类较好地解决了FCM对噪声敏感的问题,但容易产生一致性聚类。将FCM和可能性C-均值聚类结合的聚类算法较好地解决了一致性聚类问题。为进一步提高算法收敛速度和鲁棒性,提出一种基于核的快速可能性聚类算法。该方法引入核聚类的思想,同时使用样本方差对目标函数中参数η进行优化。标准数据集和人造数据集的实验结果表明这种基于核的快速可能性聚类算法提高了算法的聚类准确率,加快了收敛速度。  相似文献   

6.
针对现有的Sync算法具有较高时间复杂度,在处理大样本数据集时有相当的局限性,提出了一种快速大样本同步聚类算法(Fast Clustering by Synchronization on Large Sample,FCSLS)。首先将基于核密度估计(KDE)的抽样方法对大样本数据进行抽样压缩,再在压缩集上进行同步聚类,通过Davies-Bouldin指标自动寻优到最佳聚类数,最后,对剩下的大规模数据进行聚类,得到最终聚类结果。通过在人造数据集以及UCI真实数据集上的实验,FCSLS可以在大规模数据集上得到任意形状、密度、大小的聚类且不需要预设聚类数。同时与基于压缩集密度估计和中心约束最小包含球技术的快速压缩方法相比,FCSLS在不损失聚类精度的情况下,极大地缩短了同步聚类算法的运行时间。  相似文献   

7.
用核方法来改造传统的学习算法是近年来机器学习领域研究的一个热点.本文提出了一种新的应用核方法在原输入空间中进行聚类的思想,并把其推广应用于传统的聚类算法,得到模糊核C-均值算法和可能性核C-均值算法.该类算法的实质是在准则函数中采用了一类核诱导的非欧氏距离的新的距离度量,并且依据Huber的鲁棒统计分析,该类算法是内在鲁棒的,适合对不完整数据或缺失数据.含噪数据和野值的聚类.最后在人工和Benchmark数据集上对上述算法的性能进行了验证.  相似文献   

8.
基于核模糊聚类的多模型LSSVM回归建模   总被引:6,自引:1,他引:5  
李卫  杨煜普  王娜 《控制与决策》2008,23(5):560-562
针对大规模数据采用单模型回归存在精度差和计算量较大的问题,提出一种基于核模糊聚类的多模型最小二乘支持向量回归建模方法.该方法首先使用基于条件正定核的模糊C均值聚类算法对数据集做出聚类划分;然后针对每个聚类做最小二乘支持向量回归估计;同时根据每个聚类内数据分布的特征,给出了一种简单的核参数选择方法.利用数值仿真实验进行非线性函数估计,实验结果表明了所提出的方法具有良好的精度和泛化能力.  相似文献   

9.
后向传播神经网络算法是一种经典的分类算法,但是通常该算法训练时间较长。针对这种不足,提出了一种基于核聚类的快速后向传播算法。利用核聚类将原始样本划分为多个簇,对每一个簇计算簇中心样本,利用所有的簇中心样本作为新训练集进行神经网络学习。在UCI标准数据集和说话人识别数据集上的仿真实验,充分说明了算法较传统后向传播算法具有明显的速度优势。  相似文献   

10.
现有的同步聚类方法Sync在同步过程中需要将样本中的每一个分量看作相位振子进行计算,具有较高的时间复杂度,因此在大规模数据集上聚类时具有相当大的局限性.为了解决这一问题,提出了快速自适应同步聚类方法(fast adaptive KDE-based clustering by synchronization,FAKCS).FAKCS首先引入基于压缩集密度估计和中心约束最小包含球技术的快速压缩方法对大规模数据集进行压缩,然后通过使用Davies-Bouldin指标,在压缩集上进行ε参数自适应的同步聚类,并采用新定义的序列参量来评价局部同步的程度.另外,研究了序列参量和核密度估计间的联系,从理论上揭示了样本点的局部同步在概率密度意义下的本质.FAKCS可以在大规模数据集上得到任意形状、个数、密度的聚类而无需预设聚类数目.在图像分割和大规模UCI数据集上的实验验证了FAKCS的有效性.  相似文献   

11.
核函数及其参数的选择决定着核方法的性能。本文基于半监督学习思想,通过构建一个目标函数,利用无标签数据和成对约束信息来优化核函数,使得核函数尽可能适应数据集,从而改善核函数性能。为验证方法的有效性,将其应用于核主成分分析(KPCA)的核函数优化中,在人工数据和UCI数据集上对KPCA提取特征的分类和聚类性能进行评估,实验结果说明提出方法改进了分类和聚类性能。  相似文献   

12.
Dimensionality reduction is an important preprocessing procedure in computer vision, pattern recognition, information retrieval, and data mining. In this paper we present a kernel method based on approximately harmonic projection (AHP), a recently proposed linear manifold learning method that has an excellent performance in clustering. The kernel matrix implicitly maps the data into a reproducing kernel Hilbert space (RKHS) and makes the structure of data more distinct, which distributes on nonlinear manifold. It retains and extends the advantages of its linear version and keeps the sensitive to the connected components. This makes the method particularly suitable for unsupervised clustering. Besides, this method can cover various classes of nonlinearities with different kernels. We experiment the new method on several well-known data sets to demonstrate its effectiveness. The results show that the new algorithm performs a good job and outperforms other classic algorithms on those data sets.  相似文献   

13.
对互联网产生的大量短文本进行聚类分析具有重要的应用价值,但由于短文本存在特征稀疏和特征难以提取的问题,导致传统的文本聚类算法难以有效处理该问题。为了解决该问题,利用非负矩阵分解(NMF)模型提出基于加权核非负矩阵分解(WKNMF)的短文本聚类算法。该算法通过核方法的映射关系将稀疏特征空间映射到高维隐性空间,从而可以充分利用短文本中的隐性语义特征进行聚类;另外,利用核技巧简化高维数据的复杂运算,并通过迭代更新规则不断地动态调整短文本的权重向量,从而可以区分不同短文本对聚类的重要性。在真实的微博数据集上进行了相关实验,结果表明WKNMF算法比K均值、隐含狄利克雷分布(LDA)、NMF和自组织神经网络(SOM)具有更好的聚类质量,准确度和归一化互信息分别达到了66.38%和66.91%。  相似文献   

14.
The leading partitional clustering technique, k-modes, is one of the most computationally efficient clustering methods for categorical data. However, in the k-modes-type algorithms, the performance of their clustering depends on initial cluster centers and the number of clusters needs be known or given in advance. This paper proposes a novel initialization method for categorical data which is implemented to the k-modes-type algorithms. The proposed method can not only obtain the good initial cluster centers but also provide a criterion to find candidates for the number of clusters. The performance and scalability of the proposed method has been studied on real data sets. The experimental results illustrate that the proposed method is effective and can be applied to large data sets for its linear time complexity with respect to the number of data points.  相似文献   

15.
Local density adaptive similarity measurement for spectral clustering   总被引:3,自引:0,他引:3  
Similarity measurement is crucial to the performance of spectral clustering. The Gaussian kernel function is usually adopted as the similarity measure. However, with a fixed kernel parameter, the similarity between two data points is only determined by their Euclidean distance, and is not adaptive to their surroundings. In this paper, a local density adaptive similarity measure is proposed, which uses the local density between two data points to scale the Gaussian kernel function. The proposed similarity measure satisfies the clustering assumption and has an effect of amplifying intra-cluster similarity, thus making the affinity matrix clearly block diagonal. Experimental results on both synthetic and real world data sets show that the spectral clustering algorithm with our local density adaptive similarity measure outperforms the traditional spectral clustering algorithm, the path-based spectral clustering algorithm and the self-tuning spectral clustering algorithm.  相似文献   

16.
Mixture model based clustering (also simply called model-based clustering hereinafter) consists of fitting a mixture model to data and identifying each cluster with one of its components. This paper tackles the model selection and parameter estimation problems in model-based clustering so as to improve the clustering performance on the data sets whose true kernel distribution functions are not in the family of assumed ones, as well as with inherently overlapped clusters. Being tailored to clustering applications, an effective model selection criterion is first proposed. Unlike most criteria that measure the goodness-of-fit of the model only to generate data, the proposed one also evaluates whether the candidate model provides a reasonable partition for the observed data, which enforces a model with well-separated components. Accordingly, an improved method for the estimation of mixture parameters is derived, which aims to suppress the spurious estimates by the standard expectation maximization (EM) algorithm and enforce well-supported components in the mixture model. Finally, the estimation of mixture parameters and the model selection is integrated in a single algorithm which favors a compact mixture model with both the well-supported and well-separated components. Extensive experiments on synthetic and real-world data sets are carried out to show the effectiveness of the proposed approach to the mixture model based clustering.  相似文献   

17.
Recently, kernel-based clustering in feature space has shown to perform better than conventional clustering methods in unsupervised classification. In this paper, a partitioning clustering method in kernel-induce feature space for symbolic interval-valued data is introduced. The distance between an item and its prototype in feature space is expanded using a two-component mixture kernel to handle intervals. Moreover, tools for the partition and cluster interpretation of interval-valued data in feature space are also presented. To show the effectiveness of the proposed method, experiments with real and synthetic interval data sets were performed and a study comparing the proposed method with different clustering algorithms of the literature is also presented. The clustering quality furnished by the methods is measured by an external cluster validity index (corrected Rand index). These experiments showed the usefulness of the kernel K-means method for interval-valued data and the merit of the partition and cluster interpretation tools.  相似文献   

18.
徐鲲鹏  陈黎飞  孙浩军  王备战 《软件学报》2020,31(11):3492-3505
现有的类属型数据子空间聚类方法大多基于特征间相互独立假设,未考虑属性间存在的线性或非线性相关性.提出一种类属型数据核子空间聚类方法.首先引入原作用于连续型数据的核函数将类属型数据投影到核空间,定义了核空间中特征加权的类属型数据相似性度量.其次,基于该度量推导了类属型数据核子空间聚类目标函数,并提出一种高效求解该目标函数的优化方法.最后,定义了一种类属型数据核子空间聚类算法.该算法不仅在非线性空间中考虑了属性间的关系,而且在聚类过程中赋予每个属性衡量其与簇类相关程度的特征权重,实现了类属型属性的嵌入式特征选择.还定义了一个聚类有效性指标,以评价类属型数据聚类结果的质量.在合成数据和实际数据集上的实验结果表明,与现有子空间聚类算法相比,核子空间聚类算法可以发掘类属型属性间的非线性关系,并有效提高了聚类结果的质量.  相似文献   

19.
基于稀疏Parzen窗密度估计的快速自适应相似度聚类方法   总被引:1,自引:1,他引:0  
相似度聚类方法(Similarity-based clustering method,SCM)因其简单易实现和具有鲁棒性而广受关注.但由于内含相似度聚类算法(Similarity clustering algorithm,SCA)的高时间复杂度和凝聚型层次聚类(Agglomerative hierarchicalclu...  相似文献   

20.
An improved spectral clustering algorithm based on random walk   总被引:2,自引:0,他引:2  
The construction process for a similarity matrix has an important impact on the performance of spectral clustering algorithms. In this paper, we propose a random walk based approach to process the Gaussian kernel similarity matrix. In this method, the pair-wise similarity between two data points is not only related to the two points, but also related to their neighbors. As a result, the new similarity matrix is closer to the ideal matrix which can provide the best clustering result. We give a theoretical analysis of the similarity matrix and apply this similarity matrix to spectral clustering. We also propose a method to handle noisy items which may cause deterioration of clustering performance. Experimental results on real-world data sets show that the proposed spectral clustering algorithm significantly outperforms existing algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号