首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 236 毫秒
1.
Clustering Incomplete Data Using Kernel-Based Fuzzy C-means Algorithm   总被引:3,自引:0,他引:3  
  相似文献   

2.
聚类问题的自适应杂交差分演化模拟退火算法   总被引:1,自引:0,他引:1       下载免费PDF全文
针对K-均值聚类算法对初始值敏感和易陷入局部最优的缺点,提出了一个基于自适应杂交差分演化模拟退火的K-均值聚类算法。该算法以差分演化算法为基础,通过模拟退火算法的更新策略来增强全局搜索能力,并运用自适应技术来选择学习策略、确定算法的关键参数。实验结果表明,该算法能较好地克服传统K-均值聚类算法的缺点,具有较好的全局收敛能力,且算法稳定性强、收敛速度快,将新算法与传统的K-均值聚类算法以及最近提出的几个同类聚类算法进行了比较。  相似文献   

3.
针对工业、信息等领域出现的基于较大规模、非平稳变化复杂数据的回归问题,已有算法在计算成本及拟合效果方面无法同时满足要求.因此,文中提出基于多尺度高斯核的分布式正则化回归学习算法.算法中的假设空间为多个具有不同尺度的高斯核生成的再生核Hilbert空间的和空间.考虑到整个数据集划分的不同互斥子集波动程度不同,建立不同组合系数核函数逼近模型.利用最小二乘正则化方法同时独立求解各逼近模型.最后,通过对所得的各个局部估计子加权合成得到整体逼近模型.在2个模拟数据集和4个真实数据集上的实验表明,文中算法既能保证较优的拟合性能,又能降低运行时间.  相似文献   

4.
针对传统K-means算法存在的缺陷,引进人工鱼群算法,提出了一种基于改进鱼群和K-means的混合聚类算法。聚类样本中心点初始化时,人工鱼各维参数随机选择在对应属性两个极值之间,同时为了降低计算复杂度,提高收敛效率,寻找全局最优,首先对随机选取的一小部分人工鱼进行K-means操作,然后对全体人工鱼的追尾算子引入粒子群策略,引导其学习,模拟人工鱼的行为。通过Matlab仿真实现算法,在费雪鸢尾花卉数据集和葡萄酒质量数据集进行了实验,算法的有效性和可行性得到了验证。  相似文献   

5.
The success rates of the expert or intelligent systems depend on the selection of the correct data clusters. The k-means algorithm is a well-known method in solving data clustering problems. It suffers not only from a high dependency on the algorithm's initial solution but also from the used distance function. A number of algorithms have been proposed to address the centroid initialization problem, but the produced solution does not produce optimum clusters. This paper proposes three algorithms (i) the search algorithm C-LCA that is an improved League Championship Algorithm (LCA), (ii) a search clustering using C-LCA (SC-LCA), and (iii) a hybrid-clustering algorithm called the hybrid of k-means and Chaotic League Championship Algorithm (KSC-LCA) and this algorithm has of two computation stages. The C-LCA employs chaotic adaptation for the retreat and approach parameters, rather than constants, which can enhance the search capability. Furthermore, to overcome the limitation of the original k-means algorithm using the Euclidean distance that cannot handle the categorical attribute type properly, we adopt the Gower distance and the mechanism for handling a discrete value requirement of the categorical value attribute. The proposed algorithms can handle not only the pure numeric data but also the mixed-type data and can find the best centroids containing categorical values. Experiments were conducted on 14 datasets from the UCI repository. The SC-LCA and KSC-LCA competed with 16 established algorithms including the k-means, k-means++, global k-means algorithms, four search clustering algorithms and nine hybrids of k-means algorithm with several state-of-the-art evolutionary algorithms. The experimental results show that the SC-LCA produces the cluster with the highest F-Measure on the pure categorical dataset and the KSC-LCA produces the cluster with the highest F-Measure for the pure numeric and mixed-type tested datasets. Out of 14 datasets, there were 13 centroids produced by the SC-LCA that had better F-Measures than that of the k-means algorithm. On the Tic-Tac-Toe dataset containing only categorical attributes, the SC-LCA can achieve an F-Measure of 66.61 that is 21.74 points over that of the k-means algorithm (44.87). The KSC-LCA produced better centroids than k-means algorithm in all 14 datasets; the maximum F-Measure improvement was 11.59 points. However, in terms of the computational time, the SC-LCA and KSC-LCA took more NFEs than the k-means and its variants but the KSC-LCA ranks first and SC-LCA ranks fourth among the hybrid clustering and the search clustering algorithms that we tested. Therefore, the SC-LCA and KSC-LCA are general and effective clustering algorithms that could be used when an expert or intelligent system requires an accurate high-speed cluster selection.  相似文献   

6.
一种大数据环境下的新聚类算法   总被引:2,自引:0,他引:2  
李斌  王劲松  黄玮 《计算机科学》2015,42(12):247-250
提出了一种新的聚类算法NGKCA,该算法克服了经典聚类算法检测率和稳定性的不足,适用于解决大数据环境下的聚类问题。NGKCA聚类算法包括4个阶段:首先利用谱聚类NJW算法对大数据集进行列降维和数据归一化处理,其次引入对初始值不敏感的粒子群算法对数据集进行行降维从而选出临时的聚类中心集,接着通过全局Kmeans算法对最佳聚类中心集进行聚类以获取聚类中心点,最后使用粒子群算法对聚类中心点进行调整进而获取最终的聚类划分。在一些著名的机器学习数据集和国际标准的网络安全数据集KDDCUP99上进行实验,结果表明:提出的算法比谱聚类、Kmeans、粒子群、全局Kmeans等常见算法具有更好的稳定性和更高的检测率,与全局Kmeans算法相比具有更优的时间复杂度。  相似文献   

7.
The text clustering technique is an appropriate method used to partition a huge amount of text documents into groups. The documents size affects the text clustering by decreasing its performance. Subsequently, text documents contain sparse and uninformative features, which reduce the performance of the underlying text clustering algorithm and increase the computational time. Feature selection is a fundamental unsupervised learning technique used to select a new subset of informative text features to improve the performance of the text clustering and reduce the computational time. This paper proposes a hybrid of particle swarm optimization algorithm with genetic operators for the feature selection problem. The k-means clustering is used to evaluate the effectiveness of the obtained features subsets. The experiments were conducted using eight common text datasets with variant characteristics. The results show that the proposed algorithm hybrid algorithm (H-FSPSOTC) improved the performance of the clustering algorithm by generating a new subset of more informative features. The proposed algorithm is compared with the other comparative algorithms published in the literature. Finally, the feature selection technique encourages the clustering algorithm to obtain accurate clusters.  相似文献   

8.
提出一种密度敏感模糊核最大熵聚类算法.该算法首先通过核函数将原始非线性非高斯的数据集转化为核空间数据集,然后利用核函数的相似性抵消不属于该聚类的样本数据在聚类过程中对聚类中心求解的干扰,消除正则化系数对聚类结果的影响,进而抑制传统最大熵聚类算法的趋同性.最后通过引入相对密度项,解决因样本数据在特征空间的分布差异而导致的聚类中心求解偏差问题,从而提高聚类结果的准确性.实验部分,本文讨论了算法参数间的关系以及对聚类结果的影响.通过与传统模糊C均值聚类算法、核模糊C均值聚类算法、最大熵聚类算法、最大熵规范化权重核模糊C均值聚类算法以及其他两种改进最大熵聚类算法的聚类结果进行对比分析,结果表明本文提出的密度敏感模糊核最大熵聚类算法的聚类性能明显优于其他算法.  相似文献   

9.
A novel ant-based clustering algorithm using the kernel method   总被引:1,自引:0,他引:1  
A novel ant-based clustering algorithm integrated with the kernel (ACK) method is proposed. There are two aspects to the integration. First, kernel principal component analysis (KPCA) is applied to modify the random projection of objects when the algorithm is run initially. This projection can create rough clusters and improve the algorithm’s efficiency. Second, ant-based clustering is performed in the feature space rather than in the input space. The distance between the objects in the feature space, which is calculated by the kernel function of the object vectors in the input space, is applied as a similarity measure. The algorithm uses an ant movement model in which each object is viewed as an ant. The ant determines its movement according to the fitness of its local neighbourhood. The proposed algorithm incorporates the merits of kernel-based clustering into ant-based clustering. Comparisons with other classic algorithms using several synthetic and real datasets demonstrate that ACK method exhibits high performance in terms of efficiency and clustering quality.  相似文献   

10.
现有的多视图聚类算法大多假设多视图数据点之间为线性关系,且在学习过程中无法保留原始特征空间的局部性;而在欧氏空间中进行子空间融合又过于单调,无法将学习到的子空间表示对齐。针对以上问题,提出了基于格拉斯曼流形融合子空间的多视图聚类算法。首先,将核技巧和局部流形结构学习结合以得到不同视图的子空间表示;然后,在格拉斯曼流形上融合这些子空间表示以得到一致性亲和矩阵;最后,对一致性亲和矩阵执行谱聚类来得到最终的聚类结果,并利用交替方向乘子法(ADMM)来优化所提模型。与核多视图低秩稀疏子空间聚类(KMLRSSC)算法相比,所提算法的聚类精度在MSRCV1、Prokaryotic、Not-Hill数据集上分别提高了20.83个百分点、9.47个百分点和7.33个百分点。实验结果验证了基于格拉斯曼流形融合子空间的多视图聚类算法的有效性和良好性能。  相似文献   

11.
谱聚类将数据聚类问题转化成图划分问题,通过寻找最优的子图,对数据点进行聚类。谱聚类的关键是构造合适的相似矩阵,将数据集的内在结构真实地描述出来。针对传统的谱聚类算法采用高斯核函数来构造相似矩阵时对尺度参数的选择很敏感,而且在聚类阶段需要随机确定初始的聚类中心,聚类性能也不稳定等问题,本文提出了基于消息传递的谱聚类算法。该算法采用密度自适应的相似性度量方法,可以更好地描述数据点之间的关系,然后利用近邻传播(Affinity propagation,AP)聚类中“消息传递”机制获得高质量的聚类中心,提高了谱聚类算法的性能。实验表明,新算法可以有效地处理多尺度数据集的聚类问题,其聚类性能非常稳定,聚类质量也优于传统的谱聚类算法和k-means算法。  相似文献   

12.
A novel kernel method for clustering   总被引:10,自引:0,他引:10  
Kernel methods are algorithms that, by replacing the inner product with an appropriate positive definite function, implicitly perform a nonlinear mapping of the input data into a high-dimensional feature space. In this paper, we present a kernel method for clustering inspired by the classical k-means algorithm in which each cluster is iteratively refined using a one-class support vector machine. Our method, which can be easily implemented, compares favorably with respect to popular clustering algorithms, like k-means, neural gas, and self-organizing maps, on a synthetic data set and three UCI real data benchmarks (IRIS data, Wisconsin breast cancer database, Spam database).  相似文献   

13.
针对数据竞争聚类算法在处理复杂结构数据集时聚类性能不佳的问题,提出了一种密度敏感的数据竞争聚类算法。首先,在密度敏感距离测度的基础上定义了局部距离,以描述数据分布的局部一致性;其次,在局部距离的基础上计算出数据间的全局距离,用来描述数据分布的全局一致性,挖掘数据的空间分布信息,以弥补欧氏距离描述数据分布全局一致性能力不佳的缺陷;最后,将全局距离用于数据竞争聚类算法中。将新算法与基于欧氏距离的数据竞争聚类算法进行性能比较,在人工数据集和真实数据集上的实验结果表明,该算法克服了数据竞争聚类算法难以处理复杂结构数据的缺点,聚类结果具有更高的准确率。  相似文献   

14.
针对经典k_均值聚类方法只能处理静态数据聚类的问题,本文提出一种能够处理动态数据的改进动态k-均值聚类算法,称为Dynamical K-means算法.该方法在经典k-均值方法的基础上,通过对动态变化的数据集中 新加入样本进行分析和处理,根据聚类目标函数改变的实际情况选择最相似的类别进行局部更新或进行全局经典k_均值聚类,有效检测发生聚类概念漂移和没有发生聚类概念漂移的情况,从而实现了动态数据的在线聚类,避免了经典k_均值方法在动态数据中每次都要对全部数据重新聚类而导致算法速度过慢的问题.标准数据集和人工社会网络数据集上的实验结果表明,与经典k_均值聚类方法相比,本文提出的动态k_均值聚类方法能快速高效地处理动态数据聚类问题,并有效地检测动态数据聚类过程中所产生的概念漂移问题.  相似文献   

15.
目的 针对现有广义均衡模糊C-均值聚类不收敛问题,提出一种改进广义均衡模糊聚类新算法,并将其推广至再生希尔伯特核空间以便提高该类算法的普适性。方法 在现有广义均衡模糊C-均值聚类目标函数的基础上,利用Schweizer T范数极限表达式的性质构造了新的广义均衡模糊C-均值聚类最优化目标函数,然后采用拉格朗日乘子法获取其迭代求解所对应的隶属度和聚类中心表达式,同时对其聚类中心迭代表达式进行修改并得到一类聚类性能显著改善的修正聚类算法;最后利用非线性函数将数据样本映射至高维特征空间获得核空间广义均衡模糊聚类算法。结果 对Iris标准文本数据聚类和灰度图像分割测试表明,提出的改进广义均衡模模糊聚类新算法及其修正算法具有良好的分类性能,核空间广义均衡模糊聚类算法对比现有融入类间距离的改进模糊C-均值聚类(FCS)算法和改进再生核空间的模糊局部C-均值聚类(KFLICM)算法能将图像分割的误分率降低10%30%。结论 本文算法克服了现有广义均衡模糊C-均值聚类算法的缺陷,同时改善了聚类性能,适合复杂数据聚类分析的需要。  相似文献   

16.
针对大规模文本聚类中对聚类算法执行效率的要求,提出了一个内容相关的纵向数据划分策略FTDV,并基于该策略提出了数据划分优化的并行DVP k-means算法,提高了常规并行k-means算法的并行化程度,达到了优化算法执行效率的目的。在实验中,与常规并行k-means算法和基于关键方向分解的PDDP k-means算法进行比较,DVP k-means具有更好的并行性和对数据规模的适应性,且可以生成更高质量的聚簇。  相似文献   

17.
基于云计算平台Hadoop的并行k-means聚类算法设计研究   总被引:2,自引:0,他引:2  
随着数据库技术的发展和Intcrnct的迅速普及,实际应用中需要处理的数据量急剧地增长,致聚类研究面临 许多新的问题和挑战,如海量数据和新的计算环境等。深入研究了基于云计算平台Hadoop的并行k-means聚类算 法,给出了算法设计的方法和策略。在多个不同大小数据集上的实验表明,设计的并行聚类算法具有优良的加速比、 扩展率和数据伸缩率等性能,适合用于海量数据的分析和挖掘。  相似文献   

18.
Recently there has been a steep growth in the development of kernel-based learning algorithms. The intrinsic problem in such algorithms is the selection of the optimal kernel for the learning task of interest. In this paper, we propose an unsupervised approach to learn a linear combination of kernel functions, such that the resulting kernel best serves the objectives of the learning task. This is achieved through measuring the influence of each point on the structure of the dataset. This measure is calculated by constructing a weighted graph on which a random walk is performed. The measure of influence in the feature space is probabilistically related to the input space that yields an optimization problem to be solved. The optimization problem is formulated in two different convex settings, namely linear and semidefinite programming, dependent on the type of kernel combination considered. The contributions of this paper are twofold: first, a novel unsupervised approach to learn the kernel function, and second, a method to infer the local similarity represented by the kernel function by measuring the global influence of each point toward the structure of the dataset. The proposed approach focuses on the kernel selection which is independent of the kernel-based learning algorithm. The empirical evaluation of the proposed approach with various datasets shows the effectiveness of the algorithm in practice.  相似文献   

19.
近邻传播聚类(AP)方法是近年来出现的一种广受关注的聚类方法,在处理多类、大规模数据集时,能够在较短的时间得到较理想的结果,因此与传统方法相比具有很大的优势。但是对于一些聚类结构复杂的数据集,往往不能得到很好的聚类结果。通过分析数据的聚类特性,设计了一种可以根据数据结构自动调整参数的核函数,数据集在其映射得到的核空间中线性可分或几乎线性可分,对该核空间中的数据集进行近邻传播聚类,有效提高了AP聚类的精确度和速度。算法有效性分析以及仿真实验验证了所提算法在处理大规模复杂结构数据集上的性能优于原始AP算法。  相似文献   

20.
Data clustering has been proven to be an effective method for discovering structure in medical datasets. The majority of clustering algorithms produce exclusive clusters meaning that each sample can belong to one cluster only. However, most real-world medical datasets have inherently overlapping information, which could be best explained by overlapping clustering methods that allow one sample belong to more than one cluster. One of the simplest and most efficient overlapping clustering methods is known as overlapping k-means (OKM), which is an extension of the traditional k-means algorithm. Being an extension of the k-means algorithm, the OKM method also suffers from sensitivity to the initial cluster centroids. In this paper, we propose a hybrid method that combines k-harmonic means and overlapping k-means algorithms (KHM-OKM) to overcome this limitation. The main idea behind KHM-OKM method is to use the output of KHM method to initialize the cluster centers of OKM method. We have tested the proposed method using FBCubed metric, which has been shown to be the most effective measure to evaluate overlapping clustering algorithms regarding homogeneity, completeness, rag bag, and cluster size-quantity tradeoff. According to results from ten publicly available medical datasets, the KHM-OKM algorithm outperforms the original OKM algorithm and can be used as an efficient method for clustering medical datasets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号