首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 636 毫秒
1.
现实世界中高维数据无处不在,然而在高维数据中往往存在大量的冗余和噪声信息,这导致很多传统聚类算法在对高维数据聚类时不能获得很好的性能.实践中发现高维数据的类簇结构往往嵌入在较低维的子空间中.因而,降维成为挖掘高维数据类簇结构的关键技术.在众多降维方法中,基于图的降维方法是研究的热点.然而,大部分基于图的降维算法存在以下两个问题:(1)需要计算或者学习邻接图,计算复杂度高;(2)降维的过程中没有考虑降维后的用途.针对这两个问题,提出一种基于极大熵的快速无监督降维算法MEDR. MEDR算法融合线性投影和极大熵聚类模型,通过一种有效的迭代优化算法寻找高维数据嵌入在低维子空间的潜在最优类簇结构. MEDR算法不需事先输入邻接图,具有样本个数的线性时间复杂度.在真实数据集上的实验结果表明,与传统的降维方法相比, MEDR算法能够找到更好地将高维数据投影到低维子空间的投影矩阵,使投影后的数据有利于聚类.  相似文献   

2.
在许多数据挖掘的实际应用中要求每一个类别的实例数量相对平衡. 而独立子空间聚类的熵加权K-means算法(EWKM)会产生不均衡的划分, 聚类质量很差. 本文定义了一种兼顾平衡划分与特征分布的多目标熵, 然后应用该熵改进了EWKM算法的目标函数, 同利用迭代方法和交替方向乘子法设计其求解流程, 并提出基于熵的平衡子空间K-means算法(EBSKM). 最后, 在UCI、UCR等公开数据集进行聚类实验, 结果表明所提算法在准确率和平衡性方面都优于同类算法.  相似文献   

3.
吕佳 《计算机应用》2009,29(5):1380-1384
针对K-means聚类算法无法正确识别非凸形状簇的缺陷,提出一种基于Delaunay三角剖分密度度量的聚类方法,利用Delaunay三角剖分图的最近性、邻接性等优良特性来反映数据自身特点并进行密度度量,同时以混沌优化方法实现聚类目标函数的全局优化,达到全局最小解。实验结果证明,基于Delaunay三角剖分密度度量方式的聚类算法能发现任意非凸形状簇。  相似文献   

4.
Clustering divides data into meaningful or useful groups (clusters) without any prior knowledge. It is a key technique in data mining and has become an important issue in many fields. This article presents a new clustering algorithm based on the mechanism analysis of chaotic ant swarm (CAS). It is an optimization methodology for clustering problem which aims to obtain global optimal assignment by minimizing the objective function. The proposed algorithm combines three advantages into one: finding global optimal solution to the objective function, not sensitive to clusters with different size and density and suitable to multi-dimensional data sets. The quality of this approach is evaluated on several well-known benchmark data sets. Compared with the popular clustering method named k-means algorithm and the PSO-based clustering technique, experimental results show that our algorithm is an effective clustering technique and can be used to handle data sets with complex cluster sizes, densities and multiple dimensions.  相似文献   

5.
为解决大规模非线性最优化问题的串行求解速度慢的问题,提出应用松弛异步并行算法求解无约束最优化问题。根据无约束最优化问题的BFGS串行算法,在PC机群环境下将其并行化。利用CHOLESKY方法分解系数为对称正定矩阵的线性方程组,运用无序松弛异步并行方法求解解向量和Wolfe-Powell非线性搜索步长,并行求解BFGS修正公式,构建BFGS松弛异步并行算法,并对算法的时间复杂性、加速比进行分析。在PC机群的实验结果表明,该算法提高了无约束最优化问题的求解速度且负载均衡,算法具有线性加速比。  相似文献   

6.
Unsupervised clustering for datasets with severe outliers inside is a difficult task. In this approach, we propose a cluster-dependent multi-metric clustering approach which is robust to severe outliers. A dataset is modeled as clusters each contaminated by noises of cluster-dependent unknown noise level in formulating outliers of the cluster. With such a model, a multi-metric Lp-norm transformation is proposed and learnt which maps each cluster to the most Gaussian distribution by minimizing some non-Gaussianity measure. The approach is composed of two consecutive phases: multi-metric location estimation (MMLE) and multi-metric iterative chi-square cutoff (ICSC). Algorithms for MMLE and ICSC are proposed. It is proved that the MMLE algorithm searches for the solution of a multi-objective optimization problem and in fact learns a cluster-dependent multi-metric Lq-norm distance and/or a cluster-dependent multi-kernel defined in data space for each cluster. Experiments on heavy-tailed alpha-stable mixture datasets, Gaussian mixture datasets with radial and diffuse outliers added respectively, and the real Wisconsin breast cancer dataset and lung cancer dataset show that the proposed method is superior to many existent robust clustering and outlier detection methods in both clustering and outlier detection performances.  相似文献   

7.
Many clustering models define good clusters as extrema of objective functions. Optimization of these models is often done using an alternating optimization (AO) algorithm driven by necessary conditions for local extrema. We abandon the objective function model in favor of a generalized model called alternating cluster estimation (ACE). ACE uses an alternating iteration architecture, but membership and prototype functions are selected directly by the user. Virtually every clustering model can be realized as an instance of ACE. Out of a large variety of possible instances of non-AO models, we present two examples: 1) an algorithm with a dynamically changing prototype function that extracts representative data and 2) a computationally efficient algorithm with hyperconic membership functions that allows easy extraction of membership functions. We illustrate these non-AO instances on three problems: a) simple clustering of plane data where we show that creating an unmatched ACE algorithm overcomes some problems of fuzzy c-means (FCM-AO) and possibilistic c-means (PCM-AO); b) functional approximation by clustering on a simple artificial data set; and c) functional approximation on a 12 input 1 output real world data set. ACE models work pretty well in all three cases  相似文献   

8.
Multiple kernel clustering (MKC), which performs kernel-based data fusion for data clustering, is an emerging topic. It aims at solving clustering problems with multiple cues. Most MKC methods usually extend existing clustering methods with a multiple kernel learning (MKL) setting. In this paper, we propose a novel MKC method that is different from those popular approaches. Centered kernel alignment—an effective kernel evaluation measure—is employed in order to unify the two tasks of clustering and MKL into a single optimization framework. To solve the formulated optimization problem, an efficient two-step iterative algorithm is developed. Experiments on several UCI datasets and face image datasets validate the effectiveness and efficiency of our MKC algorithm.  相似文献   

9.
Most clustering algorithms operate by optimizing (either implicitly or explicitly) a single measure of cluster solution quality. Such methods may perform well on some data sets but lack robustness with respect to variations in cluster shape, proximity, evenness and so forth. In this paper, we have proposed a multiobjective clustering technique which optimizes simultaneously two objectives, one reflecting the total cluster symmetry and the other reflecting the stability of the obtained partitions over different bootstrap samples of the data set. The proposed algorithm uses a recently developed simulated annealing-based multiobjective optimization technique, named AMOSA, as the underlying optimization strategy. Here, points are assigned to different clusters based on a newly defined point symmetry-based distance rather than the Euclidean distance. Results on several artificial and real-life data sets in comparison with another multiobjective clustering technique, MOCK, three single objective genetic algorithm-based automatic clustering techniques, VGAPS clustering, GCUK clustering and HNGA clustering, and several hybrid methods of determining the appropriate number of clusters from data sets show that the proposed technique is well suited to detect automatically the appropriate number of clusters as well as the appropriate partitioning from data sets having point symmetric clusters. The performance of AMOSA as the underlying optimization technique in the proposed clustering algorithm is also compared with PESA-II, another evolutionary multiobjective optimization technique.  相似文献   

10.
基于捕食-被捕食粒子群优化的模糊聚类   总被引:1,自引:0,他引:1       下载免费PDF全文
粒子群优化聚类算法具有参数简单,收敛快等优势,但也有局部极值问题。为解决此问题,提出一种基于捕食-被捕食的粒子群优化模糊聚类算法且聚类中心采用密度函数初始化。捕食者追逐被捕食者中心,加速收敛,而被捕食者逃离捕食者,促进多样性,以防局部极值出现。实验测试数据表明,算法具有防止局部极值、收敛快、全局寻优能力强等性能优势,能够比较好客观地反映现实世界。  相似文献   

11.
In the mining optimisation literature, most researchers focused on two strategic-level and tactical-level open-pit mine optimisation problems, which are respectively termed ultimate pit limit (UPIT) or constrained pit limit (CPIT). However, many researchers indicate that the substantial numbers of variables and constraints in real-world instances (e.g., with 50–1000 thousand blocks) make the CPIT's mixed integer programming (MIP) model intractable for use. Thus, it becomes a considerable challenge to solve the large scale CPIT instances without relying on exact MIP optimiser as well as the complicated MIP relaxation/decomposition methods. To take this challenge, two new graph-based algorithms based on network flow graph and conjunctive graph theory are developed by taking advantage of problem properties. The performance of our proposed algorithms is validated by testing recent large scale benchmark UPIT and CPIT instances’ datasets of MineLib in 2013. In comparison to best known results from MineLib, it is shown that the proposed algorithms outperform other CPIT solution approaches existing in the literature. The proposed graph-based algorithms lead to a more competent mine scheduling optimisation expert system because the third-party MIP optimiser is no longer indispensable and random neighbourhood search is not necessary.  相似文献   

12.
为了解决聚类算法容易陷入局部最优的问题,以及增强聚类算法的全局搜索能力,基于KHM算法以及改进的引力搜索算法,本文提出一种混合K-调和均值聚类算法(G-KHM)。G-KHM算法具有KHM算法收敛速度快的优点,但同时针对KHM算法容易陷入局部最优解的问题,在初始化后数据开始搜索聚类中心时采用了一种基于对象多样性及收敛性增强的引力搜索算法,该方法改进了引力搜索算法容易失去种群多样性的缺点,并同时具有引力搜索算法较强的全局搜索能力,可以使算法收敛到全局最优解。仿真结果表明,G-KHM算法能有效地避免陷入局部极值,具有较强的全局搜索能力以及稳定性,并且相比KHM算法、K-mean聚类算法、C均值聚类算法以及粒子群算法,在分类精度和运行时间上表现出了更好地效果。  相似文献   

13.
基于改进粒子群算法的聚类算法   总被引:3,自引:0,他引:3  
K-均值算法是一种传统的聚类分析方法,具有思想与算法简单的特点,因此成为聚类分析的常用方法之一.但K-均值算法的分类结果过分依赖于初始聚类中心的选择,对于某些初始值,该算法有可能收敛于一般次优解.在分析K-均值算法和粒子群算法的基础上,提出了一种基于改进的粒子群算法的聚类算法.该算法将局部搜索能力强的K均值算法和全局搜索能力强的粒子群算法结合,提高了K均值算法的局部搜索能力、加快了收敛速度,有效地阻止了早熟现象的发生.实验表明该聚类算法有更好的收敛效果.  相似文献   

14.
Identifying the optimal cluster number and generating reliable clustering results are necessary but challenging tasks in cluster analysis. The effectiveness of clustering analysis relies not only on the assumption of cluster number but also on the clustering algorithm employed. This paper proposes a new clustering analysis method that identifies the desired cluster number and produces, at the same time, reliable clustering solutions. It first obtains many clustering results from a specific algorithm, such as Fuzzy C-Means (FCM), and then integrates these different results as a judgement matrix. An iterative graph-partitioning process is implemented to identify the desired cluster number and the final result. The proposed method is a robust approach as it is demonstrated its effectiveness in clustering 2D data sets and multi-dimensional real-world data sets of different shapes. The method is compared with cluster validity analysis and other methods such as spectral clustering and cluster ensemble methods. The method is also shown efficient in mesh segmentation applications. The proposed method is also adaptive because it not only works with the FCM algorithm but also other clustering methods like the k-means algorithm.  相似文献   

15.
In this paper, we propose a novel Fast Affinity Propagation clustering approach (FAP). FAP simultaneously considers both local and global structure information contained in datasets, and is a high-quality multilevel graph partitioning method that can implement both vector-based and graph-based clustering. First, a new Fast Sampling algorithm (FS) is proposed to coarsen the input sparse graph and choose a small number of final representative exemplars. Then a density-weighted spectral clustering method is presented to partition those exemplars on the global underlying structure of data manifold. Finally, the cluster assignments of all data points can be achieved through their corresponding representative exemplars. Experimental results on two synthetic datasets and many real-world datasets show that our algorithm outperforms the state-of-the-art original affinity propagation and spectral clustering algorithms in terms of speed, memory usage, and quality on both vector-based and graph-based clustering.  相似文献   

16.
求解SAT问题的拟人退火算法   总被引:18,自引:3,他引:18  
该文利用一个简单的变换,将可满足性(SAT)问题转换为一个求相应目标函数最小值的优化问题,提出了一种用于跳出局部陷阱的拟人策略,基于模拟退火算法和拟人策略,为SAT问题的高效近注解得出了拟人退火算法(PA),该方法不仅具有模拟退火算法的全局收敛性质,而且具有一定的并行性,继承性。数值实验表明,对于本文随机产生的测试问题例,采用拟人策略的模拟退火算法的结果优于局部搜索算法,模拟退火算法以及近来国际上流行的WALKSAT算法,因此拟人退火算法是可行的和有效的。  相似文献   

17.
In this paper, we offer a simple and accurate clustering algorithm which was derived as a closed-form analytical solution to a cluster fit function minimization problem. As a result, the algorithm finds the global minimum of the fit function, and combines exceptional efficiency with optimal clustering results.  相似文献   

18.
标签传递是一种有效的基于图的半监督分类方法,被广泛应用于图像分类、文本分类等任务中。在基于图的半监督分类方法中,图的构建在一定程度上影响算法的性能。尽管已有大量的图构建方法被提出,然而现有方法存在图的构建与后续学习过程分离以及忽略数据的局部结构问题。为了解决上述问题,提出了一种基于局部约束的自适应图标签传递方法。在该方法中,将图构建与标签传递结合形成统一框架,并且在图构建过程中同时考虑样本的局部性与稀疏性,使得优化图更具有稀疏性和判别性,从而有利于标签传递。还提出了一种迭代优化算法求解目标函数,并在四个数据库上进行大量的实验,证明了所提出方法的有效性。  相似文献   

19.
基于粒子群优化算法的数据流聚类算法   总被引:1,自引:0,他引:1  
肖裕权  周肆清 《微机发展》2011,(10):43-46,50
针对当前基于滑动窗口的聚类算法中对原始数据信息的损失问题和提高聚类质量和准确性,在现有基于滑动窗口模型数据流聚类算法的基础上,提出了一种基于群体协作的粒子群优化算法(PSO)的新数据流聚类算法。这种优化的新数据流聚类算法利用改进的时间聚类特征指数直方图作为数据流的概要结构以及应用PSO在聚类过程中对聚类质量的局部迭代优化。实验结果表明,此方法有效减少了内存的开销,解决了对原始数据信息损失的问题。与传统的数据流聚类算法相比,基于粒子群优化算法的数据流聚类算法在聚类质量和准确性上明显优于传统的数据流聚类算法。  相似文献   

20.
Although many multi-view clustering approaches have been developed recently, one common shortcoming of most of them is that they generally rely on the original feature space or consider the two components of the similarity-based clustering separately (i.e., similarity matrix construction and cluster indicator matrix calculation), which may negatively affect the clustering performance. To tackle this shortcoming, in this paper, we propose a new method termed Multi-view Clustering in Latent Embedding Space (MCLES), which jointly recovers a comprehensive latent embedding space, a robust global similarity matrix and an accurate cluster indicator matrix in a unified optimization framework. In this framework, each variable boosts each other in an interplay manner to achieve the optimal solution. To avoid the optimization problem of quadratic programming, we further propose to relax the constraint of the global similarity matrix, based on which an improved version termed Relaxed Multi-view Clustering in Latent Embedding Space (R-MCLES) is proposed. Compared with MCLES, R-MCLES achieves lower computational complexity with more correlations between pairs of data points. Extensive experiments conducted on both image and document datasets have demonstrated the superiority of the proposed methods when compared with the state-of-the-art.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号