共查询到20条相似文献,搜索用时 797 毫秒
1.
A method of relational fuzzy clustering based on producing feature vectors using FastMap 总被引:1,自引:0,他引:1
Roelof Kars Brouwer 《Information Sciences》2009,179(20):3561-47
The first stage of organizing objects is to partition them into groups or clusters. The clustering is generally done on individual object data representing the entities such as feature vectors or on object relational data incorporated in a proximity matrix.This paper describes another method for finding a fuzzy membership matrix that provides cluster membership values for all the objects based strictly on the proximity matrix. This is generally referred to as relational data clustering. The fuzzy membership matrix is found by first finding a set of vectors that approximately have the same inter-vector Euclidian distances as the proximities that are provided. These vectors can be of very low dimension such as 5 or less. Fuzzy c-means (FCM) is then applied to these vectors to obtain a fuzzy membership matrix. In addition two-dimensional vectors are also created to provide a visual representation of the proximity matrix. This allows comparison of the result of automatic clustering to visual clustering. The method proposed here is compared to other relational clustering methods including NERFCM, Rouben’s method and Windhams A-P method. Various clustering quality indices are also calculated for doing the comparison using various proximity matrices as input. Simulations show the method to be very effective and no more computationally expensive than other relational data clustering methods. The membership matrices that are produced by the proposed method are less crisp than those produced by NERFCM and more representative of the proximity matrix that is used as input to the clustering process. 相似文献
2.
3.
4.
This paper presents the development of soft clustering and learning vector quantization (LVQ) algorithms that rely on multiple weighted norms to measure the distance between the feature vectors and their prototypes. Clustering and LVQ are formulated in this paper as the minimization of a reformulation function that employs distinct weighted norms to measure the distance between each of the prototypes and the feature vectors under a set of equality constraints imposed on the weight matrices. Fuzzy LVQ and clustering algorithms are obtained as special cases of the proposed formulation. The resulting clustering algorithm is evaluated and benchmarked on three data sets that differ in terms of the data structure and the dimensionality of the feature vectors. This experimental evaluation indicates that the proposed multinorm algorithm outperforms algorithms employing the Euclidean norm as well as existing clustering algorithms employing weighted norms. 相似文献
5.
提出了一种基于拉子群优化的可能性c均值(Possibilistic Gmeans, PCM)聚类改进方法。该方法首先通过
改进PCM算法的目标函数来计算数据模式的隶属度矩阵和聚类中心完成粒子编码,从而降低算法对初始中心的敏
感,提高聚类的精度;其次,通过粒子群优化(Particle Swarm Optimization, PSO)算法对编码进行优化,以有效地克服
PCM聚类算法容易导致聚类一致性和陷入局部最优解的缺点,减少算法的迭代次数。通过人造数据集和UCI数据
集上的实验,表明该算法在计算复杂度、聚类精度和全局寻优能力方面表现得较为突出。 相似文献
6.
7.
This study is concerned with clustering carried out in presence of labeled patterns. An objective of this optimization is to reconcile between the structure residing in data (and being primarily discovered by the underlying clustering mechanism) and the labels of the patterns forming such structure. In this sense, one can consider the supervised fuzzy clustering to be a framework of preliminary data analysis providing with a thorough insight into the structure of the data and supporting the ensuing design of detailed classifiers. The proposed method augments the standard fuzzy C-means algorithm by extending the original objective function by the supervision component (labeled patterns). Experimental results illustrate the approach and discuss the use of this type of clustering in vector quantization. 相似文献
8.
针对基于粒子群的模糊聚类算法以隶属度编码时对噪音敏感,以及处理样本数小于样本维数的数据集效果较差等问题,通过改进其中的模糊聚类约束方法,提出一种改进的基于粒子群的模糊聚类方法.当样本对各类的隶属度之和不为1时,新方法在粒子群优化得出的隶属度基础上,根据样本与各类之间的距离对隶属度进一步分配,以使隶属度满足模糊聚类约束条件.新方法显著地改善了在隶属度编码下使用粒子群进行模糊聚类的效果,并通过典型的数据集进行了验证. 相似文献
9.
Carl G. LooneyAuthor Vitae 《Pattern recognition》2002,35(11):2413-2423
Major problems exist in both crisp and fuzzy clustering algorithms. The fuzzy c-means type of algorithms use weights determined by a power m of inverse distances that remains fixed over all iterations and over all clusters, even though smaller clusters should have a larger m. Our method uses a different “distance” for each cluster that changes over the early iterations to fit the clusters. Comparisons show improved results. We also address other perplexing problems in clustering: (i) find the optimal number K of clusters; (ii) assess the validity of a given clustering; (iii) prevent the selection of seed vectors as initial prototypes from affecting the clustering; (iv) prevent the order of merging from affecting the clustering; and (v) permit the clusters to form more natural shapes rather than forcing them into normed balls of the distance function. We employ a relatively large number K of uniformly randomly distributed seeds and then thin them to leave fewer uniformly distributed seeds. Next, the main loop iterates by assigning the feature vectors and computing new fuzzy prototypes. Our fuzzy merging then merges any clusters that are too close to each other. We use a modified Xie-Bene validity measure as the goodness of clustering measure for multiple values of K in a user-interaction approach where the user selects two parameters (for eliminating clusters and merging clusters after viewing the results thus far). The algorithm is compared with the fuzzy c-means on the iris data and on the Wisconsin breast cancer data. 相似文献
10.
针对聚类算法的聚类中心选取需要人工参与的问题,提出了一种基于拉普拉斯中心性和密度峰值的无参数聚类算法(ALPC)。首先,使用拉普拉斯中心性度量对象的中心性;然后,使用正态分布概率统计方法确定聚类中心对象;最后,依据对象到各个中心的距离将各个对象分配到相应聚类中心实现聚类。所提算法克服了算法需要凭借经验参数和人工选取聚类中心的缺点。在人工数据集和真实数据集上的实验结果表明,与经典的具有噪声的基于密度的聚类方法(DBSCAN)、密度峰值聚类(DPC)算法以及拉普拉斯中心峰聚类(LPC)算法相比,ALPC具有自动确定聚类中心、无参数的特点,且具有较高的聚类精度。 相似文献
11.
提出了一种新的基于模糊粒子群算法的电力变压器故障自动识别方法。首先对基于油中溶解气体分析得到五种关键气体含量数据进行特殊预处理,得到识别四种故障需要的六个关键特征。然后给出了一个新的模糊聚类目标函数,在此基础上,根据已有的故障样本利用粒子群算法得到各类故障的最优聚类中心;并由此计算出各测试样本到各个聚类中心之间的距离以及相应的隶属度,最后识别出样本的变压器故障类型。测试结果显示,该方法能有效诊断识别出变压器高能放电、过热、低能放电和正常状态,精度可达92%。 相似文献
12.
13.
大规模短文本的不完全聚类 总被引:1,自引:0,他引:1
聚类分析是数据挖掘的一个重要手段,人们可以通过聚类发现信息中潜在的热点或规律。至今,已经有大量聚类算法被研究和提出。随着互联网的日益普及,查询日志、Twitter等短文本信息逐渐在人们生活中起着越来越重要的作用。这类短文本信息数量巨大,通常可达到千万乃至亿级,现有的聚类算法在对这类大规模短文本信息进行聚类分析时往往显得异常无力。该文通过对实际应用中的短文本信息进行实验分析,发现了这类数据类别所具有的“长尾现象”,并由此提出了不完全聚类思想,可以有效地提高这类短文本信息的聚类性能。 相似文献
14.
Attribute weighted mercer kernel based fuzzy clustering algorithm for general non-spherical datasets 总被引:4,自引:0,他引:4
Hongbin Shen Jie Yang Shitong Wang Xiaojun Liu 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2006,10(11):1061-1073
Clustering analysis is an important topic in artificial intelligence, data mining and pattern recognition research. Conventional clustering algorithms, for instance, the famous Fuzzy C-means clustering algorithm (FCM), assume that all the attributes are equally relevant to all the clusters. However in most domains, especially for high-dimensional dataset, some attributes are irrelevant, and some relevant ones are less important than others with respect to a specific class. In this paper, such imbalances between the attributes are considered and a new weighted fuzzy kernel-clustering algorithm (WFKCA) is presented. WFKCA performs clustering in a kernel feature space mapped by mercer kernels. Compared with the conventional hard kernel-clustering algorithm, WFKCA can yield the meaningful prototypes (cluster centers) of the clusters. Numerical convergence properties of WFKCA are also discussed. For in-depth studies, WFKCA is extended to WFKCA2, which has been demonstrated as a useful tool for clustering incomplete data. Numerical examples demonstrate the effectiveness of the new WFKCA algorithm 相似文献
15.
16.
Clustering trajectory data discovers and visualizes available structure in movement patterns of mobile objects and has numerous potential applications in traffic control, urban planning, astronomy, and animal science. In this paper, an automated technique for clustering trajectory data using a Particle Swarm Optimization (PSO) approach has been proposed, and Dynamic Time Warping (DTW) distance as one of the most commonly-used distance measures for trajectory data is considered. The proposed technique is able to find (near) optimal number of clusters as well as (near) optimal cluster centers during the clustering process. To reduce the dimensionality of the search space and improve the performance of the proposed method (in terms of a certain performance index), a Discrete Cosine Transform (DCT) representation of cluster centers is considered. The proposed method is able to admit various cluster validity indexes as objective function for optimization. Experimental results over both synthetic and real-world datasets indicate the superiority of the proposed technique to fuzzy C-means, fuzzy K-medoids, and two evolutionary-based clustering techniques proposed in the literature. 相似文献
17.
Yamina Mohamed Ben Ali 《Neural Processing Letters》2016,44(1):221-244
Clustering analysis is the major application area of data mining where particle swarm optimization (PSO) is being widely implemented due to its simplicity and efficiency. In this paper, we present a new variant of PSO algorithm well tailored to clustering analysis. The proposed algorithm encodes each particle as a bi-dimensional vector, where in the first dimension we look for the optimal number of clusters and in the second dimension, we look for the best centroid of each cluster. In this PSO clustering algorithm a new updating positions rule is proposed to deal with our clustering objective. The performance of the proposed algorithm is tested according to artificial datasets and real datasets. The achieved results present actually good performance and still promising in future perspective. 相似文献
18.
Mohammad Taherdangkoo Mohammad Hadi Bagheri 《Engineering Applications of Artificial Intelligence》2013,26(5-6):1493-1502
One of the simple techniques for Data Clustering is based on Fuzzy C-means (FCM) clustering which describes the belongingness of each data to a cluster by a fuzzy membership function instead of a crisp value. However, the results of fuzzy clustering depend highly on the initial state selection and there is also a high risk for getting the best results when the datasets are large. In this paper, we present a hybrid algorithm based on FCM and modified stem cells algorithms, we called it SC-FCM algorithm, for optimum clustering of a dataset into K clusters. The experimental results obtained by using the new algorithm on different well-known datasets compared with those obtained by K-means algorithm, FCM, Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO), Artificial Bee Colony (ABC) Algorithm demonstrate the better performance of the new algorithm. 相似文献
19.
聚类是将物理或抽象对象的集合分成由类似的对象组成的多个类(簇)的过程.同一个簇中的对象彼此相似,而不同簇中的对象差异较大.以基因表达式编程算法为基础,结合新设计的广义聚类代数算子和目标优化函数,提出一种基于基因表达式编程的多目标自动聚类算法(MAGEP-Cluster).该算法不仅可以自动确定最优聚类的数目,还可以同时... 相似文献
20.
An efficient hybrid algorithm based on modified imperialist competitive algorithm and K-means for data clustering 总被引:2,自引:0,他引:2
Taher Niknam Elahe Taherian FardNarges Pourjafarian Alireza Rousta 《Engineering Applications of Artificial Intelligence》2011,24(2):306-317
Clustering techniques have received attention in many fields of study such as engineering, medicine, biology and data mining. The aim of clustering is to collect data points. The K-means algorithm is one of the most common techniques used for clustering. However, the results of K-means depend on the initial state and converge to local optima. In order to overcome local optima obstacles, a lot of studies have been done in clustering. This paper presents an efficient hybrid evolutionary optimization algorithm based on combining Modify Imperialist Competitive Algorithm (MICA) and K-means (K), which is called K-MICA, for optimum clustering N objects into K clusters. The new Hybrid K-ICA algorithm is tested on several data sets and its performance is compared with those of MICA, ACO, PSO, Simulated Annealing (SA), Genetic Algorithm (GA), Tabu Search (TS), Honey Bee Mating Optimization (HBMO) and K-means. The simulation results show that the proposed evolutionary optimization algorithm is robust and suitable for handling data clustering. 相似文献