首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Clustering Incomplete Data Using Kernel-Based Fuzzy C-means Algorithm   总被引:3,自引:0,他引:3  
  相似文献   

2.
In the fuzzy c-means (FCM) clustering algorithm, almost none of the data points have a membership value of 1. Moreover, noise and outliers may cause difficulties in obtaining appropriate clustering results from the FCM algorithm. The embedding of FCM into switching regressions, called the fuzzy c-regressions (FCRs), still has the same drawbacks as FCM. In this paper, we propose the alpha-cut implemented fuzzy clustering algorithms, referred to as FCMalpha, which allow the data points being able to completely belong to one cluster. The proposed FCMalpha algorithms can form a cluster core for each cluster, where data points inside a cluster core will have a membership value of 1 so that it can resolve the drawbacks of FCM. On the other hand, the fuzziness index m plays different roles for FCM and FCMalpha. We find that the clustering results obtained by FCMalpha are more robust to noise and outliers than FCM when a larger m is used. Moreover, the cluster cores generated by FCMalpha are workable for various data shape clusters, so that FCMalpha is very suitable for embedding into switching regressions. The embedding of FCMalpha into switching regressions is called FCRalpha. The proposed FCRalpha provides better results than FCR for environments with noise or outliers. Numerical examples show the robustness and the superiority of our proposed methods.  相似文献   

3.
模糊C均值(FCM)算法是数据聚类分析的主要算法。但在嘈杂环境下,对于抽样大小不一的聚类,数目越多准确性越低,上述弊端可通过替代性FCM(AFCM)的高斯内核映射来解决。鉴于AFCM的不足,提出了针对模糊C均值聚类的广义洛伦兹内核函数。利用该算法对鸢尾数据库进行聚类,将其划分成山鸢尾、变色鸢尾和维吉尼亚鸢尾3类。实验结果表明,广义洛伦兹模糊C均值(GLFCM)可实现对离群聚类和大小不等的聚类数据的分类,其结果优于K均值、FCM、替代性C均值(AFCM)、Gustafson-Kessel(GK)和 Gath-Geva(GG)方法,收敛迭代次数比AFCM的更少,其分区索引(SC)效果也好于其他方法。  相似文献   

4.
Kernel approaches can improve the performance of conventional clustering or classification algorithms for complex distributed data. This is achieved by using a kernel function, which is defined as the inner product of two values obtained by a transformation function. In doing so, this allows algorithms to operate in a higher dimensional space (i.e., more degrees of freedom for data to be meaningfully partitioned) without having to compute the transformation. As a result, the fuzzy kernel C‐means (FKCM) algorithm, which uses a distance measure between patterns and cluster prototypes based on a kernel function, can obtain more desirable clustering results than fuzzy C‐means (FCM) for not only spherical data but also nonspherical data. However, it can still be sensitive to noise as in the FCM algorithm. In this paper, to improve the drawback of FKCM, we propose a kernel possibilistic C‐means (KPCM) algorithm that applies the kernel approach to the possibilistic C‐means (PCM) algorithm. The method includes a variance updating method for Gaussian kernels for each clustering iteration. Several experimental results show that the proposed algorithm can outperform other algorithms for general data with additive noise. © 2009 Wiley Periodicals, Inc.  相似文献   

5.
Fuzzy c-means clustering with spatial constraints is considered as suitable algorithm for data clustering or data analyzing. But FCM has still lacks enough robustness to employ with noise data, because of its Euclidean distance measure objective function for finding the relationship between the objects. It can only be effective in clustering ‘spherical’ clusters, and it may not give reasonable clustering results for “non-compactly filled” spherical data such as “annular-shaped” data. This paper realized the drawbacks of the general fuzzy c-mean algorithm and it tries to introduce an extended Gaussian version of fuzzy C-means by replacing the Euclidean distance in the original object function of FCM. Firstly, this paper proposes initial kernel version of fuzzy c-means to aim at simplifying its computation and then extended it to extended Gaussian kernel version of fuzzy c-means. It derives an effective method to construct the membership matrix for objects, and it derives a robust method for updating centers from extended Gaussian version of fuzzy C-means. Furthermore, this paper proposes a new prototypes learning method and it obtains initial cluster centers using new mathematical initialization centers for the new effective objective function of fuzzy c-means, so that this paper tries to minimize the iteration of algorithms to obtain more accurate result. Initial experiment will be done with an artificially generated data to show how effectively the new proposed Gaussian version of fuzzy C-means works in obtaining clusters, and then the proposed methods can be implemented to cluster the Wisconsin breast cancer database into two clusters for the classes benign and malignant. To show the effective performance of proposed fuzzy c-means with new initialization of centers of clusters, this work compares the results with results of recent fuzzy c-means algorithm; in addition, it uses Silhouette method to validate the obtained clusters from breast cancer datasets.  相似文献   

6.
The weighting exponent m is called the fuzzifier that can influence the performance of fuzzy c-means (FCM). It is generally suggested that m∈[1.5,2.5]. On the basis of a robust analysis of FCM, a new guideline for selecting the parameter m is proposed. We will show that a large m value will make FCM more robust to noise and outliers. However, considerably large m values that are greater than the theoretical upper bound will make the sample mean a unique optimizer. A simple and efficient method to avoid this unexpected case in fuzzy clustering is to assign a cluster core to each cluster. We will also discuss some clustering algorithms that extend FCM to contain the cluster cores in fuzzy clusters. For a large theoretical upper bound case, we suggest the implementation of the FCM with a suitable large m value. Otherwise, we suggest implementing the clustering methods with cluster cores. When the data set contains noise and outliers, the fuzzifier m=4 is recommended for both FCM and cluster-core-based methods in a large theoretical upper bound case.  相似文献   

7.
Effective fuzzy c-means clustering algorithms for data clustering problems   总被引:3,自引:0,他引:3  
Clustering is a well known technique in identifying intrinsic structures and find out useful information from large amount of data. One of the most extensively used clustering techniques is the fuzzy c-means algorithm. However, computational task becomes a problem in standard objective function of fuzzy c-means due to large amount of data, measurement uncertainty in data objects. Further, the fuzzy c-means suffer to set the optimal parameters for the clustering method. Hence the goal of this paper is to produce an alternative generalization of FCM clustering techniques in order to deal with the more complicated data; called quadratic entropy based fuzzy c-means. This paper is dealing with the effective quadratic entropy fuzzy c-means using the combination of regularization function, quadratic terms, mean distance functions, and kernel distance functions. It gives a complete framework of quadratic entropy approaching for constructing effective quadratic entropy based fuzzy clustering algorithms. This paper establishes an effective way of estimating memberships and updating centers by minimizing the proposed objective functions. In order to reduce the number iterations of proposed techniques this article proposes a new algorithm to initialize the cluster centers.In order to obtain the cluster validity and choosing the number of clusters in using proposed techniques, we use silhouette method. First time, this paper segments the synthetic control chart time series directly using our proposed methods for examining the performance of methods and it shows that the proposed clustering techniques have advantages over the existing standard FCM and very recent ClusterM-k-NN in segmenting synthetic control chart time series.  相似文献   

8.
石文峰  商琳 《计算机科学》2017,44(9):45-48, 66
Fuzzy C-Means(FCM)是模糊聚类中聚类效果较好且应用较为广泛的聚类算法,但是其对初始聚类数的敏感性导致如何选择一个较好的C值 变得十分重要。因此,确定FCM的聚类数是使用FCM进行聚类分析时的一个至关重要的步骤。通过扩展决策粗糙集模型进行聚类的有效性分析,并进一步确定FCM的聚类数,从而避免了使用FCM时不好的初始化所带来的影响。文中提出了一种基于扩展粗糙集模型的模糊C均值聚类数的确定方法,并通过图像分割实验来验证聚类的效果。实验通过比对不同聚类数下分类结果的代价获得了一个较好的分割结果,并将结果与Z.Yu等人于2015年提出的蚁群模糊C均值混合算法(AFHA)以及提高的AFHA算法(IAFHA)进行对比,结果表明所提方法的聚类结果较好,图像分割效果较明显,Bezdek分割系数比AFHA和IAFHA算法的更高,且在Xie-Beni系数上也有较大优势。  相似文献   

9.
相比于k-means算法,模糊C均值(FCM)通过引入模糊隶属度,考虑不同数据簇之间的相互作用,进而避免了聚类中心趋同性问题.然而模糊隶属度具有拖尾和翘尾的结构特征,因此使得FCM算法对噪声点和孤立点很敏感;此外,由于FCM算法倾向于将各数据簇均等分,因此算法对数据簇大小也很敏感,对非平衡数据簇聚类效果不佳.针对这些问题,本文提出了基于可靠性的鲁棒模糊聚类算法(RRFCM).该算法基于当前的聚类结果,对样本点进行可靠性分析,利用样本点的可靠性和局部近邻信息,突出不同数据簇之间的可分性,从而提高了算法对噪声的鲁棒性,并且降低了对非平衡数据簇大小的敏感性,得到了泛化性能更好的聚类结果.与相关算法进行对比,RRFCM算法在人造数据集,UCI真实数据集以及图像分割实验中均取得最优的结果.  相似文献   

10.
An axiomatic approach to soft learning vector quantization andclustering   总被引:11,自引:0,他引:11  
This paper presents an axiomatic approach to soft learning vector quantization (LVQ) and clustering based on reformulation. The reformulation of the fuzzy c-means (FCM) algorithm provides the basis for reformulating entropy-constrained fuzzy clustering (ECFC) algorithms. According to the proposed approach, the development of specific algorithms reduces to the selection of a generator function. Linear generator functions lead to the FCM and fuzzy learning vector quantization algorithms while exponential generator functions lead to ECFC and entropy-constrained learning vector quantization algorithms. The reformulation of LVQ and clustering algorithms also provides the basis for developing uncertainty measures that can identify feature vectors equidistant from all prototypes. These measures are employed by a procedure developed to make soft LVQ and clustering algorithms capable of identifying outliers in the data set. This procedure is evaluated by testing the algorithms generated by linear and exponential generator functions on speech data.  相似文献   

11.
This paper focuses on the development of an effective cluster validity measure with outlier detection and cluster merging algorithms for support vector clustering (SVC). Since SVC is a kernel-based clustering approach, the parameter of kernel functions and the soft-margin constants in Lagrangian functions play a crucial role in the clustering results. The major contribution of this paper is that our proposed validity measure and algorithms are capable of identifying ideal parameters for SVC to reveal a suitable cluster configuration for a given data set. A validity measure, which is based on a ratio of cluster compactness to separation with outlier detection and a cluster-merging mechanism, has been developed to automatically determine ideal parameters for the kernel functions and soft-margin constants as well. With these parameters, the SVC algorithm is capable of identifying the optimal number of clusters with compact and smooth arbitrary-shaped cluster contours for the given data set and increasing robustness to outliers and noise. Several simulations, including artificial and benchmark data sets, have been conducted to demonstrate the effectiveness of the proposed cluster validity measure for the SVC algorithm.  相似文献   

12.
In this paper, we propose a context-sensitive technique for unsupervised change detection in multitemporal remote sensing images. The technique is based on fuzzy clustering approach and takes care of spatial correlation between neighboring pixels of the difference image produced by comparing two images acquired on the same geographical area at different times. Since the ranges of pixel values of the difference image belonging to the two clusters (changed and unchanged) generally have overlap, fuzzy clustering techniques seem to be an appropriate and realistic choice to identify them (as we already know from pattern recognition literatures that fuzzy set can handle this type of situation very well). Two fuzzy clustering algorithms, namely fuzzy c-means (FCM) and Gustafson-Kessel clustering (GKC) algorithms have been used for this task in the proposed work. For clustering purpose various image features are extracted using the neighborhood information of pixels. Hybridization of FCM and GKC with two other optimization techniques, genetic algorithm (GA) and simulated annealing (SA), is made to further enhance the performance. To show the effectiveness of the proposed technique, experiments are conducted on two multispectral and multitemporal remote sensing images. A fuzzy cluster validity index (Xie-Beni) is used to quantitatively evaluate the performance. Results are compared with those of existing Markov random field (MRF) and neural network based algorithms and found to be superior. The proposed technique is less time consuming and unlike MRF does not require any a priori knowledge of distributions of changed and unchanged pixels.  相似文献   

13.
快速模糊C均值聚类彩色图像分割方法   总被引:33,自引:3,他引:33       下载免费PDF全文
模糊C均值(FCM)聚类用于彩色图像分割具有简单直观、易于实现的特点,但存在聚类性能受中心点初始化影响且计算量大等问题,为此,提出了一种快速模糊聚类方法(FFCM)。这种方法利用分层减法聚类把图像数据分成一定数量的色彩相近的子集,一方面,子集中心用于初始化聚类中心点;另一方面,利用子集中心点和分布密度进行模糊聚类,由于聚类样本数量显著减少以及分层减法聚类计算量小,故可以大幅提高模糊C均值算法的计算速度,进而可以利用聚类有效性分析指标快速确定聚类数目。实验表明,这种方法不需事先确定聚类数目并且在优化聚类性能不变的前提下,可以使模糊聚类的速度得到明显提高,实现彩色图像的快速分割。  相似文献   

14.
网络入侵检测中的自动决定聚类数算法   总被引:13,自引:0,他引:13  
针对模糊C均值算法(fuzzy C-means algorithm,简称FCM)在入侵检测中需要预先指定聚类数的问题,提出了一种自动决定聚类数算法(fuzzy C-means and support vector machine algorithm,简称F-CMSVM).它首先用模糊C均值算法把目标数据集分为两类,然后使用带有模糊成员函数的支持向量机(support vector machihe,简称SVM)算法对结果进行评估以确定目标数据集是否可分,再迭代计算,最终得到聚类结果.支持向量机算法引入模糊C均值算法得出的隶属矩阵作为模糊成员函数,使得不同的输入样本可以得到不同的惩罚值,从而得到最优的分类超平面.该算法既不需要对训练数据集进行标记,也不需要指定聚类数,因此是一种真正的无监督算法.在对KDD CUP 1999数据集的仿真实验结果表明,该算法不仅能够得到最佳聚类数,而且对入侵有较好的检测效果.  相似文献   

15.
提出了建立在概率典型性和聚类排斥基础上的一个新型无噪声模糊聚类方法RTCM,给出了它的迭代算法过程,并验证了它的收敛性.首先引述了一般的聚类方法,它们主要分为两种:噪声聚类,如模糊c均值(FCM)、可能模糊c均值(FPCM);无噪声聚类,如NC、PCM等,然后给出了RTCM算法模型和过程,并验证了它的局部收敛性.该算法解决噪声环境下的数据聚类问题,避免了重叠聚类.对比试验表明,该算法改善了噪声环境下FCM,NC、PCM、FPCM的聚类中心质量,有效地解决了PCM在近邻聚类数据中的聚类重叠问题.  相似文献   

16.
In this paper, an approach for automatically clustering a data set into a number of fuzzy partitions with a simulated annealing using a reversible jump Markov chain Monte Carlo algorithm is proposed. This is in contrast to the widely used fuzzy clustering scheme, the fuzzy c-means (FCM) algorithm, which requires the a priori knowledge of the number of clusters. The said approach performs the clustering by optimizing a cluster validity index, the Xie-Beni index. It makes use of the homogeneous reversible jump Markov chain Monte Carlo (RJMCMC) kernel as the proposal so that the algorithm is able to jump between different dimensions, i.e., number of clusters, until the correct value is obtained. Different moves, like birth, death, split, merge, and update, are used for sampling a candidate state given the current state. The effectiveness of the proposed technique in optimizing the Xie-Beni index and thereby determining the appropriate clustering is demonstrated for both artificial and real-life data sets. In a part of the investigation, the utility of the fuzzy clustering scheme for classifying pixels in an IRS satellite image of Kolkata is studied. A technique for reducing the computation efforts in the case of satellite image data is incorporated.  相似文献   

17.
Generally, abnormal points (noise and outliers) cause cluster analysis to produce low accuracy especially in fuzzy clustering. These data not only stay in clusters but also deviate the centroids from their true positions. Traditional fuzzy clustering like Fuzzy C-Means (FCM) always assigns data to all clusters which is not reasonable in some circumstances. By reformulating objective function in exponential equation, the algorithm aggressively selects data into the clusters. However noisy data and outliers cannot be properly handled by clustering process therefore they are forced to be included in a cluster because of a general probabilistic constraint that the sum of the membership degrees across all clusters is one. In order to improve this weakness, possibilistic approach relaxes this condition to improve membership assignment. Nevertheless, possibilistic clustering algorithms generally suffer from coincident clusters because their membership equations ignore the distance to other clusters. Although there are some possibilistic clustering approaches that do not generate coincident clusters, most of them require the right combination of multiple parameters for the algorithms to work. In this paper, we theoretically study Possibilistic Exponential Fuzzy Clustering (PXFCM) that integrates possibilistic approach with exponential fuzzy clustering. PXFCM has only one parameter and not only partitions the data but also filters noisy data or detects them as outliers. The comprehensive experiments show that PXFCM produces high accuracy in both clustering results and outlier detection without generating coincident problems.  相似文献   

18.
一种快速的模糊C均值聚类彩色图像分割方法   总被引:4,自引:0,他引:4       下载免费PDF全文
FCM用于彩色图像分割存在聚类数目需要事先确定、计算速度慢的问题,为此,提出一种快速的模糊C均值聚类方法(FFCM)。首先,对原始彩色图像进行基于梯度图的分水岭变换,从而把原始彩色图像数据分成一些具有色彩一致性的子集;然后,利用这些子集的大小和中心点进行模糊聚类。由于FFCM聚类样本数量显著减小,因此可以大幅提高模糊C均值聚类算法的计算速度,进而可以采用聚类有效性指标确定聚类数目。实验表明,这种方法不需要事先确定聚类数目,在聚类有效性能不变的前提下,可以使模糊聚类的速度得到明显提高,实现了彩色图像的快速分割。  相似文献   

19.
一种基于核的快速可能性聚类算法   总被引:1,自引:1,他引:0       下载免费PDF全文
传统的快速聚类算法大多基于模糊C均值算法(Fuzzy C-means,FCM),而FCM对初始聚类中心敏感,对噪音数据敏感并且容易收敛到局部极小值,因而聚类准确率不高。可能性C-均值聚类较好地解决了FCM对噪声敏感的问题,但容易产生一致性聚类。将FCM和可能性C-均值聚类结合的聚类算法较好地解决了一致性聚类问题。为进一步提高算法收敛速度和鲁棒性,提出一种基于核的快速可能性聚类算法。该方法引入核聚类的思想,同时使用样本方差对目标函数中参数η进行优化。标准数据集和人造数据集的实验结果表明这种基于核的快速可能性聚类算法提高了算法的聚类准确率,加快了收敛速度。  相似文献   

20.
一种协同的可能性模糊聚类算法   总被引:1,自引:0,他引:1  
模糊C-均值聚类(FCM)对噪声数据敏感和可能性C-均值聚类(PCM)对初始中心非常敏感易导致一致性聚类。协同聚类算法利用不同特征子集之间的协同关系并与其他算法相结合,可提高原有的聚类性能。对此,在可能性C-均值聚类算法(PCM)基础上将其与协同聚类算法相结合,提出一种协同的可能性C-均值模糊聚类算法(C-FCM)。该算法在改进的PCM的基础上,提高了对数据集的聚类效果。在对数据集Wine和Iris进行测试的结果表明,该方法优于PCM算法,说明该算法的有效性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号