首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
许多实际问题的解决不仅需要聚类算法给出类标,更依赖于类间远近关系的辨别.对于类数较多且高维数据的困难情况,基于降维的聚类结果可视化方法通常会出现聚类的重叠、交织或强行拉远现象,使得一些类间的远近关系无法分辨或被错误显示;而现有的类间距离方法则不能揭示两个聚类是远离还是靠近.本文提出了双几何体模型方法来描述两个聚类的类间关系,并设计了相对边界距离、绝对边界距离和区域疏密程度等测量类间远近程度的方法.本文方法既考虑了两个聚类的最近样本集之间的绝对距离,也考虑了聚类边界区域的疏密程度,其优点是在上述困难情况下也能准确揭示高维空间中的类间关系.对真实数据集的实验结果表明,双几何体模型方法能有效地识别现有聚类可视化方法无法辨别的类间远近关系.  相似文献   

2.
Assessing the stability of a clustering method involves the measurement of the extent to which the generated clusters are affected by perturbations in the input data. A measure which specifies the disturbance in a set of clusters as the minimum number of operations required to restore the set of modified clusters to the original ones is adopted. A number of well-known graph theoretic clustering methods are compared in terms of their stability as determined by this measure. Specifically, it is shown that among the clustering methods in any of several families of graph theoretic methods, clusters defined as the connected components are the most stable and the clusters specified as the maximal complete subgraphs are the least stable. Furthermore, as one proceeds from the method producing the most narrow clusters (maximal complete subgraphs) to those producing relatively broader clusters, the clustering process is shown to remain at least as stable as any method in the previous stages. Finally, the lower and the upper bounds for the measure of stability, when clusters are defined as the connected components, are derived.  相似文献   

3.
Stability in cluster analysis is strongly dependent on the data set, especially on how well separated and how homogeneous the clusters are. In the same clustering, some clusters may be very stable and others may be extremely unstable. The Jaccard coefficient, a similarity measure between sets, is used as a cluster-wise measure of cluster stability, which is assessed by the bootstrap distribution of the Jaccard coefficient for every single cluster of a clustering compared to the most similar cluster in the bootstrapped data sets. This can be applied to very general cluster analysis methods. Some alternative resampling methods are investigated as well, namely subsetting, jittering the data points and replacing some data points by artificial noise points. The different methods are compared by means of a simulation study. A data example illustrates the use of the cluster-wise stability assessment to distinguish between meaningful stable and spurious clusters, but it is also shown that clusters are sometimes only stable because of the inflexibility of certain clustering methods.  相似文献   

4.
A major challenge in subspace clustering is that subspace clustering may generate an explosive number of clusters with high computational complexity, which severely restricts the usage of subspace clustering. The problem gets even worse with the increase of the data’s dimensionality. In this paper, we propose to summarize the set of subspace clusters into k representative clusters to alleviate the problem. Typically, subspace clusters can be clustered further into k groups, and the set of representative clusters can be selected from each group. In such a way, only the most representative subspace clusters will be returned to user. Unfortunately, when the size of the set of representative clusters is specified, the problem of finding the optimal set is NP-hard. To solve this problem efficiently, we present two approximate methods: PCoC and HCoC. The greatest advantage of our methods is that we only need a subset of subspace clusters as the input instead of the complete set of subspace clusters. Precisely, only the clusters in low-dimensional subspaces are computed and assembled into representative clusters in high-dimensional subspaces. The approximate results can be found in polynomial time. Our performance study shows both the effectiveness and efficiency of these methods.  相似文献   

5.
《Intelligent Data Analysis》1998,2(1-4):229-244
Automatic clustering methods are part of data mining methods. They aim at building clusters of items so that similar items fall into the same cluster while unsimilar items fall into separate clusters. A particular class of clustering methods are hierarchical ones where recursive clusters are formed to grow a binary tree representing an approximation of similarities between items. We propose a new interactive interface to help the user to interpret the result of such a clustering process, according to the item characteristics. The prototype has been applied successfully to a special case of items providing nice graphical representations (electric load curves) but can also be used with other types of curves or with more standard items.  相似文献   

6.
全网异常流量簇的检测与确定机制   总被引:3,自引:0,他引:3  
在网络安全管理领域,自动确定异常流量簇可为ISP分析和定位全网流量异常提供有效手段.提出了一种基于过滤的网络流数据的全网异常流量簇检测及确定机制.给出了问题的形式化描述和定义;扩展和改进了基于多维树的大流量簇检测方法,提出了灵活的"检测阈值"及"分裂值"的计算方法以改善大流量簇的检测精度;通过剪枝算法缩减了树的规模,提高了查找大流量簇的效率;给出了基于大流量簇确定异常流量簇的方法.实验表明该方法是可行的,可应用于全网异常诊断.  相似文献   

7.
提出一种新的解决几何约束系统的构造求解方法。这是一种基于簇改写的求解器,新的解决方法扩展了可被解决的问题的种类,保持了簇改写方法的优点。相比先前的簇改写求解器只确定了刚性簇以及两种非刚性簇,也就是有着特定的自由度的簇。许多不能被分解为刚性簇的问题得以解决,而不用求助于那些复杂的代数解决方法。  相似文献   

8.
K‐means clustering can be highly accurate when the number of clusters and the initial cluster centre are appropriate. An inappropriate determination of the number of clusters or the initial cluster centre decreases the accuracy of K‐means clustering. However, determining these values is problematic. To solve these problems, we used density‐based spatial clustering of application with noise (DBSCAN) because it does not require a predetermined number of clusters; however, it has some significant drawbacks. Using DBSCAN with high‐dimensional data and data with potentially different densities decreases the accuracy to some degree. Therefore, the objective of this research is to improve the efficiency of DBSCAN through a selection of region clusters based on density DBSCAN to automatically find the appropriate number of clusters and initial cluster centres for K‐means clustering. In the proposed method, DBSCAN is used to perform clustering and to select the appropriate clusters by considering the density of each cluster. Subsequently, the appropriate region data are chosen from the selected clusters. The experimental results yield the appropriate number of clusters and the appropriate initial cluster centres for K‐means clustering. In addition, the results of the selection of region clusters based on density DBSCAN method are more accurate than those obtained by traditional methods, including DBSCAN and K‐means and related methods such as Partitioning‐based DBSCAN (PDBSCAN) and PDBSCAN by applying the Ant Clustering Algorithm DBSCAN (PACA‐DBSCAN).  相似文献   

9.
Dynamic cluster formation using level set methods   总被引:3,自引:0,他引:3  
Density-based clustering has the advantages for: 1) allowing arbitrary shape of cluster and 2) not requiring the number of clusters as input. However, when clusters touch each other, both the cluster centers and cluster boundaries (as the peaks and valleys of the density distribution) become fuzzy and difficult to determine. We introduce the notion of cluster intensity function (CIF) which captures the important characteristics of clusters. When clusters are well-separated, CIFs are similar to density functions. But, when clusters become closed to each other, CIFs still clearly reveal cluster centers, cluster boundaries, and degree of membership of each data point to the cluster that it belongs. Clustering through bump hunting and valley seeking based on these functions are more robust than that based on density functions obtained by kernel density estimation, which are often oscillatory or oversmoothed. These problems of kernel density estimation are resolved using level set methods and related techniques. Comparisons with two existing density-based methods, valley seeking and DBSCAN, are presented which illustrate the advantages of our approach.  相似文献   

10.
The large potential energy barriers separating local minima on the potential energy surface of cluster systems pose serious problems for optimization and simulation methods. This article discusses algorithms for dealing with these problems. Lennard-Jones clusters are used to illustrate the important issues. In addition, the complexities in going from one-component to binary Lennard-Jones clusters are explored.  相似文献   

11.
Based on the molecular kinetic theory, a molecular dynamics-like data clustering approach is proposed in this paper. Clusters are extracted after data points fuse in the iterating space by the dynamical mechanism that is similar to the interacting mechanism between molecules through molecular forces. This approach is to find possible natural clusters without pre-specifying the number of clusters. Compared with 3 other clustering methods (trimmed k-means, JP algorithm and another gravitational model based method), this approach found clusters better than the other 3 methods in the experiments.  相似文献   

12.
Cluster analysis is a useful method which reveals underlying structures and relations of items after grouping them into clusters. In the case of temporal data, clusters are defined over time intervals where they usually exhibit structural changes. Conventional cluster analysis does not provide sufficient methods to analyze these structural changes, which are, however, crucial in the interpretation and evaluation of temporal clusters. In this paper, we present two novel and interactive visualization techniques that enable users to explore and interpret the structural changes of temporal clusters. We introduce the temporal cluster view, which visualizes the structural quality of a number of temporal clusters, and temporal signatures, which represents the structure of clusters over time. We discuss how these views are utilized to understand the temporal evolution of clusters. We evaluate the proposed techniques in the cluster analysis of mixed lipid bilayers.  相似文献   

13.
Finding clusters in large datasets is a difficult task. Almost all computationally feasible methods are related to k-means and need a clear partition structure of the data, while most such datasets contain masking outliers and other deviations from the usual models of partitioning cluster analysis. It is possible to look for clusters informally using graphic tools like the grand tour, but the meaning and the validity of such patterns is unclear. In this paper, a three-step-approach is suggested: In the first step, data visualization methods like the grand tour are used to find cluster candidate subsets of the data. In the second step, reproducible clusters are generated from them by means of fixed point clustering, a method to find a single cluster at a time based on the Mahalanobis distance. In the third step, the validity of the clusters is assessed by the use of classification plots. The approach is applied to an astronomical dataset of spectra from the Hamburg/ESO survey.  相似文献   

14.
《Knowledge》2006,19(1):77-83
Ensemble methods that train multiple learners and then combine their predictions have been shown to be very effective in supervised learning. This paper explores ensemble methods for unsupervised learning. Here, an ensemble comprises multiple clusterers, each of which is trained by k-means algorithm with different initial points. The clusters discovered by different clusterers are aligned, i.e. similar clusters are assigned with the same label, by counting their overlapped data items. Then, four methods are developed to combine the aligned clusterers. Experiments show that clustering performance could be significantly improved by ensemble methods, where utilizing mutual information to select a subset of clusterers for weighted voting is a nice choice. Since the proposed methods work by analyzing the clustering results instead of the internal mechanisms of the component clusterers, they are applicable to diverse kinds of clustering algorithms.  相似文献   

15.
In a bivariate data set it is easy to represent clusters, e.g. by manually circling them or separating them by lines. But many data sets have more than two variables, or they come in the form of inter-object dissimilarities. There exist methods to partition such a data set into clusters, but the resulting partition is not visual by itself. In this paper we construct a new graphical display called CLUSPLOT, in which the objects are represented as points in a bivariate plot and the clusters as ellipses of various sizes and shapes. The algorithm is implemented as an S-PLUS function. Several options are available, e.g. labelling of objects and clusters, drawing lines connecting clusters, and the use of color. We illustrate this new tool with several examples.  相似文献   

16.
Categorical data clustering is a difficult and challenging task due to the special characteristic of categorical attributes: no natural order. Thus, this study aims to propose a two-stage method named partition-and-merge based fuzzy genetic clustering algorithm (PM-FGCA) for categorical data. The proposed PM-FGCA uses a fuzzy genetic clustering algorithm to partition the dataset into a maximum number of clusters in the first stage. Then, the merge stage is designed to select two clusters among the clusters that generated in the first stage based on its inter-cluster distances and merge two selected clusters to one cluster. This procedure is repeated until the number of clusters equals to the predetermined number of clusters. Thereafter, some particular instances in each cluster are considered to be re-assigned to other clusters based on the intra-cluster distances. The proposed PM-FGCA is implemented on ten categorical datasets from UCI machine learning repository. In order to evaluate the clustering performance, the proposed PM-FGCA is compared with some existing methods such as k-modes algorithm, fuzzy k-modes algorithm, genetic fuzzy k-modes algorithm, and non-dominated sorting genetic algorithm using fuzzy membership chromosomes. Adjusted Ranked Index (ARI), Normalized Mutual Information (NMI), and Davies–Bouldin (DB) index are selected as three clustering validation indices which are represented to both external index (i.e., ARI and NMI) and internal index (i.e., DB). Consequently, the experimental result shows that the proposed PM-FGCA outperforms the benchmark methods in terms of the tested indices.  相似文献   

17.
This paper presents the colored farthest-neighbor graph (CFNG), a new method for finding clusters of similar objects. The method is useful because it works for both objects with coordinates and for objects without coordinates. The only requirement is that the distance between any two objects be computable. In other words, the objects must belong to a metric space. The CFNG uses graph coloring to improve on an existing technique by Rovetta and Masulli. Just as with their technique, it uses recursive partitioning to build a hierarchy of clusters. In recursive partitioning, clusters are sometimes split prematurely, and one of the contributions of this paper is a way to reduce the occurrence of such premature splits, which also result when other partition methods are used to find clusters.  相似文献   

18.
This paper introduces dynamic clustering methods for partitioning symbolic interval data. These methods furnish a partition and a prototype for each cluster by optimizing an adequacy criterion that measures the fitting between clusters and their representatives. To compare symbolic interval data, these methods use single adaptive (city-block and Hausdorff) distances that change at each iteration, but are the same for all clusters. Moreover, various tools for the partition and cluster interpretation of symbolic interval data furnished by these algorithms are also presented. Experiments with real and synthetic symbolic interval data sets demonstrate the usefulness of these adaptive clustering methods and the merit of the partition and cluster interpretation tools.  相似文献   

19.
The linkage methods are mostly used in hierarchical clustering. In this paper, we integrate Ordered Weighted Averaging (OWA) operator with hierarchical clustering in order to find distances between clusters. In case of using OWA operator in order to find distance between clusters, OWA acts as a generalized case of single linkage, complete linkage, and average linkage methods. In order to illustrate the proposed method, we handle a phylogenetic tree constructed by hierarchical clustering of protein sequences. To illustrate the efficiency of the method, we use 2D-data set. We obtain graphs demonstrating the relationships of the clusters and we calculate the root-mean-square standard deviation (RMSSDT) and R-squared (RS) validity indices, respectively, which are frequently used to evaluate results of the hierarchical clustering algorithms.  相似文献   

20.
We present a new constructive solving approach for systems of 3D geometric constraints. The solver is based on the cluster rewriting approach, which can efficiently solve large systems of constraints on points, and incrementally handle changes to a system, but can so far solve only a limited class of problems. The new solving approach extends the class of problems that can be solved, while retaining the advantages of the cluster rewriting approach. Whereas previous cluster rewriting solvers only determined rigid clusters, we also determine two types of non-rigid clusters, i.e. clusters with particular degrees of freedom. This allows us to solve many additional problems that cannot be decomposed into rigid clusters, without resorting to expensive algebraic solving methods. In addition to the basic ideas of the approach, an incremental solving algorithm, two methods for solution selection, and a method for mapping constraints on 3D primitives to constraints on points are presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号