首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 688 毫秒
1.
网格密度峰值聚类在兼顾密度峰值聚类算法可识别任意形状类簇的基础上,通过数据集的网格化简化整体计算量,成为当前备受关注的聚类方法.针对大规模数据,如何进一步区分稠密与稀疏网格,减少网格密度峰值聚类中参与计算的非空网格代表点的数量是解决“网格灾难”的关键.结合以网格密度为变量的概率密度分布呈现出类Zipf分布的特点,提出一种基于Zipf分布的网格密度峰值聚类算法.首先计算所有非空网格的密度并映射为Zipf分布,根据对应的Zipf分布筛选出稠密中心网格和稀疏边缘网格;然后仅对稠密中心网格进行密度峰值聚类,在自适应确定潜在聚类中心的同时减少欧氏距离的计算量,降低算法复杂度;最后通过对稀疏边缘网格的处理,进一步优化类簇边界并提高聚类精度.人工数据集和UCI数据集下的实验结果表明,所提出算法对大规模、类簇交叉数据的聚类具有明显优势,能够在保证聚类精度的同时降低时间复杂度.  相似文献   

2.
为了解决DPC(Clustering by fast search and ?nd of Density Peaks)算法中依赖截断距离、计算复杂度大和需要人工选取簇心的问题,提出了基于残差和密度网格的簇心自确认聚类算法。将数据对象映射到网格上,用网格对象作为聚类对象,删除不含任何信息的网格对象;用特定方式计算网格对象的密度值和距离值;接着通过残差分析确定含有簇心的网格对象;用与非边缘点的距离和自变动的阈值来处理网格边缘点和噪声点。仿真实验表明所提出的算法与一些其他聚类算法对比,有着较高的聚类精度和较低的时间复杂度。  相似文献   

3.
在PSO算法的基础上提出的基于量子行为的QPSO算法,并将其应用到基因表达数据集上。QPSO基因聚类算法是将N条基因根据使TWCV(Total Within-Cluster Variation)函数值达到最小分到由用户指定的K个聚类中。根据K-means算法的优点,利用K-means聚类的结果重新初始化粒子群,结合QPSO和PSO的聚类算法提出了KQPSO和KPSO算法。通过在4个实验数据集上利用K-means、PSO、QPSO、KPSO、KQPSO 5个聚类算法得出的结果比较显示QPSO算法在基因表达数据分析上具有良好的性能。  相似文献   

4.
基于网格密度方向的聚类簇边缘精度加强算法   总被引:1,自引:0,他引:1  
现有的基于网格聚类算法在获得较高效率的同时,却是以牺牲聚类的质量为代价的,特别是在簇与簇相互邻近的情况下,因为簇边缘聚类的不准确这种现象尤为突出.为解决此类问题,提出了一种基于网格密度方向的聚类预处理方法,该方法的思想来源于牛顿的万有引力普遍规律,即物体之间的距离越小质量越大,则吸引力越大,簇内的密度比簇边缘的密度大,即吸引力大,故如果一个网格单元密度同时出现反方向递增时,即挤压的情况,则需要对该单元进行进一步的细分处理,判断该单元是不是簇的边缘单元,并准确地判断边缘单元中对象的挤压方向.实验显示该算法可以有效地加强聚类簇边缘的精度,具有较高的簇识别率,因此,作为聚类的预处理算法是理想的.  相似文献   

5.
为改进EMicro算法存在的不足提出了GDF-CUStreams算法。该算法采用网格特征向量存储数据的分布特征,通过更新网格特征向量合并成簇对不确定数据流聚类,对新数据点的到来采用增量聚类。通过网格密度和网格质心之间的距离判定网格是否是零星网格,利用网格引力对簇边界进行优化,检测和删除零星网格,使簇边缘更加平滑,提高聚类精度。其中网格密度和网格质心都采用增量更新。实验结果表明,与EMicro算法相比,GDF-CUStreams效率更高且效果良好。  相似文献   

6.
针对现有聚类算法在计算网格密度时未考虑周围空间的影响因素而导致聚类边界不平滑的现象,提出一种基于扩展网格和密度的数据流聚类算法。通过动态确定网格扩展区域,将网格密度计算范围从本网格合理地扩展到相邻网格空间,进而根据算法中引入的凝聚度衡量周围空间数据点对网格密度的影响。为进一步精确聚类边缘的轮廓分布情况,使用边界点距离阈值函数从噪声中分离出类的边界点,并给出一种改进的网格合并方法,根据簇间连通性简化网格簇合并的判断条件,有效减少算法执行时间。实验结果表明,该算法具有较高的聚类质量和聚类效率。  相似文献   

7.
提出了网格密度影响因子的概念,通过加权处理考虑了相邻网格的综合影响,能较好地代表当前网格相对密度,然后利用它来识别具有不同密度聚簇的高密度网格单元,并从高密度单元网格进行扩展,直至生成一个聚簇骨架,对边缘网格边界点进行识别和提取,提高网格聚类精度.通过实验验证,新算法能对不同大小与形状的聚簇进行聚类,可以识别具有多个密度的不同类组成的数据集,能捕获聚簇边界点,聚类效果较好.  相似文献   

8.
米源  杨燕  李天瑞 《计算机科学》2011,38(12):178-181
针对基于密度网格的数据流聚类算法中存在的缺陷进行改进,提出一种基于D-Strcam算法的改进算法NDD-Stream。算法通过统计网格单元的密度与簇的数目,动态确定网格单元的密度阂值;对位于簇边界的网格单元采用不均匀划分,以提高簇边界的聚类精度。合成与真实数据集上的实验结果表明,算法能够在数据流对象上取得良好的聚类质量。  相似文献   

9.
网格聚类算法   总被引:3,自引:0,他引:3  
聚类分析有广泛的应用,是数据挖掘中非常重要的方法。聚类分析算法有多种分类,每种方法在不同领域发挥了不同的作用。以研究网格聚类算法为目的,介绍了聚类分析算法的要求以及常见的聚类算法;针对基于网格方法的聚类算法进行专门研究,比较分析了传统的和改进的基于网格方法的聚类算法。介绍的各种网格聚类算法都有自身的优点和不足。通过对这些网格聚类算法的学习便于深入研究网格聚类算法,以便将其与实际问题相结合,设计更好的算法。  相似文献   

10.
密度峰值快速搜索聚类CFSFDP算法选择聚类中心时需要通过人工在决策图中选择,且最后进行簇核心与簇光晕划分时会将簇的一些边缘部分划入簇光晕中,导致划分结果不够合理。针对以上问题,提出一种聚类中心自动选择及簇核心与簇光晕分割优化的聚类算法。利用异常检测的思想,寻找簇中心权值的异常点,将异常点作为各簇的聚类中心;引入簇内局部密度,实现对簇核心与簇光晕更合理的分割。通过实验对比,本文提出的算法自动化效果优于CFSFDP算法且得到的聚类结果更为精确。  相似文献   

11.
The self-organizing map (SOM) is a powerful method for visualization, cluster extraction, and data mining. It has been used successfully for data of high dimensionality and complexity where traditional methods may often be insufficient. In order to analyze data structure and capture cluster boundaries from the SOM, one common approach is to represent the SOM's knowledge by visualization methods. Different aspects of the information learned by the SOM are presented by existing methods, but data topology, which is present in the SOM's knowledge, is greatly underutilized. We show in this paper that data topology can be integrated into the visualization of the SOM and thereby provide a more elaborate view of the cluster structure than existing schemes. We achieve this by introducing a weighted Delaunay triangulation (a connectivity matrix) and draping it over the SOM. This new visualization, CONNvis, also shows both forward and backward topology violations along with the severity of forward ones, which indicate the quality of the SOM learning and the data complexity. CONNvis greatly assists in detailed identification of cluster boundaries. We demonstrate the capabilities on synthetic data sets and on a real 8D remote sensing spectral image.  相似文献   

12.
Multi-cluster environments are composed of multiple clusters of computers that act collaboratively, and thus allowing computational problems to be treated that require more resources than those available in a single cluster. However, the degree of complexity of the scheduling process is greatly increased by the heterogeneity of resources and co-allocation process, which distributes the tasks of parallel jobs across cluster boundaries.  相似文献   

13.
分类数据的聚类边界检测技术   总被引:1,自引:0,他引:1  
邱保志  王波 《计算机应用》2012,32(6):1654-1656
随着分类属性数据集的应用越来越广泛,获取含有分类属性数据集的聚类边界的需求也越来越迫切。为了获取聚类的边界,在定义分类数据的边界度和聚类边界的基础上,提出了一种带分类属性数据的聚类边界检测算法——CBORDER。该算法首先利用随机分配初始聚类中心和边界度对类进行划分并获取记录边界点的证据,然后运用证据积累的思想多次执行该过程来获取聚类的边界。实验结果表明,CBORDER算法能有效地检测出高维分类属性数据集中聚类的边界。  相似文献   

14.
In the above paper by Zuberek, conflict-free Petri nets with deterministic timing are used for modeling cluster tools. Performance analysis of the models is based on P-invariants. This comment tries to clarify some statements in the paper and points out a more efficient performance-analysis approach based on linear programming, which is of polynomial time complexity.  相似文献   

15.
目前常见的轨迹聚类大多基于OPTICS、DBSCAN和K-means等算法,但这些聚类方法的时间复杂度随着轨迹数量的增加会大幅上升。针对该问题,提出一种基于密度核心的轨迹聚类算法。通过引入密度核心的概念,设计轨迹密度计算函数以获取聚类簇的致密核心轨迹,同时利用出租车载客轨迹自身的方向和速度等属性提取轨迹特征点,减少轨迹数据量。在此基础上,根据聚类簇中致密核心轨迹与参与聚类轨迹的相似度距离判断轨迹的匹配程度,进而聚合相似轨迹,并将聚类结果储存在聚类节点中。实验结果表明,与TRACLUS和OPTICS聚类算法相比,该算法能够得到更准确的聚类效果,并且时间效率更高。  相似文献   

16.
赵小强  崔砚鹏  郭铮  刘敏  李雄  文秦 《软件学报》2022,33(2):622-640
作为无线传感器网络(wireless sensor networks,WSNs)的关键技术之一,分簇路由协议因其可扩展性较强及能耗较低等优势,逐渐成为WSNs路由协议的研究热点.如何对簇首进行最佳化选取,是提高分簇路由协议性能的关键.通过揭示不同场景中的簇首数量及网络能耗之间的映射关系,以能耗最小化为目标,构建了簇首最...  相似文献   

17.
《Information and Computation》2007,205(8):1274-1293
Though complexity theory already extensively studies path-cardinality-based restrictions on the power of nondeterminism, this paper is motivated by a more recent goal: To gain insight into how much of a restriction, it is of nondeterminism to limit machines to have just one contiguous (with respect to some simple order) interval of accepting paths. In particular, we study the robustness—the invariance under definition changes—of the cluster class CL#P. This class contains each #P function that is computed by a balanced Turing machine whose accepting paths always form a cluster with respect to some length-respecting total order with efficient adjacency checks. The definition of CL#P is heavily influenced by the defining paper’s focus on (global) orders. In contrast, we define a cluster class, CLU#P, to capture what seems to us a more natural model of cluster computing. We prove that the naturalness is costless: CL#P = CLU#P. Then we exploit the more natural, flexible features of CLU#P to prove new robustness results for CL#P and to expand what is known about the closure properties of CL#P.The complexity of recognizing edges—of an ordered collection of computation paths or of a cluster of accepting computation paths—is central to this study. Most particularly, our proofs exploit the power of unique discovery of edges—the ability of nondeterministic functions to, in certain settings, discover on exactly one (in some cases, on at most one) computation path a critical piece of information regarding edges of orderings or clusters.  相似文献   

18.
针对网络中的告警泛洪和故障处理复杂问题, 提出一种结合元胞学习自动机(CLA)和决策树ID3的新告警关联聚类算法。在CLA算法中使用学习自动机对告警信号进行分簇, 但是在一个簇内如果出现任何子群或交错, 则决策树ID3学习算法通过分割数据样本训练在该簇上来优化决策边界, 从而大大减少分簇告警数目以及完成对根源性告警的定位。仿真表明, 该算法能有效地对大量告警信号进行分析, 并且能比较准确地鉴定出根源性告警。  相似文献   

19.
A Novel Density-Based Clustering Framework by Using Level Set Method   总被引:1,自引:0,他引:1  
In this paper, a new density-based clustering framework is proposed by adopting the assumption that the cluster centers in data space can be regarded as target objects in image space. First, the level set evolution is adopted to find an approximation of cluster centers by using a new initial boundary formation scheme. Accordingly, three types of initial boundaries are defined so that each of them can evolve to approach the cluster centers in different ways. To avoid the long iteration time of level set evolution in data space, an efficient termination criterion is presented to stop the evolution process in the circumstance that no more cluster centers can be found. Then, a new effective density representation called level set density (LSD) is constructed from the evolution results. Finally, the valley seeking clustering is used to group data points into corresponding clusters based on the LSD. The experiments on some synthetic and real data sets have demonstrated the efficiency and effectiveness of the proposed clustering framework. The comparisons with DBSCAN method, OPTICS method, and valley seeking clustering method further show that the proposed framework can successfully avoid the overfitting phenomenon and solve the confusion problem of cluster boundary points and outliers.  相似文献   

20.
Support vector clustering (SVC) is an important boundary-based clustering algorithm in multiple applications for its capability of handling arbitrary cluster shapes.However,SVC’s popularity is degraded by its highly intensive time complexity and poor label performance.To overcome such problems,we present a novel efficient and robust convex decomposition based cluster labeling (CDCL) method based on the topological property of dataset.The CDCL decomposes the implicit cluster into convex hulls and each one is comprised by a subset of support vectors (SVs).According to a robust algorithm applied in the nearest neighboring convex hulls,the adjacency matrix of convex hulls is built up for finding the connected components;and the remaining data points would be assigned the label of the nearest convex hull appropriately.The approach’s validation is guaranteed by geometric proofs.Time complexity analysis and comparative experiments suggest that CDCL improves both the efficiency and clustering quality significantly.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号