首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
自适应仿射传播聚类   总被引:42,自引:4,他引:42  
王开军  张军英  李丹  张新娜  郭涛 《自动化学报》2007,33(12):1242-1246
适合处理大类数的仿射传播聚类有两个尚未解决的问题: 一是很难确定偏向参数取何值能够使算法产生最优的聚类结果; 另一个是当震荡发生后算法不能自动消除震荡并收敛. 为了解决这两个问题, 提出了自适应仿射传播聚类方法, 具体技术包括: 自适应扫描偏向参数空间来搜索聚类个数空间以寻找最优聚类结果、自适应调整阻尼因子来消除震荡以及当调整阻尼因子方法失效时的自适应逃离震荡技术. 与原算法相比, 自适应仿射传播聚类方法性能更优, 能够自动消除震荡和寻找最优聚类结果. 对模拟和真实数据集的实验结果表明, 自适应仿射传播聚类方法十分有效, 其聚类质量优于或不低于原算法.  相似文献   

2.
指定类数下仿射传播聚类的快速算法①   总被引:1,自引:0,他引:1  
针对Science杂志上提出的仿射传播(Affinity propagation)聚类产生指定类数的聚类结果时效率较低的问题,提出了基于多网格策略的快速算法。该算法采用多网格搜索策略来减少调用仿射传播算法的次数,改进偏向参数的上界以缩小搜索范围。新方法大幅度地提高了仿射传播聚类在指定类数下的速度性能。实验结果表明新方法十分有效,在运行时间上比现有方法减少了22%-90%。  相似文献   

3.
针对Science杂志上提出的仿射传播(Affinity propagation)聚类产生指定类数的聚类结果时效率较低的问题,提出了基于多网格策略的快速算法。该算法采用多网格搜索策略来减少调用仿射传播算法的次数,改进偏向参数的上界以缩小搜索范围。新方法大幅度地提高了仿射传播聚类在指定类数下的速度性能。实验结果表明新方法十分有效,在运行时间上比现有方法减少了22%-90%。  相似文献   

4.
仿射传播算法是一种快速有效的聚类方法,但其聚类结果的不稳定性影响了聚类性能。对此,提出基于近邻的仿射传播算法(AP-NN),通过仿射传播算法产生初始簇,并从中选择代表簇对非代表簇的样本进行近邻聚类。在时间序列数据集上的实验结果表明,AP-NN模型算法能够产生较好的聚类结果,适用于聚类分析。  相似文献   

5.
针对仿射传播(AP)算法存在缺乏判定最优聚类结果的指标以及收敛性能不够好等问题,提出了一种基于方向梯度直方图(HOG)的AP改进算法。首先提取图像的HOG特征向量,然后引入收缩因子加速仿射传播算法的收敛过程,最后将有效性指标嵌入算法的迭代过程,监督并引导算法向着最好聚类质量的方向运行。对人脸图像进行实验,实验结果表明,基于HOG的AP改进算法可以得到更接近正确类数的结果,提高了FM值,降低了错误率。  相似文献   

6.
对于手写字符识别过程中相似字符较多且相同字符存在大量不规则书写变形的问题,提出一种改进的仿射传播聚类算法加入手写字符识别过程中。该算法基于原始仿射传播(AP)聚类算法,将其与聚类评判函数Silhouette结合,通过AP算法迭代过程自适应地改变偏向参数以调整类别数,并且结合每次聚类质量得到最优聚类结果。基于手写汉字识别的实验结果表明,加入了原始AP算法的识别率比传统识别过程得到的识别率总体提高1.52%,而加入改进AP算法的识别率又比加入原始AP算法的识别率总体提高了1.28%。该实验结果验证了加入聚类算法于手写字符识别过程的有效性,而改进AP算法相比原始AP算法在收敛性和聚类质量上都有一定的提高。  相似文献   

7.
赵健  唐洁  谢瑜 《计算机应用研究》2012,29(10):3980-3982
近年来,基于划分的聚类算法被广泛应用于数据和图像聚类中。针对应用最为广泛的k-均值算法在图像聚类中存在的聚类速度慢、效果差等问题,提出一种仿射传播算法应用于图像聚类中。提取图像中颜色、形状和纹理等特征向量,利用仿射传播算法对综合特征向量模型进行聚类,最后将仿射传播算法和k-均值算法对MIT图像的聚类作了对比分析。仿真实验表明,仿射传播算法在速度和聚类效果上均优于已有的k-均值算法,在准确性和实时性方面均能达到较好的效果。  相似文献   

8.
图像的自动准确分割是实现黑素细胞瘤图像自动分析的关键.针对皮肤镜黑素细胞瘤图像,提出一种基于改进遗传算法和自生成神经网络(SGNN)相结合的自适应聚类分割算法.首先采用遗传算法选取一组最优的种子样本作为初始神经树;然后通过SGNN对剩余样本进行训练得到一个自生成神经森林;最后令森林中每棵树代表一个类,完成黑素细胞瘤图像的自适应聚类分割.该算法解决了SGNN对样本训练顺序敏感的问题,并能够自适应地确定类别数,聚类过程无需任何人工干预;同时根据解空间的大小设定遗传算法的初始种群规模,并在进化过程中根据个体的变化对种群规模以及交叉率和变异率等遗传控制参数进行动态调整,有效地提高了算法的运行速度.实验结果表明,文中算法稳定性好,聚类结果符合人眼判别的诊断要求.  相似文献   

9.
传统的划分聚类算法必须指定簇的数量且聚类结果受初始条件的影响较大.针对此缺点,提出了一种基于PSO和K-means的混合动态聚类算法--DKPSO,运行过程中能够自动确定聚类簇的最佳数量.此算法在初始时将聚类数据划分为较多数量的簇以减少初始条件的影响,然后使用离散PSO算法不断优化簇的数量并使用K-means算法进一步优化每个粒子代表的聚类中心.为了提高收敛速度,对算法进行了的改进,使每个粒子的惯性权重随迭代次数非线性自适应地调整.最后通过实验对算法的有效性进行了验证,并给出实验结果.  相似文献   

10.
聚类分组数的自动确定是谱聚类算法中一个亟待解决的问题.针对谱聚类算法聚类分组数的获取问题,提出一种基于人工免疫的自适应谱聚类算法.该算法通过模拟抗体的克隆选择机制和免疫系统的初次免疫应答、二次免疫应答机制,实现了数据样本聚类分组数的自动调整,解决了聚类算法需要人工输入聚类分组数的弊端.并分别在线性模拟数据、非凸模拟数据和UCI数据集上验证了算法的可行性、算法在非凸数据集上的优势以及算法的有效性.实验结果表明该算法可以自动获取正确的聚类分组数,提高聚类效果,减少达到全局最优解时的迭代次数,具有较高的稳定性.  相似文献   

11.
何红洲  周明天 《计算机工程》2013,(12):181-185,190
已有的仿射传播聚类算法不能很好地反映复杂蛋白质序列本身的聚类结构。为此,提出一种基于哈夫曼判定的蛋白质分类方法。在计算广义置换式匹配相似度的基础上,使用已有的自适应仿射传播算法聚类蛋白质序列。采用哈夫曼编码方法,通过限制平均码长使聚类结果能反映蛋白质序列家族的聚类结构。在蛋白质同源聚类数据库和蛋白质结构分类数据库的6个数据集上进行实验,结果表明,该方法与adAP、谱聚类、SMS和TribeMCL方法相比,不仅能获得更接近于数据集家族的聚类数目及更紧凑的聚类结构,而且F—measure指标平均估值分别高出19.67%、8.7%、9.5%和43.51%。  相似文献   

12.
说话人聚类是说话人分离中的一个重要过程,然而传统的以贝叶斯信息准则作为距离测度的层次聚类方式,会出现聚类误差向上传递的情况。本文提出了一种逐级算法增强处理机制。当片段之间的最小贝叶斯信息准则距离超过设定的门限值时,或者类别个数到达一定程度时,将当前聚类结果作为初始类中心,通过变分贝叶斯迭代法重新对每个类别中的片段调优,最后再依据概率线性判别分析得分门限确定说话人个数。实验表明,本文方法在美国国家标准技术署08 summed测试集上,使得“类纯度”和“说话人纯度”比传统算法都有了一定提升,且使得说话人分离整体性能相对提升了27.6%。  相似文献   

13.
Spectral clustering is one of the most popular and important clustering methods in pattern recognition, machine learning, and data mining. However, its high computational complexity limits it in applications involving truly large-scale datasets. For a clustering problem with n samples, it needs to compute the eigenvectors of the graph Laplacian with O(n3) time complexity. To address this problem, we propose a novel method called anchor-based spectral clustering (ASC) by employing anchor points of data. Specifically, m (m ? n) anchor points are selected from the dataset, which can basically maintain the intrinsic (manifold) structure of the original data. Then a mapping matrix between the original data and the anchors is constructed. More importantly, it is proved that this data-anchor mapping matrix essentially preserves the clustering structure of the data. Based on this mapping matrix, it is easy to approximate the spectral embedding of the original data. The proposed method scales linearly relative to the size of the data but with low degradation of the clustering performance. The proposed method, ASC, is compared to the classical spectral clustering and two state-of-the-art accelerating methods, i.e., power iteration clustering and landmark-based spectral clustering, on 10 real-world applications under three evaluation metrics. Experimental results show that ASC is consistently faster than the classical spectral clustering with comparable clustering performance, and at least comparable with or better than the state-of-the-art methods on both effectiveness and efficiency.  相似文献   

14.
In this work a unified treatment of solid and fluid vibration problems is developed by means of the Finite-Difference Time-Domain (FDTD). The scheme here proposed takes advantage from a scaling factor in the velocity fields that improves the performance of the method and the vibration analysis in heterogenous media. Moreover, the scheme has been extended in order to simulate both the propagation in porous media and the lossy solid materials. In order to accurately reproduce the interaction of fluids and solids in FDTD both time and spatial resolutions must be reduced compared with the set up used in acoustic FDTD problems. This aspect implies the use of bigger grids and hence more time and memory resources. For reducing the time simulation costs, FDTD code has been adapted in order to exploit the resources available in modern parallel architectures. For CPUs the implicit usage of the advanced vectorial extensions (AVX) in multi-core CPUs has been considered. In addition, the computation has been distributed along the different cores available by means of OpenMP directives. Graphic Processing Units have been also considered and the degree of improvement achieved by means of this parallel architecture has been compared with the highly-tuned CPU scheme by means of the relative speed up. The speed up obtained by the parallel versions implemented were up to 3 (AVX and OpenMP) and 40 (CUDA) times faster than the best sequential version for CPU that also uses OpenMP with auto-vectorization techniques, but non includes implicitely vectorial instructions. Results obtained with both parallel approaches demonstrate that massive parallel programming techniques are mandatory in solid-vibration problems with FDTD.  相似文献   

15.
为解决数据流聚类中的"链式数据"问题以及文本数据流存在的高维、稀疏、多主题问题,以Squeezer聚类算法为基础,重新定义了聚类过程中类的质心、半径和判别距离.提出了一种改进算法,通过加入数据预处理环节来提高聚类精度,通过投影聚类提高聚类效率并为簇赋予语义.最后通过在互联网新闻语料的聚类实验,表明了所提出的算法能够以较小的速度代价换来聚类效果的大幅提升,性能显著优于Squeezer算法.  相似文献   

16.
目的 随着城市交通拥堵问题的日益严重,建立有效的道路拥堵可视化系统,对智慧城市建设起着重要作用。针对目前基于车辆密度分析法、车速判定法、行驶时间判定法等模式单一,可信度低的问题,提出了一种基于DBSCAN+(density-based spatial clustering of applications with noise plus)的道路拥堵识别可视化方法。方法 引入分块并行计算,相较于传统密度算法,可以适应大规模轨迹数据,并行降维聚类速度快。对结果中缓行区类簇判别路段起始点和终止点,通过曲线拟合和拓扑网络纠偏算法,将类簇中轨迹样本点所表征的路段通过地图匹配算法匹配在电子地图中,并结合各类簇中浮动车平均行驶速度判别道路拥堵程度,以颜色深浅程度进行区分可视化。结果 实验结果表明,DBSCAN+算法相较现有改进的DBSCAN算法时间复杂度具有优势,由指数降为线性,可适应海量轨迹点。相较主流地图产品,利用城市出租车车载OBD(on board diagnostics)数据进行城区道路拥堵识别,提取非畅通路段总检出长度相较最优产品提高28.9%,拥堵识别命中率高达91%,较主流产品城区拥堵识别平均命中率提高15%。结论 在城市路网中,基于DBSCAN+密度聚类和缓行区平均移动速度的多表征道路拥堵识别算法与主流地图产品相比,对拥堵识别率、通勤程度划分更具代表性,可信度更高,可以为道路拥堵识别的实时性提供保障。  相似文献   

17.
In cluster analysis, determining number of clusters is an important issue because information about the most appropriate number of clusters do not exist in the real-world problems. Automatic clustering is a clustering approach which is able to automatically find the most suitable number of clusters as well as divide the instances into the corresponding clusters. This study proposes a novel automatic clustering algorithm using a hybrid of improved artificial bee colony optimization algorithm and K-means algorithm (iABC). The proposed iABC algorithm improves the onlooker bee exploration scheme by directing their movements to a better location. Instead of using a random neighborhood location, the improved onlooker bee considers the data centroid to find a better initial centroid for the K-means algorithm. To increase efficiency of the improvement, the updating process is only applied on the worst cluster centroid. The proposed iABC algorithm is verified using some benchmark datasets. The computational result indicates that the proposed iABC algorithm outperforms the original ABC algorithm for automatic clustering problem. Furthermore, the proposed iABC algorithm is utilized to solve the customer segmentation problem. The result reveals that the iABC algorithm has better and more stable result than original ABC algorithm.  相似文献   

18.
Clustering a large volume of data in a distributed environment is a challenging issue. Data stored across multiple machines are huge in size, and solution space is large. Genetic algorithm deals effectively with larger solution space and provides better solution. In this paper, we proposed a novel clustering algorithm for distributed datasets, using combination of genetic algorithm (GA) with Mahalanobis distance and k-means clustering algorithm. The proposed algorithm is two phased; in phase 1, GA is applied in parallel on data chunks located across different machines. Mahalanobis distance is used as fitness value in GA, which considers covariance between the data points and thus provides a better representation of initial data. K-means with K-means\( ++ \) initialization is applied in phase 2 on intermediate output to get final result. The proposed algorithm is implemented on Hadoop framework, which is inherently designed to deal with distributed datasets in a fault-tolerant manner. Extensive experiments were conducted for multiple real-life and synthetic datasets to measure performance of our proposed algorithm. Results were compared with MapReduce-based algorithms, mrk-means, parallel k-means and scaling GA.  相似文献   

19.
谱聚类算法是基于谱图分割理论的聚类方法,其对高维、非凸数据分布问题有很好的聚类效果。但对大规模数据问题的聚类,该方法存在着计算时间和存储空间等方面的瓶颈。本文给出了一个自适应的谱聚类并行算法,通过局部计算和异步循环通信并行方法,最大限度减少了并行谱聚类中数据通信次数,并通过计算与通信重叠策略,进一步降低了并行算法的通信开销。在并行算法实现中,将自主开发的最优预条件共轭梯度法并行求解器 PLOBPCG 用于谱聚类的特征降维。在中科院的“元”超级计算机上,通过对两类大规模数据聚类的测试表明,在 2048 核上的加速比接近线性加速,并行效率达到96%以上。  相似文献   

20.
An algorithm for optimizing data clustering in feature space is studied in this work. Using graph Laplacian and extreme learning machine (ELM) mapping technique, we develop an optimal weight matrix W for feature mapping. This work explicitly performs a mapping of the original data for clustering into an optimal feature space, which can further increase the separability of original data in the feature space, and the patterns points in same cluster are still closely clustered. Our method, which can be easily implemented, gets better clustering results than some popular clustering algorithms, like k-means on the original data, kernel clustering method, spectral clustering method, and ELM k-means on data include three UCI real data benchmarks (IRIS data, Wisconsin breast cancer database, and Wine database).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号