共查询到20条相似文献,搜索用时 19 毫秒
1.
Machine Learning - Anomaly detection is a hard data analysis process that requires constant creation and improvement of data analysis algorithms. Using traditional clustering algorithms to analyse... 相似文献
2.
We propose a new algorithm to cluster multiple and parallel data streams using spectral component similarity analysis, a new similarity metric. This new algorithm can effectively cluster data streams that show similar behaviour to each other but with unknown time delays. The algorithm performs auto-regressive modelling to measure the lag correlation between the data streams and uses it as the distance metric for clustering. The algorithm uses a sliding window model to continuously report the most recent clustering results and to dynamically adjust the number of clusters. Our experimental results on real and synthetic datasets show that our algorithm has better clustering quality, efficiency, and stability than other existing methods. 相似文献
3.
基于衰减滑动窗口数据流聚类算法研究 总被引:2,自引:0,他引:2
数据流具有数据流量大、流量连续且快速、难以存储和恢复等特性,其挖掘质量和效率是检验挖掘算法的重要标准.传统的数据流聚类挖掘算法是基于界标窗口、滑动窗口和衰减窗口模型,其算法的聚类质量较差,时间复杂度高等不足,就此类问题,研究一种滑动衰减窗口的数据流聚类算法,并对算法进行了设计与实现,有效的改善传统数据流算法聚类质量和时间效率的问题.仿真实验结果表明了该算法的有效性,达到了较满意的效果. 相似文献
4.
基于密度的优化数据流聚类算法 总被引:1,自引:1,他引:1
为了解决数据流聚类算法中有效处理离群点这一关键问题,改进了基于密度的数据流聚类算法,在DenStream算法基础上提出了具有双检测时间策略DDTS(double derection time strategy)的基于密度的数据流聚类算法.该策略在数据流流速波动的情况下,结合时间与流数据数量两方面因素对微簇进行测试.通过在线动态维护和删减微簇,保存可能升级的离群点来改善聚类效果.实验结果表明,改进算法具有良好的适用性和有效性,能够取得较高的聚类质量. 相似文献
5.
6.
为改进EMicro算法存在的不足提出了GDF-CUStreams算法。该算法采用网格特征向量存储数据的分布特征,通过更新网格特征向量合并成簇对不确定数据流聚类,对新数据点的到来采用增量聚类。通过网格密度和网格质心之间的距离判定网格是否是零星网格,利用网格引力对簇边界进行优化,检测和删除零星网格,使簇边缘更加平滑,提高聚类精度。其中网格密度和网格质心都采用增量更新。实验结果表明,与EMicro算法相比,GDF-CUStreams效率更高且效果良好。 相似文献
7.
Semi-supervised learning, which uses a large amount of unlabeled data to improve the performance of a classifier when only a limited amount of labeled data is available, has become a hot topic in machine learning research recently. In this paper, we propose a semi-supervised ensemble of classifiers approach, for learning in time-varying data streams. This algorithm maintains all the desirable properties of the semi-supervised Co-trained random FOREST algorithm (Co-Forest) and extends it into evolving data streams. It assigns a weight to each example according to Poisson(1) to simulate the bootstrap sample method in data streams, which is used to keep the diversity of Random Forest. By utilizing incremental learning technology, it avoids unnecessary repetition training and improves the accuracy of base models. In addition, the ADaptive WINdowing (ADWIN2) is introduced to deal with concept drift, which makes it adapt to the varying environment. Empirical evaluation on both synthetic data and UCI data reveals that our proposed method outperforms state-of-the-art semi-supervised and supervised methods in time-varying data streams, and also achieves relatively high performance in stationary streams. 相似文献
8.
针对K-means对初始聚类中心敏感和易陷入局部最优的缺点,提出了一种改进的基于粒子群的聚类算法.该算法结合基于密度和最大最小距离法来确定初始聚类中心,解决K-means对初始值敏感的问题;利用粒子群算法全局寻优能力强的优点,避免K-means陷入局部最优.通过对样本集各维属性的规范化处理,惯性权值采用凹函数递减,计算相异度矩阵,引入用群体适应度方差,进一步优化混合算法.实验结果表明,该算法具有更高的准确率和更强的收敛能力. 相似文献
9.
在对蚁群优化算法(ACO)和粒子群优化算法(PSO)进行分析的基础上,提出一种解决函数连续优化的群智能混合策略-CA-PSO.在求解过程中,首先对解空间进行区域划分,进而利用ACO在优化初期具备的快速收敛性能,在整个解空间内搜索最优解的敏感区域.然后利用蚁群的搜索结果初始化PSO粒子,利用PSO快速和全局收敛性进行所在小区域内的搜索.种群更新时根据蚁群的拓扑结构和小区域间的阶跃规则,蚁群不断向最优解敏感区域聚集,使得敏感区域内粒子数增加,则局部的PSO搜索策略可以更细密的搜索最优.实例结果表明,CA-PSO既能保证解的分布性与多样性,又避免了在多峰值函数寻优过程中陷入局部最优解而停止运算,最终将收敛到全局最优解. 相似文献
10.
11.
针对基于粒子群的模糊聚类算法以隶属度编码时对噪音敏感,以及处理样本数小于样本维数的数据集效果较差等问题,通过改进其中的模糊聚类约束方法,提出一种改进的基于粒子群的模糊聚类方法.当样本对各类的隶属度之和不为1时,新方法在粒子群优化得出的隶属度基础上,根据样本与各类之间的距离对隶属度进一步分配,以使隶属度满足模糊聚类约束条件.新方法显著地改善了在隶属度编码下使用粒子群进行模糊聚类的效果,并通过典型的数据集进行了验证. 相似文献
12.
针对传统的模糊C-均值聚类算法对初始聚类中心较敏感、易陷入局部最优的缺点,将粒子群优化算法和FCM算法相结合,提出一种改进的模糊聚类算法。该算法利用粒子群算法的全局搜索能力代替FCM算法寻找初始聚类中心,使其跳出局部最优,实现模糊聚类。主要从反映数据集分类的类内紧致性程度和类间分离性程度的角度考虑,重新设计适应度函数。实验结果表明,提出的算法在聚类正确率和有效性指标上有更好的效果。 相似文献
13.
增量式挖掘方法有适应大规模动态数据、降低内存需求和可实现并行处理等诸多好处,但是目前的增量式聚类方法存在参数限制较多和计算结果不够准确等问题.在信息源变化的数据挖掘体系结构下,利用一群特殊的智能代理增量修改知识模型,提出了群体智能聚类模型的构建方法及增量模型维护算法.该方法利用信息熵加快聚类过程,根据信息素和数据库的插入及删除增量操作调整已生成的聚群,设定的参数较少,实验表明聚类结果准确. 相似文献
14.
《Expert systems with applications》2014,41(13):6009-6016
Clustering is an important and popular technique in data mining. It partitions a set of objects in such a manner that objects in the same clusters are more similar to each another than objects in the different cluster according to certain predefined criteria. K-means is simple yet an efficient method used in data clustering. However, K-means has a tendency to converge to local optima and depends on initial value of cluster centers. In the past, many heuristic algorithms have been introduced to overcome this local optima problem. Nevertheless, these algorithms too suffer several short-comings. In this paper, we present an efficient hybrid evolutionary data clustering algorithm referred to as K-MCI, whereby, we combine K-means with modified cohort intelligence. Our proposed algorithm is tested on several standard data sets from UCI Machine Learning Repository and its performance is compared with other well-known algorithms such as K-means, K-means++, cohort intelligence (CI), modified cohort intelligence (MCI), genetic algorithm (GA), simulated annealing (SA), tabu search (TS), ant colony optimization (ACO), honey bee mating optimization (HBMO) and particle swarm optimization (PSO). The simulation results are very promising in the terms of quality of solution and convergence speed of algorithm. 相似文献
15.
Mohamed-Rafik Bouguelia Slawomir Nowaczyk Amir H. Payberah 《Data mining and knowledge discovery》2018,32(6):1597-1633
In the era of big data, considerable research focus is being put on designing efficient algorithms capable of learning and extracting high-level knowledge from ubiquitous data streams in an online fashion. While, most existing algorithms assume that data samples are drawn from a stationary distribution, several complex environments deal with data streams that are subject to change over time. Taking this aspect into consideration is an important step towards building truly aware and intelligent systems. In this paper, we propose GNG-A, an adaptive method for incremental unsupervised learning from evolving data streams experiencing various types of change. The proposed method maintains a continuously updated network (graph) of neurons by extending the Growing Neural Gas algorithm with three complementary mechanisms, allowing it to closely track both gradual and sudden changes in the data distribution. First, an adaptation mechanism handles local changes where the distribution is only non-stationary in some regions of the feature space. Second, an adaptive forgetting mechanism identifies and removes neurons that become irrelevant due to the evolving nature of the stream. Finally, a probabilistic evolution mechanism creates new neurons when there is a need to represent data in new regions of the feature space. The proposed method is demonstrated for anomaly and novelty detection in non-stationary environments. Results show that the method handles different data distributions and efficiently reacts to various types of change. 相似文献
16.
Aggarwal C.C. Han J. Jianyong Wang Yu P.S. 《Knowledge and Data Engineering, IEEE Transactions on》2006,18(5):577-589
Current models of the classification problem do not effectively handle bursts of particular classes coming in at different times. In fact, the current model of the classification problem simply concentrates on methods for one-pass classification modeling of very large data sets. Our model for data stream classification views the data stream classification problem from the point of view of a dynamic approach in which simultaneous training and test streams are used for dynamic classification of data sets. This model reflects real-life situations effectively, since it is desirable to classify test streams in real time over an evolving training and test stream. The aim here is to create a classification system in which the training model can adapt quickly to the changes of the underlying data stream. In order to achieve this goal, we propose an on-demand classification process which can dynamically select the appropriate window of past training data to build the classifier. The empirical results indicate that the system maintains an high classification accuracy in an evolving data stream, while providing an efficient solution to the classification task. 相似文献
17.
基于群集智能的蚁群优化算法研究 总被引:7,自引:0,他引:7
李志伟 《计算机工程与设计》2003,24(8):27-29
群集智能是近年来人工智能领域研究的一个新的热点课题。介绍了这一研究的思想方法和数学模型,以蚂蚁群体的智能行为研究对象,阐述了基于群集智能的蚁群优化算法,并介绍了该算法的工程应用。 相似文献
18.
传统的基于粒子群最优化的混合启发式算法和模拟退火算法往往以牺牲解的质量或者求解速度来实现有效的调度,为了解决这一问题,提出了一种基于高速下行分组接入(HSDPA)标准的混合群集智能算法。首先假定HSDPA标准所指定的是现实性不完善的信道状态信息(CSI)反馈,并以有限集合的形式存在于信道指示符(CQI)中;接着在最优化过程中,利用模拟退火算法和粒子群最优化算法各自的优点设计混合群集智能算法;最后利用混合算法进行数据处理,得到最优解的同时降低了复杂度,从而实现提升系统通量,达到调度最优化的目的。实验结果表明,与传统的基于粒子群最优化的算法相比,所提的混合算法取得了更好的调度效果。 相似文献
19.
Clustering provides a knowledge acquisition method for intelligent systems. This paper proposes a novel data-clustering algorithm, by combining a new initialization technique, K-means algorithm and a new gradual data transformation approach to provide more accurate clustering results than the K-means algorithm and its variants by increasing the clusters’ coherence. The proposed data transformation approach solves the problem of generating empty clusters, which frequently occurs for other clustering algorithms. An efficient method based on the principal component transformation and a modified silhouette algorithm is also proposed in this paper to determine the number of clusters. Several different data sets are used to evaluate the efficacy of the proposed method to deal with the empty cluster generation problem and its accuracy and computational performance in comparison with other K-means based initialization techniques and clustering methods. The developed estimation method for determining the number of clusters is also evaluated and compared with other estimation algorithms. Significances of the proposed method include addressing the limitations of the K-means based clustering and improving the accuracy of clustering as an important method in the field of data mining and expert systems. Application of the proposed method for the knowledge acquisition in time series data such as wind, solar, electric load and stock market provides a pre-processing tool to select the most appropriate data to feed in neural networks or other estimators in use for forecasting such time series. In addition, utilization of the knowledge discovered by the proposed K-means clustering to develop rule based expert systems is one of the main impacts of the proposed method. 相似文献
20.
黄力明 《计算机工程与设计》2008,29(9):2300-2303
模糊C-均值聚类算法广泛用于图像分割,但存在聚类性能受类中心初始化影响,且计算量大等问题.为此,提出了一种基于微粒群的模糊C-均值聚类图像分割算法,该方法利用微粒群较强的搜索能力搜索聚类中心:由于搜索聚类中心是按密度进行,计算量小,故可以大幅提高模糊C-均值算法的计算速度.实验结果表明,该方法可以使模糊聚类的速度得到明显提高,实现图像的快速分割. 相似文献