首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 18 毫秒
1.
2.
Characteristic-Based Clustering for Time Series Data   总被引:1,自引:0,他引:1  
With the growing importance of time series clustering research, particularly for similarity searches amongst long time series such as those arising in medicine or finance, it is critical for us to find a way to resolve the outstanding problems that make most clustering methods impractical under certain circumstances. When the time series is very long, some clustering algorithms may fail because the very notation of similarity is dubious in high dimension space; many methods cannot handle missing data when the clustering is based on a distance metric.This paper proposes a method for clustering of time series based on their structural characteristics. Unlike other alternatives, this method does not cluster point values using a distance metric, rather it clusters based on global features extracted from the time series. The feature measures are obtained from each individual series and can be fed into arbitrary clustering algorithms, including an unsupervised neural network algorithm, self-organizing map, or hierarchal clustering algorithm.Global measures describing the time series are obtained by applying statistical operations that best capture the underlying characteristics: trend, seasonality, periodicity, serial correlation, skewness, kurtosis, chaos, nonlinearity, and self-similarity. Since the method clusters using extracted global measures, it reduces the dimensionality of the time series and is much less sensitive to missing or noisy data. We further provide a search mechanism to find the best selection from the feature set that should be used as the clustering inputs.The proposed technique has been tested using benchmark time series datasets previously reported for time series clustering and a set of time series datasets with known characteristics. The empirical results show that our approach is able to yield meaningful clusters. The resulting clusters are similar to those produced by other methods, but with some promising and interesting variations that can be intuitively explained with knowledge of the global characteristics of the time series.  相似文献   

3.
数据挖掘过程中的模糊聚类方法   总被引:6,自引:0,他引:6  
在研究数据挖掘过程中常见的数据聚类方法的基础上,在数据挖掘中引入了模糊聚类分析的方法,分析了该方法在数据挖掘过程中的特性,讨论了其在大型数据库中的应用方法。  相似文献   

4.
一种基于聚类和快速计算的异常数据挖掘算法   总被引:1,自引:0,他引:1  
传统局部离群因子(LOF)算法在动态增量数据库环境下,进行二次异常数据挖掘需重新计算所有数据对象局部偏离因子,存在效率较低的问题。为此,提出一种基于聚类和快速计算的异常数据挖掘算法。对传统DBSCAN算法进行改进,并且在该改进算法聚类的基础上,仅对部分数据对象计算局部偏离因子。实验结果表明,该算法在动态增量数据库环境下,与 LOF 与 lncLOF算法相比,不仅计算时间效率高,而且能提高挖掘异常数据的精度。  相似文献   

5.
To deal with data patterns with linguistic ambiguity and with probabilistic uncertainty in a single framework, we construct an interpretable probabilistic fuzzy rule-based system that requires less human intervention and less prior knowledge than other state of the art methods. Specifically, we present a new iterative fuzzy clustering algorithm that incorporates a supervisory scheme into an unsupervised fuzzy clustering process. The learning process starts in a fully unsupervised manner using fuzzy c-means (FCM) clustering algorithm and a cluster validity criterion, and then gradually constructs meaningful fuzzy partitions over the input space. The corresponding fuzzy rules with probabilities are obtained through an iterative learning process of selecting clusters with supervisory guidance based on the notions of cluster-pureness and class-separability. The proposed algorithm is tested first with synthetic data sets and benchmark data sets from the UCI Repository of Machine Learning Database and then, with real facial expression data and TV viewing data.  相似文献   

6.
从加权广义欧氏权距离平方和最小概念出发,在循环迭代模糊聚类算法的基础上提出一种数据集指标值残缺的模糊聚类模型,示例分析了不同数据集在不同残缺程度下的聚类效果,拓展了模糊聚类算法应用领域。  相似文献   

7.
Enhanced Fuzzy System Models With Improved Fuzzy Clustering Algorithm   总被引:2,自引:0,他引:2  
Although traditional fuzzy models have proven to have high capacity of approximating the real-world systems, they have some challenges, such as computational complexity, optimization problems, subjectivity, etc. In order to solve some of these problems, this paper proposes a new fuzzy system modeling approach based on improved fuzzy functions to model systems with continuous output variable. The new modeling approach introduces three features: i) an improved fuzzy clustering (IFC) algorithm, ii) a new structure identification algorithm, and iii) a nonparametric inference engine. The IFC algorithm yields simultaneous estimates of parameters of c-regression models, together with fuzzy c-partitioning of the data, to calculate improved membership values with a new membership function. The structure identification of the new approach utilizes IFC, instead of standard fuzzy c-means clustering algorithm, to fuzzy partition the data, and it uses improved membership values as additional input variables along with the original scalar input variables for two different choices of regression methods: least squares estimation or support vector regression, to determine ldquofuzzy functionsrdquo for each cluster. With novel IFC, one could learn the system behavior more accurately compared to other FSM models. The nonparametric inference engine is a new approach, which uses the alike -nearest neighbor method for reasoning. Empirical comparisons indicate that the proposed approach yields comparable or better accuracy than fuzzy or neuro-fuzzy models based on fuzzy rules bases, as well as other soft computing methods.  相似文献   

8.
Clustering time series is a problem that has applications in a wide variety of fields, and has recently attracted a large amount of research. Time series data are often large and may contain outliers. We show that the simple procedure of clipping the time series (discretising to above or below the median) reduces memory requirements and significantly speeds up clustering without decreasing clustering accuracy. We also demonstrate that clipping increases clustering accuracy when there are outliers in the data, thus serving as a means of outlier detection and a method of identifying model misspecification. We consider simulated data from polynomial, autoregressive moving average and hidden Markov models and show that the estimated parameters of the clipped data used in clustering tend, asymptotically, to those of the unclipped data. We also demonstrate experimentally that, if the series are long enough, the accuracy on clipped data is not significantly less than the accuracy on unclipped data, and if the series contain outliers then clipping results in significantly better clusterings. We then illustrate how using clipped series can be of practical benefit in detecting model misspecification and outliers on two real world data sets: an electricity generation bid data set and an ECG data set.  相似文献   

9.
针对直觉模糊集合数据的聚类有效性问题,给出了一种用于发现最优模糊划分的聚类有效性方法.该方法采用直觉模糊相关度和直觉模糊熵两个重要因子来评价直觉模糊聚类的有效性.其中,直觉模糊相关度通过增加非隶属度参数对模糊相关度进行直觉化扩展,用于评价类与类间相关度的大小,同时加入权重参数解决了样本数据各维特征分配不均匀的问题,而直觉模糊熵用于检验分类结果的可靠性.最后通过实例验证了该方法对于紧致的、良好分离的教据集分类效果理想,其在目标编群、目标识别等信息融合领域有良好的应用前景.  相似文献   

10.
提出了一种利用模糊集理论进行聚类的技术,详细阐述了在关系数据库中利用此技术实现聚类的方法和过程,并给出了程序流程和程序实现;经过聚类后的数据对象,既可以从中获取分类知识和信息,也可以为下一步的关联规则挖掘提供低噪声的数据源。  相似文献   

11.
基于模糊最近邻的高维数据聚类   总被引:3,自引:0,他引:3  
提出一种基于模糊最近邻的聚类算法(简称FNNC算法).FNNC算法通过加权共享最近邻图来形成簇,而且仅仅使用对象图中一些有用的连接.本文通过实验验证了FNNC算法在高维数据聚类中的有效性.  相似文献   

12.
模糊C均值(FCM)聚类算法对初始中心点敏感,不考虑类别间中心点的相互影响,且仅能处理低维数据。为此,设计一种改进的初始中心点选择方法,并基于条件模糊聚类思想,将传统FCM算法中的欧氏距离替换为余弦距离后提出wHFCLM算法。将该算法与扩展增量聚类算法spFCM、oFCM和rseFCM相结合,得到对应的扩展增量模糊聚类算法spHF(c+l)M、oHF(c+l)M以及rseHF(c+l)M。实验结果表明,与spFCM算法、oFCM算法和rseFCM算法相比,扩展增量模糊聚类算法对初始中心点的选择敏感性较低,能较好地处理大规模稀疏高维数据集,且在合适的分块大小下具有更优的聚类性能。  相似文献   

13.
针对区间数模糊c均值聚类算法存在模糊度指数m无法准确描述数据簇划分情况的问题,对点数据集合的区间Ⅱ型模糊c均值聚类算法进行拓展,将其扩展到区间型不确定数据的聚类中。同时,分析了区间数的区间Ⅱ型模糊c均值聚类算法的收敛性,以确定模糊度指数m1和m2的取值原则。基于合成数据和实测数据的仿真实验结果表明:区间数的区间Ⅱ型模糊c均值聚类算法比区间数的模糊c均值聚类算法的聚类效果好。  相似文献   

14.
朱强 《现代计算机》2007,(4):87-88,94
分析了常用的数据挖掘方法,在数据挖掘中引入了模糊聚类分析的方法,分析了该方法在数据挖掘中的优势,并以例证说明这一方法的实际应用。  相似文献   

15.
Economic dispatch is a highly constrained optimization problem encompassing interaction among decision variables. Environmental concerns that arise due to the operation of fossil fuel fired electric generators, transforms the classical problem into multiobjective environmental/economic dispatch (EED). In this paper, a fuzzy clustering-based particle swarm (FCPSO) algorithm has been proposed to solve the highly constrained EED problem involving conflicting objectives. FCPSO uses an external repository to preserve nondominated particles found along the search process. The proposed fuzzy clustering technique, manages the size of the repository within limits without destroying the characteristics of the Pareto front. Niching mechanism has been incorporated to direct the particles towards lesser explored regions of the Pareto front. To avoid entrapment into local optima and enhance the exploratory capability of the particles, a self-adaptive mutation operator has been proposed. In addition, the algorithm incorporates a fuzzy-based feedback mechanism and iteratively uses the information to determine the compromise solution. The algorithm's performance has been examined over the standard IEEE 30 bus six-generator test system, whereby it generated a uniformly distributed Pareto front whose optimality has been authenticated by benchmarking against the epsiv -constraint method. Results also revealed that the proposed approach obtained high-quality solutions and was able to provide a satisfactory compromise solution in almost all the trials, thereby validating the efficacy and applicability of the proposed approach over the real-world multiobjective optimization problems.  相似文献   

16.
为了提高软件质量,控制和改汕软件开发过程,需要有效地度量软件开发过程和分析其过程各个阶段收集的度量数据.文中将模糊聚类算法应用到软件度量的数据分析中.先给出了数据挖掘相关知识和理论,再介绍了该算法在软件度量数据分析中应用的实验研究.由于较快地发现有严重缺陷的模块,进而提高了软件测试效率.  相似文献   

17.
Clustering Incomplete Data Using Kernel-Based Fuzzy C-means Algorithm   总被引:3,自引:0,他引:3  
  相似文献   

18.
Noise clustering, as a robust clustering method, performs partitioning of data sets reducing errors caused by outliers. Noise clustering defines outliers in terms of a certain distance, which is called noise distance. The probability or membership degree of data points belonging to the noise cluster increases with their distance to regular clusters. The main purpose of noise clustering is to reduce the influence of outliers on the regular clusters. The emphasis is not put on exactly identifying outliers. However, in many applications outliers contain important information and their correct identification is crucial. In this paper we present a method to estimate the noise distance in noise clustering based on the preservation of the hypervolume of the feature space. Our examples will demonstrate the efficiency of this approach.  相似文献   

19.
模糊聚类挖掘方法在电子商务中的应用研究   总被引:1,自引:0,他引:1  
系统聚类法中常用的是最小距离法、最大距离法、重心距离法、类平均距离法等,这些方法都是定义一种类与类之间的距离来进行聚类的,但在有些情况下其聚类结果不唯一,文章利用模糊关系短阵,给出了一种新的基于模糊聚类的方法,并将这些技术应用到具体电子商务平台的数据挖掘工作中,得到了可行性验证,从而为此技术在电子商务领域的广泛应用起到了较好的示范作用。  相似文献   

20.
杜星海  侯红 《微机发展》2005,15(12):132-134
为了提高软件质量,控制和改汕软件开发过程,需要有效地度量软件开发过程和分析其过程各个阶段收集的度量数据。文中将模糊聚类算法应用到软件度量的数据分析中。先给出了数据挖掘相关知识和理论,再介绍了该算法在软件度量数据分析中应用的实验研究。由于较快地发现有严重缺陷的模块,进而提高了软件测试效率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号

京公网安备 11010802026262号