首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A new feature selection method is proposed for high-dimensional data clustering on the basis of data field. With the potential entropy to evaluate the importance of feature subsets, features are filtered by removing unimportant features or noises from the original datasets. Experiments show that the proposed method can sharply reduce the number of dimensions and effectively improve the clustering performance on WDBC dataset.  相似文献   

2.
Being characteristic of non-teacher learning, self-organization, memory, and noise resistance, the artificial immune system is a research focus in the field of intelligent information processing. Based on the basic principles of organism immune and clonal selection, this article presents a polyclonal clustering algorithm characteristic of self-adaptation. According to the core idea of the algorithm, various immune operators in the artificial immune system are employed in the clustering process; moreover, clustering numbers are adjusted in accordance with the affinity function. Introduction of the recombination operator can effectively enhance the diversity of the individual antibody in a generation population, so that the searching scope for solutions is enlarged and the premature phenomenon of the algorithm is avoided. Besides, introduction of the inconsistent mutation operator enhances the adaptability and optimizes the performance of local solution seeking. Meanwhile, the convergence of the algorithm is accelerated. In addition, the article also proves the convergence of the algorithm by employing the Markov chain. Results of the data simulation experiment show that the algorithm is capable of obtaining reasonable and effective cluster.  相似文献   

3.
The clustering of trajectories over huge volumes of streaming data has been rec- ognized as critical for many modem applica- tions. In this work, we propose a continuous clustering of trajectories of moving objects over high speed data streams, which updates online trajectory clusters on basis of incremental line- segment clustering. The proposed clustering algorithm obtains trajectory clusters efficiently and stores all closed trajectory clusters in a bi- tree index with efficient search capability. Next, we present two query processing methods by utilising three proposed pruning strategies to fast handle two continuous spatio-temporal queries, threshold-based trajectory clustering queries and threshold-based trajectory outlier detections. Finally, the comprehensive experi- mental studies demonstrate that our algorithm achieves excellent effectiveness and high effi- ciency for continuous clustering on both syn- thetic and real streaming data, and the propo- sed query processing methods utilise average 90% less time than the naive query methods.  相似文献   

4.
Traditional fuzzy clustering algorithms based on objective function is unable to determine the optimum number of clusters, sensitive to the initial cluster centers, and easily sunk into the issue of local optimum. A Fuzzy similarity-based clustering (FSBC) algorithm is proposed in this paper. This method consists three phases: first, the objective function is modified by integrating Fuzzy C-means (FCM) and Possibilistic C-means (PCM) method; second, using the density function from data for similarity-based clustering to automatically generate initial prototype without requesting users to specify; finally, the iteration process optimized by Particle swarm optimization (PSO) to obtain appropriate adjustment parameters that can provide better results, which avoids the local minimum problems of traditional methods. The experimental results on the synthetic data and UCI standard data sets show that the proposed algorithm has greater searching capability, less computational complexity, higher clustering precision.  相似文献   

5.
In this letter, a real-time C-V (Characteristic-Vector) clustering algorithm is put forth to treat with vast action data which are dynamically collected from web site.The algo-fithm cites the concept of C-V to denote characteristic, synchronously it adopts two-value[0,1] input and self-definition vigilance parameter to design clustering-architecture.Vector Degree of Matching(VDM) plays a key role in the clustering algorithm, which determines the magnitude of typical characteristic.Making use of stability analysis, the classifications are confirmed to have reliably hierarchical structure when vigilance parameter shifts from 0.1 to 0.99.This non-linear relation between vigilance parameter and classification upper limit helps mining out representa-tive classifications from net-users according to the actural web resource, then administering system can map them to web resource space to implement the intelligent configuration effectually and reapidly.  相似文献   

6.
In this paper, the idea of interest coverage is provided to form clusters in sensor network, which mean that the distance among data trends gathered by neighbor sensors is so small that, in some period, those sensors can be clustered, and certain sensor can be used to replace the cluster to form the virtual sensor network topology. In detail, the Jensen-Shannon Divergence (JSD) is used to characterize the distance among different distributions which represent the data trend of sensors. Then, based on JSD, a hierarchical clustering algorithm is provided to form the virtual sensor net- work topology. Simulation shows that the proposed approach gains more than 50% energy saving than Statistical Aggregation Methods (SAM) which transmitted data gathered by sensor only when the difference among data exceed certain threshold.  相似文献   

7.
Clustering is one of the most widely used data mining techniques that can be used to create homogeneous clusters.K-means is one of the popular clustering algorithms that,despite its inherent simplicity,has also some major problems.One way to resolve these problems and improve the k-means algorithm is the use of evolutionary algorithms in clustering.In this study,the Imperialist Competitive Algorithm(ICA) is developed and then used in the clustering process.Clustering of IRIS,Wine and CMC datasets using developed ICA and comparing them with the results of clustering by the original ICA,GA and PSO algorithms,demonstrate the improvement of Imperialist competitive algorithm.  相似文献   

8.
In this paper, a clustering algorithm is proposed based on the high correlation among the overlapped field of views for the wireless multimedia sensor networks. Firstly, by calculating the area of the overlapped field of views (FoVs) based on the gird method, node correlations have been obtained. Then, the algorithm utilizes the node correlations to partition the network region in which there are high correlation multimedia sensor nodes. Meanwhile, in order to minimize the energy consumption for transmitting images, the strategy of the cluster heads election is proposed based on the cost estimation, which consists of signal strength and residual energy as well as the node correlation. Simulation results show that the proposed algorithm can balance the energy consumption and extend the network lifetime effectively.  相似文献   

9.
Replica management is currently a major topic to data grid community. This paper first proposed a replication strategy model for federated data grid systems in which sub data grid is not equal-sized. Then, we investigated what is the optimal way to replicate data in non- uniform membership federated data grid, that is, system with storage constraints should make how many replicas for every data to minimize average access latency. Furthermore, we considered the impact of non-uniform membership on system performance. For a replication strategy, we built a mechanism to compare the performance between two data grids. The simulation results showed that the strategy proposed in this paper is superior to LRU strategy, uniform replication strategy, proportional replication strategy and square root replication strategy in wide area network bandwidth requirement and in average access latency of data.  相似文献   

10.
《电子学报:英文版》2017,(6):1221-1226
Category-based statistic language model is an important method to solve the problem of sparse data in statistical language models. But there are two bottlenecks about this model: 1) The problem of word clustering, it is hard to find a suitable clustering method that has good performance and has not large amount of computation; 2) Class-based method always loses some prediction ability to adapt the text of different domain. In order to solve above problems, a novel definition of word similarity by utilizing mutual information was presented. Based on word similarity, the definition of word set similarity was given and a bottom-up hierarchical clustering algorithm was proposed. Experimental results show that the word clustering algorithm based on word similarity is better than conventional greedy clustering method in speed and performance, the perplexity is reduced from 283 to 207.8.  相似文献   

11.
Exemplar-based clustering algorithm is very efficient to handle large scale and high dimensional data, while it does not require the user to specify many pa- rameters. For current algorithms, however, are the inabil- ities to identify the optimal results or specify the number of clusters automatically. To remedy these, in this work, we propose and explore the idea of exemplar-based cluster- ing analysis optimized by genetic algorithms, abbreviated as ECGA framework, which use genetic algorithms for op- timizing and combining the results. First, an exemplar- based clustering framework based on canonical genetic al- gorithm is introduced. Then the framework is optimized with three new genetic operators: (1) Geometry operator which limits the typology distribution of exemplars based on pair-wise distances, (2) EM operator which apply EM (Expectation maximization) algorithm to generate children from previous population and (3) Vertex substitution op- erator which is initialized with genetic algorithm and se- lect exemplars by using the variable neighborhood search meta-heuristic framework. Theoretical analysis proves the ECGA can achieve better chance to find the optimal clus- tering results. Experimental results on several synthetic and real data sets show our ECGA provide comparable or better results at the cost of slightly longer CPU time.  相似文献   

12.
Relational database management systems are usually deployed on singlenode machines and have strict limitations in terms of da ta structure. This means they do not work well with big data, and NoSQL has been proposed as a solution. To make data querying more efficient, indexes and memory cache techniques are used in NoSQL databases. In this paper, we propose a hierarchical in dexing mechanism and a prototype distributed datastorage system, called HMIBase, which has hierarchical indexes for nonprima ry keys in tables and makes data querying more efficient. HMIBase uses HBase as the lower data storage and creates a memory cache for more efficient data transmission. HMIBase supports coprocessortoprocess update requests. It also provides a client with query and update APIs and a server to support RPCs from the client and finish jobs. To improve the cache hit ratio, we propose a memory cache replacement strategy, called Hot Score algorithm, in HMIBase. The experimental results show that Hot Score algo rithm is better than other cachereplacement strategies.  相似文献   

13.
Among the available clustering algorithms in data mining, the CLOPE algorithm attracts much more attention with its high speed and good performance. However, the proper choice of some parameters in the CLOPE algorithm directly affects the validity of the clustering results, which is still an open issue. For this purpose, this paper proposes a fuzzy CLOPE algorithm, and presents a method for the optimal parameter choice by defining a modified partition fuzzy degree as a clustering validity function. The experimental results with real data set illustrate the effectiveness of the proposed fuzzy CLOPE algorithm and optimal parameter choice method based on the modified partition fuzzy degree.  相似文献   

14.
Clustering is one of the most widely used techniques for exploratory data analysis.Spectral clustering algorithm,a popular modern clustering algorithm,has been shown to be more effective in detecting clusters than many traditional algorithms.It has applications ranging from computer vision and information retrieval to social science and biology.With the size of databases soaring,clustering algorithms have scaling computational time and memory use.In this paper,we propose a parallel spectral clustering implementation based on MapReduce.Both the computation and data storage are distributed,which solves the scalability problems for most existing algorithms.We empirically analyze the proposed implementation on both benchmark networks and a real social network dataset of about two million vertices and two billion edges crawled from Sina Weibo.It is shown that the proposed implementation scales well,speeds up the clustering without sacrificing quality,and processes massive datasets efficiently on commodity machine clusters.  相似文献   

15.
As the society increasingly emphasizes the need of clean and renewable energy systems,the electric power industry is undergoing profound changes to transform a passive,hierarchical grid into an active and open-access smart grid.Enabled by advances in sensing,communication,and actuation,future smart grids offer much broader opportunities for cross-fertilization between the traditional power engineering community and the communication community.This special issue pres-  相似文献   

16.
Part-Of-Speech tagging is a basic task in the field of natural language processing. This paper builds a POS tagger based on improved Hidden Markov model, by employing word clustering and syntactic parsing model. Firstly, In order to overcome the defects of the classi- cal HMM, Markov family model (MFM), a new statisti- cal model was introduced. Secondly, to solve the prob- lem of data sparseness, we propose a bottom-to-up hierar- chical word clustering algorithm. Then we combine syn- tactic parsing with part-of-speech tagging. The Part-of- Speech tagging experiments show that the improved Part- Of-Speech tagging model has higher performance than Hidden Markov models (HMMs) under the same test- ing conditions, the precision is enhanced from 94.642% to 97.235%.  相似文献   

17.
A critical problem related to semisupervised kernel clustering is the selection of an optimal kernel parameter since the value of parameter has significant impact on the performance of clustering. In this paper, we construct a semi-supervised kernel fuzzy c-means clustering algorithm in terms of pairwise constraints to obtain an optimal kernel parameter. Combined with kernel parameter initialization directly using the given constraints, a new optimization process is derived to automatically estimate the optimal parameter of kernel function. Experimental results show that, with the effective use of pairwise constraints, the proposed approach works well for the estimation of kernel parameter in semi-supervised kernel fuzzy c-means clustering.  相似文献   

18.
A multi-parameter signal sorting algorithm for interleaved radar pulses in dense emitter environment is presented. The algorithm includes two parts, pulse classification and pulse repetition interval (PRI) analysis. Firstly, we propose the dynamic distance clustering (DDC) for classification. In the clustering algorithm, the multi-dimension features of radar pulse are used for reliable classification. The similarity threshold estimation method in DDC is derived, which contributes to the efficiency of the algorithm. However, DDC has large computation with many signal pulses. Then, in order to sort radar signals in real time, the improved DDC (IDDC) algorithm is proposed. Finally, PRI analysis is adopted to complete the process of sorting. The simulation experiments and hardware implementations show both algorithms are effective.  相似文献   

19.
20.
Erasure code is widely used as the redundancy scheme in distributed storage system. When a storage node fails, the repair process often requires to transfer a large amount of data. Regenerating code and hierarchical code are two classes of codes proposed to reduce the repair bandwidth cost. Regenerating codes reduce the amount of data transferred by each helping node, while hierarchical codes reduce the number of nodes participating in the repair process. In this paper, we propose a "sub-code nesting framework" to combine them together. The resulting regenerating hierarchical code has low repair degree as hierarchical code and lower repair cost than hierarchical code. Our code can achieve exact regeneration of the failed node, and has the additional property of low updating complexity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号