首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
Mining spatial association rules in image databases   总被引:2,自引:0,他引:2  
In this paper, we propose a novel spatial mining algorithm, called 9DLT-Miner, to mine the spatial association rules from an image database, where every image is represented by the 9DLT representation. The proposed method consists of two phases. First, we find all frequent patterns of length one. Next, we use frequent k-patterns (k ? 1) to generate all candidate (k + 1)-patterns. For each candidate pattern generated, we scan the database to count the pattern’s support and check if it is frequent. The steps in the second phase are repeated until no more frequent patterns can be found. Since our proposed algorithm prunes most of impossible candidates, it is more efficient than the Apriori algorithm. The experiment results show that 9DLT-Miner runs 2-5 times faster than the Apriori algorithm.  相似文献   

2.
Image database design based on 9D-SPA representation for spatial relations   总被引:2,自引:0,他引:2  
Spatial relationships between objects are important features for designing a content-based image retrieval system. We propose a new scheme, called 9D-SPA representation, for encoding the spatial relations in an image. With this representation, important functions of intelligent image database systems such as visualization, browsing, spatial reasoning, iconic indexing, and similarity retrieval can be easily achieved. The capability of discriminating images based on 9D-SPA representation is much more powerful than any spatial representation method based on minimum bounding rectangles or centroids of objects. The similarity measures using 9D-SPA representation provide a wide range of fuzzy matching capability in similarity retrieval to meet different user's requirements. Experimental results showed that our system is very effective in terms of recall and precision. In addition, the 9D-SPA representation can be incorporated into a two-level index structure to help reduce the search space of each query processing. The experimental results also demonstrated that, on average, only 0.1254 percent /spl sim/ 1.6829 percent of symbolic pictures (depending on various degrees of similarity) were accessed per query in an image database containing 50,000 symbolic pictures.  相似文献   

3.
Frequent pattern mining discovers sets of items that frequently appear together in a transactional database; these can serve valuable economic and research purposes. However, if the database contains sensitive data (e.g., user behavior records, electronic health records), directly releasing the discovered frequent patterns with support counts will carry significant risk to the privacy of individuals. In this paper, we study the problem of how to accurately find the top-k frequent patterns with noisy support counts on transactional databases while satisfying differential privacy. We propose an algorithm, called differentially private frequent pattern (DFP-Growth), that integrates a Laplace mechanism and an exponential mechanism to avoid privacy leakage. We theoretically prove that the proposed method is (λ, δ)-useful and differentially private. To boost the accuracy of the returned noisy support counts, we take consistency constraints into account to conduct constrained inference in the post-processing step. Extensive experiments, using several real datasets, confirm that our algorithm generates highly accurate noisy support counts and top-k frequent patterns.  相似文献   

4.
Okbin Lee 《Information Sciences》2006,176(15):2148-2160
In order to maintain load balancing in a distributed network, each node should obtain workload information from all the nodes in the network. To accomplish this, this processing requires O(v2) communication complexity, where v is the number of nodes. First, we present a new synchronous dynamic distributed load balancing algorithm on a (vk + 1, 1)-configured network applying a symmetric balanced incomplete block design, where v = k2 + k + 1. Our algorithm designs a special adjacency matrix and then transforms it to (vk + 1, 1)-configured network for an efficient communication. It requires only communication complexity and each node receives workload information from all the nodes without redundancy since each link has the same amount of traffic for transferring workload information. Later, this algorithm is revised for distributed networks and is analyzed in terms of efficiency of load balancing.  相似文献   

5.
A recent approach to improve the performance of XML query evaluation is to cache the query results of frequent query patterns. Unfortunately, discovering these frequent query patterns is an expensive operation. In this paper, we develop a two-pass mining algorithm 2PXMiner that guarantees the discovery of frequent query patterns by scanning the database at most twice. By exploiting a transaction summary data structure, and an enumeration tree, we are able to determine the upper bounds of the frequencies of the candidate patterns, and to quickly prune away the infrequent patterns. We also design an index to trace the repeating candidate subtrees generated by sibling repetition, thus avoiding redundant computations. Experiments results indicate that 2PXMiner is both efficient and scalable.  相似文献   

6.
Mining top-rank-k frequent patterns is a popular data mining task, which consists of discovering the patterns in a transaction database that belong to the k first ranks in terms of support. Although, several algorithms have been proposed for this task, it remains computationally expensive. To address this issue, this paper proposes a novel algorithm named BTK. It relies on a novel tree structure named TB-tree to store crucial information about frequent patterns. Moreover, BTK employs a new B-list structure to store information about patterns, and relies on subsume indexes to reduce the search space and speed up the discovery of top-rank-k frequent patterns. BTK also uses an early pruning strategy and an effective threshold raising mechanism. Additionally, BTK introduces two efficient procedures for respectively generating subsume indexes and intersecting B-lists. Extensive experiments were conducted on several datasets to evaluate the efficiency of the proposed algorithm. Results show that BTK is highly efficient and competitive.  相似文献   

7.
As a generalization of the precise and pessimistic diagnosis strategies of system-level diagnosis of multicomputers, the t/k diagnosis strategy can significantly improve the self-diagnosing capability of a system at the expense of no more than k fault-free processors (nodes) being mistakenly diagnosed as faulty. In the case k ? 2, to our knowledge, there is no known t/k diagnosis algorithm for general diagnosable system or for any specific system. Hypercube is a popular topology for interconnecting processors of multicomputers. It is known that an n-dimensional cube is (4n − 9)/3-diagnosable. This paper addresses the (4n − 9)/3 diagnosis of n-dimensional cube. By exploring the relationship between a largest connected component of the 0-test subgraph of a faulty hypercube and the distribution of the faulty nodes over the network, the fault diagnosis of an n-dimensional cube can be reduced to those of two constituent (n − 1)-dimensional cubes. On this basis, a diagnosis algorithm is presented. Given that there are no more than 4n − 9 faulty nodes, this algorithm can isolate all faulty nodes to within a set in which at most three nodes are fault-free. The proposed algorithm can operate in O(N log2 N) time, where N = 2n is the total number of nodes of the hypercube. The work of this paper provides insight into developing efficient t/k diagnosis algorithms for larger k value and for other types of interconnection networks.  相似文献   

8.
In this paper, we propose two parallel algorithms for mining maximal frequent itemsets from databases. A frequent itemset is maximal if none of its supersets is frequent. One parallel algorithm is named distributed max-miner (DMM), and it requires very low communication and synchronization overhead in distributed computing systems. DMM has the local mining phase and the global mining phase. During the local mining phase, each node mines the local database to discover the local maximal frequent itemsets, then they form a set of maximal candidate itemsets for the top-down search in the subsequent global mining phase. A new prefix tree data structure is developed to facilitate the storage and counting of the global candidate itemsets of different sizes. This global mining phase using the prefix tree can work with any local mining algorithm. Another parallel algorithm, named parallel max-miner (PMM), is a parallel version of the sequential max-miner algorithm (Proc of ACM SIGMOD Int Conf on Management of Data, 1998, pp 85–93). Most of existing mining algorithms discover the frequent k-itemsets on the kth pass over the databases, and then generate the candidate (k + 1)-itemsets for the next pass. Compared to those level-wise algorithms, PMM looks ahead at each pass and prunes more candidate itemsets by checking the frequencies of their supersets. Both DMM and PMM were implemented on a cluster of workstations, and their performance was evaluated for various cases. They demonstrate very good performance and scalability even when there are large maximal frequent itemsets (i.e., long patterns) in databases.
Congnan LuoEmail:
  相似文献   

9.
An order-clique-based approach for mining maximal co-locations   总被引:2,自引:0,他引:2  
Most algorithms for mining spatial co-locations adopt an Apriori-like approach to generate size-k prevalence co-locations after size-(k − 1) prevalence co-locations. However, generating and storing the co-locations and table instances is costly. A novel order-clique-based approach for mining maximal co-locations is proposed in this paper. The efficiency of the approach is achieved by two techniques: (1) the spatial neighbor relationships and the size-2 prevalence co-locations are compressed into extended prefix-tree structures, which allows the order-clique-based approach to mine candidate maximal co-locations and co-location instances; and (2) the co-location instances do not need to be stored after computing some characteristics of the corresponding co-location, which significantly reduces the execution time and space required for mining maximal co-locations. The performance study shows that the new method is efficient for mining both long and short co-location patterns, and is faster than some other methods (in particular the join-based method and the join-less method).  相似文献   

10.
We define an interconnection network AQn,k which we call the augmented k-ary n-cube by extending a k-ary n-cube in a manner analogous to the existing extension of an n-dimensional hypercube to an n-dimensional augmented cube. We prove that the augmented k-ary n-cube AQn,k has a number of attractive properties (in the context of parallel computing). For example, we show that the augmented k-ary n-cube AQn,k: is a Cayley graph, and so is vertex-symmetric, but not edge-symmetric unless n = 2; has connectivity 4n − 2 and wide-diameter at most max{(n − 1)k − (n − 2), k + 7}; has diameter , when n = 2; and has diameter at most , for n ? 3 and k even, and at most , for n ? 3 and k odd.  相似文献   

11.
In this paper, a new algorithm is developed to reduce the computational complexity of Ward’s method. The proposed approach uses a dynamic k-nearest-neighbor list to avoid the determination of a cluster’s nearest neighbor at some steps of the cluster merge. Double linked algorithm (DLA) can significantly reduce the computing time of the fast pairwise nearest neighbor (FPNN) algorithm by obtaining an approximate solution of hierarchical agglomerative clustering. In this paper, we propose a method to resolve the problem of a non-optimal solution for DLA while keeping the corresponding advantage of low computational complexity. The computational complexity of the proposed method DKNNA + FS (dynamic k-nearest-neighbor algorithm with a fast search) in terms of the number of distance calculations is O(N2), where N is the number of data points. Compared to FPNN with a fast search (FPNN + FS), the proposed method using the same fast search algorithm (DKNNA + FS) can reduce the computing time by a factor of 1.90-2.18 for the data set from a real image. In comparison with FPNN + FS, DKNNA + FS can reduce the computing time by a factor of 1.92-2.02 using the data set generated from three images. Compared to DLA with a fast search (DLA + FS), DKNNA + FS can decrease the average mean square error by 1.26% for the same data set.  相似文献   

12.
In this paper, we propose a modified version of the k-nearest neighbor (kNN) algorithm. We first introduce a new affinity function for distance measure between a test point and a training point which is an approach based on local learning. A new similarity function using this affinity function is proposed next for the classification of the test patterns. The widely used convention of k, i.e., k = [√N] is employed, where N is the number of data used for training purpose. The proposed modified kNN algorithm is applied on fifteen numerical datasets from the UCI machine learning data repository. Both 5-fold and 10-fold cross-validations are used. The average classification accuracy, obtained from our method is found to exceed some well-known clustering algorithms.  相似文献   

13.
Little work has been reported in the literature to support k-nearest neighbor (k-NN) searches/queries in hybrid data spaces (HDS). An HDS is composed of a combination of continuous and non-ordered discrete dimensions. This combination presents new challenges in data organization and search ordering. In this paper, we present an algorithm for k-NN searches using a multidimensional index structure in hybrid data spaces. We examine the concept of search stages and use the properties of an HDS to derive a new search heuristic that greatly reduces the number of disk accesses in the initial stage of searching. Further, we present a performance model for our algorithm that estimates the cost of performing such searches. Our experimental results demonstrate the effectiveness of our algorithm and the accuracy of our performance estimation model.  相似文献   

14.
Sequential pattern mining has been studied extensively in the data mining community. Most previous studies require the specification of a min_support threshold for mining a complete set of sequential patterns satisfying the threshold. However, in practice, it is difficult for users to provide an appropriate min_support threshold. To overcome this difficulty, we propose an alternative mining task: mining top-k frequent closed sequential patterns of length no less than min_, where k is the desired number of closed sequential patterns to be mined and min_ is the minimal length of each pattern. We mine the set of closed patterns because it is a compact representation of the complete set of frequent patterns. An efficient algorithm, called TSP, is developed for mining such patterns without min_support. Starting at (absolute) min_support=1, the algorithm makes use of the length constraint and the properties of top-k closed sequential patterns to perform dynamic support raising and projected database pruning. Our extensive performance study shows that TSP has high performance. In most cases, it outperforms the efficient closed sequential pattern-mining algorithm, CloSpan, even when the latter is running with the best tuned min_support threshold. Thus, we conclude that, for sequential pattern mining, mining top-k frequent closed sequential patterns without min_support is more preferable than the traditional min_support-based mining.  相似文献   

15.
16.
The torus is a popular interconnection topology and several commercial multicomputers use a torus as the basis of their communication network. Moreover, there are many parallel algorithms with torus-structured and mesh-structured task graphs have been developed. If one network can embed a mesh or torus network, the algorithms with mesh-structured or torus-structured can also be used in this network. Thus, the problem of embedding meshes or tori into networks is meaningful for parallel computing. In this paper, we prove that for n ? 6 and 1 ? m ? ⌈n/2⌉ − 1, a family of 2m disjoint k-dimensional tori of size 2s1×2s2×?×2sk each can be embedded in an n-dimensional crossed cube with unit dilation, where each si ? 2, , and max1?i?k{si} ? 3 if n is odd and ; otherwise, max1?i?k{si} ? n − 2m − 1. A new concept, cycle skeleton, is proposed to construct a dynamic programming algorithm for embedding a desired torus into the crossed cube. Furthermore, the time complexity of the algorithm is linear with respect to the size of desired torus. As a consequence, a family of disjoint tori can be simulated on the same crossed cube efficiently and in parallel.  相似文献   

17.
In the present scenario of global economy and World Wide Web, large sets of evolving and distributed data can be handled efficiently by incremental data mining. Frequent patterns are very important in knowledge discovery and data mining process, such as mining of association rules, correlations. FP-tree is a very versatile data structure used for mining of frequent patterns in knowledge discovery and data mining process. FP-tree is a compact representation of transaction database that contains frequency information of all relevant frequent patterns (FP) of the database. All of the existing incremental frequent pattern mining algorithms, such as AFPIM, CATS, CanTree, CP-tree, and SPO-tree, perform incremental mining by processing one transaction of the incremental part of database at a time and updating it to the FP-tree of initial (original) database. Here, in this paper, we propose a novel method that takes advantage of FP-tree representation of incremental transaction database for incremental mining. We propose a batch incremental processing algorithm BIT_FPGrowth that restructures and merges two small consecutive duration FP-trees to obtain a FP-tree of the FP-Growth algorithm. Our BIT_FPGrowth uses FP-tree as preprocessed data repository to get transactions (i.e., item-sets), unlike other sequential incremental algorithms that read transactions from database. BIT_FPGrowth algorithm takes less time for constructing FP-tree. Our experimental results show that, as the size of the database increases, increase in runtime of BIT_FPGrowth is much less and is least of all the other algorithms.  相似文献   

18.
Mining top?k frequent patterns without minimum support threshold   总被引:1,自引:1,他引:0  
Finding frequent patterns play an important role in mining association rules, sequences, episodes, Web log mining and many other interesting relationships among data. Frequent pattern mining methods often produce a huge number of frequent itemsets that is not feasible for effective usage. The number of highly correlated patterns is usually very small and may even be one. Most of the existing frequent pattern mining techniques often require the setting of many input parameters and may involve multiple passes over the database. Minimum support is the widely used parameter in frequent pattern mining to discover statistically significant patterns. Specifying appropriate minimum support is a challenging task for a data analyst as the choice of minimum support value is somewhat arbitrary. Generally, it is required to repeatedly execute an algorithm, heuristically tuning the value of minimum support over a wide range, until the desired result is obtained, certainly, a very time-consuming process. Setting up an inappropriate minimum support may also cause an algorithm to fail in finding the true patterns. We present a novel method to efficiently retrieve top few maximal frequent patterns in order of significance without use of the minimum support parameter. Instead, we are only required to specify a more human understandable parameter, namely the desired number itemsets k. Our technique requires only a single pass over the database and generation of length two itemsets. The association ratio graph is proposed as a compact structure containing concise information, which is created in time quadratic to the size of the database. Algorithms are described for using this graph structure to discover top-most and top-k maximal frequent itemsets without minimum support threshold. To effectively achieve this, the method employs construction of an all path source-to-destination tree to discover all maximal cycles in the graph. The results can be ranked in decreasing order of significance. Results are presented demonstrating the performance advantages to be gained from the use of this approach.  相似文献   

19.
A reverse k-nearest neighbor (RkNN) query retrieves the data points which regard the query point as one of their respective k nearest neighbors. A bi-chromatic reverse k-nearest neighbor (BRkNN) query is a variant of the RkNN query, considering two types of data. Given two types of data G and C, a BRkNN query regarding a data point q in G retrieves the data points from C that regard q as one of their respective k-nearest neighbors among the data points in G. Many existing approaches answer either the RkNN query or the BRkNN query. Different from these approaches, in this paper, we make the first attempt to propose a top-n query based on the concept of BRkNN queries, which ranks the data points in G and retrieves the top-n points according to the cardinalities of the corresponding BRkNN answer sets. For efficiently answering this top-n query, we construct the Voronoi Diagram of G to index the data points in G and C. From the information associated with the Voronoi Diagram of G, the upper bound of the cardinality of the BRkNN answer sets for each data point in G can be quickly computed. Moreover, based on an existing approach to answering the RkNN query and the characteristics of the Voronoi Diagram of G, we propose a method to find the candidate region regarding a BRkNN query, which tightens the corresponding search space. Finally, based on the triangle inequality, we propose an efficient refinement algorithm for finding the exact BRkNN answers from the candidate regions. To evaluate our approach on answering the top-n query, it is compared with an approach which applies a state-of-the-art algorithm for answering the BRkNN query to each data point in G. The experiment results reveal that our approach has a much better performance.  相似文献   

20.
We investigate the periodic nature of the positive solutions of the fuzzy difference equation , where k, m are positive integers, A0, A1, are positive fuzzy numbers and the initial values xi, i = −d, −d + 1, … , −1, d = max{km}, are positive fuzzy numbers. In addition, we give conditions so that the solutions of this equation are unbounded.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号