首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Mining Fuzzy Multiple-Level Association Rules from Quantitative Data   总被引:2,自引:0,他引:2  
Machine-learning and data-mining techniques have been developed to turn data into useful task-oriented knowledge. Most algorithms for mining association rules identify relationships among transactions using binary values and find rules at a single-concept level. Transactions with quantitative values and items with hierarchical relationships are, however, commonly seen in real-world applications. This paper proposes a fuzzy multiple-level mining algorithm for extracting knowledge implicit in transactions stored as quantitative values. The proposed algorithm adopts a top-down progressively deepening approach to finding large itemsets. It integrates fuzzy-set concepts, data-mining technologies and multiple-level taxonomy to find fuzzy association rules from transaction data sets. Each item uses only the linguistic term with the maximum cardinality in later mining processes, thus making the number of fuzzy regions to be processed the same as the number of original items. The algorithm therefore focuses on the most important linguistic terms for reduced time complexity.  相似文献   

2.
Data-mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Most conventional data-mining algorithms identify the relationships among transactions using binary values, however, transactions with quantitative values are commonly seen in real-world applications. This paper thus proposes a new data-mining algorithm for extracting interesting knowledge from transactions stored as quantitative values. The proposed algorithm integrates fuzzy set concepts and the apriori mining algorithm to find interesting fuzzy association rules in given transaction data sets. Experiments with student grades at I-Shou University were also made to verify the performance of the proposed algorithm.  相似文献   

3.
Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. In real-world applications, transactions may contain quantitative values and each item may have a lifespan from a temporal database. In this paper, we thus propose a data mining algorithm for deriving fuzzy temporal association rules. It first transforms each quantitative value into a fuzzy set using the given membership functions. Meanwhile, item lifespans are collected and recorded in a temporal information table through a transformation process. The algorithm then calculates the scalar cardinality of each linguistic term of each item. A mining process based on fuzzy counts and item lifespans is then performed to find fuzzy temporal association rules. Experiments are finally performed on two simulation datasets and the foodmart dataset to show the effectiveness and the efficiency of the proposed approach.  相似文献   

4.
Data mining is most commonly used in attempts to induce association rules from databases which can help decision-makers easily analyze the data and make good decisions regarding the domains concerned. Different studies have proposed methods for mining association rules from databases with crisp values. However, the data in many real-world applications have a certain degree of imprecision. In this paper we address this problem, and propose a new data-mining algorithm for extracting interesting knowledge from databases with imprecise data. The proposed algorithm integrates imprecise data concepts and the fuzzy apriori mining algorithm to find interesting fuzzy association rules in given databases. Experiments for diagnosing dyslexia in early childhood were made to verify the performance of the proposed algorithm.  相似文献   

5.
6.
Data mining is most commonly used in attempts to induce association rules from transaction data. Transactions in real-world applications, however, usually consist of quantitative values. This paper thus proposes a fuzzy data-mining algorithm for extracting both association rules and membership functions from quantitative transactions. We present a GA-based framework for finding membership functions suitable for mining problems and then use the final best set of membership functions to mine fuzzy association rules. The fitness of each chromosome is evaluated by the number of large 1-itemsets generated from part of the previously proposed fuzzy mining algorithm and by the suitability of the membership functions. Experimental results also show the effectiveness of the framework.  相似文献   

7.
Mining useful information and helpful knowledge from large databases has evolved into an important research area in recent years. Among the classes of knowledge derived, finding sequential patterns in temporal transaction databases is very important since it can help model customer behavior. In the past, researchers usually assumed databases were static to simplify data-mining problems. In real-world applications, new transactions may be added into databases frequently. Designing an efficient and effective mining algorithm that can maintain sequential patterns as a database grows is thus important. In this paper, we propose a novel incremental mining algorithm for maintaining sequential patterns based on the concept of pre-large sequences to reduce the need for rescanning original databases. Pre-large sequences are defined by a lower support threshold and an upper support threshold that act as gaps to avoid the movements of sequences directly from large to small and vice versa. The proposed algorithm does not require rescanning original databases until the accumulative amount of newly added customer sequences exceeds a safety bound, which depends on database size. Thus, as databases grow larger, the numbers of new transactions allowed before database rescanning is required also grow. The proposed approach thus becomes increasingly efficient as databases grow.  相似文献   

8.
Genetic-Fuzzy Data Mining With Divide-and-Conquer Strategy   总被引:1,自引:0,他引:1  
Data mining is most commonly used in attempts to induce association rules from transaction data. Most previous studies focused on binary-valued transaction data. Transaction data in real-world applications, however, usually consist of quantitative values. This paper, thus, proposes a fuzzy data-mining algorithm for extracting both association rules and membership functions from quantitative transactions. A genetic algorithm (GA)-based framework for finding membership functions suitable for mining problems is proposed. The fitness of each set of membership functions is evaluated by the fuzzy-supports of the linguistic terms in the large 1-itemsets and by the suitability of the derived membership functions. The evaluation by the fuzzy supports of large 1-itemsets is much faster than that when considering all itemsets or interesting association rules. It can also help divide-and-conquer the derivation process of the membership functions for different items. The proposed GA framework, thus, maintains multiple populations, each for one item's membership functions. The final best sets of membership functions in all the populations are then gathered together to be used for mining fuzzy association rules. Experiments are conducted to analyze different fitness functions and set different fitness functions and setting different supports and confidences. Experiments are also conducted to compare the proposed algorithm, the one with uniform fuzzy partition, and the existing one without divide-and-conquer, with results validating the performance of the proposed algorithm.  相似文献   

9.
Time series analysis has always been an important and interesting research field due to its frequent appearance in different applications. In the past, many approaches based on regression, neural networks and other mathematical models were proposed to analyze the time series. In this paper, we attempt to use the data mining technique to analyze time series. Many previous studies on data mining have focused on handling binary-valued data. Time series data, however, are usually quantitative values. We thus extend our previous fuzzy mining approach for handling time-series data to find linguistic association rules. The proposed approach first uses a sliding window to generate continues subsequences from a given time series and then analyzes the fuzzy itemsets from these subsequences. Appropriate post-processing is then performed to remove redundant patterns. Experiments are also made to show the performance of the proposed mining algorithm. Since the final results are represented by linguistic rules, they will be friendlier to human than quantitative representation.  相似文献   

10.
Mining linguistic browsing patterns in the world wide web   总被引:2,自引:0,他引:2  
 World-wide-web applications have grown very rapidly and have made a significant impact on computer systems. Among them, web browsing for useful information may be most commonly seen. Due to its tremendous amounts of use, efficient and effective web retrieval has thus become a very important research topic in this field. Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for a certain purpose. In this paper, we use the data mining techniques to discover relevant browsing behavior from log data in web servers, thus being able to help make rules for retrieval of web pages. The browsing time of a customer on each web page is used to analyze the retrieval behavior. Since the data collected are numeric, fuzzy concepts are used to process them and to form linguistic terms. A sophisticated web-mining algorithm is thus proposed to find relevant browsing behavior from the linguistic data. Each page uses only the linguistic term with the maximum cardinality in later mining processes, thus making the number of fuzzy regions to be processed the same as the number of the pages. Computational time can thus be greatly reduced. The patterns mined out thus exhibit the browsing behavior and can be used to provide some appropriate suggestions to web-server managers.  相似文献   

11.
A genetic-fuzzy mining approach for items with multiple minimum supports   总被引:2,自引:2,他引:0  
Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Mining association rules from transaction data is most commonly seen among the mining techniques. Most of the previous mining approaches set a single minimum support threshold for all the items and identify the relationships among transactions using binary values. In the past, we proposed a genetic-fuzzy data-mining algorithm for extracting both association rules and membership functions from quantitative transactions under a single minimum support. In real applications, different items may have different criteria to judge their importance. In this paper, we thus propose an algorithm which combines clustering, fuzzy and genetic concepts for extracting reasonable multiple minimum support values, membership functions and fuzzy association rules from quantitative transactions. It first uses the k-means clustering approach to gather similar items into groups. All items in the same cluster are considered to have similar characteristics and are assigned similar values for initializing a better population. Each chromosome is then evaluated by the criteria of requirement satisfaction and suitability of membership functions to estimate its fitness value. Experimental results also show the effectiveness and the efficiency of the proposed approach.  相似文献   

12.
The goal of data mining is to find out interesting and meaningful patterns from large databases. In some real applications, many data are quantitative and linguistic. Fuzzy data mining was thus proposed to discover fuzzy knowledge from this kind of data. In the past, two mining algorithms based on the ant colony systems were proposed to find suitable membership functions for fuzzy association rules. They transformed the problem into a multi-stage graph, with each route representing a possible set of membership functions, and then, used the any colony system to solve it. They, however, searched for solutions in a discrete solution space in which the end points of membership functions could be adjusted only in a discrete way. The paper, thus, extends the original approaches to continuous search space, and a fuzzy mining algorithm based on the continuous ant approach is proposed. The end points of the membership functions may be moved in the continuous real-number space. The encoding representation and the operators are also designed for being suitable in the continuous space, such that the actual global optimal solution is contained in the search space. Besides, the proposed approach does not have fixed edges and nodes in the search process. It can dynamically produce search edges according to the distribution functions of pheromones in the solution space. Thus, it can get a better nearly global optimal solution than the previous two ant-based fuzzy mining approaches. The experimental results show the good performance of the proposed approach as well.  相似文献   

13.
The rough-set theory proposed by Pawlak, has been widely used in dealing with data classification problems. The original rough-set model is, however, quite sensitive to noisy data. Ziarko thus proposed the variable precision rough-set model to deal with noisy data and uncertain information. This model allowed for some degree of uncertainty and misclassification in the mining process. Conventionally, the mining algorithms based on the rough-set theory identify the relationships among data using crisp attribute values; however, data with quantitative values are commonly seen in real-world applications. This paper thus deals with the problem of producing a set of fuzzy certain and fuzzy possible rules from quantitative data with a predefined tolerance degree of uncertainty and misclassification. A new method, which combines the variable precision rough-set model and the fuzzy set theory, is thus proposed to solve this problem. It first transforms each quantitative value into a fuzzy set of linguistic terms using membership functions and then calculates the fuzzy β-lower and the fuzzy β-upper approximations. The certain and possible rules are then generated based on these fuzzy approximations. These rules can then be used to classify unknown objects. The paper thus extends the existing rough-set mining approaches to process quantitative data with tolerance of noise and uncertainty.  相似文献   

14.
Frequent sequential pattern mining has become one of the most important tasks in data mining. It has many applications, such as sequential analysis, classification, and prediction. How to generate candidates and how to control the combinatorically explosive number of intermediate subsequences are the most difficult problems. Intelligent systems such as recommender systems, expert systems, and business intelligence systems use only a few patterns, namely those that satisfy a number of defined conditions. Challenges include the mining of top-k patterns, top-rank-k patterns, closed patterns, and maximal patterns. In many cases, end users need to find itemsets that occur with a sequential pattern. Therefore, this paper proposes approaches for mining top-k co-occurrence items usually found with a sequential pattern. The Naive Approach Mining (NAM) algorithm discovers top-k co-occurrence items by directly scanning the sequence database to determine the frequency of items. The Vertical Approach Mining (VAM) algorithm is based on vertical database scanning. The Vertical with Index Approach Mining (VIAM) algorithm is based on a vertical database with index scanning. VAM and VIAM use pruning strategies to reduce the search space, thus improving performance. VAM and VIAM are especially effective in mining the co-occurrence items of a long input pattern. The three algorithms were evaluated using real-world databases. The experimental results show that these algorithms perform well, especially VAM and VIAM.  相似文献   

15.
Modification of records in databases is common in real-world applications. Developing an efficient and effective mining algorithm to maintain discovered information as the records in a database are updated is thus quite important in the field of data mining. Although association rules for modification of records can be maintained by using deletion and insertion procedures, this requires twice the computation time needed for a single procedure. In this paper, we present a new modification algorithm to resolve this issue. The concept of pre-large itemsets is used to reduce the need for rescanning original databases and to save maintenance costs. The proposed algorithm does not require rescanning of original databases until a specified number of records have been modified. If the database is large, then the number of modified records allowed will also be large. This characteristic is especially useful for real-world applications.  相似文献   

16.
In real-world applications, transactions usually consist of quantitative values. Many fuzzy data mining approaches have thus been proposed for finding fuzzy association rules with the predefined minimum support from the give quantitative transactions. However, the common problems of those approaches are that an appropriate minimum support is hard to set, and the derived rules usually expose common-sense knowledge which may not be interesting in business point of view. In this paper, an algorithm for mining fuzzy coherent rules is proposed for overcoming those problems with the properties of propositional logic. It first transforms quantitative transactions into fuzzy sets. Then, those generated fuzzy sets are collected to generate candidate fuzzy coherent rules. Finally, contingency tables are calculated and used for checking those candidate fuzzy coherent rules satisfy the four criteria or not. If yes, it is a fuzzy coherent rule. Experiments on the foodmart dataset are also made to show the effectiveness of the proposed algorithm.  相似文献   

17.
In the past, many algorithms were proposed to adopt fuzzy-set theory for discovering fuzzy association rules from quantitative databases. The fuzzy frequent pattern (FFP)-tree and the compressed fuzzy frequent pattern (CFFP)-tree algorithms were respectively proposed to mine the incomplete fuzzy frequent itemsets from the tree-based structures. In the past, multiple fuzzy frequent pattern (MFFP)-tree algorithm was proposed to keep more linguistic terms for mining fuzzy frequent itemsets. Since the MFFP-tree algorithm inherits the property of the FFP-tree algorithm, numerous tree nodes are thus required to build the MFFP-tree structure for mining the desired multiple fuzzy frequent itemsets. In this paper, the compressed multiple fuzzy frequent pattern (CMFFP)-tree algorithm is designed to keep not only the linguistic term with maximum membership value but also the other frequent linguistic terms for mining the completely fuzzy frequent itemsets. In the designed CMFFP-tree algorithm, the multiple frequent linguistic terms are sorted in descending order of their occurrence frequencies to build the CMFFP-tree structure. The construction process is the same as the CFFP-tree algorithm except more information are kept for later mining process to discover the completely fuzzy frequent itemsets. Each node in the CMFFP-tree uses the additional array to keep the membership values of its prefix path by intersection operation. A CMFFP-mine algorithm is also designed to efficiently mine the multiple fuzzy frequent itemsets from the developed CMFFP-tree structure. Experiments are then conducted to show the performance of the proposed CMFFP-tree algorithm in terms of execution time and the number of tree nodes, compared to those of the MFFP-tree and CFFP-tree algorithms.  相似文献   

18.
Weighted sequential pattern mining has recently been discussed in the field of data mining. Different from traditional sequential pattern mining, this kind of mining considers different significances of items in real applications, such as cost or profit. Most of the related studies adopt the maximum weighted upper-bound model to find weighted sequential patterns, but they generate a large number of unpromising candidate subsequences. In this study, we thus propose an efficient approach for finding weighted sequential patterns from sequence databases. In particular, a tightening strategy in the proposed approach is proposed to obtain more accurate weighted upper-bounds for subsequences in mining. Through the experimental evaluation, the results also show the proposed approach has good performance in terms of pruning effectiveness and execution efficiency.  相似文献   

19.
An ACS-based framework for fuzzy data mining   总被引:1,自引:0,他引:1  
Data mining is often used to find out interesting and meaningful patterns from huge databases. It may generate different kinds of knowledge such as classification rules, clusters, association rules, and among others. A lot of researches have been proposed about data mining and most of them focused on mining from binary-valued data. Fuzzy data mining was thus proposed to discover fuzzy knowledge from linguistic or quantitative data. Recently, ant colony systems (ACS) have been successfully applied to optimization problems. However, few works have been done on applying ACS to fuzzy data mining. This thesis thus attempts to propose an ACS-based framework for fuzzy data mining. In the framework, the membership functions are first encoded into binary-bits and then fed into the ACS to search for the optimal set of membership functions. The problem is then transformed into a multi-stage graph, with each route representing a possible set of membership functions. When the termination condition is reached, the best membership function set (with the highest fitness value) can then be used to mine fuzzy association rules from a database. At last, experiments are made to make a comparison with other approaches and show the performance of the proposed framework.  相似文献   

20.
Cluster-Based Evaluation in Fuzzy-Genetic Data Mining   总被引:2,自引:0,他引:2  
Data mining is commonly used in attempts to induce association rules from transaction data. Most previous studies focused on binary-valued transaction data. Transactions in real-world applications, however, usually consist of quantitative values. In the past, we proposed a fuzzy-genetic data-mining algorithm for extracting both association rules and membership functions from quantitative transactions. It used a combination of large 1-itemsets and membership-function suitability to evaluate the fitness values of chromosomes. The calculation for large 1-itemsets could take a lot of time, especially when the database to be scanned could not totally fed into main memory. In this paper, an enhanced approach, called the cluster-based fuzzy-genetic mining algorithm, is thus proposed to speed up the evaluation process and keep nearly the same quality of solutions as the previous one. It divides the chromosomes in a population into clusters by the - means clustering approach and evaluates each individual according to both cluster and their own information. Experimental results also show the effectiveness and efficiency of the proposed approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号