首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In data mining applications, it is important to develop evaluation methods for selecting quality and profitable rules. This paper utilizes a non-parametric approach, Data Envelopment Analysis (DEA), to estimate and rank the efficiency of association rules with multiple criteria. The interestingness of association rules is conventionally measured based on support and confidence. For specific applications, domain knowledge can be further designed as measures to evaluate the discovered rules. For example, in market basket analysis, the product value and cross-selling profit associated with the association rule can serve as essential measures to rule interestingness. In this paper, these domain measures are also included in the rule ranking procedure for selecting valuable rules for implementation. An example of market basket analysis is applied to illustrate the DEA based methodology for measuring the efficiency of association rules with multiple criteria.  相似文献   

2.
Elicitation of classification rules by fuzzy data mining   总被引:1,自引:0,他引:1  
Data mining techniques can be used to find potentially useful patterns from data and to ease the knowledge acquisition bottleneck in building prototype rule-based systems. Based on the partition methods presented in simple-fuzzy-partition-based method (SFPBM) proposed by Hu et al. (Comput. Ind. Eng. 43(4) (2002) 735), the aim of this paper is to propose a new fuzzy data mining technique consisting of two phases to find fuzzy if–then rules for classification problems: one to find frequent fuzzy grids by using a pre-specified simple fuzzy partition method to divide each quantitative attribute, and the other to generate fuzzy classification rules from frequent fuzzy grids. To improve the classification performance of the proposed method, we specially incorporate adaptive rules proposed by Nozaki et al. (IEEE Trans. Fuzzy Syst. 4(3) (1996) 238) into our methods to adjust the confidence of each classification rule. For classification generalization ability, the simulation results from the iris data demonstrate that the proposed method may effectively derive fuzzy classification rules from training samples.  相似文献   

3.
4.
This paper introduces a new mathematical method for improving the discrimination power of data envelopment analysis and to completely rank the efficient decision-making units (DMUs). Fuzzy concept is utilised. For this purpose, first all DMUs are evaluated with the CCR model. Thereafter, the resulted weights for each output are considered as fuzzy sets and are then converted to fuzzy numbers. The introduced model is a multi-objective linear model, endpoints of which are the highest and lowest of the weighted values. An added advantage of the model is its ability to handle the infeasibility situation sometimes faced by previously introduced models.  相似文献   

5.
Business rules are an effective way to control data quality. Business experts can directly enter the rules into appropriate software without error prone communication with programmers. However, not all business situations and possible data quality problems can be considered in advance. In situations where business rules have not been defined yet, patterns of data handling may arise in practice. We employ data mining to accounting transactions in order to discover such patterns. The discovered patterns are represented in form of association rules. Then, deviations from discovered patterns can be marked as potential data quality violations that need to be examined by humans. Data quality breaches can be expensive but manual examination of many transactions is also expensive. Therefore, the goal is to find a balance between marking too many and too few transactions as being potentially erroneous. We apply appropriate procedures to evaluate the classification accuracy of developed association rules and support the decision on the number of deviations to be manually examined based on economic principles.  相似文献   

6.
This paper deals with the problem of finding the optimum site for a railway station for the city of Mashhad, northeast Iran, using the methods of analytical hierarchy process (AHP) and data envelopment analysis (DEA). The paper identifies a four-level hierarchy model for the railway station site-selection problem. The model uses four main criteria: (1) rail-related, (2) passenger services, (3) architecture and urbanism, and (4) economics. In addition, there are 26 subcriteria as well as five (potential) candidates or alternatives. Comparison matrices are used to obtain the local weights and priorities of the railway-station candidates. A DEA model is proposed to determine the optimum site for a railway station. It is shown that the local priorities (or weights) obtained from the AHP can be defined as the multiple outputs of a DEA model for finding the best site for a railway station.  相似文献   

7.
Mining association rules are widely studied in data mining society. In this paper, we analyze the measure method of support–confidence framework for mining association rules, from which we find it tends to mine many redundant or unrelated rules besides the interesting ones. In order to ameliorate the criterion, we propose a new method of match as the substitution of confidence. We analyze in detail the property of the proposed measurement. Experimental results show that the generated rules by the improved method reveal high correlation between the antecedent and the consequent when the rules were compared with that produced by the support–confidence framework. Furthermore, the improved method decreases the generation of redundant rules.  相似文献   

8.
Data mining can dig out valuable information from databases to assist a business in approaching knowledge discovery and improving business intelligence. Database stores large structured data. The amount of data increases due to the advanced database technology and extensive use of information systems. Despite the price drop of storage devices, it is still important to develop efficient techniques for database compression. This paper develops a database compression method by eliminating redundant data, which often exist in transaction database. The proposed approach uses a data mining structure to extract association rules from a database. Redundant data will then be replaced by means of compression rules. A heuristic method is designed to resolve the conflicts of the compression rules. To prove its efficiency and effectiveness, the proposed approach is compared with two other database compression methods. Chin-Feng Lee is an associate professor with the Department of Information Management at Chaoyang University of Technology, Taiwan, R.O.C. She received her M.S. and Ph.D. degrees in 1994 and 1998, respectively, from the Department of Computer Science and Information Engineering at National Chung Cheng University. Her current research interests include database design, image processing and data mining techniques. S. Wesley Changchien is a professor with the Institute of Electronic Commerce at National Chung-Hsing University, Taiwan, R.O.C. He received a BS degree in Mechanical Engineering (1989) and completed his MS (1993) and Ph.D. (1996) degrees in Industrial Engineering at State University of New York at Buffalo, USA. His current research interests include electronic commerce, internet/database marketing, knowledge management, data mining, and decision support systems. Jau-Ji Shen received his Ph.D. degree in Information Engineering and Computer Science from National Taiwan University at Taipei, Taiwan in 1988. From 1988 to 1994, he was the leader of the software group in Institute of Aeronautic, Chung-Sung Institute of Science and Technology. He is currently an associate professor of information management department in the National Chung Hsing University at Taichung. His research areas focus on the digital multimedia, database and information security. His current research areas focus on data engineering, database techniques and information security. Wei-Tse Wang received the B.A. (2001) and M.B.A (2003) degrees in Information Management at Chaoyang University of Technology, Taiwan, R.O.C. His research interests include data mining, XML, and database compression.  相似文献   

9.
A mixed integer linear model for selecting the best decision making unit (DMU) in data envelopment analysis (DEA) has recently been proposed by Foroughi [Foroughi, A. A. (2011a). A new mixed integer linear model for selecting the best decision making units in data envelopment analysis. Computers and Industrial Engineering, 60(4), 550–554], which involves many unnecessary constraints and requires specifying an assurance region (AR) for input weights and output weights, respectively. Its selection of the best DMU is easy to be affected by outliers and may sometimes be incorrect. To avoid these drawbacks, this paper proposes three alternative mixed integer linear programming (MILP) models for identifying the most efficient DMU under different returns to scales, which contain only essential constraints and decision variables and are much simpler and more succinct than Foroughi’s. The proposed alternative MILP models can make full use of input and output information without the need of specifying any assurance regions for input and output weights to avoid zero weights, can make correct selections without being affected by outliers, and are of significant importance to the decision makers whose concerns are not DMU ranking, but the correct selection of the most efficient DMU. The potential applications of the proposed alternative MILP models and their effectiveness are illustrated with four numerical examples.  相似文献   

10.
In this paper, a new method for aggregating the opinions of experts in a preferential voting system is proposed. The method, which uses fuzzy concept in handling crisp data, is computationally efficient and is able to completely rank the alternatives. Through this method, the number of votes for certain rank position that each alternative receives are first grouped together to form fuzzy numbers. The nearest point to a fuzzy number concept is then used to introduce an artificial ideal alternative. Data envelopment analysis is next used to find the efficiency scores of the alternatives in a pair-wise comparison with the artificial ideal alternative. Alternatives are rank based on these efficiency scores. If the alternatives are not completely ranked, a weight restriction method also based on fuzzy concept is used on the un-discriminated alternatives until they are completely ranked. Two examples are given for illustration of the method.  相似文献   

11.
Online mining of fuzzy multidimensional weighted association rules   总被引:1,自引:1,他引:0  
This paper addresses the integration of fuzziness with On-Line Analytical Processing (OLAP) based association rules mining. It contributes to the ongoing research on multidimensional online association rules mining by proposing a general architecture that utilizes a fuzzy data cube for knowledge discovery. A data cube is mainly constructed to provide users with the flexibility to view data from different perspectives as some dimensions of the cube contain multiple levels of abstraction. The first step of the process described in this paper involves introducing fuzzy data cube as a remedy to the problem of handling quantitative values of dimensional attributes in a cube. This facilitates the online mining of fuzzy association rules at different levels within the constructed fuzzy data cube. Then, we investigate combining the concepts of weight and multiple-level to mine fuzzy weighted multi-cross-level association rules from the constructed fuzzy data cube. For this purpose, three different methods are introduced for single dimension, multidimensional and hybrid (integrates the other two methods) fuzzy weighted association rules mining. Each of the three methods utilizes a fuzzy data cube constructed to suite the particular method. To the best of our knowledge, this is the first effort in this direction. We compared the proposed approach to an existing approach that does not utilize fuzziness. Experimental results obtained for each of the three methods on a synthetic dataset and on the adult data of the United States census in year 2000 demonstrate the effectiveness and applicability of the proposed fuzzy OLAP based mining approach. OLAP is one of the most popular tools for on-line, fast and effective multidimensional data analysis. In the OLAP framework, data is mainly stored in data hypercubes (simply called cubes).  相似文献   

12.
Association rules have been widely used in many application areas to extract new and useful information expressed in a comprehensive way for decision makers from raw data. However, raw data may not always be available, it can be distributed in multiple datasets and therefore there resulting number of association rules to be inspected is overwhelming. In the light of these observations, we propose meta-association rules, a new framework for mining association rules over previously discovered rules in multiple databases. Meta-association rules are a new tool that convey new information from the patterns extracted from multiple datasets and give a “summarized” representation about most frequent patterns. We propose and compare two different algorithms based respectively on crisp rules and fuzzy rules, concluding that fuzzy meta-association rules are suitable to incorporate to the meta-mining procedure the obtained quality assessment provided by the rules in the first step of the process, although it consumes more time than the crisp approach. In addition, fuzzy meta-rules give a more manageable set of rules for its posterior analysis and they allow the use of fuzzy items to express additional knowledge about the original databases. The proposed framework is illustrated with real-life data about crime incidents in the city of Chicago. Issues such as the difference with traditional approaches are discussed using synthetic data.  相似文献   

13.
Frequent pattern mining is an essential theme in data mining. Existing algorithms usually use a bottom-up search strategy. However, for very high dimensional data, this strategy cannot fully utilize the minimum support constraint to prune the rowset search space. In this paper, we propose a new method called top-down mining together with a novel row enumeration tree to make full use of the pruning power of the minimum support constraint. Furthermore, to efficiently check if a rowset is closed, we develop a method called the trace-based method. Based on these methods, an algorithm called TD-Close is designed for mining a complete set of frequent closed patterns. To enhance its performance further, we improve it by using new pruning strategies and new data structures that lead to a new algorithm TTD-Close. Our performance study shows that the top-down strategy is effective in cutting down search space and saving memory space, while the trace-based method facilitates the closeness-checking. As a result, the algorithm TTD-Close outperforms the bottom-up search algorithms such as Carpenter and FPclose in most cases. It also runs faster than TD-Close.  相似文献   

14.
提出一种基于免疫原理的人工免疫算法,用于模糊关联规则的挖掘.该算法通过借鉴生物免疫系统中的克隆选择原理来实施优化操作,它直接从给出的数据中,通过优化机制自动确定每个属性对应的模糊集合,使推导出的满足条件的模糊关联规则数目最多.将实际数据集和相关算法进行性能比较,实验结果表明了所提出算法的有效性.  相似文献   

15.
Utility of an itemset is considered as the value of this itemset, and utility mining aims at identifying the itemsets with high utilities. The temporal high utility itemsets are the itemsets whose support is larger than a pre-specified threshold in current time window of the data stream. Discovery of temporal high utility itemsets is an important process for mining interesting patterns like association rules from data streams. In this paper, we propose a novel method, namely THUI (Temporal High Utility Itemsets)-Mine, for mining temporal high utility itemsets from data streams efficiently and effectively. To the best of our knowledge, this is the first work on mining temporal high utility itemsets from data streams. The novel contribution of THUI-Mine is that it can effectively identify the temporal high utility itemsets by generating fewer candidate itemsets such that the execution time can be reduced substantially in mining all high utility itemsets in data streams. In this way, the process of discovering all temporal high utility itemsets under all time windows of data streams can be achieved effectively with less memory space and execution time. This meets the critical requirements on time and space efficiency for mining data streams. Through experimental evaluation, THUI-Mine is shown to significantly outperform other existing methods like Two-Phase algorithm under various experimental conditions.  相似文献   

16.
In the domain of association rules mining (ARM) discovering the rules for numerical attributes is still a challenging issue. Most of the popular approaches for numerical ARM require a priori data discretization to handle the numerical attributes. Moreover, in the process of discovering relations among data, often more than one objective (quality measure) is required, and in most cases, such objectives include conflicting measures. In such a situation, it is recommended to obtain the optimal trade-off between objectives. This paper deals with the numerical ARM problem using a multi-objective perspective by proposing a multi-objective particle swarm optimization algorithm (i.e., MOPAR) for numerical ARM that discovers numerical association rules (ARs) in only one single step. To identify more efficient ARs, several objectives are defined in the proposed multi-objective optimization approach, including confidence, comprehensibility, and interestingness. Finally, by using the Pareto optimality the best ARs are extracted. To deal with numerical attributes, we use rough values containing lower and upper bounds to show the intervals of attributes. In the experimental section of the paper, we analyze the effect of operators used in this study, compare our method to the most popular evolutionary-based proposals for ARM and present an analysis of the mined ARs. The results show that MOPAR extracts reliable (with confidence values close to 95%), comprehensible, and interesting numerical ARs when attaining the optimal trade-off between confidence, comprehensibility and interestingness.  相似文献   

17.
This paper proposes a data envelopment analysis (DEA) approach to measurement and benchmarking of service quality. Dealing with measurement of overall service quality of multiple units with SERVPERF as multiple-criteria decision-making (MCDM), the proposed approach utilizes DEA, in particular, the pure output model without inputs. The five dimensions of SERVPERF are considered as outputs of the DEA model. A case study of auto repair services is provided for the purpose of illustration. The current practice of benchmarking of service quality with SERVQUAL/SERVPERF is limited in that there is little guidance to whom to benchmark and to what degree service quality should be improved. This study contributes to the field of service quality benchmarking by overcoming the above limitations, taking advantage of DEA’s capability to handle MCDM problems and provide benchmarking guidelines.  相似文献   

18.
One of the primary issues on data envelopment analysis (DEA) models is the reduction of weights flexibility. There are literally several studies to determine common weights in DEA but none of them considers uncertainty in data. This paper introduces a robust optimization approach to find common weights in DEA with uncertain data. The uncertainty is considered in both inputs and outputs and a suitable robust counterpart of DEA model is developed. The proposed robust DEA model is solved and the ideal solution is found for each decision making units (DMUs). Then, the common weights are found for all DMUs by utilizing the goal programming technique. To illustrate the performance of the proposed model, a numerical example is solved. Also, the proposed model of this paper is implemented by using some actual data from provincial gas companies in Iran.  相似文献   

19.
Data mining techniques, extracting patterns from large databases are the processes that focus on the automatic exploration and analysis of large quantities of raw data in order to discover meaningful patterns and rules. In the process of applying the methods, most of the managers who are engaging the business encounter a multitude of rules resulted from the data mining technique. In view of multi-faceted characteristics of such rules, in general, the rules are featured by multiple conflicting criteria that are directly related with the business values, such as, e.g. expected monetary value or incremental monetary value.

In the paper, we present a method for rule prioritization, taking into account the business values which are comprised of objective metric or managers’ subjective judgments. The proposed methodology is an attempt to make synergy with decision analysis techniques for solving problems in the domain of data mining. We believe that this approach would be particularly useful for the business managers who are suffering from rule quality or quantity problems, conflicts between extracted rules, and difficulties of building a consensus in case several managers are involved for the rule selection.  相似文献   


20.
Association rule is one of the data mining techniques involved in discovering information that represents the association among data. Data in the database sometimes appear infrequent but highly associated with a specific data. This paper proposes a technique for significant rare data by introducing second support in discovering the association rules of such data. We show that the proposed approach provides better performance as compared to standard association rules techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号