首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 468 毫秒
1.
随着基于机器学习的文本自动分类方法成为主流分类技术,基于机器学习的文本分类方法往往忽视了对规则分类方法的有效运用。该文将基于规则的分类思想和基于机器学习的分类方法有机地结合起来,把规则判别看作一个分量分类器,提出了一种辅以规则补充的双层文本分类模型和一种优化的分类规则学习算法。根据该方法设计并实现了一个基于规则和N-Gram统计分类相结合的双层分类器,进行了双层分类模型与单独的N-Gram分类模型的实验,结果表明辅以规则补充的双层分类器具有更好的分类性能。  相似文献   

2.
One of the known classification approaches in data mining is rule induction (RI). RI algorithms such as PRISM usually produce If-Then classifiers, which have a comparable predictive performance to other traditional classification approaches such as decision trees and associative classification. Hence, these classifiers are favourable for carrying out decisions by users and therefore they can be utilised as decision making tools. Nevertheless, RI methods, including PRISM and its successors, suffer from a number of drawbacks primarily the large number of rules derived. This can be a burden especially when the input data is largely dimensional. Therefore, pruning unnecessary rules becomes essential for the success of this type of classifiers. This article proposes a new RI algorithm that reduces the search space for candidate rules by early pruning any irrelevant items during the process of building the classifier. Whenever a rule is generated, our algorithm updates the candidate items frequency to reflect the discarded data examples associated with the rules derived. This makes items frequency dynamic rather static and ensures that irrelevant rules are deleted in preliminary stages when they don't hold enough data representation. The major benefit will be a concise set of decision making rules that are easy to understand and controlled by the decision maker. The proposed algorithm has been implemented in WEKA (Waikato Environment for Knowledge Analysis) environment and hence it can now be utilised by different types of users such as managers, researchers, students and others. Experimental results using real data from the security domain as well as sixteen classification datasets from University of California Irvine (UCI) repository reveal that the proposed algorithm is competitive in regards to classification accuracy when compared to known RI algorithms. Moreover, the classifiers produced by our algorithm are smaller in size which increase their possible use in practical applications.  相似文献   

3.
Building a high accuracy classifier for classification is a problem in real applications. One high accuracy classifier used for this purpose is based on association rules. In the past, some researches showed that classification based on association rules (or class-association rules – CARs) has higher accuracy than that of other rule-based methods such as ILA and C4.5. However, mining CARs consumes more time because it mines a complete rule set. Therefore, improving the execution time for mining CARs is one of the main problems with this method that needs to be solved. In this paper, we propose a new method for mining class-association rule. Firstly, we design a tree structure for the storage frequent itemsets of datasets. Some theorems for pruning nodes and computing information in the tree are developed after that, and then, based on the theorems, we propose an efficient algorithm for mining CARs. Experimental results show that our approach is more efficient than those used previously.  相似文献   

4.
Having received considerable interest in recent years, associative classification has focused on developing a class classifier, with lesser attention paid to the probability classifier used in direct marketing. While contributing to this integrated framework, this work attempts to increase the prediction accuracy of associative classification on class imbalance by adapting the scoring based on associations (SBA) algorithm. The SBA algorithm is modified by coupling it with the pruning strategy of association rules in the probabilistic classification based on associations (PCBA) algorithm, which is adjusted from the CBA for use in the structure of the probability classifier. PCBA is adjusted from CBA by increasing the confidence through under-sampling, setting different minimum supports (minsups) and minimum confidences (minconfs) for rules of different classes based on each distribution, and removing the pruning rules of the lowest error rate. Experimental results based on benchmark datasets and real-life application datasets indicate that the proposed method performs better than C5.0 and the original SBA do, and the number of rules required for scoring is significantly reduced.  相似文献   

5.
Classification plays an important role in decision support systems. A lot of methods for mining classification rules have been developed in recent years, such as C4.5 and ILA. These methods are, however, based on heuristics and greedy approaches to generate rule sets that are either too general or too overfitting for a given dataset. They thus often yield high error ratios. Recently, a new method for classification from data mining, called the Classification Based on Associations (CBA), has been proposed for mining class-association rules (CARs). This method has more advantages than the heuristic and greedy methods in that the former could easily remove noise, and the accuracy is thus higher. It can additionally generate a rule set that is more complete than C4.5 and ILA. One of the weaknesses of mining CARs is that it consumes more time than C4.5 and ILA because it has to check its generated rule with the set of the other rules. We thus propose an efficient pruning approach to build a classifier quickly. Firstly, we design a lattice structure and propose an algorithm for fast mining CARs using this lattice. Secondly, we develop some theorems and propose an algorithm for pruning redundant rules quickly based on these theorems. Experimental results also show that the proposed approach is more efficient than those used previously.  相似文献   

6.
传统关联规则挖掘在面临分类决策问题时,易出现非频繁规则遗漏、预测精度不高的问题。为得到正确合理且更为完整的规则,提出了一种改进方法 DT-AR(decision tree-association rule algorithm),利用决策树剪枝策略对关联规则集进行补充。该方法利用FP-Growth(frequent pattern growth)算法得到关联规则集,利用C4.5算法构建后剪枝决策树并提取分类规则,在进行置信度迭代筛选后与关联规则集取并集修正,利用置信度作为权重系数采取投票法进行分类。实验结果表明,与传统关联规则挖掘和决策树剪枝方法相比,该方法得到的规则在数据集分类结果上更准确。  相似文献   

7.
A hybrid coevolutionary algorithm for designing fuzzy classifiers   总被引:1,自引:0,他引:1  
Rule learning is one of the most common tasks in knowledge discovery. In this paper, we investigate the induction of fuzzy classification rules for data mining purposes, and propose a hybrid genetic algorithm for learning approximate fuzzy rules. A novel niching method is employed to promote coevolution within the population, which enables the algorithm to discover multiple rules by means of a coevolutionary scheme in a single run. In order to improve the quality of the learned rules, a local search method was devised to perform fine-tuning on the offspring generated by genetic operators in each generation. After the GA terminates, a fuzzy classifier is built by extracting a rule set from the final population. The proposed algorithm was tested on datasets from the UCI repository, and the experimental results verify its validity in learning rule sets and comparative advantage over conventional methods.  相似文献   

8.
针对现有关联分类算法资源消耗大、规则剪枝难、分类模型复杂的缺陷,提出了一种基于分类修剪的关联分类算法改进方案ACCP.根据分类属性值的不同对分类规则前项进行分块挖掘,并对频繁项集挖掘过程和规则修剪进行了改进,有效提高了分类准确率和算法运行效率.实验结果表明,此算法改进方案相比传统CBA算法和C4.5决策树算法有着更高的分类准确率,取得了较好的应用效果.  相似文献   

9.
决策树C4.5算法在数据挖掘中的分析及其应用   总被引:5,自引:0,他引:5  
决策树是归纳学习和数据挖掘的重要方法,通常用来形成分类器和预测模型。分类器是数据挖掘的一种基本方法。本文对分类器的基本概念、C4.5算法、决策树的构建和剪枝进行了介绍,然后将C4.5算法应用于篮球比赛的技术统计分析中,通过对这些数据分析从而得到一些较实用的预测胜负规则。  相似文献   

10.
研究分析了现有关联规则分类算法,总结了一般关联规则分类存在的不足,提出了一个基于关联规则挖掘技术构造分类器的新方法。该方法解决了传统算法产生规则太多,分类模型难以理解的问题。  相似文献   

11.
基于支持度和置信度模型的关联规则剪枝算法会挖掘出很多无趣规则。针对该问题,提出一种正相关性指导下的关联规则剪枝算法。利用全置信度和提升度构造一个正相关性评价函数,以此对频繁项集进行剪枝。实验结果表明,该算法能减少无趣关联规则数量,提升挖掘结果质量,缩短挖掘时间。  相似文献   

12.
基于类频繁模式树的关联分类   总被引:1,自引:0,他引:1  
提出一种新的基于类频繁模式树的关联分类算法CFPC(Class FP-tree based Classifier).该方法基于FP-tree实现,无需生成庞大的候选项目集;依据记录的分类属性进行指导性划分,并使用类支持度进行记录项的分类剪枝,生成类模式树,避免了小数据类别集上的强关联模式遗漏;挖掘出的规则形成分类器,用于类标号未知的记录的区分.试验结果表明CFPC的正确性和有效性.  相似文献   

13.
罗军  况夯 《计算机应用》2008,28(9):2386-2388
提出一种新颖的基于Boosting模糊分类的文本分类方法。首先采用潜在语义索引(LSI)对文本特征进行选择;然后提出Boosting算法集成模糊分类器学习,在每轮迭代训练过程中,算法通过调整训练样本的分布,利用遗传算法产生分类规则。减少分类规则能够正确分类样本的权值,使得新产生的分类规则重点考虑难于分类的样本。实验结果表明,该文本分类算法具有良好分类的性能。  相似文献   

14.
Modifying the database originally designed for maritime transportation and harbourside sailing safety management in order to train the data classifier to detect smuggling and illegal immigrant vessels may lead to a lack of class-related attributes, and the accuracy rate of classification may be low. However, some valuable artificial classification rules already exist prior to the building of a data classifier, and it is worth attempting to incorporate them into the database and thus build a more accurate data classifier. With the limiting condition that no extra reference point of the terrain can be added into the system, this research selects the artificial smuggling detection rule “vessels sail fast towards shore” deduced by Coast Guard’s officers, sets the radar station as the reference point of the shore coordinates, and defines a new attribute, the absolute difference between the anti-course and target azimuth, to demonstrate the “vessels sail approximately towards radar station or sea” concept. Matching the original “speed” attribute, the newly-modified database includes the intrinsic meaning of “vessels sail fast towards radar station or sea”. The results show that the accuracy rate of the two data classifiers, the back-propagation neural network (BPN) and support vector machine (SVM), improved by 22% and 14%, respectively. It is significant that incorporation of the expertise of human experts by new attribute construction contributes to the effective learning of the data classifier.  相似文献   

15.
基于规则置信度调整的关联文本分类   总被引:1,自引:0,他引:1  
基于关联规则的文本分类方法ARC-BC是目前已知的分类效果最好的关联规则分类算法.本文提出了利用ARC-BC分类器的封闭测试的结果对分类器进行调整规则置信度的算法RCA(Rules Confidence Adjustment),参与正确分类行为次数多于参与错误分类行为次数(即"威信"较高)的规则应该拥有更高的置信度,反之,就赋予更低的置信度.实验结果表明,经过RCA算法调整的关联文本分类器的分类效果得到显著提高.  相似文献   

16.
基于属性相关性的决策树规则生成算法   总被引:5,自引:0,他引:5  
范洁  常晓航  杨岳湘 《计算机仿真》2006,23(12):90-92,103
决策树方法因结构简单、便于理解和具有较高的分类精度而在数据挖掘中被广泛采用,其规则生成算法实现对决策树规则的提取和化简。属性相关性分析的基本思想是计算某种度量,用于量化属性与给定概念的相关性。提出了一种基于属性相关性的c4.5决策树规则生成算法c—c4.5rules,可替代c4.5原有的规则生成算法。c—c4.5rules在对规则进行化简时充分考虑了属性之间的关联性,实验表明该算法在保持原有分类精度的前提下,能有效提高规则生成时的计算速度和效率。  相似文献   

17.
18.
Building a highly-compact and accurate associative classifier   总被引:1,自引:1,他引:0  
Associative classification has aroused significant research attention in recent years due to its advantage in rule forms with satisfactory accuracy. However, the rules in associative classifiers derived from typical association rule mining (e.g., Apriori-type) may easily become too many to be understood and even be sometimes redundant or conflicting. To deal with these issues of concern, a recently proposed approach (i.e., GARC) appears to be superior to other existing approaches (e.g., C4.5-type, NN, SVM, CBA) in two respects: one is its classification accuracy that is equally satisfactory; the other is the compactness that the generated classifier is constituted with much fewer rules. Along with this line of methodological thinking, this paper presents a novel GARC-type approach, namely GEAR, to build an associative classifier with three distinctive and desirable features. First, the rules in the GEAR classifier are more intuitively appealing; second, the GEAR classification accuracy is improved or at least as good as others; and third, the GEAR classifier is significantly more compact in size. In doing so, a number of notions including rule redundancy and compact set are provided, together with related properties that could be incorporated into the rule mining process as algorithmic pruning strategies. The experimental results with benchmarking datasets also reveal that GEAR outperforms GARC and other approaches in an effective manner.  相似文献   

19.
基于排序的关联分类算法   总被引:1,自引:0,他引:1  
提出了一种基于排序的关联分类算法.利用基于规则的分类方法中择优方法偏爱高精度规则的思想和考虑尽可能多的规则,改进了CBA(Classification Based on Associations)只根据少数几条覆盖训练集的规则构造分类器的片面性.首先采用关联规则挖掘算法产生后件为类标号的关联规则,然后根据长度、置信度、支持度和提升度等对规则进行排序,并在排序时删除对分类结果没有影响的规则.排序后的规则加上一个默认分类便构成最终的分类器.选用20个UCI公共数据集的实验结果表明,提出的算法比CBA具有更高的平均分类精度.  相似文献   

20.
一个最优分类关联规则算法   总被引:1,自引:0,他引:1  
分类和关联规则发现是数据挖掘中的两个重要领域。使用关联规则算法挖掘分类规则被叫做分类关联规则算法,是一个有较好前景的方法。本文提出了一个最优分类关联规则算法——OCARA。该算法使用最优关联规则挖掘算法挖掘分类规则,并对最优规则集排序,从而获得一个分类精度较高的分类器。将OCARA与传统分类算法C4.5和一般分类关联规则算法CBA、RMR在8个UCI数据集上进行实验比较,结果显示OCARA具有更好的性能,证明OCARA是一个有效的分类关联规则挖掘算法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号