首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
关联规则挖掘的软集包含度方法   总被引:2,自引:0,他引:2       下载免费PDF全文
耿生玲  李永明  刘震 《电子学报》2013,41(4):804-809
本文在深入研究软集数据分析的基础上,将包含度引入软集数据关联规则挖掘中,利用包含度理论描述属性集之间的量化关系,给出软集上属性集间的包含度、关联规则和最大关联规则的概念,讨论包含度和可信度之间的联系.在此基础上给出利用包含度在事务数据软集中挖掘满足给定的支持度和可信度阈值的软关联规则方法,以及最大软关联规则的提取算法.理论证明和实例分析表明该关联规则挖掘方法是有效的,并通过实验对算法的性能进行了比较.  相似文献   

2.
There are rules refering to infrequent instances after the procession of attribute reduction and value reduction with traditional methods. A rough set RS based k-exception approach (RSKEA) to rule reduction is presented. Its main idea lies in a two-phase RS based rule reduction. An ordinary decision table is attained through general method of RS knowledge reduction in the first phase. Then a k-exception candidate set is nominated according to the decision table. RS rule reduction is employed for the reformed source data set, which remove all the instances included in the k-exception set. We apply the approach to the automobile database. Results show that it can reduce the number and complexity of rules with adjustable conflict rate, which contributes to approximate rule reduction.  相似文献   

3.
基于主曲线的脱机手写数字识别   总被引:7,自引:1,他引:6  
苗夺谦  张红云  李道国  王真 《电子学报》2005,33(9):1639-1643
该文提出了一种基于主曲线的脱机手写数字识别方法.该方法将主曲线及知识约简算法运用于识别模型中.主曲线是主成份分析的非线性推广,它是通过数据分布"中间"并满足"自相合"的光滑曲线.它较好地反映了数据分布的结构特征.粗糙集理论的知识约简是从决策表中获取决策(分类)规则的有效工具.本文将主曲线用于训练数据的特征提取,根据主曲线的特征生成决策表;利用我们提出的知识约简算法对决策表进行处理,自动获得分类规则.这种方法既符合人的识别习惯,又克服了利用统计特征识别所带来的不足.实验结果表明了该方法能有效提高手写数字的识别率,为脱机手写数字识别的研究提供了一条新途径.  相似文献   

4.
关联规则现在已成为数据挖掘领域中非常重要的研究课题,用于发现隐藏在大型数据集中的令人感兴趣的联系。Apriori算法作为第一个关联规则挖掘算法,开创性地使用了基于支持度的剪枝技术,系统地控制了候选项集的指数增长。但是,Apriori算法仍然存在着频繁扫描数据库和产生大量候选项集的缺点。鉴于此,提出了用一个整型或整型数组来代替一项事务集和一项候选项集,通过数据压缩,可以一次性将海量数据载入内存,减少了磁盘I/O负载,并通过位运算与计算海明距离达到计算支持度的目的,同时使用了若干优化方法。  相似文献   

5.
基于随机化属性选择和邻域覆盖约简的集成学习   总被引:2,自引:0,他引:2       下载免费PDF全文
 提高分类模型的分类精度和可靠性是分类建模追求的目标.针对目前规则学习方法应用于分类时稳定性差以及分类精度低的问题,本文通过随机化邻域属性约简,搜索一组分类精度较高的属性子集,在不同的属性子集上采用邻域覆盖约简方法学习分类规则,得到多个规则集.最后通过简单投票融合不同规则集上的分类结果获得对象的类别.实验表明,基于随机化邻域约简的集成学习方法分类性能优于或与其它相关的分类器相当,并且在噪声扰动下具有更强的鲁棒性.  相似文献   

6.
Comprehensibility is very important when machine learning techniques are used in computer-aided medical diagnosis. Since an artificial neural network ensemble is composed of multiple artificial neural networks, its comprehensibility is worse than that of a single artificial neural network. In this paper, C4.5 Rule-PANE, which combines an artificial neural network ensemble with rule induction by regarding the former as a preprocess of the latter, is proposed. At first, an artificial neural network ensemble is trained. Then, a new training data set is generated by feeding the feature vectors of original training instances to the trained ensemble and replacing the expected class labels of original training instances with the class labels output from the ensemble. Additional training data may also be appended by randomly generating feature vectors and combining them with their corresponding class labels output from the ensemble. Finally, a specific rule induction approach, i.e., C4.5 Rule, is used to learn rules from the new training data set. Case studies on diabetes, hepatitis , and breast cancer show that C4.5 Rule-PANE could generate rules with strong generalization ability, which benefits from an artificial neural network ensemble, and strong comprehensibility, which benefits from rule induction.  相似文献   

7.
Bootstrap and aggregating VQ classifier for speaker recognition   总被引:1,自引:0,他引:1  
A bootstrap and aggregating (bagging) vector quantisation (VQ) classifier is proposed for speaker recognition. This method obtains multiple training data sets by resampling the original training data set, and then integrates the corresponding multiple classifiers into a single classifier. Experiments involving a closed set, text-independent and speaker identification system are carried out using the TIMIT database. The proposed bagging VQ classifier shows considerably improved performance over the conventional VQ classifier  相似文献   

8.
In this paper we propose a ‘bank of classifiers’ approach to image region labelling and evaluate dynamic classifier selection and classifier combination approaches against a baseline approach that works with a single best classifier chosen using a validation set. In this analysis, image segmentation, feature extraction, and classification are treated as three separate steps of analysis. The classifiers used are each trained with a different texture feature representation of training images. The paper proposes a new knowledge-based predictive approach based on estimating the Mahalanobis distance between test sample feature values and the corresponding probability distribution function from training data that selectively triggers classifiers. This approach is shown to perform better than probability-based classifier combination (all classifiers are triggered but their decisions are fused with combination rules), and single classifier, respectively, based on classification rates and confusion matrices. The experiments are performed on the natural scene analysis application.  相似文献   

9.
一种提高神经网络集成差异性的学习方法   总被引:7,自引:1,他引:6       下载免费PDF全文
李凯  黄厚宽 《电子学报》2005,33(8):1387-1390
集成学习已经成为机器学习的研究方向之一,它可以显著地提高分类器的泛化性能.本文分析了Bagging及AdaBoost集成方法,指出了这两种方法的缺陷;然后提出了一种新的基于神经网络的分类器集成方法DBNNE,该方法通过生成差异数据增加集成的差异性;另外,当生成一个分类器后,采用了测试方法确保分类器集成的正确率;最后针对十个标准数据集进行了实验研究,结果表明集成算法DBNNE在小规模数据集上优于Bagging及AdaBoost集成方法,而在较大数据集上也不逊色于这两种集成方法.  相似文献   

10.
基于数据的机器学习是研究从观测数据出发寻找规律,并利用这些规律对未来数据进行预测.该文提出一种新的分类判别方法--覆盖算法,其主要过程是利用某种覆盖规则算法寻找一些训练样本集的支撑点(代表点),在决策的时候仅需计算待分类样本与支撑覆盖点之间的距离并进行比较,与之最近的支撑点所在类别即为代分类样本的类别.而支撑点仅占全部训练样本的一部分,所以相比最近邻方法具有较小运算量和存储量的优点.另一方面,覆盖算法主要是样本之间的距离运算,不需要像SVM那样考虑核函数的选择问题,因此更适用于大数据量的自动分类问题.对正常星系和恒星两类光谱数据进行实验,结果表明,覆盖算法具有较好的鲁棒性、较高的分类正确率.  相似文献   

11.
随着信息技术和数据库技术的飞速发展,从大量的数据中获取有用的信息和知识变得越来越重要。模糊关联规则挖掘是数据挖掘中针对数量型属性关联规则发现的一种有效方法。提出了一种基于矩阵的模糊关联规则挖掘算法,并将其应用于网络安全事件关联分析中,通过对DARPA标准数据集的分析,得出了预期数量的关联规则,并成功验证了某些攻击场景,该模糊关联规则挖掘算法取得了较好的实验结果。  相似文献   

12.
一种基于粗糙集增量式规则学习的问题分类方法研究   总被引:2,自引:0,他引:2  
该文提出一种基于粗糙集增量式规则自动学习来实现问题分类的方法,通过深入提取问句特征并采用决策表形式构建训练语料,利用机器学习的方法自动获取分类规则。与其他方法相比优势在于,用于分类的规则自动生成,并采用粗糙集理论的简约方法获得优化的最小规则集;首次在问题分类中引入增量式学习理念,不但提高了分类精度,而且避免了繁琐的重新训练过程,大大提高了学习速度,并且提高了分类的可扩展性和适应性。对比实验表明,该方法分类精度高,适应性好。在国际TREC2005Q/A实际评测中表现良好。  相似文献   

13.
Association rule mining is an active data mining research area. However, most ARM algorithms cater to a centralized environment. In contrast to previous ARM algorithms, we have developed a distributed algorithm, called optimized distributed association mining, for geographically distributed data sets. ODAM generates support counts of candidate itemsets quicker than the other DARM algorithms and reduces the size of average transactions, data sets, and message exchanges.  相似文献   

14.
There are rules refering to infrequent instances after the procession of attribute reductionand value reduction with traditional methods.A rough set RS based k-exception approach (RSKEA) torule reduction is presented.Its main idea lies in a two-phase RS based rule reduction.An ordinarydecision table is attained through general method of RS knowledge reduction in the first phase.Then a k-exception candidate set is nominated according to the decision table.RS rule reduction is employed forthe reformed source data set,which remove all the instances included in the k-exception set.We apply theapproach to the automobile database.Results show that it can reduce the number and complexity of ruleswith adjustable conflict rate,which contributes to approximate rule reduction.  相似文献   

15.
面对降水粒子分类过程中可能存在的样本数不足,样本质量不高的问题,提出一种基于增量贝叶斯的双偏振气象雷达降水粒子分类方法。该方法首先处理有标签的训练数据集,获取属性节点和类节点之间的条件概率表构建朴素贝叶斯分类器;接着使用朴素贝叶斯分类器分类无标签数据,判断类置信度值后将符合条件的数据追加到训练数据集中,最后修正朴素贝叶斯分类器完成增量学习,得到增量贝叶斯分类器实现降水粒子分类。增量贝叶斯分类器不仅能够增加有效的数据样本,还能够及时更新分类器从而提高其泛化性和适应性,分类结果的准确性也得到了一定的改善。  相似文献   

16.
一种基于SVM/RS的中文机构名称自动识别方法   总被引:4,自引:0,他引:4  
该文提出一种支持向量机(Support Vector Machines,SVM)和粗糙集(Rough Set, RS)相结合的中文机构名称短语识别方法。该方法借助词的基本语义搭配关系表示短语的构成规则,并通过粗糙集属性约简的方法自动学习到机构名称构成规则的无冗余集。识别时,首先寻找到与这些规则匹配的词串作为候选机构名,然后结合候选机构名以及其上下文词的语义特征,利用SVM分类器判断该候选是否是真正的机构名称。这种方法对1617万字人民日报语料开放测试的F值分别达到82.06%。  相似文献   

17.
数据挖掘是关联规则中一个重要的研究方向.对关联规则的数据挖掘和遗传算法进行概述,阐述关联规则数据挖掘的意义,提出一种采用改进型遗传算法的关联规则的提取算法,并从编码方法、适应度函数的构造和变异、选择、交叉算子设计方面进行讨论和分析,最后结合一个具体实例进行应用.实验证明这种算法是有效的.  相似文献   

18.
Data mining is an information extraction process that aims to discover valuable knowledge in databases. Existing genetic algorithms (GAs) designed for rule induction evaluates the rules as a whole via a fitness function. Major drawbacks of GAs for rule induction include computation inefficiency, accuracy and rule expressiveness. In this paper, we propose a constraint-based genetic algorithm (CBGA) approach to reveal more accurate and significant classification rules. This approach allows constraints to be specified as relationships among attributes according to predefined requirements, user's preferences, or partial knowledge in the form of a constraint network. The constraint-based reasoning is employed to produce valid chromosomes using constraint propagation to ensure the genes to comply with the predefined constraint network. The proposed approach is compared with a regular GA and C4.5 using two UCI repository data sets. Better classification accurate rates from CBGA are demonstrated.  相似文献   

19.
Nonparametric kernel classification rules derived from incomplete (missing) data are studied. A number of techniques of handling missing observation in the training set are taken into account. In particular, the straightforward approach of designing a classifier only from available data (deleting missing values) is considered. The class of imputation techniques is also taken into consideration. In the latter case, one estimates missing values and then calculates classification rules from such a completed training set. Consistency and speed of convergence of proposed classification rules are established. Results of simulation studies are presented  相似文献   

20.
Association rules represent a promising technique to improve heart disease prediction. Unfortunately, when association rules are applied on a medical data set, they produce an extremely large number of rules. Most of such rules are medically irrelevant and the time required to find them can be impractical. A more important issue is that, in general, association rules are mined on the entire data set without validation on an independent sample. To solve these limitations, we introduce an algorithm that uses search constraints to reduce the number of rules, searches for association rules on a training set, and finally validates them on an independent test set. The medical significance of discovered rules is evaluated with support, confidence, and lift. Association rules are applied on a real data set containing medical records of patients with heart disease. In medical terms, association rules relate heart perfusion measurements and risk factors to the degree of disease in four specific arteries. Search constraints and test set validation significantly reduce the number of association rules and produce a set of rules with high predictive accuracy. We exhibit important rules with high confidence, high lift, or both, that remain valid on the test set on several runs. These rules represent valuable medical knowledge.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号