首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
Support vector machines (SVMs) are state-of-the-art tools used to address issues pertinent to classification. However, the explanation capabilities of SVMs are also their main weakness, which is why SVMs are typically regarded as incomprehensible black box models. In the present study, a rule extraction algorithm to extract the comprehensible rule from SVMs and enhance their explanation capability is proposed. The proposed algorithm seeks to use the support vectors from a training model of SVMs and combine genetic algorithms for constructing rule sets. The proposed method can not only generate rule sets from SVMs based on the mixed discrete and continuous variables but can also select important variables in the rule set simultaneously. Measurements of accuracy, sensitivity, specificity, and fidelity are utilized to compare the performance of the proposed method with direct learner algorithms and several rule-extraction techniques from SVMs. The results indicate that the proposed method performs at least as well as with the most successful direct rule learners. Finally, an actual case of pressure ulcer was studied, and the results indicated the practicality of our proposed method in real applications.  相似文献   

2.
In this paper, we propose a novel algorithm for rule extraction from support vector machines (SVMs), termed SQRex-SVM. The proposed method extracts rules directly from the support vectors (SVs) of a trained SVM using a modified sequential covering algorithm. Rules are generated based on an ordered search of the most discriminative features, as measured by interclass separation. Rule performance is then evaluated using measured rates of true and false positives and the area under the receiver operating characteristic (ROC) curve (AUC). Results are presented on a number of commonly used data sets that show the rules produced by SQRex-SVM exhibit both improved generalization performance and smaller more comprehensible rule sets compared to both other SVM rule extraction techniques and direct rule learning techniques  相似文献   

3.
In this paper, we propose a novel algorithm for rule extraction from support vector machines (SVMs), termed SQRex-SVM. The proposed method extracts rules directly from the support vectors (SVs) of a trained SVM using a modified sequential covering algorithm. Rules are generated based on an ordered search of the most discriminative features, as measured by interclass separation. Rule performance is then evaluated using measured rates of true and false positives and the area under the receiver operating characteristic (ROC) curve (AUC). Results are presented on a number of commonly used data sets that show the rules produced by SQRex-SVM exhibit both improved generalization performance and smaller more comprehensible rule sets compared to both other SVM rule extraction techniques and direct rule learning techniques.  相似文献   

4.
5.
决策信息系统的规则提取是数据挖掘的研究内容之一,概念格理论与粒计算理论是该领域研究的主要数学工具。文中通过探究这两大理论间的关系,利用等价关系定义了最小乐观概念格及其结构,最小乐观概念区别于传统经典概念,但是具有格的结构。在此基础上,提出了一种决策信息系统的规则提取算法,该算法引入了粒度思想,通过求取每一粒层中的最小乐观概念,并根据最小乐观概念的外延与决策属性等价类间的蕴含关系进行决策规则提取,通过设置算法的终止条件来加快其收敛速度,以达到针对决策信息系统知识约简的目的。最小乐观概念的定义比经典概念的定义更宽泛,其生成过程也更简单。最后,通过理论证明、实例验证以及数值实验对比验证了该方法的正确性与优越性。  相似文献   

6.
Choosing appropriate classification algorithms for a given data set is very important and useful in practice but also is full of challenges. In this paper, a method of recommending classification algorithms is proposed. Firstly the feature vectors of data sets are extracted using a novel method and the performance of classification algorithms on the data sets is evaluated. Then the feature vector of a new data set is extracted, and its k nearest data sets are identified. Afterwards, the classification algorithms of the nearest data sets are recommended to the new data set. The proposed data set feature extraction method uses structural and statistical information to characterize data sets, which is quite different from the existing methods. To evaluate the performance of the proposed classification algorithm recommendation method and the data set feature extraction method, extensive experiments with the 17 different types of classification algorithms, the three different types of data set characterization methods and all possible numbers of the nearest data sets are conducted upon the 84 publicly available UCI data sets. The results indicate that the proposed method is effective and can be used in practice.  相似文献   

7.
詹超  胡江洪 《微机发展》2006,16(3):107-109
介绍了一种使用基因芯片实验产生的基因表达数据对功能基因进行分类的方法,该方法是以支持向量机(SVM)理论为基础的。文中描述了径向基函数SVM,与其它SVM相比,径向基函数SVM在基因分类中有更好的性能。SVM的理论基础是统计学习理论,它不仅结构简单,而且技术性能高,泛化能力强,在基因表达式分类中表现出有很多优点,成为热点研究方向。  相似文献   

8.
杨泽民 《计算机科学》2013,40(3):259-262
为了解决关联规则挖掘算法中频繁集信息挖掘不完善和时序周期对事务集频繁项挖掘的影响问题,提出了一种基于时序和兴趣度约束的加权关系规则挖掘算法。该算法首先利用时序滑动函数对时序事务集进行发生概率估算和权值赋值,依据兴趣度约束函数和剪枝定理进行事务集化简,然后根据支持度和寿支持期望进行加权频繁事务集抽取,最后依据置信度进行加权关联规则导出。实验结果证明,该算法能够快速有效地挖掘出符合用户兴趣度的关联规则。  相似文献   

9.
在提取满足用户特定需求的关联规则时,由于现有约束性关联规则挖掘算法存在大量的冗余候选项和重复计算,故提出一种基于属性位复用的约束性关联规则挖掘算法,其适合挖掘任何长度且满足用户特定需求的关联规则。该算法通过属性位的权值组合,将交易事务转换成整数,用属性位复用技术构建候选区间,并利用其端点值双向变化,构建索引候选频繁项,同时也用布尔运算计算其支持数。实验证明其比现有算法更快速,将其应用到客户关系管理系统中分析客户关联信息,可以有效地提高系统效率。  相似文献   

10.
Sparse kernel SVMs via cutting-plane training   总被引:1,自引:0,他引:1  
We explore an algorithm for training SVMs with Kernels that can represent the learned rule using arbitrary basis vectors, not just the support vectors (SVs) from the training set. This results in two benefits. First, the added flexibility makes it possible to find sparser solutions of good quality, substantially speeding-up prediction. Second, the improved sparsity can also make training of Kernel SVMs more efficient, especially for high-dimensional and sparse data (e.g. text classification). This has the potential to make training of Kernel SVMs tractable for large training sets, where conventional methods scale quadratically due to the linear growth of the number of SVs. In addition to a theoretical analysis of the algorithm, we also present an empirical evaluation.  相似文献   

11.
传统支持向量机通常关注于数据分布的边缘样本,支持向量通常在这些边缘样本中产生。本文提出一个新的支持向量算法,该算法的支持向量从全局的数据分布中产生,其稀疏性能在大部分数据集上远远优于经典支持向量机算法。该算法在多类问题上的时间复杂度仅等价于原支持向量机算法的二值问题,解决了设计多类算法时变量数目庞大或者二值子分类器数目过多的问题。  相似文献   

12.
Artificial neural network (ANN) is one of the most widely used techniques in classification data mining. Although ANNs can achieve very high classification accuracies, their explanation capability is very limited. Therefore one of the main challenges in using ANNs in data mining applications is to extract explicit knowledge from them. Based on this motivation, a novel approach is proposed in this paper for generating classification rules from feed forward type ANNs. Although there are several approaches in the literature for classification rule extraction from ANNs, the present approach is fundamentally different from them. In the previous studies, ANN training and rule extraction is generally performed independently in a sequential (hierarchical) manner. However, in the present study, training and rule extraction phases are integrated within a multiple objective evaluation framework for generating accurate classification rules directly. The proposed approach makes use of differential evolution algorithm for training and touring ant colony optimization algorithm for rule extracting. The proposed algorithm is named as DIFACONN-miner. Experimental study on the benchmark data sets and comparisons with some other classical and state-of-the art rule extraction algorithms has shown that the proposed approach has a big potential to discover more accurate and concise classification rules.  相似文献   

13.
This paper proposes a new method for fuzzy rule extraction from trained support vector machines (SVMs) for multi-class problems, named FREx_SVM. SVMs have been used in a variety of applications. However, they are considered “black box models,” where no interpretation about the input–output mapping is provided. Some methods to reduce this limitation have already been proposed, but they are restricted to binary classification problems and to the extraction of symbolic rules with intervals or functions in their antecedents. In order to improve the interpretability of the generated rules, this paper presents a new model for extracting fuzzy rules from a trained SVM. The proposed model is suited for classification in multi-class problems and includes a wrapper feature selection algorithm. It is evaluated in four benchmark databases, and results obtained demonstrate its capacity to generate a reduced set of interpretable fuzzy rules that explains both the classification database and the influence of each input variable on the determination of the final class.  相似文献   

14.
In this paper, a novel clustering method in the kernel space is proposed. It effectively integrates several existing algorithms to become an iterative clustering scheme, which can handle clusters with arbitrary shapes. In our proposed approach, a reasonable initial core for each of the cluster is estimated. This allows us to adopt a cluster growing technique, and the growing cores offer partial hints on the cluster association. Consequently, the methods used for classification, such as support vector machines (SVMs), can be useful in our approach. To obtain initial clusters effectively, the notion of the incomplete Cholesky decomposition is adopted so that the fuzzy c‐means (FCM) can be used to partition the data in a kernel defined‐like space. Then a one‐class and a multiclass soft margin SVMs are adopted to detect the data within the main distributions (the cores) of the clusters and to repartition the data into new clusters iteratively. The structure of the data set is explored by pruning the data in the low‐density region of the clusters. Then data are gradually added back to the main distributions to assure exact cluster boundaries. Unlike the ordinary SVM algorithm, whose performance relies heavily on the kernel parameters given by the user, the parameters are estimated from the data set naturally in our approach. The experimental evaluations on two synthetic data sets and four University of California Irvine real data benchmarks indicate that the proposed algorithms outperform several popular clustering algorithms, such as FCM, support vector clustering (SVC), hierarchical clustering (HC), self‐organizing maps (SOM), and non‐Euclidean norm fuzzy c‐means (NEFCM). © 2009 Wiley Periodicals, Inc.4  相似文献   

15.
蒋华  江日辰  王鑫  王慧娇 《计算机仿真》2020,37(3):254-258,420
传统支持向量机(SVM)对不平衡数据进行二分类时,存在分类边界容易偏移的问题。目前,对于不平衡数据问题主要从数据集和算法两方面来解决。提出了一种基于数据集方法是采用ADASYN和SMOTE算法来联合生成小类样本点。上述方法是根据K近邻算法计算小类样本点和大类样本点数目,对小样本点进行分类后分别采用ADASYN和SMOTE算法进行小类样本点合成。最后实验对算法验证,结果采用ROC曲线来比较单独采用SMOTE或者ADASYN算法合成小类样本点,文中介绍的算法具有最高AUC值,由此可见提出的算法可以提高不平衡数据分类的有效性。  相似文献   

16.
As a broad subfield of artificial intelligence, machine learning is concerned with the development of algorithms and techniques that allow computers to learn. These methods such as fuzzy logic, neural networks, support vector machines, decision trees and Bayesian learning have been applied to learn meaningful rules; however, the only drawback of these methods is that it often gets trapped into a local optimal. In contrast with machine learning methods, a genetic algorithm (GA) is guaranteeing for acquiring better results based on its natural evolution and global searching. GA has given rise to two new fields of research where global optimization is of crucial importance: genetic based machine learning (GBML) and genetic programming (GP). This article adopts the GBML technique to provide a three-phase knowledge extraction methodology, which makes continues and instant learning while integrates multiple rule sets into a centralized knowledge base. Moreover, the proposed system and GP are both applied to the theoretical and empirical experiments. Results for both approaches are presented and compared. This paper makes two important contributions: (1) it uses three criteria (accuracy, coverage, and fitness) to apply the knowledge extraction process which is very effective in selecting an optimal set of rules from a large population; (2) the experiments prove that the rule sets derived by the proposed approach are more accurate than GP.  相似文献   

17.
刘俊  李威  陈蜀宇  徐光侠 《软件学报》2022,33(12):4574-4589
提出了一种基于各向异性高斯核核惩罚的主成分分析的特征提取算法.该算法不同于传统的核主成分分析算法.在非线性数据降维中,传统的核主成分分析算法忽略了原始数据的无量纲化.此外,传统的核函数在各维度上主要由一个相同的核宽参数控制,该方法无法准确反映各维度不同特征的重要性,从而导致降维过程中准确率低下.为了解决上述问题,首先针对现原始数据的无量纲化问题,提出了一种均值化算法,使得原始数据的总方差贡献率有明显的提高.其次,引入了各向异性高斯核函数,该核函数每个维度拥有不同的核宽参数,各核宽参数能够准确地反映所在维度数据特征的重要性.再次,基于各向异性高斯核函数建立了核主成分分析的特征惩罚目标函数,以便用较少的特征表示原始数据,并反映每个主成分信息的重要性.最后,为了寻求最佳特征,引入梯度下降算法来更新特征惩罚目标函数中的核宽度和控制特征提取算法的迭代过程.为了验证所提出算法的有效性,各算法在UCI公开数据集上和KDDCUP99数据集上进行了比较.实验结果表明,所提基于各向异性高斯核核惩罚的主成分分析的特征提取算法比传统的主成分分析算法在9种公开的UCI公开数据集上准确率平均提高了4.49%.在KDDCUP99数据集上,所提基于各向异性高斯核核惩罚的主成分分析的特征提取算法比传统的主成分分析算法准确率提高了8%.  相似文献   

18.
The discrete wavelet transform (DWT) provides a multiresolution decomposition of hyperspectral data. Wavelet features of each level are downsampled from the band features. Fine-scale and large-scale information from hyperspectral signals can be separated and this method might provide specific discriminant capability compared to using band features alone. This article proposes using a combination of band and wavelet features (BWFs) in the stacked support vector machine (SSVM), where each feature set is solved independently by level-0 support vector machines (SVMs), and level-1 SVMs are used to correct the errors of level-0 SVMs and obtain the final classification result. The effectiveness of the proposed method was examined using two benchmark hyperspectral data sets collected over forest and urban areas, respectively. For both data sets, the proposed method significantly outperformed SVMs using band features, wavelet energy features (WEFs), wavelet concatenated features (WFs concatenated), and both BWFs and the SSVM using only WFs.  相似文献   

19.
林晓立  陈恩红  任皖英 《计算机工程》2003,29(19):68-69,179
对当前具有代表性的几种特征提取算法进行了分析与比较,并在Bourgain算法的基础上,提出一种基于数据类别数及各类代表元素等启发式信息的复杂数据特征提取算法。对于M类复杂数据,该算法可以提取出维向量用来表示这些数据。针对实际数据,对几种算法的降维性能进行了比较实验,实验结果表明该算法具有很好的特征提取效果。  相似文献   

20.
Recent research shows that rule based models perform well while classifying large data sets such as data streams with concept drifts. A genetic algorithm is a strong rule based classification algorithm which is used only for mining static small data sets. If the genetic algorithm can be made scalable and adaptable by reducing its I/O intensity, it will become an efficient and effective tool for mining large data sets like data streams. In this paper a scalable and adaptable online genetic algorithm is proposed to mine classification rules for the data streams with concept drifts. Since the data streams are generated continuously in a rapid rate, the proposed method does not use a fixed static data set for fitness calculation. Instead, it extracts a small snapshot of the training example from the current part of data stream whenever data is required for the fitness calculation. The proposed method also builds rules for all the classes separately in a parallel independent iterative manner. This makes the proposed method scalable to the data streams and also adaptable to the concept drifts that occur in the data stream in a fast and more natural way without storing the whole stream or a part of the stream in a compressed form as done by the other rule based algorithms. The results of the proposed method are comparable with the other standard methods which are used for mining the data streams.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号