首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Intrusion detection system (IDS) is to monitor the attacks occurring in the computer or networks. Anomaly intrusion detection plays an important role in IDS to detect new attacks by detecting any deviation from the normal profile. In this paper, an intelligent algorithm with feature selection and decision rules applied to anomaly intrusion detection is proposed. The key idea is to take the advantage of support vector machine (SVM), decision tree (DT), and simulated annealing (SA). In the proposed algorithm, SVM and SA can find the best selected features to elevate the accuracy of anomaly intrusion detection. By analyzing the information from using KDD’99 dataset, DT and SA can obtain decision rules for new attacks and can improve accuracy of classification. In addition, the best parameter settings for the DT and SVM are automatically adjusted by SA. The proposed algorithm outperforms other existing approaches. Simulation results demonstrate that the proposed algorithm is successful in detecting anomaly intrusion detection.  相似文献   

2.
针对传统支持向量机(SVM)在封装式特征选择中分类精度低、特征子集选择冗余以及计算效率差的不足,利用元启发式优化算法同步优化SVM与特征选择。为改善SVM分类效果以及选择特征子集的能力,首先,利用自适应差分进化(DE)算法、混沌初始化与锦标赛选择策略对斑点鬣狗优化(SHO)算法改进,以增强其局部搜索能力并提高其寻优效率与求解精度;其次,将改进后的算法用于特征选择与SVM参数调整的同步优化中;最后,在UCI数据集进行特征选择仿真实验,采取分类准确率、选择特征数、适应度值及运行时间来综合评估所提算法的优化性能。实验结果证明,改进算法的同步优化机制能够在高分类准确率下降低特征选择的数目,该算法比传统算法更适合解决封装式特征选择问题,具有良好的应用价值。  相似文献   

3.
The decision‐tree (DT) algorithm is a very popular and efficient data‐mining technique. It is non‐parametric and computationally fast. Besides forming interpretable classification rules, it can select features on its own. In this article, the feature‐selection ability of DT and the impacts of feature‐selection/extraction on DT with different training sample sizes were studied by using AVIRIS hyperspcetral data. DT was compared with three other feature‐selection methods; the results indicated that DT was an unstable feature selector, and the number of features selected by DT was strongly related to the sample size. Trees derived with and without feature‐selection/extraction were compared. It was demonstrated that the impacts of feature selection on DT were shown mainly as a significant increase in the number of tree nodes (14.13–23.81%) and moderate increase in tree accuracy (3.5–4.8%). Feature extraction, like Non‐parametric Weighted Feature Extraction (NWFE) and Decision Boundary Feature Extraction (DBFE), could enhance tree accuracy more obviously (4.78–6.15%) and meanwhile a decrease in the number of tree nodes (6.89–16.81%). When the training sample size was small, feature‐selection/extraction could increase the accuracy more dramatically (6.90–15.66%) without increasing tree nodes.  相似文献   

4.
5.
This study proposes a knowledge discovery method that uses multilayer perceptron (MLP) based neural rule extraction (NRE) approach for credit risk analysis (CRA) of real-life small and medium enterprises (SMEs) in Turkey. A feature selection and extraction stage is followed by neural classification that produces accurate rule sets. In the first stage, the feature selection is achieved by decision tree (DT), recursive feature extraction with support vector machines (RFE-SVM) methods and the feature extraction is performed by factor analysis (FA), principal component analysis (PCA) methods. It is observed that the RFE-SVM approach gave the best result in terms of classification accuracy and minimal input dimension. Among various classifiers k-NN, MLP and SVM are compared in classification experiments. Then, the Continuous/Discrete Rule Extractor via Decision Tree Induction (CRED) algorithm is used to extract rules from the hidden units of a MLP for knowledge discovery. Here, the MLP makes a decision for customers as being “good” or “bad” and reveals the rules obtained at the final decision. In the experiments, Turkish SME database has 512 samples. The proposed approach validates the claim that is a viable alternative to other methods for knowledge discovery.  相似文献   

6.
Decision trees have been widely used in data mining and machine learning as a comprehensible knowledge representation. While ant colony optimization (ACO) algorithms have been successfully applied to extract classification rules, decision tree induction with ACO algorithms remains an almost unexplored research area. In this paper we propose a novel ACO algorithm to induce decision trees, combining commonly used strategies from both traditional decision tree induction algorithms and ACO. The proposed algorithm is compared against three decision tree induction algorithms, namely C4.5, CART and cACDT, in 22 publicly available data sets. The results show that the predictive accuracy of the proposed algorithm is statistically significantly higher than the accuracy of both C4.5 and CART, which are well-known conventional algorithms for decision tree induction, and the accuracy of the ACO-based cACDT decision tree algorithm.  相似文献   

7.

Features subset selection (FSS) generally plays an essential role in the implementation of data mining, particularly in the field of high-dimensional medical data analysis, as well as in supplying early detection with essential features and high accuracy. The latest modern feature selection models are now using the ability of optimization algorithms for extracting features of particular properties to get the highest accuracy performance possible. Many of the optimization algorithms, such as genetic algorithm, often use the required parameters that would need to be adjusted for better results. For the function selection procedure, tuning these parameter values is a difficult challenge. In this paper, a new wrapper-based feature selection approach called binary teaching learning based optimization (BTLBO) is introduced. The binary teaching learning based optimization (BTLBO) is among the most sophisticated meta-heuristic method which does not involve any specific algorithm parameters. It requires only standard process parameters such as population size and a number of iterations to extract a set of features selected from a data. This is a demanding process, to achieve the best possible set of features would be to use a method which is independent of the method controlling parameters. This paper introduces a new modified binary teaching–learning-based optimization (NMBTLBO) as a technique to select subset features and demonstrate support vector machine (SVM) accuracy of binary identification as a fitness function for the implementation of the feature subset selection process. The new proposed algorithm NMBTLBO contains two steps: first, the new updating procedure, second, the new method to select the primary teacher in teacher phase in binary teaching-learning based on optimization algorithm. The proposed technique NMBTLBO was used to classify the rheumatic disease datasets collected from Baghdad Teaching Hospital Outpatient Rheumatology Clinic during 2016–2018. Compared with the original BTLBO algorithm, the improved NMBTLBO algorithm has achieved a major difference in accuracy. Validation was carried out by testing the accuracy of four classification methods: K-nearest neighbors, decision trees, support vector machines and K-means. Study results showed that the classification accuracy of the four methods was increased for the proposed method of selection of features (NMBTLBO) compared to the BTLBO algorithm. SVM classifier provided 89% accuracy of BTLBO-SVM and 95% with NMBTLBO –SVM. Decision trees set the values of 94% with BTLBO-SVM and 95% with the feature selection of NMBTLBO-SVM. The analysis indicates that the latest method (NMBTLBO) enhances classification accuracy.

  相似文献   

8.
根据医学图像数据的特性,提出一种基于粗糙集和决策树相结合的数据挖掘新方法。该方法利用粗糙集中基于属性重要性的离散化方法对医学图像特征进行离散化,采用粗糙集对其属性进行约简,得到低维训练数据,再用SLIQ决策树算法产生决策规则。实验表明:将粗糙理论与SLIQ相结合的数据挖掘方法既保留了原始数据的内部特点,同时剔除了与分类无关或关系不大的冗余特征,从而提高了分类的准确率和效率。  相似文献   

9.
In recent years, a few sequential covering algorithms for classification rule discovery based on the ant colony optimization meta-heuristic (ACO) have been proposed. This paper proposes a new ACO-based classification algorithm called AntMiner-C. Its main feature is a heuristic function based on the correlation among the attributes. Other highlights include the manner in which class labels are assigned to the rules prior to their discovery, a strategy for dynamically stopping the addition of terms in a rule’s antecedent part, and a strategy for pruning redundant rules from the rule set. We study the performance of our proposed approach for twelve commonly used data sets and compare it with the original AntMiner algorithm, decision tree builder C4.5, Ripper, logistic regression technique, and a SVM. Experimental results show that the accuracy rate obtained by AntMiner-C is better than that of the compared algorithms. However, the average number of rules and average terms per rule are higher.  相似文献   

10.
This paper presents a particle swarm optimization (PSO)-based fuzzy expert system for the diagnosis of coronary artery disease (CAD). The designed system is based on the Cleveland and Hungarian Heart Disease datasets. Since the datasets consist of many input attributes, decision tree (DT) was used to unravel the attributes that contribute towards the diagnosis. The output of the DT was converted into crisp if–then rules and then transformed into fuzzy rule base. PSO was employed to tune the fuzzy membership functions (MFs). Having applied the optimized MFs, the generated fuzzy expert system has yielded 93.27% classification accuracy. The major advantage of this approach is the ability to interpret the decisions made from the created fuzzy expert system, when compared with other approaches.  相似文献   

11.
为推广国产高分数据在森林树种分类方面的应用,以北京市延庆区八达岭国家森林公园主要区域的6期高分二号影像为数据源,在分层分类的基础上,利用支持向量机递归特征消除、C5.0决策树、FSO 3种特征优选方法,从4种特征维度下实现面向对象的支持向量机和随机森林的森林树种分类,最终取得总体精度平均为83.65%,特定树种生产者精度介于93.75%(山杏)和38.10%(刺槐)之间,特定树种用户精度介于100%(华北落叶松)和44.74%(榆树)之间的良好结果。结果表明:C5.0决策树耗时最短(0.01 h)且其所选特征应用于分类总体精度最高(86.90%);在不同特征维度下支持向量机分类的总体精度比随机森林平均高出3.28%;支持向量机和随机森林均对特征维度不敏感,但良好的特征优选结果仍会对支持向量机的分类效率(最高提升86.98%)和随机森林的分类精度(最高提升9.22%)产生较大影响。  相似文献   

12.
决策树算法采用递归方法构建,训练效率较低,过度分类的决策树可能产生过拟合现象.因此,文中提出模型决策树算法.首先在训练数据集上采用基尼指数递归生成一棵不完全决策树,然后使用一个简单分类模型对其中的非纯伪叶结点(非叶结点且结点包含的样本不属于同一类)进行分类,生成最终的决策树.相比原始的决策树算法,这样产生的模型决策树能在算法精度不损失或损失很小的情况下,提高决策树的训练效率.在标准数据集上的实验表明,文中提出的模型决策树在速度上明显优于决策树算法,具备一定的抗过拟合能力.  相似文献   

13.
为推广国产高分数据在森林树种分类方面的应用,以北京市延庆区八达岭国家森林公园主要区域的6期高分二号影像为数据源,在分层分类的基础上,利用支持向量机递归特征消除、C5.0决策树、FSO 3种特征优选方法,从4种特征维度下实现面向对象的支持向量机和随机森林的森林树种分类,最终取得总体精度平均为83.65%,特定树种生产者精度介于93.75%(山杏)和38.10%(刺槐)之间,特定树种用户精度介于100%(华北落叶松)和44.74%(榆树)之间的良好结果。结果表明:C5.0决策树耗时最短(0.01 h)且其所选特征应用于分类总体精度最高(86.90%);在不同特征维度下支持向量机分类的总体精度比随机森林平均高出3.28%;支持向量机和随机森林均对特征维度不敏感,但良好的特征优选结果仍会对支持向量机的分类效率(最高提升86.98%)和随机森林的分类精度(最高提升9.22%)产生较大影响。  相似文献   

14.
The credit card industry has been growing rapidly recently, and thus huge numbers of consumers’ credit data are collected by the credit department of the bank. The credit scoring manager often evaluates the consumer’s credit with intuitive experience. However, with the support of the credit classification model, the manager can accurately evaluate the applicant’s credit score. Support Vector Machine (SVM) classification is currently an active research area and successfully solves classification problems in many domains. This study used three strategies to construct the hybrid SVM-based credit scoring models to evaluate the applicant’s credit score from the applicant’s input features. Two credit datasets in UCI database are selected as the experimental data to demonstrate the accuracy of the SVM classifier. Compared with neural networks, genetic programming, and decision tree classifiers, the SVM classifier achieved an identical classificatory accuracy with relatively few input features. Additionally, combining genetic algorithms with SVM classifier, the proposed hybrid GA-SVM strategy can simultaneously perform feature selection task and model parameters optimization. Experimental results show that SVM is a promising addition to the existing data mining methods.  相似文献   

15.
Credit scoring with a data mining approach based on support vector machines   总被引:3,自引:0,他引:3  
The credit card industry has been growing rapidly recently, and thus huge numbers of consumers’ credit data are collected by the credit department of the bank. The credit scoring manager often evaluates the consumer’s credit with intuitive experience. However, with the support of the credit classification model, the manager can accurately evaluate the applicant’s credit score. Support Vector Machine (SVM) classification is currently an active research area and successfully solves classification problems in many domains. This study used three strategies to construct the hybrid SVM-based credit scoring models to evaluate the applicant’s credit score from the applicant’s input features. Two credit datasets in UCI database are selected as the experimental data to demonstrate the accuracy of the SVM classifier. Compared with neural networks, genetic programming, and decision tree classifiers, the SVM classifier achieved an identical classificatory accuracy with relatively few input features. Additionally, combining genetic algorithms with SVM classifier, the proposed hybrid GA-SVM strategy can simultaneously perform feature selection task and model parameters optimization. Experimental results show that SVM is a promising addition to the existing data mining methods.  相似文献   

16.
The C4.5 decision tree (DT) can be applied in various fields and discovers knowledge for human understanding. However, different problems typically require different parameter settings. Rule of thumb or trial-and-error methods are generally utilized to determine parameter settings. However, these methods may result in poor parameter settings and unsatisfactory results. On the other hand, although a dataset can contain numerous features, not all features are beneficial for classification in C4.5 algorithm. Therefore, a novel scatter search-based approach (SS + DT) is proposed to acquire optimal parameter settings and to select the beneficial subset of features that result in better classification results. To evaluate the efficiency of the proposed SS + DT approach, datasets in the UCI (University of California, Irvine) Machine Learning Repository are utilized to assess the performance of the proposed approach. Experimental results demonstrate that the parameter settings for the C4.5 algorithm obtained by the SS + DT approach are better than those obtained by other approaches. When feature selection is considered, classification accuracy rates on most datasets are increased. Therefore, the proposed approach can be utilized to identify effectively the best parameter settings for C4.5 algorithm and useful features.  相似文献   

17.
大数据的发展对数据分类领域的分类准确性有了更高的要求;支持向量机(Support Vector Machine,SVM)的广泛应用需要一种高效的方法来构造一个分类能力强的SVM分类器;SVM的核函数参数与惩罚因子以及特征子集对预测模型的复杂度和预测精度有着重要影响。为提高SVM的分类性能,文中将SVM的渐近性融合到灰狼优化(Grey Wolf Optimization,GWO)算法中,提出了新的SVM分类器模型,该模型对SVM的参数与数据的特征子集同时进行优化,融合SVM渐近性的新灰狼个体将灰狼优化算法的搜索空间导向超参数空间中的最佳区域,能够更快地获得最优解;此外,将获得的分类准确率、所选特征个数和支持向量个数相结合,提出了一种新的适应度函数,新的适应度函数与融合渐近性的灰狼优化算法将搜索引向最优解。采用UCI中的多个经典数据集对所提模型进行验证,将其与网格搜素算法、未融合渐近性的灰狼优化算法以及其他文献中的方法进行对比,其分类准确率在不同数据集上均有不同程度的提升。实验结果表明,所提算法能找到SVM的最优参数与最小特征子集,具有更高的分类准确率和更短的平均处理时间。  相似文献   

18.
Precisely monitoring land cover/use is crucial for urban environmental assessment and management. Various classification techniques such as pixel-based and object-based approaches have advantages and disadvantages. In this article, based on our experiment data from an unmanned platform carried lidar scanner system and camera, we explored and compared classi?cation accuracies of pixel-based decision tree (DT) and object-based Support Vector Machine (SVM) approaches. Lidar height information can improve classification accuracy based on either object-based SVM or pixel-based DT. From total classification accuracy, object-based SVM was higher than that of pixel-based DT classification, and total accuracy and kappa coefficient of the former were 92.71% and 0.899, respectively. However, pixel-based DT outperformed object-based SVM when classifying small ‘scatter’ tree along roads. Additionally, in order to evaluate the accuracy of pixel-based DT and object-based SVM, we added benchmark data of ISPRS to compare the classification results of two methods. Object-based SVM classification methods by combining aerial imagery with lidar height information can achieve higher classification accuracy. And, accurately extracting tree class of different landscape pattern should select appropriate machine-learning algorithms. Comparison of the results on two methods will provide a reference for selecting a particular classification approaches according to local conditions.  相似文献   

19.
参数的选择对支持向量机(SVM)分类精度和泛化能力有至关重要的影响,而群体智能算法近年来在参数优化方面应用广泛,在此背景下提出CSA-SVM模型。该模型将分类准确率作为目标函数,利用乌鸦搜索算法(CSA)求得SVM的最优参数组合。为了验证CSA-SVM模型的分类性能,将该模型应用于6个标准分类数据集,并分别与遗传算法(GA)和粒子群(PSO)算法优化后的SVM模型进行性能比较。实验结果表明,CSA算法在SVM参数选择中具有更好地寻优能力和更快地寻优速度,CSA-SVM模型具有较高的分类准确率。  相似文献   

20.
Medical data feature a number of characteristics that make their classification a complex task. Yet, the societal significance of the subject and the computational challenge it presents has caused the classification of medical datasets to be a popular research area. A new hybrid metaheuristic is presented for the classification task of medical datasets. The hybrid ant–bee colonies (HColonies) consists of two phases: an ant colony optimization (ACO) phase and an artificial bee colony (ABC) phase. The food sources of ABC are initialized into decision lists, constructed during the ACO phase using different subsets of the training data. The task of the ABC is to optimize the obtained decision lists. New variants of the ABC operators are proposed to suit the classification task. Results on a number of benchmark, real-world medical datasets show the usefulness of the proposed approach. Classification models obtained feature good predictive accuracy and relatively small model size.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号