首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
The back-propagation network (BPN) is a popular tool with applications in a variety of fields. Nevertheless, different problems may require different parameter settings for a given network architecture. A dataset may contain many features, but not all features are beneficial for classification by the BPN. Therefore, a particle-swarm-optimization-based approach, denoted as PSOBPN, is proposed to obtain the suitable parameter settings for BPN and to select the beneficial subset of features which result in a better classification accuracy rate. A set of 23 problems with a range of examples and features drawn from the UCI (University of California, Irvine) machine learning repository is adopted to test the performance of the proposed algorithm. The results are compared with several well-known published algorithms. The comparative study shows that the proposed approach improves the classification accuracy rate in most test problems. Furthermore, when the feature selection is taken into consideration, the classification accuracy rates of most datasets are increased. The proposed algorithm should thus be useful to both practitioners and researchers.  相似文献   

2.
Support vector machine (SVM) is a state-of-art classification tool with good accuracy due to its ability to generate nonlinear model. However, the nonlinear models generated are typically regarded as incomprehensible black-box models. This lack of explanatory ability is a serious problem for practical SVM applications which require comprehensibility. Therefore, this study applies a C5 decision tree (DT) to extract rules from SVM result. In addition, a metaheuristic algorithm is employed for the feature selection. Both SVM and C5 DT require expensive computation. Applying these two algorithms simultaneously for high-dimensional data will increase the computational cost. This study applies artificial bee colony optimization (ABC) algorithm to select the important features. The proposed algorithm ABC–SVM–DT is applied to extract comprehensible rules from SVMs. The ABC algorithm is applied to implement feature selection and parameter optimization before SVM–DT. The proposed algorithm is evaluated using eight datasets to demonstrate the effectiveness of the proposed algorithm. The result shows that the classification accuracy and complexity of the final decision tree can be improved simultaneously by the proposed ABC–SVM–DT algorithm, compared with genetic algorithm and particle swarm optimization algorithm.  相似文献   

3.
决策树算法采用递归方法构建,训练效率较低,过度分类的决策树可能产生过拟合现象.因此,文中提出模型决策树算法.首先在训练数据集上采用基尼指数递归生成一棵不完全决策树,然后使用一个简单分类模型对其中的非纯伪叶结点(非叶结点且结点包含的样本不属于同一类)进行分类,生成最终的决策树.相比原始的决策树算法,这样产生的模型决策树能在算法精度不损失或损失很小的情况下,提高决策树的训练效率.在标准数据集上的实验表明,文中提出的模型决策树在速度上明显优于决策树算法,具备一定的抗过拟合能力.  相似文献   

4.
The prediction of bank performance is an important issue. The bad performance of banks may first result in bankruptcy, which is expected to influence the economics of the country eventually. Since the early 1970s, many researchers had already made predictions on such issues. However, until recent years, most of them have used traditional statistics to build the prediction model. Because of the vigorous development of data mining techniques, many researchers have begun to apply those techniques to various fields, including performance prediction systems. However, data mining techniques have the problem of parameter settings. Therefore, this study applies particle swarm optimization (PSO) to obtain suitable parameter settings for support vector machine (SVM) and decision tree (DT), and to select a subset of beneficial features, without reducing the classification accuracy rate. In order to evaluate the proposed approaches, dataset collected from Taiwanese commercial banks are used as source data. The experimental results showed that the proposed approaches could obtain a better parameter setting, reduce unnecessary features, and improve the accuracy of classification significantly.  相似文献   

5.
Intrusion detection system (IDS) is to monitor the attacks occurring in the computer or networks. Anomaly intrusion detection plays an important role in IDS to detect new attacks by detecting any deviation from the normal profile. In this paper, an intelligent algorithm with feature selection and decision rules applied to anomaly intrusion detection is proposed. The key idea is to take the advantage of support vector machine (SVM), decision tree (DT), and simulated annealing (SA). In the proposed algorithm, SVM and SA can find the best selected features to elevate the accuracy of anomaly intrusion detection. By analyzing the information from using KDD’99 dataset, DT and SA can obtain decision rules for new attacks and can improve accuracy of classification. In addition, the best parameter settings for the DT and SVM are automatically adjusted by SA. The proposed algorithm outperforms other existing approaches. Simulation results demonstrate that the proposed algorithm is successful in detecting anomaly intrusion detection.  相似文献   

6.
领域自适应方法在特征变换过程中对多个度量大多采取静态权重设置,导致方法在不同任务上效果差异较大.为此,文中提出领域自适应任务中的动态参数调整方法.基于再生希尔伯特空间模型,最小化域间可区分性联合概率分布差异,求解域间不变特征空间.在此过程中,依据A-距离计算域间差异中同类标签和不同类标签分布差异的占比,并以此动态调整可区分性和可迁移性的权重参数,从而达到最优的自适应效果.在3个图像分类数据集上的实验表明文中方法的有效性.  相似文献   

7.
Text classification (TC) is a very crucial task in this century of high-volume text datasets. Feature selection (FS) is one of the most important stages in TC studies. In the literature, numerous feature selection methods are recommended for TC. In the TC domain, filter-based FS methods are commonly utilized to select a more informative feature subsets. Each method uses a scoring system that is based on its algorithm to order the features. The classification process is then carried out by choosing the top-N features. However, each method's feature order is distinct from the others. Each method selects by giving the qualities that are critical to its algorithm a high score, but it does not select by giving the features that are unimportant a low value. In this paper, we proposed a novel filter-based FS method namely, brilliant probabilistic feature selector (BPFS), to assign a fair score and select informative features. While the BPFS method selects unique features, it also aims to select sparse features by assigning higher scores than common features. Extensive experimental studies using three effective classifiers decision tree (DT), support vector machines (SVM), and multinomial naive bayes (MNB) on four widely used datasets named Reuters-21,578, 20Newsgroup, Enron1, and Polarity with different characteristics demonstrate the success of the BPFS method. For feature dimensions, 20, 50, 100, 200, 500, and 1000 dimensions were used. The experimental results on different benchmark datasets show that the BPFS method is more successful than the well-known and recent FS methods according to Micro-F1 and Macro-F1 scores.  相似文献   

8.
Linear discriminant analysis (LDA) is a commonly used classification method. It can provide important weight information for constructing a classification model. However, real-world data sets generally have many features, not all of which benefit the classification results. If a feature selection algorithm is not employed, unsatisfactory classification will result, due to the high correlation between features and noise. This study points out that the feature selection has influence on the LDA by showing an example. The methods traditionally used for LDA to determine the beneficial feature subset are not easy or cannot guarantee the best results when problems have larger number of features.The particle swarm optimization (PSO) is a powerful meta-heuristic technique in the artificial intelligence field; therefore, this study proposed a PSO-based approach, called PSOLDA, to specify the beneficial features and to enhance the classification accuracy rate of LDA. To measure the performance of PSOLDA, many public datasets are employed to measure the classification accuracy rate. Comparing the optimal result obtained by the exhaustive enumeration, the PSOLDA approach can obtain the same optimal result. Due to much time required for exhaustive enumeration when problems have larger number of features, exhaustive enumeration cannot be applied. Therefore, many heuristic approaches, such as forward feature selection, backward feature selection, and PCA-based feature selection are used. This study showed that the classification accuracy rates of the PSOLDA were higher than those of these approaches in many public data sets.  相似文献   

9.
监督学习情况下,经常遇到样例的维数远远大于样本个数的学习情况。此时,样例中存在许多与样例类标签无关的特征,研究如何同时实现稀疏特征选择并具有更好的分类性能的算法具有优势。提出了基于权核逻辑斯蒂非线性回归模型的分类和特征选择算法。权对角矩阵的对角元素在0到1之间取值,对角元素的取值作为学习参数由最优化过程确定,讨论了提出的快速轮转优化算法。提出的算法在十个实际数据集上进行了测试,实验结果显示,提出的分类算法与L1,L2,Lp正则化逻辑斯蒂模型分类算法比较具有优势。  相似文献   

10.
高光谱图像具有高维度、带间相关性较高、样本数量较少等诸多问题,直接利用表示学习算法对高光谱图像进行分类会导致严重的维数灾难.对于高光谱图像,不是所有的光谱带都可用于特定的分类任务.因此,文中提出基于增强空谱特征网络的空间感知协同表示算法.依据高光谱图像内在的低维流形构建基于空谱特征的分层网络.利用训练的网络对高维数据进行特征提取,并利用空间感知协同表示算法进行分类.在两个高光谱数据集Indian Pines和Pavia University上的实验表明文中算法的有效性.  相似文献   

11.
RSKNN 算法是一种基于变精度粗糙集理论的 k-近邻改进算法,该算法能够保证在一定分类精度的前提下,有效地降低分类的计算量,提高分类效率。但由于 RSKNN 算法只是简单地将每个类中的样本划分成一个核心和边界区域,并没有根据数据集本身的特点进行划分,因而存在极大的局限性。针对存在的问题,提出一种多代表点学习算法,运用结构风险最小化理论对影响分类模型期望风险的因素进行分析,并使用无监督的局部聚类算法学习优化代表点集合。在UCI公共数据集上的实验表明,该算法比RSKNN算法具有更高的分类精度。  相似文献   

12.
Classification of large datasets is an important data mining problem. Many classification algorithms have been proposed in the literature, but studies have shown that so far no algorithm uniformly outperforms all other algorithms in terms of quality. In this paper, we present a unifying framework called Rain Forest for classification tree construction that separates the scalability aspects of algorithms for constructing a tree from the central features that determine the quality of the tree. The generic algorithm is easy to instantiate with specific split selection methods from the literature (including C4.5, CART, CHAID, FACT, ID3 and extensions, SLIQ, SPRINT and QUEST). In addition to its generality, in that it yields scalable versions of a wide range of classification algorithms, our approach also offers performance improvements of over a factor of three over the SPRINT algorithm, the fastest scalable classification algorithm proposed previously. In contrast to SPRINT, however, our generic algorithm requires a certain minimum amount of main memory, proportional to the set of distinct values in a column of the input relation. Given current main memory costs, this requirement is readily met in most if not all workloads.  相似文献   

13.
ABSTRACT

It is well known that various features extraction approaches are utilized in polarimetric synthetic aperture (PolSAR) terrain classification for representing the data characteristic. It needs relevant and effective feature fusion algorithms to process complicated features. To address this issue, this article presents a multimodal sparse representation (MSR) framework based algorithm to fuse the different feature vectors from the complicated data space. Polarimetric data features, decomposition features, and the texture features from Pauli colour-coded image are selected to represent multimodal data in different observation modes. The corresponding multimodal manifold regularizations are added to MSR framework to approximate the data structure. Considering the independence and correlation of features, the intrinsic affinity matrices are calculated from this framework. They are processed via local preserve projection algorithm to project the multimodal features into a low dimensionally intrinsic feature space for subsequent classification. Three datasets are utilized in experiments, Western Xi’an, Flevoland, and San Francisco Bay datasets from the Radarsat-2 system in C-band. The effect of regularization parameters and different dimensional fused features are analysed in visualization and quantitation performance. The experiment results demonstrate that the effectiveness and validity of proposed method are superior to other state-of-art methods.  相似文献   

14.
15.
In this study, we propose a set of new algorithms to enhance the effectiveness of classification for 5-year survivability of breast cancer patients from a massive data set with imbalanced property. The proposed classifier algorithms are a combination of synthetic minority oversampling technique (SMOTE) and particle swarm optimization (PSO), while integrating some well known classifiers, such as logistic regression, C5 decision tree (C5) model, and 1-nearest neighbor search. To justify the effectiveness for this new set of classifiers, the g-mean and accuracy indices are used as performance indexes; moreover, the proposed classifiers are compared with previous literatures. Experimental results show that the hybrid algorithm of SMOTE + PSO + C5 is the best one for 5-year survivability of breast cancer patient classification among all algorithm combinations. We conclude that, implementing SMOTE in appropriate searching algorithms such as PSO and classifiers such as C5 can significantly improve the effectiveness of classification for massive imbalanced data sets.  相似文献   

16.
This paper presents a multiple criteria decision approach for trading weekly tool capacity between two semiconductor fabs. Due to the high-cost characteristics of tools, a semiconductor company with multiple fabs (factories) may weekly trade their tool capacities. That is, a lowly utilized workstation in one fab may sell capacity to its highly utilized counterpart in the other fab. Wu and Chang [Wu, M. C., & Chang, W. J. (2007). A short-term capacity trading method for semiconductor fabs with partnership. Expert Systems with Application, 33(2), 476–483] have proposed a method for making weekly trading decisions between two wafer fabs. Compared with no trading, their method could effectively increase the two fabs’ throughput for a longer period such as 8 weeks. However, their trading decision-making is based on a single criterion—number of weekly produced operations, which may still leave a space for improving. We therefore proposed a multiple criteria trading decision approach in order to further increase the two fabs’ throughput. The three decision criteria are: number of operations, number of layers, and number of wafers. This research developed a method to find an optimal weighting vector for the three criteria. The method firstly used NN + GA (neural network + genetic algorithm) to find an optimal trading decision in each week, and then used DOE + RSM (design of experiment + response surface method) to find an optimal weighting vector for a longer period, say 10 weeks. Experiments indicated that the multiple criteria approach indeed outperformed the previous method in terms the fabs’ long-term throughput.  相似文献   

17.
This paper presents a hybrid approach based on feature selection, fuzzy weighted pre-processing and artificial immune recognition system (AIRS) to medical decision support systems. We have used the heart disease and hepatitis disease datasets taken from UCI machine learning database as medical dataset. Artificial immune recognition system has shown an effective performance on several problems such as machine learning benchmark problems and medical classification problems like breast cancer, diabetes, and liver disorders classification. The proposed approach consists of three stages. In the first stage, the dimensions of heart disease and hepatitis disease datasets are reduced to 9 from 13 and 19 in the feature selection (FS) sub-program by means of C4.5 decision tree algorithm (CBA program), respectively. In the second stage, heart disease and hepatitis disease datasets are normalized in the range of [0,1] and are weighted via fuzzy weighted pre-processing. In the third stage, weighted input values obtained from fuzzy weighted pre-processing are classified using AIRS classifier system. The obtained classification accuracies of our system are 92.59% and 81.82% using 50-50% training-test split for heart disease and hepatitis disease datasets, respectively. With these results, the proposed method can be used in medical decision support systems.  相似文献   

18.
In classification, every feature of the data set is an important contributor towards prediction accuracy and affects the model building cost. To extract the priority features for prediction, a suitable feature selector is schemed. This paper proposes a novel memetic based feature selection model named Shapely Value Embedded Genetic Algorithm (SVEGA). The relevance of each feature towards prediction is measured by assembling genetic algorithms with shapely value measures retrieved from SVEGA. The obtained results are then evaluated using Support Vector Machine (SVM) with different kernel configurations on 11 + 11 benchmark datasets (both binary class and multi class). Eventually, a contrasting analysis is done between SVEGA-SVM and other existing feature selection models. The experimental results with the proposed setup provides robust outcome; hence proving it to be an efficient approach for discovering knowledge via feature selection with improved classification accuracy compared to conventional methods.  相似文献   

19.
Class imbalance has become a big problem that leads to inaccurate traffic classification. Accurate traffic classification of traffic flows helps us in security monitoring, IP management, intrusion detection, etc. To address the traffic classification problem, in literature, machine learning (ML) approaches are widely used. Therefore, in this paper, we also proposed an ML-based hybrid feature selection algorithm named WMI_AUC that make use of two metrics: weighted mutual information (WMI) metric and area under ROC curve (AUC). These metrics select effective features from a traffic flow. However, in order to select robust features from the selected features, we proposed robust features selection algorithm. The proposed approach increases the accuracy of ML classifiers and helps in detecting malicious traffic. We evaluate our work using 11 well-known ML classifiers on the different network environment traces datasets. Experimental results showed that our algorithms achieve more than 95% flow accuracy results.  相似文献   

20.
在基于网络流量分析,被动式的网络设备识别研究中,网络流量数据中往往存在许多高维数据,其中的部分特征对设备识别贡献不大,甚至会严重影响分类结果和分类性能.所以针对这个问题本文提出了一种将Filter和Wrapper方式相结合,基于对称不确定性(SU)和近似马尔可夫毯(AMB)的网络流量特征选择算法FSSA,本文提出的方法...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号