首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper proposes a genetic algorithm feature selection (GAFS) for image retrieval systems and image classification. Two texture features of adaptive motifs co-occurrence matrix (AMCOM) and gradient histogram for adaptive motifs (GHAM) and color feature of an adaptive color histogram for K-means (ACH) were used in this paper. In this paper, the feature selections have adopted sequential forward selection (SFS), sequential backward selection (SBS), and genetic algorithms feature selection (GAFS). Image retrieval and classification performance mainly build from three features: ACH, AMCOM and GHAM, where the classification system is used for two-class SVM classification. In the experimental results, we can find that all the methods regarding feature extraction mentioned in this study can contribute to better results with regard to image retrieval and image classification. The GAFS can provide a more robust solution at the expense of increased computational effort. By applying GAFS to image retrieval systems, not only could the number of features be effectively reduced, but higher image retrieval accuracy is elicited.  相似文献   

2.
Liver biopsy is considered to be the gold standard for analyzing chronic hepatitis and fibrosis; however, it is an invasive and expensive approach, which is also difficult to standardize. Medical imaging techniques such as ultrasonography, computed tomography (CT), and magnetic resonance imaging are non-invasive and helpful methods to interpret liver texture, and may be good alternatives to needle biopsy. Recently, instead of visual inspection of these images, computer-aided image analysis based approaches have become more popular. In this study, a non-invasive, low-cost and relatively accurate method was developed to determine liver fibrosis stage by analyzing some texture features of liver CT images. In this approach, some suitable regions of interests were selected on CT images and a comprehensive set of texture features were obtained from these regions using different methods, such as Gray Level Co-occurrence matrix (GLCM), Laws’ method, Discrete Wavelet Transform (DWT), and Gabor filters. Afterwards, sequential floating forward selection and exhaustive search methods were used in various combinations for the selection of most discriminating features. Finally, those selected texture features were classified using two methods, namely, Support Vector Machines (SVM) and k-nearest neighbors (k-NN). The mean classification accuracy in pairwise group comparisons was approximately 95% for both classification methods using only 5 features. Also, performance of our approach in classifying liver fibrosis stage of subjects in the test set into 7 possible stages was investigated. In this case, both SVM and k-NN methods have returned relatively low classification accuracies. Our pairwise group classification results showed that DWT, Gabor, GLCM, and Laws’ texture features were more successful than the others; as such features extracted from these methods were used in the feature fusion process. Fusing features from these better performing families further improved the classification performance. The results show that our approach can be used as a decision support system in especially pairwise fibrosis stage comparisons.  相似文献   

3.
Most of the widely used pattern classification algorithms, such as Support Vector Machines (SVM), are sensitive to the presence of irrelevant or redundant features in the training data. Automatic feature selection algorithms aim at selecting a subset of features present in a given dataset so that the achieved accuracy of the following classifier can be maximized. Feature selection algorithms are generally categorized into two broad categories: algorithms that do not take the following classifier into account (the filter approaches), and algorithms that evaluate the following classifier for each considered feature subset (the wrapper approaches). Filter approaches are typically faster, but wrapper approaches deliver a higher performance. In this paper, we present the algorithm – Predictive Forward Selection – based on the widely used wrapper approach forward selection. Using ideas from meta-learning, the number of required evaluations of the target classifier is reduced by using experience knowledge gained during past feature selection runs on other datasets. We have evaluated our approach on 59 real-world datasets with a focus on SVM as the target classifier. We present comparisons with state-of-the-art wrapper and filter approaches as well as one embedded method for SVM according to accuracy and run-time. The results show that the presented method reaches the accuracy of traditional wrapper approaches requiring significantly less evaluations of the target algorithm. Moreover, our method achieves statistically significant better results than the filter approaches as well as the embedded method.  相似文献   

4.
In many pattern recognition applications, high-dimensional feature vectors impose a high computational cost as well as the risk of “overfitting”. Feature Selection addresses the dimensionality reduction problem by determining a subset of available features which is most essential for classification. This paper presents a novel feature selection method named filtered and supported sequential forward search (FS_SFS) in the context of support vector machines (SVM). In comparison with conventional wrapper methods that employ the SFS strategy, FS_SFS has two important properties to reduce the time of computation. First, it dynamically maintains a subset of samples for the training of SVM. Because not all the available samples participate in the training process, the computational cost to obtain a single SVM classifier is decreased. Secondly, a new criterion, which takes into consideration both the discriminant ability of individual features and the correlation between them, is proposed to effectively filter out nonessential features. As a result, the total number of training is significantly reduced and the overfitting problem is alleviated. The proposed approach is tested on both synthetic and real data to demonstrate its effectiveness and efficiency.  相似文献   

5.
In conjunction with the advance in computer technology, virtual screening of small molecules has been started to use in drug discovery. Since there are thousands of compounds in early-phase of drug discovery, a fast classification method, which can distinguish between active and inactive molecules, can be used for screening large compound collections. In this study, we used Support Vector Machines (SVM) for this type of classification task. SVM is a powerful classification tool that is becoming increasingly popular in various machine-learning applications. The data sets consist of 631 compounds for training set and 216 compounds for a separate test set. In data pre-processing step, the Pearson's correlation coefficient used as a filter to eliminate redundant features. After application of the correlation filter, a single SVM has been applied to this reduced data set. Moreover, we have investigated the performance of SVM with different feature selection strategies, including SVM–Recursive Feature Elimination, Wrapper Method and Subset Selection. All feature selection methods generally represent better performance than a single SVM while Subset Selection outperforms other feature selection methods. We have tested SVM as a classification tool in a real-life drug discovery problem and our results revealed that it could be a useful method for classification task in early-phase of drug discovery.  相似文献   

6.
This paper presents a hybrid filter-wrapper feature subset selection algorithm based on particle swarm optimization (PSO) for support vector machine (SVM) classification. The filter model is based on the mutual information and is a composite measure of feature relevance and redundancy with respect to the feature subset selected. The wrapper model is a modified discrete PSO algorithm. This hybrid algorithm, called maximum relevance minimum redundancy PSO (mr2PSO), is novel in the sense that it uses the mutual information available from the filter model to weigh the bit selection probabilities in the discrete PSO. Hence, mr2PSO uniquely brings together the efficiency of filters and the greater accuracy of wrappers. The proposed algorithm is tested over several well-known benchmarking datasets. The performance of the proposed algorithm is also compared with a recent hybrid filter-wrapper algorithm based on a genetic algorithm and a wrapper algorithm based on PSO. The results show that the mr2PSO algorithm is competitive in terms of both classification accuracy and computational performance.  相似文献   

7.
We present in this paper our winning solution to Dedicated Task 1 in Nokia Mobile Data Challenge (MDC). MDC Task 1 is to infer the semantic category of a place based on the smartphone sensing data obtained at that place. We approach this task in a standard supervised learning setting: we extract discriminative features from the sensor data and use state-of-the-art classifiers (SVM, Logistic Regression and Decision Tree Family) to build classification models. We have found that feature engineering, or in other words, constructing features using human heuristics, is very effective for this task. In particular, we have proposed a novel feature engineering technique, Conditional Feature (CF), a general framework for domain-specific feature construction. In total, we have generated 2,796,200 features and in our final five submissions we use feature selection to select 100 to 2000 features. One of our key findings is that features conditioned on fine-granularity time intervals, e.g. every 30 min, are most effective. Our best 10-fold CV accuracy on training set is 75.1% by Gradient Boosted Trees, and the second best accuracy is 74.6% by L1-regularized Logistic Regression. Besides the good performance, we also report briefly our experience of using F# language for large-scale (~70 GB raw text data) conditional feature construction.  相似文献   

8.
Kernel Function in SVM-RFE based Hyperspectral Data band Selection   总被引:2,自引:0,他引:2  
Supporting vector machine recursive feature elimination (SVM-RFE) has a low efficiency when it is applied to band selection for hyperspectral dada,since it usually uses a non-linear kernel and trains SVM every time after deleting a band.Recent research shows that SVM with non-linear kernel doesn’t always perform better than linear one for SVM classification.Similarly,there is some uncertainty on which kernel is better in SVM-RFE based band selection.This paper compares the classification results in SVM-RFE using two SVMs,then designs two optimization strategies for accelerating the band selection process:the percentage accelerated method and the fixed accelerated method.Through an experiment on AVIRIS hyperspectral data,this paper found:① Classification precision of SVM will slightly decrease with the increasing of redundant bands,which means SVM classification needs feature selection in terms of classification accuracy;② The best band collection selected by SVM-RFE with linear SVM that has higher classification accuracy and less effective bands than that with non-linear SVM;③ Both two optimization strategies improved the efficiency of the feature selection,and percentage eliminating performed better than fixed eliminating method in terms of computational efficiency and classification accuracy.  相似文献   

9.
ContextSeveral issues hinder software defect data including redundancy, correlation, feature irrelevance and missing samples. It is also hard to ensure balanced distribution between data pertaining to defective and non-defective software. In most experimental cases, data related to the latter software class is dominantly present in the dataset.ObjectiveThe objectives of this paper are to demonstrate the positive effects of combining feature selection and ensemble learning on the performance of defect classification. Along with efficient feature selection, a new two-variant (with and without feature selection) ensemble learning algorithm is proposed to provide robustness to both data imbalance and feature redundancy.MethodWe carefully combine selected ensemble learning models with efficient feature selection to address these issues and mitigate their effects on the defect classification performance.ResultsForward selection showed that only few features contribute to high area under the receiver-operating curve (AUC). On the tested datasets, greedy forward selection (GFS) method outperformed other feature selection techniques such as Pearson’s correlation. This suggests that features are highly unstable. However, ensemble learners like random forests and the proposed algorithm, average probability ensemble (APE), are not as affected by poor features as in the case of weighted support vector machines (W-SVMs). Moreover, the APE model combined with greedy forward selection (enhanced APE) achieved AUC values of approximately 1.0 for the NASA datasets: PC2, PC4, and MC1.ConclusionThis paper shows that features of a software dataset must be carefully selected for accurate classification of defective components. Furthermore, tackling the software data issues, mentioned above, with the proposed combined learning model resulted in remarkable classification performance paving the way for successful quality control.  相似文献   

10.
鉴于支持向量机特征选择和参数优化对其分类准确率有重大的影响,将支持向量机渐近性能融入遗传算法并生成特征染色体,从而将遗传算法的搜索导向超参数空间中的最佳化误差直线.在此基础上,提出一种新的基十带特征染色体遗传算法的方法,同时进行支持向量机特征选择和参数优化.在与网格搜索、不带特征染色体遗传算法和其他方法的比较中,所提出的方法具有较高的准确率、更小的特征子集和更少的处理时间.  相似文献   

11.
Support vector machine (SVM) is a novel pattern classification method that is valuable in many applications. Kernel parameter setting in the SVM training process, along with the feature selection, significantly affects classification accuracy. The objective of this study is to obtain the better parameter values while also finding a subset of features that does not degrade the SVM classification accuracy. This study develops a simulated annealing (SA) approach for parameter determination and feature selection in the SVM, termed SA-SVM.To measure the proposed SA-SVM approach, several datasets in UCI machine learning repository are adopted to calculate the classification accuracy rate. The proposed approach was compared with grid search which is a conventional method of performing parameter setting, and various other methods. Experimental results indicate that the classification accuracy rates of the proposed approach exceed those of grid search and other approaches. The SA-SVM is thus useful for parameter determination and feature selection in the SVM.  相似文献   

12.
Contemporary biological technologies produce extremely high-dimensional data sets from which to design classifiers, with 20,000 or more potential features being common place. In addition, sample sizes tend to be small. In such settings, feature selection is an inevitable part of classifier design. Heretofore, there have been a number of comparative studies for feature selection, but they have either considered settings with much smaller dimensionality than those occurring in current bioinformatics applications or constrained their study to a few real data sets. This study compares some basic feature-selection methods in settings involving thousands of features, using both model-based synthetic data and real data. It defines distribution models involving different numbers of markers (useful features) versus non-markers (useless features) and different kinds of relations among the features. Under this framework, it evaluates the performances of feature-selection algorithms for different distribution models and classifiers. Both classification error and the number of discovered markers are computed. Although the results clearly show that none of the considered feature-selection methods performs best across all scenarios, there are some general trends relative to sample size and relations among the features. For instance, the classifier-independent univariate filter methods have similar trends. Filter methods such as the t-test have better or similar performance with wrapper methods for harder problems. This improved performance is usually accompanied with significant peaking. Wrapper methods have better performance when the sample size is sufficiently large. ReliefF, the classifier-independent multivariate filter method, has worse performance than univariate filter methods in most cases; however, ReliefF-based wrapper methods show performance similar to their t-test-based counterparts.  相似文献   

13.
基于二进制PSO算法的特征选择及SVM参数同步优化   总被引:3,自引:0,他引:3  
特征选择及分类器参数优化是提高分类器性能的两个重要方面,传统上这两个问题是分开解决的。近年来,随着进化优化计算技术在模式识别领域的广泛应用,编码上的灵活性使得特征选择及参数的同步优化成为一种可能和趋势。为了解决此问题,本文研究采用二进制PSO算法同步进行特征选择及SVM参数的同步优化,提出了一种PSO-SVM算法。实验表明,该方法可有效地找出合适的特征子集及SVM参数,并取得较好的分类效果;且与文[4]所提出的GA-SVM算法相比具有特征精简幅度较大、运行效率较高等优点。  相似文献   

14.
15.
This work presents a global geometric similarity scheme (GGSS) for feature selection in fault diagnosis, which is composed of global geometric model and similarity metric. The global geometric model is formed to construct connections between disjoint clusters in fault diagnosis. The similarity metric of the global geometric model is applied to filter feature subsets. To evaluate the performance of GGSS, fault data from wind turbine test rig is collected, and condition classification is carried out with classifiers established by Support Vector Machine (SVM) and General Regression Neural Network (GRNN). The classification results are compared with feature ranking methods and feature wrapper approaches. GGSS achieves higher classification accuracy than the feature ranking methods, and better time efficiency than the feature wrapper approaches. The hybrid scheme, GGSS with wrapper, obtains optimal classification accuracy and time efficiency. The proposed scheme can be applied in feature selection to get better accuracy and efficiency in condition classification of fault diagnosis.  相似文献   

16.
Syndromic surveillance can play an important role in protecting the public's health against infectious diseases. Infectious disease outbreaks can have a devastating effect on society as well as the economy, and global awareness is therefore critical to protecting against major outbreaks. By monitoring online news sources and developing an accurate news classification system for syndromic surveillance, public health personnel can be apprised of outbreaks and potential outbreak situations. In this study, we have developed a framework for automatic online news monitoring and classification for syndromic surveillance. The framework is unique and none of the techniques adopted in this study have been previously used in the context of syndromic surveillance on infectious diseases. In recent classification experiments, we compared the performance of different feature subsets on different machine learning algorithms. The results showed that the combined feature subsets including Bag of Words, Noun Phrases, and Named Entities features outperformed the Bag of Words feature subsets. Furthermore, feature selection improved the performance of feature subsets in online news classification. The highest classification performance was achieved when using SVM upon the selected combination feature subset.  相似文献   

17.
The aim of this paper is to propose a new hybrid data mining model based on combination of various feature selection and ensemble learning classification algorithms, in order to support decision making process. The model is built through several stages. In the first stage, initial dataset is preprocessed and apart of applying different preprocessing techniques, we paid a great attention to the feature selection. Five different feature selection algorithms were applied and their results, based on ROC and accuracy measures of logistic regression algorithm, were combined based on different voting types. We also proposed a new voting method, called if_any, that outperformed all other voting methods, as well as a single feature selection algorithm's results. In the next stage, a four different classification algorithms, including generalized linear model, support vector machine, naive Bayes and decision tree, were performed based on dataset obtained in the feature selection process. These classifiers were combined in eight different ensemble models using soft voting method. Using the real dataset, the experimental results show that hybrid model that is based on features selected by if_any voting method and ensemble GLM + DT model performs the highest performance and outperforms all other ensemble and single classifier models.  相似文献   

18.
This paper focuses on feature selection in classification. A new version of support vector machine (SVM) named p-norm support vector machine ( $p\in[0,1]$ ) is proposed. Different from the standard SVM, the p-norm $(p\in[0,1])$ of the normal vector of the decision plane is used which leads to more sparse solution. Our new model can not only select less features but also improve the classification accuracy by adjusting the parameter p. The numerical experiments results show that our p-norm SVM is more effective than some usual methods in feature selection.  相似文献   

19.
Spectral features of images, such as Gabor filters and wavelet transform can be used for texture image classification. That is, a classifier is trained based on some labeled texture features as the training set to classify unlabeled texture features of images into some pre-defined classes. The aim of this paper is twofold. First, it investigates the classification performance of using Gabor filters, wavelet transform, and their combination respectively, as the texture feature representation of scenery images (such as mountain, castle, etc.). A k-nearest neighbor (k-NN) classifier and support vector machine (SVM) are also compared. Second, three k-NN classifiers and three SVMs are combined respectively, in which each of the combined three classifiers uses one of the above three texture feature representations respectively, to see whether combining multiple classifiers can outperform the single classifier in terms of scenery image classification. The result shows that a single SVM using Gabor filters provides the highest classification accuracy than the other two spectral features and the combined three k-NN classifiers and three SVMs.  相似文献   

20.
低层特征的选择与提取是自动图像分类的基础,一方面,所选择的图像特征应能代表各种不同的图像属性,利于不同类别图像之间的区分;另一方面,为了提高后续模型的计算效率,需要减少噪声特征、冗余特征.提出了一种基于特征加权的自动图像分类方法.该方法根据图像低层特征分布的离散程度来衡量特征相对于类别的重要性,增加相关度高的特征的权重,降低相关度低的特征权重,从而避免后续模型被弱相关或不相关的特征所支配.所提的特征加权算法主要考察的是特征相对某个具体类别的重要程度,可以为每个类别选择出适合自身的特征权重.然后,将加权特征嵌入到支持向量机算法中用于自动图像分类,在Corel图像数据集上的实验结果表明,基于特征加权的自动图像分类算法可以有效地提高图像分类的准确性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号