排序方式: 共有11条查询结果,搜索用时 15 毫秒
1.
基于支持向量机的AdaBoost人脸检测方法 总被引:4,自引:3,他引:1
人脸的检测与识别技术因其巨大的应用价值及市场潜力,引起各方面的关注,已经成为计算机视觉领域的研究热点.介绍了一种基于支持向量机(SVM)的AdaBoost人脸检测方法.与原有的AdaBoost算法相比,AdaBoostSVM算法通过设置核参数σ的最小值,并自适应地调整σ值来解决AdaBoost算法分类器训练中的过学习问题.该方法降低了复杂性,增强了推广性.实验结果证明,对于人脸模型具有较好的检测效果,并且比单纯运用AdaBooet算法具有更高的正确检测率. 相似文献
2.
粗糙one-class支持向量机(ROCSVM)是一种一类支持向量机,它通过核函数映射,定义上近似超平面和下近似超平面,使得训练样本能根据在粗糙间隔中的位置,自适应地对决策超平面产生影响.由于ROCSVM训练集只有正类样本,因此充分挖掘和利用训练样本的分类特征对于提高ROCSVM的分类性能有重要意义.为此,提出了一种基于训练样本分类特征贡献度的加权高斯核函数(λRBF):先对训练样本做主成分分析(PCA)得到按特征值排序的向量集,以此向量集构造核函数,使得特征值较大的维度在核函数中起较大的作用.在UCI标准数据集和仿真数据上的实验结果表明:与一般RBF的ROCSVM相比,基于λ-RBF的ROCSVM有着更好的泛化性和更高的识别率. 相似文献
3.
4.
Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning 总被引:1,自引:0,他引:1
Tao Wang Author Vitae 《Journal of Systems and Software》2010,83(7):1137-1147
Cost-sensitive learning algorithms are typically designed for minimizing the total cost when multiple costs are taken into account. Like other learning algorithms, cost-sensitive learning algorithms must face a significant challenge, over-fitting, in an applied context of cost-sensitive learning. Specifically speaking, they can generate good results on training data but normally do not produce an optimal model when applied to unseen data in real world applications. It is called data over-fitting. This paper deals with the issue of data over-fitting by designing three simple and efficient strategies, feature selection, smoothing and threshold pruning, against the TCSDT (test cost-sensitive decision tree) method. The feature selection approach is used to pre-process the data set before applying the TCSDT algorithm. The smoothing and threshold pruning are used in a TCSDT algorithm before calculating the class probability estimate for each decision tree leaf. To evaluate our approaches, we conduct extensive experiments on the selected UCI data sets across different cost ratios, and on a real world data set, KDD-98 with real misclassification cost. The experimental results show that our algorithms outperform both the original TCSDT and other competing algorithms on reducing data over-fitting. 相似文献
5.
Petra A. Hoggarth Carrie R.H. Innes John C. Dalrymple-Alford Richard D. Jones 《Accident; analysis and prevention》2015
The prediction of on-road driving ability using off-road measures is a key aim in driving research. The primary goal in most classification models is to determine a small number of off-road variables that predict driving ability with high accuracy. Unfortunately, classification models are often over-fitted to the study sample, leading to inflation of predictive accuracy, poor generalization to the relevant population and, thus, poor validity. Many driving studies do not report sufficient details to determine the risk of model over-fitting and few report any validation technique, which is critical to test the generalizability of a model. After reviewing the literature, we generated a model using a moderately large sample size (n = 279) employing best practice techniques in the context of regression modelling. By then randomly selecting progressively smaller sample sizes we show that a low ratio of participants to independent variables can result in over-fitted models and spurious conclusions regarding model accuracy. We conclude that more stable models can be constructed by following a few guidelines. 相似文献
6.
This study develops a neural network (NN) model to explore the nonlinear relationship between crash frequency and risk factors. To eliminate the possibility of over-fitting and to deal with the black-box characteristic, a network structure optimization algorithm and a rule extraction method are proposed. A case study compares the performance of the trained and modified NN models with that of the traditional negative binomial (NB) model for analyzing crash frequency on road segments in Hong Kong. The results indicate that the optimized NNs have somewhat better fitting and predictive performance than the NB models. Moreover, the smaller training/testing errors in the optimized NNs with pruned input and hidden nodes demonstrate the ability of the structure optimization algorithm to identify the insignificant factors and to improve the model generalization capacity. Furthermore, the rule-set extracted from the optimized NN model can reveal the effect of each explanatory variable on the crash frequency under different conditions, and implies the existence of nonlinear relationship between factors and crash frequency. With the structure optimization algorithm and rule extraction method, the modified NN model has great potential for modeling crash frequency, and may be considered as a good alternative for road safety analysis. 相似文献
7.
《Expert systems with applications》2014,41(17):8003-8015
The current research presents a methodology for classification based on Mahalanobis Distance (MD) and Association Mining using Rough Sets Theory (RST). MD has been used in Mahalanobis Taguchi System (MTS) to develop classification scheme for systems having dichotomous states or categories. In MTS, selection of important features or variables to improve classification accuracy is done using Signal-to-Noise (S/N) ratios and Orthogonal Arrays (OAs). OAs has been reviewed for limitations in handling large number of variables. Secondly, penalty for over-fitting or regularization is not included in the feature selection process for the MTS classifier. Besides, there is scope to enhance the utility of MTS to a classification-cum-causality analysis method by adding comprehensive information about the underlying process which generated the data. This paper proposes to select variables based on maximization of degree-of-dependency between Subset of System Variables (SSVs) and system classes or categories (R). Degree-of-dependency, which reflects goodness-of-model and hence goodness of the SSV, is measured by conditional probability of system states on subset of variables. Moreover, a suitable regularization factor equivalent to L0 norm is introduced in an optimization problem which jointly maximizes goodness-of-model and effect of regularization. Dependency between SSVs and R is modeled via the equivalent sets of Rough Set Theory. Two new variants of MTS classifier are developed and their performance in terms of accuracy of classification is evaluated on test datasets from five case studies. The proposed variants of MTS are observed to be performing better than existing MTS methods and other classification techniques found in literature. 相似文献
8.
粗糙one-class支持向量机(ROC-SVM)在粗糙集理论基础上通过构建粗糙上超平面和下超平面来处理过拟合问题,但是在寻找最优分类超平面的过程中,忽略了训练样本类内结构这一非常重要的先验知识。因此,提出了一种基于类内散度的粗糙one-class支持向量机(WSROC-SVM),该方法通过最小化训练样本类内散度来优化训练样本类内结构,一方面使训练样本在高维特征空间中与坐标原点的间隔尽可能大,另一方面使得训练样本在粗糙上超平面尽可能紧密。在合成数据集和UCI数据集上的实验结果表明,较原始算法,该方法有着更高的识别率和更好的泛化性能,在解决实际分类问题上更具优越性。 相似文献
9.
Soft computing techniques have been widely used during the last two decades for nonlinear system modeling, specifically as predictive tools. In this study, the performances of two well-known soft computing predictive techniques, artificial neural network (ANN) and genetic programming (GP), are evaluated based on several criteria, including over-fitting potential. A case study in punching shear prediction of RC slabs is modeled here using a hybrid ANN (which includes simulated annealing and multi-layer perception) and an established GP variant called gene expression programming. The ANN and GP results are compared to values determined from several design codes. For more verification, external validation and parametric studies were also conducted. The results of this study indicate that model acceptance criteria should include engineering analysis from parametric studies. 相似文献
10.
This paper proposes a multi-level approach to data clustering and provides a novel approach to characterisation of clay soils by, effectively, looking at the same clay sample from different angles. It is shown that using this approach can help avoid detection of spurious clusters or skipping vital natural grouping in data. Muscovite, illite and kaolinite were identified by X-ray diffraction (XRD) in <4 μm fraction of soil samples obtained from the periphery of an abandoned manganese oxide mine and semi quantified as major, minor and trace. Based on information inherent in the data attributes, useful rules for grouping the samples were generated and with the aid of multiple data clustering, applied to characterize the clay minerals occurrences in the soils. The paper found that the presence of large quantities of illite and kaolinite heavily influence the formation of clusters. When the most influential variables—LJ and KJ were taken out, the resulting model showed that muscovite traces play a vital role in initial cluster building and the importance matrix of inputs suggested inter-dependence between muscovite, kaolinite and illite traces as well as between them and minor quantities of illite. Dwelling on aspects of clay mineralogy and modelling sciences, the paper marks a significant departure from the conventional approaches to clay characterisation by showing how effectively data mining methods can be adopted in the area. For a successful approach to characterisation of clay minerals in African soils, the paper recommends to set-up data repositories that will provide scientific data sources and forums in a multi-disciplinary environment. This is particularly important as capturing interesting patterns requires expert knowledge describing the emerging natural groupings. 相似文献