首页 | 官方网站   微博 | 高级检索  
     


Class-modelling techniques that optimize the probabilities of false noncompliance and false compliance
Authors:   Sagrario Sá  nchez,Luis A. Sarabia
Affiliation:a Department of Chemistry, Faculty of Sciences, University of Burgos, Pza. Misael Bañuelos s/n, 09001 Burgos, Spain
b Department of Mathematics and Computation, Faculty of Sciences, University of Burgos, Pza. Misael Bañuelos s/n, 09001 Burgos, Spain
Abstract:The work presents two approaches for the construction of empirical class-models for a given category C. The attention is centred on the information provided by the sensitivity and specificity, the two usual parameters employed to qualify a class-model. In fact, not only a class-model is built for C but a set of class-models which differ in their sensitivity and specificity. Therefore the range of possible jointly available values is described, allowing the user to select the model that best adapt to specific situations or particular needs.One of the approaches, PLS-CM (Partial Least Squares Class-Modelling), is based on the modelling of the distribution of the values obtained by a PLS model fitted with binary response (belong/do not belong to C). In that way, the corresponding hypothesis test permits the computation of the probabilities α and β of type I and type II errors when deciding whether a sample belongs to C. These probabilities, expressed as percentages, are 100 minus sensitivity and 100 minus specificity, respectively. The representation of β versus α is the risk curve that describes the PLS-CM capability of modelling category C.The other approach comes from setting the problem as a multi-objective optimization problem, the one that corresponds to simultaneously maximize sensitivity and specificity, which usually behave oppositely. The trading-off solutions (again, different class-models) are computed to be Pareto-optimal solutions, that is, the set of the optimal solutions in at least one of the conflicting objectives, what is known as the Pareto-optimal front, POF.Additionally, a procedure to cross-validate the risk curve and the Pareto-optimal front is proposed for the first time in order to evaluate the prediction ability of both methods.Two case-studies are used to drive the discussion: 1) the characterization of wines that official wine-tasters regarded as compliant ones according to the quality characteristics stated by a Denomination of Origin and 2) The characterization of breast tumours defined as benign (compliant class) from 9 cytological variables.Finally, the performance of the methods is tested using several data sets from the literature.
Keywords:Class-modelling   Partial least squares   Pareto-optimal front   Colour wines   Genetic algorithm   Neural network   Sensitivity   Specificity
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号