首页 | 官方网站   微博 | 高级检索  
     


Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning
Authors:Tao Wang [Author Vitae]
Affiliation:a Faculty of EIT, University of Technology Sydney, P.O. Box 123, Broadway, Sydney, NSW 2007, Australia
b Key Lab of High Confidence Software Technologies, School of EE & CS, Beijing University, Beijing, China
c Department of Computer Science, Zhejiang Normal University, Jinhua, China
Abstract:Cost-sensitive learning algorithms are typically designed for minimizing the total cost when multiple costs are taken into account. Like other learning algorithms, cost-sensitive learning algorithms must face a significant challenge, over-fitting, in an applied context of cost-sensitive learning. Specifically speaking, they can generate good results on training data but normally do not produce an optimal model when applied to unseen data in real world applications. It is called data over-fitting. This paper deals with the issue of data over-fitting by designing three simple and efficient strategies, feature selection, smoothing and threshold pruning, against the TCSDT (test cost-sensitive decision tree) method. The feature selection approach is used to pre-process the data set before applying the TCSDT algorithm. The smoothing and threshold pruning are used in a TCSDT algorithm before calculating the class probability estimate for each decision tree leaf. To evaluate our approaches, we conduct extensive experiments on the selected UCI data sets across different cost ratios, and on a real world data set, KDD-98 with real misclassification cost. The experimental results show that our algorithms outperform both the original TCSDT and other competing algorithms on reducing data over-fitting.
Keywords:Classification  Cost-sensitive learning  Over-fitting
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号