首页 | 官方网站   微博 | 高级检索  
     

具有高可理解性的二分决策树生成算法研究
引用本文:蒋艳凰,杨学军,赵强利.具有高可理解性的二分决策树生成算法研究[J].软件学报,2003,14(12):1996-2005.
作者姓名:蒋艳凰  杨学军  赵强利
作者单位:1. 国防科学技术大学,计算机学院,湖南,长沙,410073
2. 清华得实科技股份有限公司,北京,100085
基金项目:Supported by the National Natural Science Foundation of China under Grant No.69825104 (国家自然科学基金)
摘    要:二分离散化是决策树生成中处理连续属性最常用的方法,对于连续属性较多的问题,生成的决策树庞大,知识表示难以理解.针对两类分类问题,提出一种基于属性变换的多区间离散化方法--RCAT,该方法首先将连续属性转化为某类别的概率属性,此概率属性的二分法结果对应于原连续属性的多区间划分,然后对这些区间的边缘进行优化,获得原连续属性的信息熵增益,最后采用悲观剪枝与无损合并剪枝技术对RCAT决策树进行简化.对多个领域的数据集进行实验,结果表明:对比二分离散化,RCAT算法的执行效率高,生成的决策树在保持分类精度的同时,树的规模小,可理解性强.

关 键 词:机器学习  二分决策树  信息熵增益  剪枝  RCAT算法
收稿时间:2002/10/29 0:00:00
修稿时间:2002/12/31 0:00:00

Constructing Binary Classification Trees with High Intelligibility
JIANG Yan-Huang,YANG Xue-Jun and ZHAO Qiang-Li.Constructing Binary Classification Trees with High Intelligibility[J].Journal of Software,2003,14(12):1996-2005.
Authors:JIANG Yan-Huang  YANG Xue-Jun and ZHAO Qiang-Li
Abstract:Binarization is the most popular discretization method in decision tree generation, while for the domain with many continuous attributes, it always gets a big incomprehensible tree which can't be described as knowledge. In order to get a more intelligible decision tree, this paper presents a new discretization algorithm, RCAT, for continuous attributes in the generation of binary classification tree. It uses simple binarization to solve the multisplitting problem through mapping a continuous attribute into another probability attribute based on statistic information. Two pruning methods are introduced to simplify the constructed tree. Empirical results of several domains show that, for the two-class problem with a preponderance of continuous attributes, RCAT algorithm can generate a much smaller decision tree efficiently with higher intelligibility than binarization while retaining predictive accuracy.
Keywords:machine learning  binary classification tree  information gain  pruning  range-splitting based on continuous attributes transform (RCAT) algorithm
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号