首页 | 官方网站   微博 | 高级检索  
     

连续属性离散化的Imp-Chi2算法
引用本文:桑雨,闫德勤,刘磊,梁宏霞. 连续属性离散化的Imp-Chi2算法[J]. 计算机工程, 2008, 34(17): 39-41
作者姓名:桑雨  闫德勤  刘磊  梁宏霞
作者单位:辽宁师范大学计算机信息与技术学院,大连,116029;辽宁师范大学计算机信息与技术学院,大连,116029;辽宁师范大学计算机信息与技术学院,大连,116029;辽宁师范大学计算机信息与技术学院,大连,116029
基金项目:国家自然科学基金,辽宁省教育厅资助项目,辽宁师范大学校科研和教改项目
摘    要:连续属性离散化是机器学习和数据挖掘领域中的一个重要问题,离散化是否合理决定着表达和提取相关信息的准确性。经过研究Chi2系列算法,提出一种新的基于属性重要性的连续属性离散化方法——Imp-Chi2算法,该算法依据属性重要性程度对属性离散化的顺序进行了合理的调整,能够更准确地对连续属性进行离散化。文章通过C4.5和支持向量机分别对离散化后的结果进行了实验,在实验过程中,提出一种训练集类比例抽取方法,避免了训练集随机抽取的不均匀性。实验结果证明了所提算法的有效性。

关 键 词:连续属性离散化  Chi2算法  属性重要性  训练集类比例抽取
修稿时间: 

Imp-Chi2 Algorithm for Discretization of Real Value Attributes
SANG Yu,YAN De-qin,LIU Lei,LIANG Hong-xia. Imp-Chi2 Algorithm for Discretization of Real Value Attributes[J]. Computer Engineering, 2008, 34(17): 39-41
Authors:SANG Yu  YAN De-qin  LIU Lei  LIANG Hong-xia
Affiliation:(College of Computer and Information Technology, Liaoning Normal University, Dalian 116029)
Abstract:Discretization is an effective technique to deal with continuous attributes for machine learning and data mining. Reasonability of a discretization process is determined by the accuracy of expression and extraction for informations. By analyzing a series of Chi2 algorithm, a new algorithm called Imp-Chi2 algorithm is proposed, which is based on attribute significance. The algorithm reasonably adjusts the sequence of disretization for attributes according to the level of attribute significance, and exactly discretes the real value attributes. The experiments are performed respectively with the results of discreted data by using C4.5 and SVM. In the process of the experiments, a selection method of training set according to class proportion is presented. The method overcomes the bad-distributed situation for random selection of training set. Experimental results show that the presented algorithm is effective.
Keywords:discretization of real value attributes  Chi2 algorithm  attribute significance  selection of training set according to class proportion
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号