Multivariate Discretization for Set Mining期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Multivariate Discretization for Set Mining

Authors:	Stephen D Bay

Affiliation:	(1) Department of Information and Computer Science, University of California, Irvine, California, USA, US

Abstract:	Many algorithms in data mining can be formulated as a set-mining problem where the goal is to find conjunctions (or disjunctions) of terms that meet user-specified constraints. Set-mining techniques have been largely designed for categorical or discrete data where variables can only take on a fixed number of values. However, many datasets also contain continuous variables and a common method of dealing with these is to discretize them by breaking them into ranges. Most discretization methods are univariate and consider only a single feature at a time (sometimes in conjunction with a class variable). We argue that this is a suboptimal approach for knowledge discovery as univariate discretization can destroy hidden patterns in data. Discretization should consider the effects on all variables in the analysis and that two regions X and Y should only be in the same interval after discretization if the instances in those regions have similar multivariate distributions (F _x∼F _y) across all variables and combinations of variables. We present a bottom-up merging algorithm to discretize continuous variables based on this rule. Our experiments indicate that the approach is feasible, that it will not destroy hidden patterns and that it will generate meaningful intervals. Received 14 November 2000 / Revised 1 February 2001 / Accepted in revised form 1 May 2001

Keywords:	: Data mining Multivariate discretization Set mining
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏