Data Squashing by Empirical Likelihood期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Data Squashing by Empirical Likelihood

Authors:	Art Owen

Affiliation:	(1) Department of Statistics, Stanford University, Sequoia Hall, Stanford, CA 94025, USA

Abstract:	Data squashing was introduced by W. DuMouchel, C. Volinsky, T. Johnson, C. Cortes, and D. Pregibon, in Proceedings of the 5th International Conference on KDD (1999). The idea is to scale data sets down to smaller representative samples instead of scaling up algorithms to very large data sets. They report success in learning model coefficients on squashed data. This paper presents a form of data squashing based on empirical likelihood. This method reweights a random sample of data to match certain expected values to the population. The computation required is a relatively easy convex optimization. There is also a theoretical basis to predict when it will and won't produce large gains. In a credit scoring example, empirical likelihood weighting also accelerates the rate at which coefficients are learned. We also investigate the extent to which these benefits translate into improved accuracy, and consider reweighting in conjunction with boosted decision trees.

Keywords:	credit scoring database abstraction MART misclassification loss reweighting
本文献已被 SpringerLink 等数据库收录！