首页 | 官方网站   微博 | 高级检索  
     


Sampling and Subsampling for Cluster Analysis in Data Mining: With Applications to Sky Survey Data
Authors:David M Rocke  Jian Dai
Affiliation:(1) Center for Image Processing and Integrated Computing, University of California, Davis, CA 95616, USA
Abstract:This paper describes a clustering method for unsupervised classification of objects in large data sets. The new methodology combines the mixture likelihood approach with a sampling and subsampling strategy in order to cluster large data sets efficiently. This sampling strategy can be applied to a large variety of data mining methods to allow them to be used on very large data sets. The method is applied to the problem of automated star/galaxy classification for digital sky data and is tested using a sample from the Digitized Palomar Sky Survey (DPOSS) data. The method is quick and reliable and produces classifications comparable to previous work on these data using supervised clustering.
Keywords:clustering algorithm  mixture likelihood  sampling  star/galaxy classification
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号