Mixture model selection via hierarchical BIC |
| |
Affiliation: | 1. Keck School of Medicine of USC, Marina del Rey, CA, United States;2. Erasmus Medical Center, Rotterdam, The Netherlands;3. Radboud University Medical Center, Nijmegen, The Netherlands;4. King''s College London, London, United Kingdom;5. Queensland University of Technology (QUT), Brisbane, QLD, Australia;6. University of Queensland, Brisbane, QLD, Australia;7. QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia |
| |
Abstract: | The Bayesian information criterion (BIC) is one of the most popular criteria for model selection in finite mixture models. However, it implausibly penalizes the complexity of each component using the whole sample size and completely ignores the clustered structure inherent in the data, resulting in over-penalization. To overcome this problem, a novel criterion called hierarchical BIC (HBIC) is proposed which penalizes the component complexity only using its local sample size and matches the clustered data structure well. Theoretically, HBIC is an approximation of the variational Bayesian (VB) lower bound when sample size is large and the widely used BIC is a less accurate approximation. An empirical study is conducted to verify this theoretical result and a series of experiments is performed on simulated and real data sets to compare HBIC and BIC. The results show that HBIC outperforms BIC substantially and BIC suffers from underestimation. |
| |
Keywords: | Model selection Mixture model EM Maximum likelihood estimation BIC Hierarchical BIC Clustering |
本文献已被 ScienceDirect 等数据库收录! |
|