PREDICTIVE MODELING WITH MISSING DATA USING AN AUTOMATIC RELEVANCE DETERMINATION ENSEMBLE: A COMPARATIVE STUDY期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

PREDICTIVE MODELING WITH MISSING DATA USING AN AUTOMATIC RELEVANCE DETERMINATION ENSEMBLE: A COMPARATIVE STUDY

Authors:	Mlungisi Duma Bhekisipho Twala Fulufhelo Nelwamondo Tshilidzi Marwala

Affiliation:	1. Department of Electrical Engineering and the Built Environment , University of Johannesburg , Auckland Park , Johannesburg , South Africa mlungisiduma@gmail.com;3. Department of Electrical Engineering and the Built Environment , University of Johannesburg , Auckland Park , Johannesburg , South Africa;4. Modelling and Digital Science, Council for Scientific and Industrial Research , Pretoria , South Africa

Abstract:	The objective of this article is to present an automatic relevance determination ensemble as an effective variable extraction method for insurance datasets with large numbers of variables. Automatic relevance determination is a method that uses a Bayesian neural network and the evidence framework to rank variables in the order of relevance to the target variable. The current approach uses a single Bayesian neural network that searches only for local minima or maxima. In large datasets with numerous variables, this is a concern because we cannot be certain that the outcome is an optimal one. The method used to address this issue in this study is an automatic relevance determination ensemble with various configurations (or structures) of the Bayesian neural networks. Each outcome in the ensemble is determined by using a confidence factor rather than by scrutinizing the most probable weights values or hyperparameters directly. The extraction method is used with the repeated incremental pruning to produce error reduction, logistic discriminant analysis, and k-nearest neighbor models to evaluate the performance. Furthermore, the datasets employed contain escalating missing data to measure the accuracy and resilience of the models when they are used with the proposed ensemble. The ensemble is compared with the principal component analysis method. The results show that with the automatic relevance determination ensemble, the models achieve higher accuracies in performance than when used with the principal component analysis. Furthermore, the resilience and strength of models is higher when using the ensemble, compared with the principal component analysis method.

Keywords:

设为首页 | 免责声明 | 关于勤云 | 加入收藏