首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Credit scoring focuses on the development of empirical models to support the financial decision‐making processes of financial institutions and credit industries. It makes use of applicants' historical data and statistical or machine learning techniques to assess the risk associated with an applicant. However, the historical data may consist of redundant and noisy features that affect the performance of credit scoring models. The main focus of this paper is to develop a hybrid model, combining feature selection and a multilayer ensemble classifier framework, to improve the predictive performance of credit scoring. The proposed hybrid credit scoring model is modeled in three phases. The initial phase constitutes preprocessing and assigns ranks and weights to classifiers. In the next phase, the ensemble feature selection approach is applied to the preprocessed dataset. Finally, in the last phase, the dataset with the selected features is used in a multilayer ensemble classifier framework. In addition, a classifier placement algorithm based on the Choquet integral value is designed, as the classifier placement affects the predictive performance of the ensemble framework. The proposed hybrid credit scoring model is validated on real‐world credit scoring datasets, namely, Australian, Japanese, German‐categorical, and German‐numerical datasets.  相似文献   

2.
Credit scoring aims to assess the risk associated with lending to individual consumers. Recently, ensemble classification methodology has become popular in this field. However, most researches utilize random sampling to generate training subsets for constructing the base classifiers. Therefore, their diversity is not guaranteed, which may lead to a degradation of overall classification performance. In this paper, we propose an ensemble classification approach based on supervised clustering for credit scoring. In the proposed approach, supervised clustering is employed to partition the data samples of each class into a number of clusters. Clusters from different classes are then pairwise combined to form a number of training subsets. In each training subset, a specific base classifier is constructed. For a sample whose class label needs to be predicted, the outputs of these base classifiers are combined by weighted voting. The weight associated with a base classifier is determined by its classification performance in the neighborhood of the sample. In the experimental study, two benchmark credit data sets are adopted for performance evaluation, and an industrial case study is conducted. The results show that compared to other ensemble classification methods, the proposed approach is able to generate base classifiers with higher diversity and local accuracy, and improve the accuracy of credit scoring.  相似文献   

3.
The One-vs-One strategy is among the most used techniques to deal with multi-class problems in Machine Learning. This way, any binary classifier can be used to address the original problem, since one classifier is learned for each possible pair of classes. As in every ensemble method, classifier combination becomes a vital step in the classification process. Even though many combination models have been developed in the literature, none of them have dealt with the possibility of reducing the number of generated classifiers after the training phase, i.e., ensemble pruning, since every classifier is supposed to be necessary.On this account, our objective in this paper is two-fold: (1) We propose a transformation of the aggregation step, which lead us to a new combination strategy where instances are classified on the basis of the similarities among score-matrices. (2) This fact allows us to introduce the possibility of reducing the number of binary classifiers without affecting the final accuracy. We will show that around 50% of classifiers can be removed (depending on the base learner and the specific problem) and that the confidence degrees obtained by these base classifiers have a strong influence on the improvement in the final accuracy.A thorough experimental study is carried out in order to show the behavior of the proposed approach in comparison with the state-of-the-art combination models in the One-vs-One strategy. Different classifiers from various Machine Learning paradigms are considered as base classifiers and the results obtained are contrasted with the proper statistical analysis.  相似文献   

4.
Many techniques have been proposed for credit risk assessment, from statistical models to artificial intelligence methods. During the last few years, different approaches to classifier ensembles have successfully been applied to credit scoring problems, demonstrating to be more accurate than single prediction models. However, it is still a question what base classifiers should be employed in each ensemble in order to achieve the highest performance. Accordingly, the present paper evaluates the performance of seven individual prediction techniques when used as members of five different ensemble methods. The ultimate aim of this study is to suggest appropriate classifiers for each ensemble approach in the context of credit scoring. The experimental results and statistical tests show that the C4.5 decision tree constitutes the best solution for most ensemble methods, closely followed by the multilayer perceptron neural network and logistic regression, whereas the nearest neighbour and the naive Bayes classifiers appear to be significantly the worst.  相似文献   

5.
基于粗集理论的选择性支持向量机集成   总被引:1,自引:0,他引:1       下载免费PDF全文
集成分类器的性能很大程度决定于各成员分类器的构造和对各成员分类器的组合方法。提出一种基于粗集理论的选择性支持向量机集成算法,该算法首先利用粗集技术产生一个属性约简集合,然后以各约简集为样本属性空间构造各成员分类器,其次通过对各成员分类器精度与差异度的计算,选择既满足个体的精度要求,又满足个体差异性要求的成员分类器进行集成。最后通过对UCI上一组实验数据的测试,证实该方法能够有效提高支持向量机的推广性能。  相似文献   

6.
Many techniques have been proposed for credit risk assessment, from statistical models to artificial intelligence methods. During the last few years, different approaches to classifier ensembles have successfully been applied to credit scoring problems, demonstrating to be generally more accurate than single prediction models. The present paper goes one step beyond by introducing composite ensembles that jointly use different strategies for diversity induction. Accordingly, the combination of data resampling algorithms (bagging and AdaBoost) and attribute subset selection methods (random subspace and rotation forest) for the construction of composite ensembles is explored with the aim of improving the prediction performance. The experimental results and statistical tests show that this new two-level classifier ensemble constitutes an appropriate solution for credit scoring problems, performing better than the traditional single ensembles and very significantly better than individual classifiers.  相似文献   

7.
针对原有集成学习多样性不足而导致的集成效果不够显著的问题,提出一种基于概率校准的集成学习方法以及两种降低多重共线性影响的方法。首先,通过使用不同的概率校准方法对原始分类器给出的概率进行校准;然后使用前一步生成的若干校准后的概率进行学习,从而预测最终结果。第一步中使用的不同概率校准方法为第二步的集成学习提供了更强的多样性。接下来,针对校准概率与原始概率之间的多重共线性问题,提出了选择最优(choose-best)和有放回抽样(bootstrap)的方法。选择最优方法对每个基分类器,从原始分类器和若干校准分类器之间选择最优的进行集成;有放回抽样方法则从整个基分类器集合中进行有放回的抽样,然后对抽样出来的分类器进行集成。实验表明,简单的概率校准集成学习对学习效果的提高有限,而使用了选择最优和有放回抽样方法后,学习效果得到了较大的提高。此结果说明,概率校准为集成学习提供了更强的多样性,其伴随的多重共线性问题可以通过抽样等方法有效地解决。  相似文献   

8.
在集成学习中使用平均法、投票法作为结合策略无法充分利用基分类器的有效信息,且根据波动性设置基分类器的权重不精确、不恰当。以上问题会降低集成学习的效果,为了进一步提高集成学习的性能,提出将证据推理(evidence reasoning, ER)规则作为结合策略,并使用多样性赋权法设置基分类器的权重。首先,由多个深度学习模型作为基分类器、ER规则作为结合策略,构建集成学习的基本结构;然后,通过多样性度量方法计算每个基分类器相对于其他基分类器的差异性;最后,将差异性归一化实现基分类器的权重设置。通过多个图像数据集的分类实验,结果表明提出的方法较实验选取的其他方法准确率更高且更稳定,证明了该方法可以充分利用基分类器的有效信息,且多样性赋权法更精确。  相似文献   

9.
A data driven ensemble classifier for credit scoring analysis   总被引:2,自引:0,他引:2  
This study focuses on predicting whether a credit applicant can be categorized as good, bad or borderline from information initially supplied. This is essentially a classification task for credit scoring. Given its importance, many researchers have recently worked on an ensemble of classifiers. However, to the best of our knowledge, unrepresentative samples drastically reduce the accuracy of the deployment classifier. Few have attempted to preprocess the input samples into more homogeneous cluster groups and then fit the ensemble classifier accordingly. For this reason, we introduce the concept of class-wise classification as a preprocessing step in order to obtain an efficient ensemble classifier. This strategy would work better than a direct ensemble of classifiers without the preprocessing step. The proposed ensemble classifier is constructed by incorporating several data mining techniques, mainly involving optimal associate binning to discretize continuous values; neural network, support vector machine, and Bayesian network are used to augment the ensemble classifier. In particular, the Markov blanket concept of Bayesian network allows for a natural form of feature selection, which provides a basis for mining association rules. The learned knowledge is represented in multiple forms, including causal diagram and constrained association rules. The data driven nature of the proposed system distinguishes it from existing hybrid/ensemble credit scoring systems.  相似文献   

10.
Ensemble pruning deals with the selection of base learners prior to combination in order to improve prediction accuracy and efficiency. In the ensemble literature, it has been pointed out that in order for an ensemble classifier to achieve higher prediction accuracy, it is critical for the ensemble classifier to consist of accurate classifiers which at the same time diverse as much as possible. In this paper, a novel ensemble pruning method, called PL-bagging, is proposed. In order to attain the balance between diversity and accuracy of base learners, PL-bagging employs positive Lasso to assign weights to base learners in the combination step. Simulation studies and theoretical investigation showed that PL-bagging filters out redundant base learners while it assigns higher weights to more accurate base learners. Such improved weighting scheme of PL-bagging further results in higher classification accuracy and the improvement becomes even more significant as the ensemble size increases. The performance of PL-bagging was compared with state-of-the-art ensemble pruning methods for aggregation of bootstrapped base learners using 22 real and 4 synthetic datasets. The results indicate that PL-bagging significantly outperforms state-of-the-art ensemble pruning methods such as Boosting-based pruning and Trimmed bagging.  相似文献   

11.
为了提高分类器集成性能,提出了一种基于聚类算法与排序修剪结合的分类器集成方法。首先将混淆矩阵作为量化基分类器间差异度的工具,通过聚类将分类器划分为若干子集;然后提出一种排序修剪算法,以距离聚类中心最近的分类器为起点,根据分类器的距离对差异度矩阵动态加权,以加权差异度作为排序标准对子集中的分类器进行按比例修剪;最后使用投票法对选出的基分类器进行集成。同时与多种集成方法在UCI数据库中的10组数据集上进行对比与分析,实验结果表明基于聚类与排序修剪的分类器选择方法有效提升了集成系统的分类能力。  相似文献   

12.
Hybrid models based on feature selection and machine learning techniques have significantly enhanced the accuracy of standalone models. This paper presents a feature selection‐based hybrid‐bagging algorithm (FS‐HB) for improved credit risk evaluation. The 2 feature selection methods chi‐square and principal component analysis were used for ranking and selecting the important features from the datasets. The classifiers were built on 5 training and test data partitions of the input data set. The performance of the hybrid algorithm was compared with that of the standalone classifiers: feature selection‐based classifiers and bagging. The hybrid FS‐HB algorithm performed best for qualitative dataset with less features and tree‐based unstable base classifier. Its performance on numeric data was also better than other standalone classifiers, whereas comparable to bagging with only selected features. Its performance was found better on 70:30 data partition and the type II error, which is very significant in risk evaluation was also reduced significantly. The improved performance of FS‐HB is attributed to the important features used for developing the classifier thereby reducing the complexity of the algorithm and the use of ensemble methodology, which added to the classical bias variance trade‐off and performed better than standalone classifiers.  相似文献   

13.
Financial distress prediction (FDP) is of great importance to both inner and outside parts of companies. Though lots of literatures have given comprehensive analysis on single classifier FDP method, ensemble method for FDP just emerged in recent years and needs to be further studied. Support vector machine (SVM) shows promising performance in FDP when compared with other single classifier methods. The contribution of this paper is to propose a new FDP method based on SVM ensemble, whose candidate single classifiers are trained by SVM algorithms with different kernel functions on different feature subsets of one initial dataset. SVM kernels such as linear, polynomial, RBF and sigmoid, and the filter feature selection/extraction methods of stepwise multi discriminant analysis (MDA), stepwise logistic regression (logit), and principal component analysis (PCA) are applied. The algorithm for selecting SVM ensemble's base classifiers from candidate ones is designed by considering both individual performance and diversity analysis. Weighted majority voting based on base classifiers’ cross validation accuracy on training dataset is used as the combination mechanism. Experimental results indicate that SVM ensemble is significantly superior to individual SVM classifier when the number of base classifiers in SVM ensemble is properly set. Besides, it also shows that RBF SVM based on features selected by stepwise MDA is a good choice for FDP when individual SVM classifier is applied.  相似文献   

14.
Predicting future stock index price movement has always been a fascinating research area both for the investors who wish to yield a profit by trading stocks and for the researchers who attempt to expose the buried information from the complex stock market time series data. This prediction problem can be addressed as a binary classification problem with two class labels, one for the increasing movement and other for the decreasing movement. In literature, a wide range of classifiers has been tested for this application. As the performance of individual classifier varies for a diverse dataset with respect to different performance measures, it is impractical to acknowledge a specific classifier to be the best one. Hence, designing an efficient classifier ensemble instead of an individual classifier is fetching increasing attention from many researchers. Again selection of base classifiers and deciding their preferences in ensemble with respect to a variety of performance criteria can be considered as a Multi Criteria Decision Making (MCDM) problem. In this paper, an integrated TOPSIS Crow Search based weighted voting classifier ensemble is proposed for stock index price movement prediction. Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS), one of the popular MCDM techniques, is suggested for ranking and selecting a set of base classifiers for the ensemble whereas the weights of the classifiers used in the ensemble are tuned by the Crow Search method. The proposed ensemble model is validated for prediction of stock index price over the historical prices of BSE SENSEX, S&P500 and NIFTY 50 stock indices. The model has shown better performance compared to individual classifiers and other ensemble models such as majority voting, weighted voting, differential evolution and particle swarm optimization based classifier ensemble.  相似文献   

15.
Banks provide a financial intermediary service by channeling funds efficiently between borrowers and lenders. Bank lending is subject to credit risk when loans are not paid back on a timely basis or are in default. The ability or possessing a methodology to evaluate the creditworthiness of a borrower is therefore crucial to managing the bank’s risk management and profitability.The aim of the paper is dichotomous classification of the individual borrowers to the groups of creditworthy or non-creditworthy clients. The recognition of borrowers is provided applying single and aggregated classification trees.Classification trees are a powerful alternative to the more traditional statistical models. This model has the advantage of being able to detect non-linear relationships and showing a good performance in presence of qualitative information as it happens in the creditworthiness evaluation of individual borrowers. As a result, they are widely used as base classifiers for ensemble methods.Aggregated classification trees are constructed employing two ensemble methods: Adaboost and bagging. AdaBoost constructs its base classifiers in sequence, updating a distribution over the training examples to create each base classifier. Bagging combines the individual classifiers built in bootstrap replicates of the training set.The research is conducted employing actual data regarding the individual borrowers that got a mortgage credit in one of the commercial banks that operate in Poland. Each of the clients is described by 11 variables. The grouping variable informs if the client pays off the credit regularly due to the credit agreement or he is back in loan redemption. Diagnostic variables describe the clients in terms of demographic features and characterize the credits that are to be paid back (i.e. value and currency of the credit, credit rate, etc.).  相似文献   

16.
The problem addressed in this study concerns mining data streams with concept drift. The goal of the article is to propose and validate a new approach to mining data streams with concept-drift using the ensemble classifier constructed from the one-class base classifiers. It is assumed that base classifiers of the proposed ensemble are induced from incoming chunks of the data stream. Each chunk consists of prototypes and information about whether the class prediction of these instances, carried-out at earlier steps, has been correct. Each data chunk can be updated by using the instance selection technique when new data arrive. When a new data chunk is formed, the ensemble model is also updated on the basis of weights assigned to each one-class classifier. In this article, two well-known instance-based learning algorithms—the CNN and the ENN—have been adopted to solve the one-class classification problems and, consequently, update the proposed classifier ensemble. The proposed approaches have been validated experimentally, and the computational experiment results are shown and discussed. The experiment results prove that the proposed approach using the ensemble classifier constructed from the one-class base classifiers with instance selection for chunk updating can outperform well-known approaches for data streams with concept drift.  相似文献   

17.
主要目的是寻找到一种Bagging的快速修剪方法,以缩小算法占用的存储空间、提高运算的速度和实现提高分类精度的潜力.传统的选择性集成方法研究的重点是基学习器之间的差异化,从同质化的角度采研究这一问题,提出了一种全新的选择性集成思路.通过选择基学习器集合中的最差者来对Bagging集成进行快速层次修剪,获得了一种学习速度接近Bagging性能在其基础上得到提高的新算法.新算法的训练时间明显小于GASEN而性能与其相近.该算法同时还保留了与Bagging相同的并行处理能力.  相似文献   

18.
传统的雷电数据预测方法往往采用单一最优机器学习算法,较少考虑气象数据的时空变化等现象。针对该现象,提出一种基于集成策略的多机器学习短时雷电预报算法。首先,对气象数据进行属性约简,降低数据维度;其次,在数据集上训练多种异构机器学习分类器,并基于预测质量筛选最优基分类器;最后,通过对最优基分类器训练权重,并结合集成策略产生最终分类器。实验表明,该方法优于传统单最优方法,其平均预测准确率提高了9.5%。  相似文献   

19.
基于动态加权的粗糙子空间集成   总被引:1,自引:0,他引:1       下载免费PDF全文
提出一种基于动态加权的粗糙子空间集成方法EROS-DW。利用粗糙集属性约简方法获得多个特征约简子集,并据此训练基分类器。在分类阶段,根据给定待测样本的具体特征动态地为每个基分类器指派相应的权重,采用加权投票组合规则集成各分类器的输出结果。利用UCI标准数据集对该方法的性能进行测试。实验结果表明,相较于经典的集成方法,EROS-DW方法可以获得更高的分类准确率。  相似文献   

20.
Due to the wide variety of fusion techniques available for combining multiple classifiers into a more accurate classifier, a number of good studies have been devoted to determining in what situations some fusion methods should be preferred over other ones. However, the sample size behavior of the various fusion methods has hitherto received little attention in the literature of multiple classifier systems. The main contribution of this paper is thus to investigate the effect of training sample size on their relative performance and to gain more insight into the conditions for the superiority of some combination rules.A large experiment is conducted to study the performance of some fixed and trainable combination rules for executing one- and two-level classifier fusion for different training sample sizes. The experimental results yield the following conclusions: when implementing one-level fusion to combine homogeneous or heterogeneous base classifiers, fixed rules outperform trainable ones in nearly all cases, with only one exception of merging heterogeneous classifiers for large sample size. Moreover, the best classification for any considered sample size is generally achieved by a second level of combination (namely, utilizing one fusion rule to further combine a set of ensemble classifiers with each of them constructed by fusing base classifiers). Under these circumstances, it seems that adopting different types of fusion rules (fixed or trainable) as the combiners for two levels of fusion is appropriate.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号