首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 78 毫秒
1.
针对现实信用评分业务中样本类别不平衡和代价敏感问题,以及金融机构更期望以得分的方式直观地认识贷款申请人的信用风险的实际需求,提出一种基于Ext-GBDT集成的类别不平衡信用评分模型。使用欠采样的方法从"好"客户(大类)中随机采样多份与全部"坏"客户(小类)等量的样本,分别与全部小类构成训练子集;用不同的训练子集及特征采样和参数扰动的方法训练得到多个差异化的Ext-GBDT子模型;然后使用简单平均法整合子模型的预测概率;最后将信用概率转换为信用评分。在UCI德国信用数据集上,以AUC和代价敏感错误率作为评价指标,与决策树、逻辑回归、朴素贝叶斯、支持向量机、随机森林及其集成模型等当前最为常用的信用评分模型进行对比,验证了该模型的有效性。  相似文献   

2.
向欣  陆歌皓 《计算机应用研究》2021,38(12):3604-3610
针对现实信用评估业务中样本类别不平衡和代价敏感的情况,为降低信用风险评估的误分类损失,提出一种基于DESMID-AD动态选择的信用评估集成模型,根据每一个测试样本的特点动态地选择合适的基分类器对其进行信用预测.为提高模型对信用差客户(小类)的识别能力,在基分类器训练前使用过采样的方法对训练数据作类别平衡,采用元学习的方式基于多个指标进行基分类器的性能评估并在此阶段设计权重机制增强小类的影响.在三个公开信用评估数据集上,以AUC、一型、二型错误率以及误分类代价作为评价指标,与九种信用评估常用模型做比较,证明了该方法在信用评估领域的有效性和可行性.  相似文献   

3.
韩芳  孙立民 《福建电脑》2014,(12):16-18
支持向量机在分类平衡样本集时的分类效果非常好,但是对不平衡样本集的分类效果并不理想。仔细分析样本集不平衡的原因,一是数量上的不平衡,二是样本点的空间重合。本文综合考虑数量和空间重合度这两点提出了改良式欠采样算法,降低样本集空间重合度和数量上的不平衡。通过仿真结果可以看出,本文的算法对不平衡样本集的分类效果较好。  相似文献   

4.
随机欠采样方法忽略潜在有用的大类样本信息,在面对多类分类问题时更为突出.文中提出多类类别不平衡学习算法:EasyEnsemble.M.该算法通过多次针对大类样本随机采样,充分利用被随机欠采样方法忽略的潜在有用的大类样本,学习多个子分类器,利用混合的集成技术最终得到性能较优的强分类器.实验结果表明,与常用的多类类别不平衡学习算法相比,EasyEnsemble.M可有效提高分类器的G-mean值.  相似文献   

5.
在现实问题中,相似性学习的样本对存在不平衡现象,即相似性样本对的数量会远小于不相似性样本对的数量.针对此问题,文中提出两种样本对构造方法——不相似K近邻-相似K近邻(DKNN-SKNN)和不相似K近邻-相似K远邻(DKNN-SKFN).运用这两种方法可有针对性地选择相似性学习样本对,不仅可加快支持向量机的训练过程,而且在一定程度上解决样本对之间的不平衡问题.在多个数据集上进行文中方法和经典的重采样方法的对比实验,结果表明DKNN-SKNN和DKNN-SKFN具有良好性能.  相似文献   

6.
《信息与电脑》2021,(1):45-49
采用传统过采样算法会导致忽略边界样本重要信息、新样本高相似度等问题,本文针对这一问题提出了一种新型的DB-BMCSMOTE方法。首先,该算法用DBSCAN聚类法对少数类聚类,识别并去除噪音后对标签中存在的边界少数样本依概率进行标记。其次,对聚类生成的每一样本簇生成密度函数,计算其密度及采样权重,将各簇中依概率标记的少数样本与较远样本间的中点进行过采样,以提升模型的准确率。实验结果表明,该算法相比其他算法平均提升3.8%,最大为5.92%,并有效应用于信用评价。  相似文献   

7.
一种用于不平衡数据分类的改进AdaBoost算法   总被引:3,自引:1,他引:3  
真实世界中存在大量的类别不平衡分类问题,传统的机器学习算法如AdaBoost算法,关注的是分类器的整体性能,而没有给予小类更多的关注。因此针对类别不平衡学习算法的研究是机器学习的一个重要方向。AsymBoost作为AdaBoost的一种改进算法,用于类别不平衡学习时,牺牲大类样本的识别精度来提高小类样本的分类性能。AsymBoost算法依然可能遭遇样本权重过大造成的过适应问题。据此提出了一种新型的AdaBoost改进算法。该方法通过对大类中分类困难样本的权重和标签进行处理,使分类器能够同时获得较好的查准率和查全率。实验结果表明,该方法可以有效提高在不平衡数据集上的分类性能。  相似文献   

8.
互联网金融中的网络贷款用户数据具有类别不平衡的特性,严重影响传统分类器的性能。随机平衡采样算法在对原始数据集进行重采样的过程中,将所有样本同等考虑,本文在平衡采样的过程中充分考虑样本点的性能,将其分为3类样本:安全的、边界的、噪声的,针对不同类型的样本采用相应的采样方法,得到平衡的新数据集,然后对该数据集进行Bagging集成,提高算法的泛化性能,结果表明本文改进的随机平衡采样(Improved Random Balanced Sampling, IRBS)Bagging算法可以较好地对网络贷款用户进行分类。  相似文献   

9.
蒋华  江日辰  王鑫  王慧娇 《计算机仿真》2020,37(3):254-258,420
传统支持向量机(SVM)对不平衡数据进行二分类时,存在分类边界容易偏移的问题。目前,对于不平衡数据问题主要从数据集和算法两方面来解决。提出了一种基于数据集方法是采用ADASYN和SMOTE算法来联合生成小类样本点。上述方法是根据K近邻算法计算小类样本点和大类样本点数目,对小样本点进行分类后分别采用ADASYN和SMOTE算法进行小类样本点合成。最后实验对算法验证,结果采用ROC曲线来比较单独采用SMOTE或者ADASYN算法合成小类样本点,文中介绍的算法具有最高AUC值,由此可见提出的算法可以提高不平衡数据分类的有效性。  相似文献   

10.
11.
Credit scoring focuses on the development of empirical models to support the financial decision‐making processes of financial institutions and credit industries. It makes use of applicants' historical data and statistical or machine learning techniques to assess the risk associated with an applicant. However, the historical data may consist of redundant and noisy features that affect the performance of credit scoring models. The main focus of this paper is to develop a hybrid model, combining feature selection and a multilayer ensemble classifier framework, to improve the predictive performance of credit scoring. The proposed hybrid credit scoring model is modeled in three phases. The initial phase constitutes preprocessing and assigns ranks and weights to classifiers. In the next phase, the ensemble feature selection approach is applied to the preprocessed dataset. Finally, in the last phase, the dataset with the selected features is used in a multilayer ensemble classifier framework. In addition, a classifier placement algorithm based on the Choquet integral value is designed, as the classifier placement affects the predictive performance of the ensemble framework. The proposed hybrid credit scoring model is validated on real‐world credit scoring datasets, namely, Australian, Japanese, German‐categorical, and German‐numerical datasets.  相似文献   

12.
Although microfinance organizations play an important role in developing economies, decision support models for microfinance credit scoring have not been sufficiently covered in the literature, particularly for microcredit enterprises. The aim of this paper is to create a three‐class model that can improve credit risk assessment in the microfinance context. The real‐world microcredit data set used in this study includes data from retail, micro, and small enterprises. To the best of the authors' knowledge, existing research on microfinance credit scoring has been limited to regression and genetic algorithms, thereby excluding novel machine learning algorithms. The aim of this research is to close this gap. The proposed models predict default events by analysing different ensemble classification methods that empower the effects of the synthetic minority oversampling technique (SMOTE) used in the preprocessing of the imbalanced microcredit data set. Initial results have shown improvement in the prediction results for certain classes when the oversampling technique with homogeneous and heterogeneous ensemble classifier methods was applied. A prediction improvement for all classes was achieved via application of SMOTE and the Consolidated Trees Construction algorithm together with Rotation Forest. To obtain a complete view of all aspects, an additional set of metrics is used in the evaluation of performance.  相似文献   

13.
Recently, credit scoring has become a very important task as credit cards are now widely used by customers. A method that can accurately predict credit scoring is greatly needed and good prediction techniques can help to predict credit more accurately. One powerful classifier, the support vector machine (SVM), was successfully applied to a wide range of domains. In recent years, researchers have applied the SVM-based in the prediction of credit scoring, and the results have been shown it to be effective. In this study, two real world credit datasets in the University of California Irvine Machine Learning Repository were selected. SVM and a new classifier, clustering-launched classification (CLC), were employed to predict the accuracy of credit scoring. The advantages of using CLC are that it can classify data efficiently and only need one parameter needs to be decided. In substance, the results show that CLC is better than SVM. Therefore, CLC is an effective tool to predict credit scoring.  相似文献   

14.
Credit scoring model is an important tool for assessing risks in financial industry, consequently the majority of financial institutions actively develops credit scoring model on the credit approval assessment of new customers and the credit risk management of existing customers. Nonetheless, most past researches used the one-dimensional credit scoring model to measure customer risk. In this study, we select important variables by genetic algorithm (GA) to combine the bank’s internal behavioral scoring model with the external credit bureau scoring model to construct the dual scoring model for credit risk management of mortgage accounts. It undergoes more accurate risk judgment and segmentation to further discover the parts which are required to be enhanced in management or control from mortgage portfolio. The results show that the predictive ability of the dual scoring model outperforms both one-dimensional behavioral scoring model and credit bureau scoring model. Moreover, this study proposes credit strategies such as on-lending retaining and collection actions for corresponding customers in order to contribute benefits to the practice of banking credit.  相似文献   

15.
BP算法在信用风险分析中的应用   总被引:6,自引:0,他引:6  
建立了基于BP算法的神经网络信用风险评价模型,用来对我国某国有商业银行2001年80家贷款企业进行两类模式分类.按照企业的财务状况、经营状况以及过往的信用记录分为"信用好"和"信用差"两个小组.对于每一家贷款企业,主要考虑能反映该企业的还款能力、盈利能力、经营效率和资本结构等7个财务比率作为分析变量.对该BP网络分别训练100次、390次和800次.仿真结果表明,当训练800次时,网络达到一定的稳定状态,目标函数值达到最优,分类准确率达到98.75%.此外,还给出了该BP网络的学习算法和步骤.  相似文献   

16.
Nowadays, credit scoring is one of the most important topics in the banking sector. Credit scoring models have been widely used to facilitate the process of credit assessing. In this paper, an application of the locally linear model tree algorithm (LOLIMOT) was experimented to evaluate the superiority of its performance to predict the customer's credit status. The algorithm is improved with an aim of adjustment by credit scoring domain by means of data fusion and feature selection techniques. Two real world credit data sets – Australian and German – from UCI machine learning database were selected to demonstrate the performance of our new classifier. The analytical results indicate that the improved LOLIMOT significantly increase the prediction accuracy.  相似文献   

17.
局部支持向量机是一种用途广泛的分类器,无论在理论研究还是实际应用方面,局部支持向量机都受到越来越多的关注。目前,许多传统的局部支持向量机算法都存在一个问题,即模型中样本比例失衡,导致无法提高分类精度。在加权支持向量机的启发下,提出了将加权思想应用在局部支持向量机Falk-SVM中的WFalk-SVM算法,并通过实验分析验证了WFalk-SVM的可行性及其有效性,最后对WFalk-SVM算法进行分析总结。  相似文献   

18.
基于人耳听觉感知的MFCC较其他说话人特征具有强抗噪性、高识别率特点。考虑美尔滤波器组的结构,其只在低频区具有较高的分辨率,在高频区分辨率却较低,这样势必会遗失一些包含在高频区域的重要信息。利用反美尔域下的特征R-MFCC与MFCC的各自优点,将R-MFCC与MFCC结合,形成优势互补,并给出了衡量各种特征参数识别能力的Fisher准则,结合Fisher准则构造出一种新的混合特征参数。采用支持向量机分别以MFCC、R-MFCC以及新构造的混合特征为参数进行说话人的识别,实验证明基于Fisher准则的优选混合特征作为说话人识别特征是可行的。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号