期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Credit scoring with a data mining approach based on support vector machines

《Expert systems with applications》2008,34(4):847-856

The credit card industry has been growing rapidly recently, and thus huge numbers of consumers’ credit data are collected by the credit department of the bank. The credit scoring manager often evaluates the consumer’s credit with intuitive experience. However, with the support of the credit classification model, the manager can accurately evaluate the applicant’s credit score. Support Vector Machine (SVM) classification is currently an active research area and successfully solves classification problems in many domains. This study used three strategies to construct the hybrid SVM-based credit scoring models to evaluate the applicant’s credit score from the applicant’s input features. Two credit datasets in UCI database are selected as the experimental data to demonstrate the accuracy of the SVM classifier. Compared with neural networks, genetic programming, and decision tree classifiers, the SVM classifier achieved an identical classificatory accuracy with relatively few input features. Additionally, combining genetic algorithms with SVM classifier, the proposed hybrid GA-SVM strategy can simultaneously perform feature selection task and model parameters optimization. Experimental results show that SVM is a promising addition to the existing data mining methods. 相似文献

2.

A novel hybrid credit scoring model based on ensemble feature selection and multilayer ensemble classification

Diwakar Tripathi Damodar Reddy Edla Ramalingaswamy Cheruku Venkatanareshbabu Kuppili 《Computational Intelligence》2019,35(2):371-394

Credit scoring focuses on the development of empirical models to support the financial decision‐making processes of financial institutions and credit industries. It makes use of applicants' historical data and statistical or machine learning techniques to assess the risk associated with an applicant. However, the historical data may consist of redundant and noisy features that affect the performance of credit scoring models. The main focus of this paper is to develop a hybrid model, combining feature selection and a multilayer ensemble classifier framework, to improve the predictive performance of credit scoring. The proposed hybrid credit scoring model is modeled in three phases. The initial phase constitutes preprocessing and assigns ranks and weights to classifiers. In the next phase, the ensemble feature selection approach is applied to the preprocessed dataset. Finally, in the last phase, the dataset with the selected features is used in a multilayer ensemble classifier framework. In addition, a classifier placement algorithm based on the Choquet integral value is designed, as the classifier placement affects the predictive performance of the ensemble framework. The proposed hybrid credit scoring model is validated on real‐world credit scoring datasets, namely, Australian, Japanese, German‐categorical, and German‐numerical datasets. 相似文献

3.

Rule extraction algorithm from support vector machines and its application to credit screening

Chao-Ton?Su Email author Yan-Cheng?Chen 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2012,16(4):645-658

Developing rule extraction algorithms from machine learning techniques such as artificial neural networks and support vector machines (SVMs), which are considered incomprehensible black-box models, is an important topic in current research. This study proposes a rule extraction algorithm from SVMs that uses a kernel-based clustering algorithm to integrate all support vectors and genetic algorithms into extracted rule sets. This study uses measurements of accuracy, sensitivity, specificity, coverage, fidelity and comprehensibility to evaluate the performance of the proposed method on the public credit screening data sets. Results indicate that the proposed method performs better than other rule extraction algorithms. Thus, the proposed algorithm is an essential analysis tool that can be effectively used in data mining fields. 相似文献

4.

Neighborhood rough set and SVM based hybrid credit scoring classifier

Yao Ping Lu Yongheng 《Expert systems with applications》2011,38(9):11300-11304

The credit scoring model development has become a very important issue, as the credit industry is highly competitive. Therefore, considerable credit scoring models have been widely studied in the areas of statistics to improve the accuracy of credit scoring during the past few years. This study constructs a hybrid SVM-based credit scoring models to evaluate the applicant’s credit score according to the applicant’s input features: (1) using neighborhood rough set to select input features; (2) using grid search to optimize RBF kernel parameters; (3) using the hybrid optimal input features and model parameters to solve the credit scoring problem with 10-fold cross validation; (4) comparing the accuracy of the proposed method with other methods. Experiment results demonstrate that the neighborhood rough set and SVM based hybrid classifier has the best credit scoring capability compared with other hybrid classifiers. It also outperforms linear discriminant analysis, logistic regression and neural networks. 相似文献

5.

基于模糊双支持向量机的遥感图像分类研究

丁胜锋孙劲光陈东莉姜晓林《遥感技术与应用》2012,27(3):353-358

遥感图像的分类是研究土地利用变化的基础。传统的遥感图像分类方法存在运算速度慢、精度比较低和难以收敛等问题。提出了一种基于模糊双支持向量机的多类分类方法,将模糊技术引入到双支持向量机中,赋予不同样本以不同的模糊隶属度,然后将模糊双支持向量机推广到多类分类中,最后将新方法应用到遥感图像分类中。实验表明,新方法比传统的支持向量机多类分类方法有较高的分类精度,并且有较强的抗噪声能力,在运行时间上也是可行的。模糊双支持向量机是一种有效的遥感图像分类方法。相似文献

6.

Computational time reduction for credit scoring: An integrated approach based on support vector machine and stratified sampling method

Akhil Bandhu HensManoj Kumar Tiwari 《Expert systems with applications》2012,39(8):6774-6781

With the rapid growth of credit industry, credit scoring model has a great significance to issue a credit card to the applicant with a minimum risk. So credit scoring is very important in financial firm like bans etc. With the previous data, a model is established. From that model is decision is taken whether he will be granted for issuing loans, credit cards or he will be rejected. There are several methodologies to construct credit scoring model i.e. neural network model, statistical classification techniques, genetic programming, support vector model etc. Computational time for running a model has a great importance in the 21st century. The algorithms or models with less computational time are more efficient and thus gives more profit to the banks or firms. In this study, we proposed a new strategy to reduce the computational time for credit scoring. In this approach we have used SVM incorporated with the concept of reduction of features using F score and taking a sample instead of taking the whole dataset to create the credit scoring model. We run our method two real dataset to see the performance of the new method. We have compared the result of the new method with the result obtained from other well known method. It is shown that new method for credit scoring model is very much competitive to other method in the view of its accuracy as well as new method has a less computational time than the other methods. 相似文献

7.

A data driven ensemble classifier for credit scoring analysis 总被引：2，自引：0，他引：2

Nan-Chen Hsieh Lun-Ping Hung 《Expert systems with applications》2010,37(1):534-545

This study focuses on predicting whether a credit applicant can be categorized as good, bad or borderline from information initially supplied. This is essentially a classification task for credit scoring. Given its importance, many researchers have recently worked on an ensemble of classifiers. However, to the best of our knowledge, unrepresentative samples drastically reduce the accuracy of the deployment classifier. Few have attempted to preprocess the input samples into more homogeneous cluster groups and then fit the ensemble classifier accordingly. For this reason, we introduce the concept of class-wise classification as a preprocessing step in order to obtain an efficient ensemble classifier. This strategy would work better than a direct ensemble of classifiers without the preprocessing step. The proposed ensemble classifier is constructed by incorporating several data mining techniques, mainly involving optimal associate binning to discretize continuous values; neural network, support vector machine, and Bayesian network are used to augment the ensemble classifier. In particular, the Markov blanket concept of Bayesian network allows for a natural form of feature selection, which provides a basis for mining association rules. The learned knowledge is represented in multiple forms, including causal diagram and constrained association rules. The data driven nature of the proposed system distinguishes it from existing hybrid/ensemble credit scoring systems. 相似文献

8.

小波分析和支持向量机相融合的语音端点检测算法 总被引：1，自引：0，他引：1

朱恒军于泓博王发智《计算机科学》2012,39(6):244-246

为了提高语音端点检测的适应性和鲁棒性,提出一种基于小波分析和支持向量机的语音端点检测算法。首先利用小波变换提取语音信号的特征量,然后将这些特征量作为支持向量机的输入进行训练和建模,最后判断出该信号的类别。仿真实验表明,相对于传统的语音端点检测算法,小波分析和支持向量机的检测算法提高了语音端点检测的正确率,有效降低了虚检率和漏检率,具有更好的适应性和鲁棒性,对不同信噪比的信号都有较好的检测能力。相似文献

9.

A computational experiment on deducing phase diagrams from spatial thermodynamic data using machine learning techniques

《Calphad》2021

Derivation and discovery of physical dynamics inherent in big data is one of the most major purposes of machine learning (ML) in the field of modern natural science. In the materials science, phase diagrams are often called as “road maps” to perfectly understand the conditions for phase formation and/or transformation in any material system caused by the associated thermodynamics. In this paper, we report a numerical experiment investigating whether the underlying thermodynamics can be derived from the big data constructed of local spatial composition and phase distribution data along with the help of ML. The artificial data analysed have been created assuming a steel composition based on the calculation phase diagram (CALPHAD) thermodynamics combined with the order-statistics-based sampling model. The hypothetical procedures of data acquisition assumed in this numerical experiment are as follows; (i) obtaining local analysis data on the composition and phase distribution in the same observation area using instruments such as electron probe micro analyser (EPMA) and electron backscattering diffraction (EBSD), and (ii) training the classification model based on a ML algorithm with compositional data as input and the phase data as output. The accuracies of the reconstructed phase diagrams have been estimated for three ML algorithms, i.e. support vector machine (SVM), random forest, and multilayer perceptron (MLP). The phase diagrams predicted using SVM and MLP are found to be adequately consistent with those of the CALPHAD method. We have also investigated the regression performance of the continuous data involved in the CALPHAD thermodynamics, such as the phase fractions of body-centred cubic, face-centred cubic, and cementite phases. Compared with the ML algorithms, the CALPHAD method is found to show superior predictive performance since it is based on the sophisticated physical model. 相似文献

10.

基于改进的排序学习的图片检索算法研究

谭光兴刘臻晖《计算机科学》2015,42(12):275-277, 306

图片检索是图片共享社会网络中的重要研究内容之一。传统的图片检索方法往往通过对用户输入的关键字和图片的文本描述加以匹配来进行图片检索。由于文本信息存在歧义性,图片的文本描述十分困难,因此检索结果的准确性低。为了提高图片检索的准确性,提出了基于排序学习的图片检索方法。将每幅图片通过多种特征描述符进行描述,当用户的输入为图片时,通过对比查询图片和图片库中图片的相似性进行图片检索。采用支持向量机和关联规则两种学习方法对特征描述符的权重组合进行学习,并提出了相应的学习算法。实验表明,提出的基于学习的图片检索方法与相关图片检索方法相比具有更高的准确性。此外,应用支持向量机和关联规则两种方法对分类函数进行学习时,由于两种算法通过相同的数据实例对图片描述符的权重进行学习,因此得到的结果是相关的。相似文献

11.

A Universal Image Steganalysis System Based On Double Sparse Representation Classification (DSRC)

Arash Jalali Hassan Farsi Shahrokh Ghaemmaghami 《Multimedia Tools and Applications》2018,77(13):16347-16366

Achieving high rates of detection in low rates of embedding is still a challenging problem in many steganalysis systems. The newly proposed steganalysis system based on sparse representation classifier has shown remarkable detection rates in low embedding rate. In this paper, we propose a new steganalysis system based on double sparse representation classifier. We compare our proposed method with other steganalysis systems which use different classifier (including nearest neighbor, support vector machine, ensemble support vector machine and sparse representation). In all of our experiments, input features to the classifier are fixed and the ability of classifier is examined. Also we provide a complexity analysis in terms of execution time for different classifier. In most of experiments, our proposed method shows superior performance in terms of detection rate and complexity for low embedding rates. 相似文献

12.

A new fuzzy support vector machine to evaluate credit risk 总被引：7，自引：0，他引：7

Yongqiao Wang Shouyang Wang Lai K.K. 《Fuzzy Systems, IEEE Transactions on》2005,13(6):820-831

Due to recent financial crises and regulatory concerns, financial intermediaries' credit risk assessment is an area of renewed interest in both the academic world and the business community. In this paper, we propose a new fuzzy support vector machine to discriminate good creditors from bad ones. Because in credit scoring areas we usually cannot label one customer as absolutely good who is sure to repay in time, or absolutely bad who will default certainly, our new fuzzy support vector machine treats every sample as both positive and negative classes, but with different memberships. By this way we expect the new fuzzy support vector machine to have more generalization ability, while preserving the merit of insensitive to outliers, as the fuzzy support vector machine (SVM) proposed in previous papers. We reformulate this kind of two-group classification problem into a quadratic programming problem. Empirical tests on three public datasets show that it can have better discriminatory power than the standard support vector machine and the fuzzy support vector machine if appropriate kernel and membership generation method are chosen. 相似文献

13.

Orthogonal support vector machine for credit scoring

Lu Han Liyan Han Hongwei Zhao 《Engineering Applications of Artificial Intelligence》2013,26(2):848-862

The most commonly used techniques for credit scoring is logistic regression, and more recent research has proposed that the support vector machine is a more effective method. However, both logistic regression and support vector machine suffers from curse of dimension. In this paper, we introduce a new way to address this problem which is defined as orthogonal dimension reduction. We discuss the related properties of this method in detail and test it against other common statistical approaches—principal component analysis and hybridizing logistic regression to better solve and evaluate the data. With experiments on German data set, there is also an interesting phenomenon with respect to the use of support vector machine, which we define as ‘Dimensional interference’, and discuss in general. Based on the results of cross-validation, it can be found that through the use of logistic regression filtering the dummy variables and orthogonal extracting feature, the support vector machine not only reduces complexity and accelerates convergence, but also achieves better performance. 相似文献

14.

Technology credit scoring model with fuzzy logistic regression

《Applied Soft Computing》2016

Technology credit scoring models have been used to screen loan applicant firms based on their technology. Typically a logistic regression model is employed to relate the probability of a loan default of the firms with several evaluation attributes associated with technology. However, these attributes are evaluated in linguistic expressions represented by fuzzy number. Besides, the possibility of loan default can be described in verbal terms as well. To handle these fuzzy input and output data, we proposed a fuzzy credit scoring model that can be applied to predict the default possibility of loan for a firm that is approved based on its technology. The method of fuzzy logistic regression as an appropriate prediction approach for credit scoring with fuzzy input and output was presented in this study. The performance of the model is improved compared to that of typical logistic regression. This study is expected to contribute to practical utilization of the technology credit scoring with linguistic evaluation attributes. 相似文献

15.

Mining the customer credit using hybrid support vector machine technique

Weimin Chen Chaoqun Ma Lin Ma 《Expert systems with applications》2009,36(4):7611-7616

Credit scoring has become a critical and challenging management science issue, as the credit industry has been facing fiercer competition in recent years. Many methods have been suggested to tackle this problem in the literature. In this paper, we proposed hybrid support vector machine technique based on three strategies: (1) using CART to select input features, (2) using MARS to select input features, (3) using grid search to optimize model parameters. In order to verify the feasibility and effectiveness of the proposed hybrid SVM model, one credit card dataset provided by a local bank in China is used in this study. Analytic results demonstrate that the hybrid SVM technique not only has the best classification rate, but also has the lowest Type II error in comparison with CART, MARS and SVM and justify the presumptions that SVM having better capability of capturing nonlinear relationship among variables. 相似文献

16.

Development of a quick credibility scoring decision support system using fuzzy TOPSIS 总被引：1，自引：0，他引：1

Yusuf Tansel &#x; Mustafa Yurdakul 《Expert systems with applications》2010,37(1):12462

In this study, a quick credibility scoring decision support system is developed for the banks to determine the credibility of manufacturing firms in Turkey. The proposed decision support system is expected to be used by the banks when they want to determine whether an applicant firm is worth a detailed credit check or not. Using such a quick credit scoring decision model reduces the banks’ workload. The proposed credit scoring model is based on the financial ratios and fuzzy TOPSIS approach. It obtains two separate scores which reflect the attractiveness of manufacturing industries within the overall economy and manufacturing firms’ performance with respect to its competitors belonging to the same industry. These two scores are then used to determine the credibility of applicant manufacturing firms. The developed decision support system is tested with various real cases and satisfactory results are obtained. An application is also provided in the paper for illustrative purposes. 相似文献

17.

基于遗传算法的模糊支持向量网络控制 总被引：3，自引：0，他引：3

袁小芳王耀南孙炜《信息与控制》2005,34(2):205-208

将模糊控制与支持向量网络相结合，设计了一种模糊支持向量网络控制器．该控制器融合了模糊控制与支持向量网络的优点，具有不依赖被控对象模型、泛化能力强等特点．利用遗传算法来优化支持向量机参数和控制器比例因子参数，以期实现最优的控制性能. 仿真结果表明了控制系统具有优良的控制性能．相似文献

18.

Gaussian kernel-based fuzzy inference systems for high dimensional regression

Qianfeng CaiAuthor Vitae Zhifeng Hao^{Author Vitae} 《Neurocomputing》2012,77(1):197-204

We propose a novel architecture for a higher order fuzzy inference system (FIS) and develop a learning algorithm to build the FIS. The consequent part of the proposed FIS is expressed as a nonlinear combination of the input variables, which can be obtained by introducing an implicit mapping from the input space to a high dimensional feature space. The proposed learning algorithm consists of two phases. In the first phase, the antecedent fuzzy sets are estimated by the kernel-based fuzzy c-means clustering. In the second phase, the consequent parameters are identified by support vector machine whose kernel function is constructed by fuzzy membership functions and the Gaussian kernel. The performance of the proposed model is verified through several numerical examples generally used in fuzzy modeling. Comparative analysis shows that, compared with the zero-order fuzzy model, first-order fuzzy model, and polynomial fuzzy model, the proposed model exhibits higher accuracy, better generalization performance, and satisfactory robustness. 相似文献

19.

统计流形扩散核的文本分类方法

李侃周世斌刘玉树《模式识别与人工智能》2012,25(2):339-345

提出Dirichlet混合多项式(DCM)流形,并利用DCM流形可与正半球流形建立同胚和等距关系的性质,通过拉回映射将正半球流形的测地距离映射为DCM流形的测地距离,从而在DCM流形上建立距离度量,构建统计流形上的Dirichlet混合多项式扩散核和Dirichlet混合多项式倒排文档频率(DCMIDF)扩散核。利用WebKBTop4和20Newsgroups语料库上进行实验,DCM流形能比欧氏空间更能准确地描述文本。与多项式核支持向量机算法、,负测地距离核支持向量机算法相比,实验结果显示文中基于DCM扩散核和DCMIDF扩散核的支持向量机算法可取得良好的文本分类效果。相似文献

20.

支持向量机应用于大气污染物浓度预测

陈俏曹根牛陈柳《微机发展》2010,(1):250-252,F0003

支持向量机是基于统计学习理论的新一代机器学习技术,其非线性回归预测性能优越于传统统计方法。提出了一种大气污染物浓度预测模型,该方法将支持向量机应用于大气污染物浓度预测,首先对各类影响因子进行分析并进行建模预测;而后利用主成分分析的方法对输入因子降维,从而形成支持向量机的训练样本集;在此基础上建立了基于RBF核函数支持向量回归法的大气污染预模型。大气污染预测实例表明,该方法具有泛化能力强、预测精度高、训练速度快、稳定性好、便于建模等优点,有良好的应用前景。相似文献