期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Incorporating feature selection method into support vector regression for stock index forecasting

Wensheng Dai Yuehjen E. Shao Chi-Jie Lu 《Neural computing & applications》2013,23(6):1551-1561

Stock index forecasting is one of the most difficult tasks that financial organizations, firms and private investors have to face. Support vector regression (SVR) has become a popular alternative in stock index forecasting tasks due to its generalization capability in obtaining a unique solution. However, the major limitation of SVR is that it cannot capture the relative importance of independent variables to the dependent variable when many potential independent variables are considered. This study incorporates feature selection method and SVR for building stock index forecasting model. The proposed model uses multivariate adaptive regression splines (MARS), an effective nonlinear and nonparametric regression methodology, to identify important forecasting variables. The obtained significant predictor variables are then served as the inputs for the SVR model. Experimental results reveal that the obtained important variables from MARS can improve the forecasting performance of the SVR models. Moreover, the MARS results provide useful information about the relationship between the selected predictor variables and stock index through the obtained basis functions, important predictor variables and the MARS prediction function. Hence, the proposed stock index forecasting model can generate good forecasting performance and exhibits the capability of identifying significant predictor variables, which provide valuable information for further investment decisions/strategies. 相似文献

2.

Financial distress prediction using support vector machines: Ensemble vs. individual

Jie Sun Hui Li 《Applied Soft Computing》2012,12(8):2254-2265

Financial distress prediction (FDP) is of great importance to both inner and outside parts of companies. Though lots of literatures have given comprehensive analysis on single classifier FDP method, ensemble method for FDP just emerged in recent years and needs to be further studied. Support vector machine (SVM) shows promising performance in FDP when compared with other single classifier methods. The contribution of this paper is to propose a new FDP method based on SVM ensemble, whose candidate single classifiers are trained by SVM algorithms with different kernel functions on different feature subsets of one initial dataset. SVM kernels such as linear, polynomial, RBF and sigmoid, and the filter feature selection/extraction methods of stepwise multi discriminant analysis (MDA), stepwise logistic regression (logit), and principal component analysis (PCA) are applied. The algorithm for selecting SVM ensemble's base classifiers from candidate ones is designed by considering both individual performance and diversity analysis. Weighted majority voting based on base classifiers’ cross validation accuracy on training dataset is used as the combination mechanism. Experimental results indicate that SVM ensemble is significantly superior to individual SVM classifier when the number of base classifiers in SVM ensemble is properly set. Besides, it also shows that RBF SVM based on features selected by stepwise MDA is a good choice for FDP when individual SVM classifier is applied. 相似文献

3.

Predicting business failure using forward ranking-order case-based reasoning

Hui Li Jie Sun 《Expert systems with applications》2011,38(4):3075-3084

With the rapid development of business computing for Chinese listed companies, it is focused on to use case-based reasoning (CBR) in business failure prediction (BFP). Ranking-order case-based reasoning (RCBR) uses ranking-order information among cases to calculate similarity in the framework of k-nearest neighbor. RCBR is sensitive to the choice of features, meaning that optimal features can help it produce better performance. In this research, we attempt to use wrapper approach to find the optimal feature subset for RCBR in BFP. Forward feature selection method and RCBR are combined to construct a new method, namely forward RCBR (FRCBR). The combination is implemented by combining forward feature selection with RCBR as a wrapper module. Hold out method is used to assessing the performance of the classifier. Empirical data were collected from Chinese listed companies in the Shenzhen Stock Exchange and Shanghai Stock Exchange. We employed the standalone RCBR, the classical CBR with Euclidean metric as its heart, the inductive CBR, the two statistical methods of logistic regression and multivariate discriminate analysis (MDA), and support vector machines to make comparisons. For comparative methods, stepwise MDA was employed to select optimal feature subset. Empirical results indicated that FRCBR can produce dominating performance in short-term BFP of Chinese listed companies. 相似文献

4.

Dimensionality reduction for knowledge discovery in medical claims database: application to antidepressant medication utilization study

Huang SH Wulsin LR Li H Guo J 《Computer methods and programs in biomedicine》2009,93(2):115-123

Data mining, through its capacity to discover knowledge embedded in large databases to improve organizational decision-making, has the potential to contribute to efficiencies and cost savings in the increasingly costly healthcare industry. One important aspect of the methods of mining medical databases includes reducing dimensionality through feature selection. Traditionally feature selection is accomplished through stepwise regression, which tends to produce an unnecessarily high number of "significant" variables. This paper applies a filter-based feature selection method using inconsistency rate measure and discretization, to a medical claims database to predict the adequacy of duration of antidepressant medication utilization. Compared to traditional stepwise logistic regression, which selected seven variables from a total of nine potential explanatory variables to characterize patients with inadequate antidepressant medication utilization, the filter-based method selected two variables (age and number of claims) to achieve a similar prediction accuracy. This comparison suggests it may be feasible and efficient to apply the filter-based feature selection method to reduce the dimensionality of healthcare databases. 相似文献

5.

LTV model in consultant sector. Case study: mental health clinic

Mohammad J. Tarokh Aydin Akbari Sekhavat 《Behaviour & Information Technology》2006,25(5):399-405

Since the early 1980s, customer relationship management (CRM) has been important in the new competitive business environment. Today, due to development of competitive factors in the business, the enterprise's need to create and retain effective relations with customers has been highlighted more and more. With the aim of customer scoring applications, the most profitable customers can be identified. In this paper, we categorized customers by three types of values for the clinic by using logistic regression as a data-mining technique, and calculated the customer defection and future purchase probability in a mental health clinic of the university of Tehran. Model verification and validation (using lift chart) was done and customer segmentation and analysis presented with proper marketing strategies. 相似文献

6.

Logistic回归模型在卫星云图云检测中的应用

下载免费PDF全文

费文龙吕红韦志辉《计算机工程与应用》2012,48(4):18-21

云的自动检测和分类识别是所有卫星遥感资料应用的第一个步骤。基于Logistic回归模型的云图处理方法被用于FY-2C卫星云图的处理。利用逐步回归方法对云图的灰度及纹理特征进行提取,并计算出每个特征的回归系数;利用提取的特征进行云检测实验。将实验结果与地面观测资料进行对比,表明Logistic回归模型对云图处理是有效的,并且与传统的动态阈值分割方法相比,云检测的效果更好。相似文献

7.

LTV model in consultant sector. Case study: mental health clinic

《Behaviour & Information Technology》2012,31(5):399-405

Since the early 1980s, customer relationship management (CRM) has been important in the new competitive business environment. Today, due to development of competitive factors in the business, the enterprise's need to create and retain effective relations with customers has been highlighted more and more. With the aim of customer scoring applications, the most profitable customers can be identified. In this paper, we categorized customers by three types of values for the clinic by using logistic regression as a data-mining technique, and calculated the customer defection and future purchase probability in a mental health clinic of the university of Tehran. Model verification and validation (using lift chart) was done and customer segmentation and analysis presented with proper marketing strategies. 相似文献

8.

基于核主成分分析特征提取的客户流失预测

夏国恩《计算机应用》2008,28(1):149-151

将核主成分分析(KPCA)引入到客户流失预测中,提出了相应的特征提取算法。将KPCA与Logistic回归结合,设计了预测模型。通过对某电信公司客户流失预测的试验结果表明：该方法获得的命中率、覆盖率、准确率和提升系数高于原始属性集和主成分分析(PCA)特征提取法。这表明KPCA能提取客户数据的非线性特征,是研究客户流失预测问题的有效方法。相似文献

9.

A hybrid device for the solution of sampling bias problems in the forecasting of firms’ bankruptcy

Fernando Sánchez-Lasheras Pedro Lorca 《Expert systems with applications》2012,39(8):7512-7523

This paper proposes a new approach to the forecasting of firms’ bankruptcy. Our proposal is a hybrid method in which sound companies are divided in clusters using Self Organized Maps (SOM) and then each cluster is replaced by a director vector which summarizes all of them. Once the companies in clusters have been replaced by director vectors, we estimate a classification model through Multivariate Adaptive Regression Splines (MARS). For the test of the model we considered a real setting of Spanish enterprises from the construction sector. With this procedure we intend to overcome the sampling-bias problems that matched-pairs models often suffer. We estimated two benchmark models: a back propagation neural network and a simple MARS model. Our results show that the proposed hybrid approach is much more accurate than the benchmark techniques for the identification of the bankrupt companies. 相似文献

10.

基于谱回归特征降维的客户流失预测

李国祥蒋怡琳马文斌夏国恩《计算机系统应用》2021,30(9):62-68

针对于大样本数据的客户流失预测,从特征有效表达的角度,提出了一种基于谱回归特征约简的预测模型.模型在原始客户特征基础上,利用基于谱回归的流形降维,建立可区分性的低维特征空间,在此之上采用支持向量机实现客户流失的二分类.通过在网络客户和传统电信客户两种不同数据集上的大样本实验,并与不同分类器、不同特征约简或选择方法的对比,证明了该方法的有效性. 相似文献

11.

基于改进的多元自适应样条回归的全局近似算法

罗小玲薛河儒《微计算机信息》2012,(4):170-171,81

复杂模型的全局近似方法可应用于参数试验、灵敏度分析、实时仿真和设计或控制优化等很多方面。多维模型的全局近似使用的方法通常有PRS(多项式响应面),Kriging(克里格法),RBF(径向基函数),SVR(支持向量回归)和MARS(多元自适应样条回归)。虽然传统的MARS有着不容置疑的优势,但是MARS存在的缺点限制了它的应用范围。论文提出了一个改进的MARS多维全局近似方法:用黄金分割方法优化向量点,提高了采样点的构建速度,从而提高了MARS的近似效率。相似文献

12.

SBFS：基于搜索的软件缺陷预测特征选择框架

陈翔陆凌姣吉人魏世鑫《计算机应用研究》2017,34(4)

软件缺陷预测通过预先识别出被测项目内的潜在缺陷程序模块,有助于合理分配测试资源,并最终提高被测软件产品的质量。但在搜集缺陷预测数据集的时候,由于考虑了大量与代码复杂度或开发过程相关的度量元,造成数据集内存在维数灾难问题。借助基于搜索的软件工程思想,提出一种新颖的基于搜索的包裹式特征选择框架SBFS。该框架在实现时,首先借助SMOTE方法来缓解数据集内存在的类不平衡问题,随后借助基于遗传算法的特征选择方法,基于训练集选出最优特征子集。在实证研究中,以NASA数据集作为评测对象,以基于前向选择策略的包裹式特征选择方法FW、基于后向选择策略的包裹式特征选择BW、不进行特征选择的Origin作为基准方法。最终实证研究结果表明：SBFS方法在90%的情况下,不差于Origin法。在82.3%的情况下,不差于BW法。在69.3%的情况下,不差于FW法。除此之外,我们发现若基于决策树分类器,则应用SMOTE方法后,可以在71%的情况下,提高模型性能。而基于朴素贝叶斯和Logistic回归分类器,则应用SMOTE方法后,仅可以在47%和43%的情况下,提高模型的预测性能。相似文献

13.

Global optimization with multivariate adaptive regression splines.

Scott Crino Donald E Brown 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2007,37(2):333-340

This paper presents a novel procedure for approximating the global optimum in structural design by combining multivariate adaptive regression splines (MARS) with a response surface methodology (RSM). MARS is a flexible regression technique that uses a modified recursive partitioning strategy to simplify high-dimensional problems into smaller yet highly accurate models. Combining MARS and RSM improves the conventional RSM by addressing highly nonlinear high-dimensional problems that can be simplified into lower dimensions, yet maintains a low computational cost and better interpretability when compared to neural networks and generalized additive models. MARS/RSM is also compared to simulated annealing and genetic algorithms in terms of computational efficiency and accuracy. The MARS/RSM procedure is applied to a set of low-dimensional test functions to demonstrate its convergence and limiting properties. 相似文献

14.

Strategies for preventing defection based on the mean time to defection and their implementations on a self-organizing map

Young Ae Kim Hee Seok Song Soung Hie Kim 《Expert Systems》2005,22(5):265-278

Abstract: Customer retention is a critical issue for the survival of any business in today's competitive marketplace. In this paper, we propose a dynamic procedure utilizing self-organizing maps and a Markov process for detecting and preventing customer defection that uses data of past and current customer behavior. The basic concept originates from empirical observations that identified that a customer has a tendency to change behavior (i.e. trim-out usage volumes) before eventual withdrawal and defection. Our explanatory model predicts when potential defectors are likely to withdraw. Two strategies are suggested to respond to the question of where to lead potential defectors for the next stage, based on anticipating when the potential defector will leave. Our model predicts potential defectors with little deterioration of prediction accuracy compared with that of the multilayer perceptron neural network and decision trees. Moreover, it performs reasonably well in a controlled experiment using an online game. 相似文献

15.

A sequential feature extraction approach for naïve bayes classification of microarray data

Liwei Fan Kim-Leng Poh Peng Zhou 《Expert systems with applications》2009,36(6):9919-9923

Accurate classification of microarray data plays a vital role in cancer prediction and diagnosis. Previous studies have demonstrated the usefulness of naïve Bayes classifier in solving various classification problems. In microarray data analysis, however, the conditional independence assumption embedded in the classifier itself and the characteristics of microarray data, e.g. the extremely high dimensionality, may severely affect the classification performance of naïve Bayes classifier. This paper presents a sequential feature extraction approach for naïve Bayes classification of microarray data. The proposed approach consists of feature selection by stepwise regression and feature transformation by class-conditional independent component analysis. Experimental results on five microarray datasets demonstrate the effectiveness of the proposed approach in improving the performance of naïve Bayes classifier in microarray data analysis. 相似文献

16.

Bayesian Treed Models 总被引：1，自引：0，他引：1

Chipman Hugh A. George Edward I. McCulloch Robert E. 《Machine Learning》2002,48(1-3):299-320

When simple parametric models such as linear regression fail to adequately approximate a relationship across an entire set of data, an alternative may be to consider a partition of the data, and then use a separate simple model within each subset of the partition. Such an alternative is provided by a treed model which uses a binary tree to identify such a partition. However, treed models go further than conventional trees (e.g. CART, C4.5) by fitting models rather than a simple mean or proportion within each subset. In this paper, we propose a Bayesian approach for finding and fitting parametric treed models, in particular focusing on Bayesian treed regression. The potential of this approach is illustrated by a cross-validation comparison of predictive performance with neural nets, MARS, and conventional trees on simulated and real data sets. 相似文献

17.

A convex version of multivariate adaptive regression splines

《Computational statistics & data analysis》2015

Multivariate adaptive regression splines (MARS) provide a flexible statistical modeling method that employs forward and backward search algorithms to identify the combination of basis functions that best fits the data and simultaneously conduct variable selection. In optimization, MARS has been used successfully to estimate the unknown functions in stochastic dynamic programming (SDP), stochastic programming, and a Markov decision process, and MARS could be potentially useful in many real world optimization problems where objective (or other) functions need to be estimated from data, such as in surrogate optimization. Many optimization methods depend on convexity, but a non-convex MARS approximation is inherently possible because interaction terms are products of univariate terms. In this paper a convex MARS modeling algorithm is described. In order to ensure MARS convexity, two major modifications are made: (1) coefficients are constrained, such that pairs of basis functions are guaranteed to jointly form convex functions and (2) the form of interaction terms is altered to eliminate the inherent non-convexity. Finally, MARS convexity can be achieved by the fact that the sum of convex functions is convex. Convex-MARS is applied to inventory forecasting SDP problems with four and nine dimensions and to an air quality ground-level ozone problem. 相似文献

18.

Determinant elements of customer relationship management in e-business 总被引：1，自引：0，他引：1

D. Horn R. Feinberg G. Salvendy 《Behaviour & Information Technology》2005,24(2):101-109

This study investigates the composition of customer relationship management (CRM) in e-business by examining the possible elements that determine different aspects of the relationship between customers and e-businesses. A web-based CRM survey of 38 items, constructed from SERVQUAL (service quality instrument), SITEQUAL (website service quality instrument) and literature findings, was completed by 200 customer contact professionals. Results of a factor analysis indicated three main customer relationship attributes of e-business, which are: general CRM (accounting for 51% of the total variance); personalization (accounting for 9% of the total variance); and privacy (accounting for 7% of the total variance). Results of a stepwise regression indicated that these customer relationship attributes significantly predict customer attitude (83% of the explained variance). Within the general CRM dimension, website content organization correlated highly with customer attitude (65% of the explained variance). The results of the study indicate that customers perceive three main dimensions of relationship attributes of e-business (general CRM, personalization and privacy) and that all three significantly contribute to customer attitude. These findings support the importance of including relational-type e-business attributes when investigating interactions between customers and e-business. The study concludes with related implications and design guidelines to enhancing customer perception of e-business. 相似文献

19.

CARTMAP: a neural network method for automated feature selection in financial time series forecasting

Charles Wong Massimiliano Versace 《Neural computing & applications》2012,21(5):969-977

In the past two decades, there has been much interest in applying neural networks to financial time series forecasting. Yet, there has been relatively little attention paid to selecting the input features for training these networks. This paper presents a novel CARTMAP neural network based on Adaptive Resonance Theory that incorporates automatic, intuitive, transparent, and parsimonious feature selection with fast learning. On average, over three separate 4-year simulations spanning 2004–2009 of Dow Jones Industrial Average stocks, CARTMAP outperformed related and classical alternatives. The alternatives were an industry standard random walk, a regression model, a general purpose ARTMAP, and ARTMAP with stepwise feature selection. This paper also discusses why the novel feature selection scheme outperforms the alternatives and how it can represent a step toward more transparency in financial modeling. 相似文献

20.

A recommender system to avoid customer churn: A case study

Yi-Fan Wang Ding-An Chiang Mei-Hua Hsu Cheng-Jung Lin I-Long Lin 《Expert systems with applications》2009,36(4):8071-8075

A major concern for modern enterprises is to promote customer value, loyalty and contribution through services such as can help establish a long-term, honest relationship with customers. For purposes of better customer relationship management, data mining technology is commonly used to analyze large quantities of data about customer bargains, purchase preferences, customer churn, etc. This paper aims to propose a recommender system for wireless network companies to understand and avoid customer churn. To ensure the accuracy of the analysis, we use the decision tree algorithm to analyze data of over 60,000 transactions and of more than 4000 members, over a period of three months. The data of the first nine weeks is used as the training data, and that of the last month as the testing data. The results of the experiment are found to be very useful for making strategy recommendations to avoid customer churn. 相似文献