首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
《Journal of Process Control》2014,24(7):1068-1075
This paper developed a new variable selection method for soft sensor applications using the nonnegative garrote (NNG) and artificial neural network (ANN). The proposed method employs the ANN to generate a well-trained network, and then uses the NNG to conduct the accurate shrinkage of input weights of the ANN. This paper took Bayesian information criterion as the model evaluation criterion, and the optimal garrote parameter s was determined by v-fold cross-validation. The performance of the proposed algorithm was compared to existing state-of-art variable selection methods. Two artificial dataset examples and a real industrial application for air separation process were applied to demonstrate the performance of the methods. The experimental results showed that the proposed method presented better model accuracy with fewer variables selected, compared to other state-of-art methods.  相似文献   

2.
Data-driven soft sensors have been widely used in both academic research and industrial applications for predicting hard-to-measure variables or replacing physical sensors to reduce cost. It has been shown that the performance of these data-driven soft sensors could be greatly improved by selecting only the vital variables that strongly affect the primary variables, rather than using all the available process variables. In this work, a comprehensive evaluation of different variable selection methods for PLS-based soft sensor development is presented, and a new metric is proposed to assess the performance of different variable selection methods. The following seven variable selection methods are compared: stepwise regression (SR), partial least squares with regression coefficients (PLS-BETA), PLS with variable importance in projection (PLS-VIP), uninformative variable elimination with PLS (UVE-PLS), genetic algorithm with PLS (GA-PLS), least absolute shrinkage and selection operator (Lasso), and competitive adaptive reweighted sampling with PLS (CARS-PLS). Their strengths and limitations for soft sensor development are demonstrated by a simulated case study and an industrial case study.  相似文献   

3.
In this study, a soft-sensor modeling algorithm with adaptive partial least squares nonnegative garrote is developed by incorporating nonstationary disturbance. The approach is capable of monitoring the stationary and nonstationary behaviors of the process dynamics. The procedure of adaptive variable selection ensures that a compact and robust input–output relation is obtained online. Hence, in addition to simply tracking prediction, the model can be used for the detection of structural model change and the emergence of disturbance. The advantages of the proposed method are demonstrated with a simulation example and two industrial applications to predict the temperature of a blast furnace hearth wall and to estimate impurity composition of a distillation column.  相似文献   

4.
Fuzzy regression models are developed to construct the relationship between explanatory variables and responses in a fuzzy environment. In order to increase the explanatory performance of the model, the least-squares method is applied to determine the numeric coefficients based on the concept of distance. Unlike most existing approaches, the numeric coefficients in the proposed model can have negative values. The proposed model minimizes total estimation error in terms of the sum of the average squared distance between the observed and estimated responses based on a few $alpha$-cuts. The proposed approach is not limited to triangular fuzzy numbers; it can be used to carry out a large number of fuzzy observations efficiently because the model is based on traditional statistical methods. Comparisons with existing methods show that based on the total estimation error using the mean squared error and Kim and Bishu's criterion, the explanatory performance of the proposed model is satisfactory.   相似文献   

5.
现代工业过程建模中,生产过程的多变量、非线性及动态性会导致模型复杂度增高且建模精度降低.针对这一问题,将非负绞杀算法(NNG)嵌入长短期记忆(LSTM)神经网络,提出一种基于LSTM神经网络及其输入变量选择的动态软测量算法.首先,通过参数优化生成训练好的LSTM神经网络,利用其出色的历史信息记忆能力处理工业过程中的动态、时滞等问题;其次,采用NNG算法对LSTM网络输入权重进行压缩,剔除冗余变量,提高模型精度,并采用网格搜索法与分块交叉验证对其超参数寻优;最后,将算法应用于某火电厂脱硫过程排放烟气SO2浓度软测量建模,并与其它先进算法进行性能比较.实验结果表明所提算法能有效剔除冗余变量,降低模型复杂度并提高其预测性能.  相似文献   

6.
The problem of variable selection within the class of generalized additive models, when there are many covariates to choose from but the number of predictors is still somewhat smaller than the number of observations, is considered. Two very simple but effective shrinkage methods and an extension of the nonnegative garrote estimator are introduced. The proposals avoid having to use nonparametric testing methods for which there is no general reliable distributional theory. Moreover, component selection is carried out in one single step as opposed to many selection procedures which involve an exhaustive search of all possible models. The empirical performance of the proposed methods is compared to that of some available techniques via an extensive simulation study. The results show under which conditions one method can be preferred over another, hence providing applied researchers with some practical guidelines. The procedures are also illustrated analysing data on plasma beta-carotene levels from a cross-sectional study conducted in the United States.  相似文献   

7.
针对高维数据的特点,即数据中变量个数往往大于样本观测数目,并且数据往往具有异质性特点,基于众数回归分析和变量选择降维技术,提出了一种稳健有效的特征选择方法,利用局部二次逼近算法(LQA)和最大期望(EM)算法,给出估计算法和最优调节参数的选取方法。通过实验的模拟数据分析表明,所提出的特征提取选择方法整体优于基于最小二乘和中位数的正则化估计方法,特别当误差是非正态分布时,与已有方法相比具有较高的预测能力和稳健性。  相似文献   

8.
一种基于主元选择的偏最小二乘回归方法   总被引:1,自引:0,他引:1  
为更有效地分析和处理小样本多元数据,提出了一种基于主元选择的偏最小二乘回归方法,并阐述了该方法的基本原理和计算步骤。该方法首先根据相关系数矩阵选取数据样本中的主元,然后对主元进行主成分分析、典型相关分析和多元线性回归。实例分析表明,与偏最小二乘回归方法相比,该方法在分析存在多重线性相关的小样本多元数据方面回归次数更少,精确度更高。  相似文献   

9.
We propose a new Artificial neural network (ANN) method where we select a set of variables as input variables to the ANN. The selection is made so that the input variables may be informative for a target variable as much as possible. The proposed method compared favorably with the existing ANN methods when their performances were evaluated based on 488 stocks in S&P500 in terms of prediction accuracy.  相似文献   

10.
Fuzzy regression models have been applied to operational research (OR) applications such as forecasting. Some of previous studies on fuzzy regression analysis obtain crisp regression coefficients for eliminating the problem of increasing spreads for the estimated fuzzy responses as the magnitude of the independent variable increases; however, they still cannot cope with the situation of decreasing or variable spreads. This paper proposes a three-phase method to construct the fuzzy regression model with variable spreads to resolve this problem. In the first phase, on the basis of the extension principle, the membership functions of the least-squares estimates of regression coefficients are constructed to conserve completely the fuzziness of observations. In the second phase, then they are defuzzified by the center of gravity method to obtain crisp regression coefficients. In the third phase, the error terms of the proposed model are determined by setting each estimated spread equals its corresponding observed spread. Furthermore, the Mamdani fuzzy inference system is adopted for improving the accuracy of its forecasts. Compared to the previous studies, the results from five examples and an application example of Japanese house prices show that the proposed fuzzy linear regression model has higher explanatory power and forecasting performance.  相似文献   

11.

In recent years, the importance of computationally efficient surrogate models has been emphasized as the use of high-fidelity simulation models increases. However, high-dimensional models require a lot of samples for surrogate modeling. To reduce the computational burden in the surrogate modeling, we propose an integrated algorithm that incorporates accurate variable selection and surrogate modeling. One of the main strengths of the proposed method is that it requires less number of samples compared with conventional surrogate modeling methods by excluding dispensable variables while maintaining model accuracy. In the proposed method, the importance of selected variables is evaluated using the quality of the model approximated with the selected variables only. Nonparametric probabilistic regression is adopted as the modeling method to deal with inaccuracy caused by using selected variables during modeling. In particular, Gaussian process regression (GPR) is utilized for the modeling because it is suitable for exploiting its model performance indices in the variable selection criterion. Outstanding variables that result in distinctly superior model performance are finally selected as essential variables. The proposed algorithm utilizes a conservative selection criterion and appropriate sequential sampling to prevent incorrect variable selection and sample overuse. Performance of the proposed algorithm is verified with two test problems with challenging properties such as high dimension, nonlinearity, and the existence of interaction terms. A numerical study shows that the proposed algorithm is more effective as the fraction of dispensable variables is high.

  相似文献   

12.
In many practical situations, the quality of a process, or product, is better characterized and summarized by the relationship between a response variable and one or more explanatory variables. Such a relationship between the response variable and explanatory variables is called a profile. Recently, profile monitoring has become a fertile research field in statistical process control (SPC). To handle the nonlinear profile data, the proposal considered in this paper is that the entire curve is broken into several segments of data points that exhibit a statistical fit to the linear model, and therefore each of them can be monitored separately by using existing linear profile SPC methods. A new method that determines the locations of change points based on the slop change is proposed. Two goodness-of-fit criteria are utilized for determining the best number of change points to avoid over-fitting. Two nonlinear profile examples taken from the literature are used to illustrate the proposed change-point model. Monitoring performances using the existing T2 and EWMA-based approaches are presented when the nonlinear profile data is fitted by using the proposed change-point model.  相似文献   

13.
基于回归系数的变量筛选方法用于近红外光谱分析   总被引:1,自引:0,他引:1  
提出了一种基于回归系数的变量逐步筛选方法。对光谱中各变量计算其回归系数后,按其绝对值由大到小将相应变量排列,采用PLS交互检验按前向选择法逐步选择最佳变量子集。用该方法对玉米和柴油近红外光谱数据进行分析,对玉米蛋白质、柴油十六烷值和粘度分别选择出了14、12以及30个最佳变量用于建模,所得预测结果均优于全谱变量建模的预测结果。可见本方法是一种有效实用的近红外光谱变量选择方法。  相似文献   

14.
Variable selection for Poisson regression when the response variable is potentially underreported is considered. A logistic regression model is used to model the latent underreporting probabilities. An efficient MCMC sampling scheme is designed, incorporating uncertainty about which explanatory variables affect the dependent variable and which affect the underreporting probabilities. Validation data is required in order to identify and estimate all parameters. A simulation study illustrates favorable results both in terms of variable selection and parameter estimation. Finally, the procedure is applied to a real data example concerning deaths from cervical cancer.  相似文献   

15.
A fuzzy regression model is developed to construct the relationship between the response and explanatory variables in fuzzy environments. To enhance explanatory power and take into account the uncertainty of the formulated model and parameters, a new operator, called the fuzzy product core (FPC), is proposed for the formulation processes to establish fuzzy regression models with fuzzy parameters using fuzzy observations that include fuzzy response and explanatory variables. In addition, the sign of parameters can be determined in the model-building processes. Compared to existing approaches, the proposed approach reduces the amount of unnecessary or unimportant information arising from fuzzy observations and determines the sign of parameters in the models to increase model performance. This improves the weakness of the relevant approaches in which the parameters in the models are fuzzy and must be predetermined in the formulation processes. The proposed approach outperforms existing models in terms of distance, mean similarity, and credibility measures, even when crisp explanatory variables are used.  相似文献   

16.
The problem of regression analysis in a fuzzy setting is discussed. A general linear regression model for studying the dependence of a LR fuzzy response variable on a set of crisp explanatory variables, along with a suitable iterative least squares estimation procedure, is introduced. This model is then framed within a wider strategy of analysis, capable to manage various types of uncertainty. These include the imprecision of the regression coefficients and the choice of a specific parametric model within a given class of models. The first source of uncertainty is dealt with by exploiting the implicit fuzzy arithmetic relationships between the spreads of the regression coefficients and the spreads of the response variable. Concerning the second kind of uncertainty, a suitable selection procedure is illustrated. This consists in maximizing an appropriately introduced goodness of fit index, within the given class of parametric models. The above strategy is illustrated in detail, with reference to an application to real data collected in the framework of an environmental study. In the final remarks, some critical points are underlined, along with a few indications for future research in this field.  相似文献   

17.
The goal of this paper is to handle the large variation issues in fuzzy data by constructing a variable spread multivariate adaptive regression splines (MARS) fuzzy regression model with crisp parameters estimation and fuzzy error terms. It deals with imprecise measurement of response variable and crisp measurement of explanatory variables. The proposed method is a two-phase procedure which applies the MARS technique at phase one and an optimization problem at phase two to estimate the center and fuzziness of the response variable. The proposed method, therefore, handles two problems simultaneously: the problem of large variation issue and the problem of variation spreads in fuzzy observations. A realistic application of the proposed method is also presented, by which the suspended load is modeled using discharge in a hydrology engineering problem. Empirical results demonstrate that the proposed approach is more efficient and more realistic than some well-known least-squares fuzzy regression models.  相似文献   

18.
The problem of selecting variables or features in a regression model in the presence of both additive (vertical) and leverage outliers is addressed. Since variable selection and the detection of anomalous data are not separable problems, the focus is on methods that select variables and outliers simultaneously. For selection, the fast forward selection algorithm, least angle regression (LARS), is used, but it is not robust. To achieve robustness to additive outliers, a dummy variable identity matrix is appended to the design matrix allowing both real variables and additive outliers to be in the selection set. For leverage outliers, these selection methods are used on samples of elemental sets in a manner similar to that used in high breakdown robust estimation. These results are compared to several other selection methods of varying computational complexity and robustness. The extension of these methods to situations where the number of variables exceeds the number of observations is discussed.  相似文献   

19.
This paper introduces two types of nonsmooth optimization methods for selecting model hyperparameters in primal SVM models based on cross-validation. Unlike common grid search approaches for model selection, these approaches are scalable both in the number of hyperparameters and number of data points. Taking inspiration from linear-time primal SVM algorithms, scalability in model selection is achieved by directly working with the primal variables without introducing any dual variables. The proposed implicit primal gradient descent (ImpGrad) method can utilize existing SVM solvers. Unlike prior methods for gradient descent in hyperparameters space, all work is done in the primal space so no inversion of the kernel matrix is required. The proposed explicit penalized bilevel programming (PBP) approach optimizes both the hyperparameters and parameters simultaneously. It solves the original cross-validation problem by solving a series of least squares regression problems with simple constraints in both the hyperparameter and parameter space. Computational results on least squares support vector regression problems with multiple hyperparameters establish that both the implicit and explicit methods perform quite well in terms of generalization and computational time. These methods are directly applicable to other learning tasks with differentiable loss functions and regularization functions. Both the implicit and explicit algorithms investigated represent powerful new approaches to solving large bilevel programs involving nonsmooth loss functions.  相似文献   

20.
Term and variable selection for non-linear system identification   总被引:1,自引:0,他引:1  
The purpose of variable selection is to pre-select a subset consisting of the significant variables or to eliminate the redundant variables from all the candidate variables of a system under study prior to model term detection. It is required that the selected significant variables alone should sufficiently represent the system. Generally, not all the model terms, which are produced by combining different variables, make an equal contribution to the system output and terms, which make little contribution, can be omitted. A parsimonious representation, which contains only the significant terms, can often be obtained without the loss of representational accuracy by eliminating the redundant terms. Based on these observations, a new variable and term selection algorithm is proposed in this paper. The term detection algorithm can be applied to the general class of non-linear modelling problems which can be expressed as a linear-in-the-parameters form. The variable selection procedure is based on locally linear and cross-bilinear models, which are used together with the forward orthogonal least squares (OLS) and error reduction ratio (ERR) approach to determine the significant terms and to pre-select the important variables for both time series and input–output systems. Several numerical examples are provided to illustrate the applicability and effectiveness of the new approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号