首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
从弹性网(Elastic net)方法所选择的模型出发,构造基于模型选择条件下的系数的精确分布,并通过分布进行推断从而得到检验系数显著性的p值及模型系数的置信区间等.通过方法可对传统弹性网方法所选模型做进一步调整,模拟研究说明了本文所提方法在变量选择中的适用性。如对噪声变量有较强的识别能力等.在实证分析中,使用基于变量选择事件的弹性网方法对我国劳动者工资收入的影响原因进行了筛选,分析表明在传统弹性网方法选取的解释变量中,宗教活动频率、工龄、身体健康程度以及个体身高不是影响劳动收入的最主要原因,可依据实际情况剔除这些变量,减少研究成本且提高分析效率,在实际应用中有一定的参考价值.  相似文献   

2.
分位数变系数模型是一种稳健的非参数建模方法.使用变系数模型分析数据时,一个自然的问题是如何同时选择重要变量和从重要变量中识别常数效应变量.本文基于分位数方法研究具有稳健和有效性的估计和变量选择程序.利用局部光滑和自适应组变量选择方法,并对分位数损失函数施加双惩罚,我们获得了惩罚估计.通过BIC准则合适地选择调节参数,提出的变量选择方法具有oracle理论性质,并通过模拟研究和脂肪实例数据分析来说明新方法的有用性.数值结果表明,在不需要知道关于变量和误差分布的任何信息前提下,本文提出的方法能够识别不重要变量同时能区分出常数效应变量.  相似文献   

3.
偏倚一方差分析方法是在模型选择过程中权衡模型对现有样本解释程度和未知样本估计准确度的分析方法,目的是使选定的模型检验误差尽量小.在分类或回归过程中进行有效的变量筛选可以获得更准确的模型表达,但也会因此带来一定误差.提出"选择误差"的概念,用于刻画带有变量选择的分类问题中由于变量的某种选择方法所引起的误差.将分类问题的误差分解为偏倚—方差—选择误差进行研究,考察偏倚、方差和选择误差对分类问题的总误差所产生的影响.  相似文献   

4.
本文在多种复杂数据下, 研究一类半参数变系数部分线性模型的统计推断理论和方法. 首先在纵向数据和测量误差数据等复杂数据下, 研究半参数变系数部分线性模型的经验似然推断问题, 分别提出分组的和纠偏的经验似然方法. 该方法可以有效地处理纵向数据的组内相关性给构造经验似然比函数所带来的困难. 其次在测量误差数据和缺失数据等复杂数据下, 研究模型的变量选择问题, 分别提出一个“纠偏” 的和基于借补值的变量选择方法. 该变量选择方法可以同时选择参数分量及非参数分量中的重要变量, 并且变量选择与回归系数的估计同时进行. 通过选择适当的惩罚参数, 证明该变量选择方法可以相合地识别出真实模型, 并且所得的正则估计具有oracle 性质.  相似文献   

5.
纵向数据常常用正态混合效应模型进行分析.然而,违背正态性的假定往往会导致无效的推断.与传统的均值回归相比较,分位回归可以给出响应变量条件分布的完整刻画,对于非正态误差分布也可以给稳健的估计结果.本文主要考虑右删失响应下纵向混合效应模型的分位回归估计和变量选择问题.首先,逆删失概率加权方法被用来得到模型的参数估计.其次,结合逆删失概率加权和LASSO惩罚变量选择方法考虑了模型的变量选择问题.蒙特卡洛模拟显示所提方法要比直接删除删失数据的估计方法更具优势.最后,分析了一组艾滋病数据集来展示所提方法的实际应用效果.  相似文献   

6.
本文研究测量误差模型的自适应LASSO(least absolute shrinkage and selection operator)变量选择和系数估计问题.首先分别给出协变量有测量误差时的线性模型和部分线性模型自适应LASSO参数估计量,在一些正则条件下研究估计量的渐近性质,并且证明选择合适的调整参数,自适应LASSO参数估计量具有oracle性质.其次讨论估计的实现算法及惩罚参数和光滑参数的选择问题.最后通过模拟和一个实际数据分析研究了自适应LASSO变量选择方法的表现,结果表明,变量选择和参数估计效果良好.  相似文献   

7.
考虑了删失分位数变系数回归模型的FIC准则,并基于FIC准则给出了兴趣参数的模型选择和平均估计.为了全面反映响应变量的分布信息,克服异常值和重尾模型误差,文章对响应变量的不同分位数水平进行建模,因此与普通最小二乘方法相比更为稳健.在较为一般的条件下,证明了所提估计的渐近性质,通过模拟实验研究了估计的有限样本性质,用所提方法分析了手机用户的游戏时间数据.  相似文献   

8.
植物遗传与基因组学研究表明许多重要的农艺性状有影响的基因位点不是稀疏的,受到大量微效基因的影响,并且还存在基因交互项的影响.本文基于重要油料作物油菜的花期数据,研究中等稀疏条件下的基因选择问题,提出了一种两步Bayes模型选择方法.考虑基因间的交互作用,模型的维数急剧增长,加上数据结构特别,通常的变量选择方法效果不好.本文提出两步变量选择的方法:首先利用Kolmogorov特征扫描方法筛除那些明显不重要的变量,达到降维的目的;其次,在选出的位点中考虑交互作用.为了克服Bayes方法计算速度慢的问题,本文在模型中引入指示变量,通过估计指示变量的后验分布选择模型.模拟结果表明本文提出的方法在预测精度和计算稳定性上有良好的表现,与不加指示变量的Bayes方法相比,在预测精度上有很大的提高.最后,利用本文提出的方法分析一个油菜花期数据,发现了一些交互效应的基因位点.  相似文献   

9.
针对高维强相关数据的变量选择问题,本文提出了改进的变量选择方法.该方法先利用自适应弹性网方法(Aenet)在原始的强相关数据上建立模型,选出对响应变量起重要作用的群组变量和独立变量;再通过偏最小二乘方法(PLS)对选出的变量作模型估计;最后,将两种方法得到的估计系数做线性组合,并以此系数来建立回归模型.新模型具有精度高、解释性好的优点,数值实验验证了该方法的有效性.  相似文献   

10.
本文主要考虑利用贝叶斯方法分析加速失效时间模型.在该模型中,误差项的分布为未知并采用Pólya Tree分布进行逼近.本文利用贝叶斯Lasso和马尔科夫链蒙特卡罗方法对模型进行参数估计和变量选择.模拟结果显示本文提出的方法能准确识别模型中重要的影响因子并能得到准确的参数估计.本文最后利用此模型识别II型糖尿病人生存时间的重要风险因子.  相似文献   

11.
当前上市公司信用风险数据所呈现出的高维度以及高相关性的特点严重影响了信用风险模型的准确性。为此本文结合已有算法以及信用风险模型的特点设计了一种新的基于非参数的变量选择方法。通过该方法对上市公司用风险相关变量进行分析筛选可以消除数据集中包含的噪声变量以及线性相关变量。本文同时还针对该方法设计了高变量维度下最优解求解算法。文章以Logistic模型为例对上市公司信用风险做了实证分析,研究结果表明与以往的变量选择方法相比该方法可以有效的降低数据维度,消除变量间的相关性,并同时提高模型的可靠性和预测精度。  相似文献   

12.
Many problems in genomics are related to variable selection where high-dimensional genomic data are treated as covariates. Such genomic covariates often have certain structures and can be represented as vertices of an undirected graph. Biological processes also vary as functions depending upon some biological state, such as time. High-dimensional variable selection where covariates are graph-structured and underlying model is nonparametric presents an important but largely unaddressed statistical challenge. Motivated by the problem of regression-based motif discovery, we consider the problem of variable selection for high-dimensional nonparametric varying-coefficient models and introduce a sparse structured shrinkage (SSS) estimator based on basis function expansions and a novel smoothed penalty function. We present an efficient algorithm for computing the SSS estimator. Results on model selection consistency and estimation bounds are derived. Moreover, finite-sample performances are studied via simulations, and the effects of high-dimensionality and structural information of the covariates are especially highlighted. We apply our method to motif finding problem using a yeast cell-cycle gene expression dataset and word counts in genes’ promoter sequences. Our results demonstrate that the proposed method can result in better variable selection and prediction for high-dimensional regression when the underlying model is nonparametric and covariates are structured. Supplemental materials for the article are available online.  相似文献   

13.
Many problems in genomics are related to variable selection where high-dimensional genomic data are treated as covariates. Such genomic covariates often have certain structures and can be represented as vertices of an undirected graph. Biological processes also vary as functions depending upon some biological state, such as time. High-dimensional variable selection where covariates are graph-structured and underlying model is nonparametric presents an important but largely unaddressed statistical challenge. Motivated by the problem of regression-based motif discovery, we consider the problem of variable selection for high-dimensional nonparametric varying-coefficient models and introduce a sparse structured shrinkage (SSS) estimator based on basis function expansions and a novel smoothed penalty function. We present an efficient algorithm for computing the SSS estimator. Results on model selection consistency and estimation bounds are derived. Moreover, finite-sample performances are studied via simulations, and the effects of high-dimensionality and structural information of the covariates are especially highlighted. We apply our method to motif finding problem using a yeast cell-cycle gene expression dataset and word counts in genes' promoter sequences. Our results demonstrate that the proposed method can result in better variable selection and prediction for high-dimensional regression when the underlying model is nonparametric and covariates are structured. Supplemental materials for the article are available online.  相似文献   

14.
When the data has heavy tail feature or contains outliers, conventional variable selection methods based on penalized least squares or likelihood functions perform poorly. Based on Bayesian inference method, we study the Bayesian variable selection problem for median linear models. The Bayesian estimation method is proposed by using Bayesian model selection theory and Bayesian estimation method through selecting the Spike and Slab prior for regression coefficients, and the effective posterior Gibbs sampling procedure is also given. Extensive numerical simulations and Boston house price data analysis are used to illustrate the effectiveness of the proposed method.  相似文献   

15.
A general methodology for selecting predictors for Gaussian generative classification models is presented. The problem is regarded as a model selection problem. Three different roles for each possible predictor are considered: a variable can be a relevant classification predictor or not, and the irrelevant classification variables can be linearly dependent on a part of the relevant predictors or independent variables. This variable selection model was inspired by a previous work on variable selection in model-based clustering. A BIC-like model selection criterion is proposed. It is optimized through two embedded forward stepwise variable selection algorithms for classification and linear regression. The model identifiability and the consistency of the variable selection criterion are proved. Numerical experiments on simulated and real data sets illustrate the interest of this variable selection methodology. In particular, it is shown that this well ground variable selection model can be of great interest to improve the classification performance of the quadratic discriminant analysis in a high dimension context.  相似文献   

16.
In this article, we study the variable selection of partially linear single-index model(PLSIM). Based on the minimized average variance estimation, the variable selection of PLSIM is done by minimizing average variance with adaptive ll penalty. Implementation algorithm is given. Under some regular conditions, we demonstrate the oracle properties of aLASSO procedure for PLSIM. Simulations are used to investigate the effectiveness of the proposed method for variable selection of PLSIM.  相似文献   

17.
We consider the problem of variable selection for single-index varying-coefficient model, and present a regularized variable selection procedure by combining basis function approximations with SCAD penalty. The proposed procedure simultaneously selects significant covariates with functional coefficients and local significant variables with parametric coefficients. With appropriate selection of the tuning parameters, the consistency of the variable selection procedure and the oracle property of the estimators are established. The proposed method can naturally be applied to deal with pure single-index model and varying-coefficient model. Finite sample performances of the proposed method are illustrated by a simulation study and the real data analysis.  相似文献   

18.
A commonly used semiparametric model is considered. We adopt two difference based estimators of the linear component of the model and propose corresponding thresholding estimators that can be used for variable selection. For each thresholding estimator, variable selection in the linear component is developed and consistency of the variable selection procedure is shown. We evaluate our method in a simulation study and implement it on a real data set.  相似文献   

19.
The main challenge in working with gene expression microarrays is that the sample size is small compared to the large number of variables (genes). In many studies, the main focus is on finding a small subset of the genes, which are the most important ones for differentiating between different types of cancer, for simpler and cheaper diagnostic arrays. In this paper, a sparse Bayesian variable selection method in probit model is proposed for gene selection and classification. We assign a sparse prior for regression parameters and perform variable selection by indexing the covariates of the model with a binary vector. The correlation prior for the binary vector assigned in this paper is able to distinguish models with the same size. The performance of the proposed method is demonstrated with one simulated data and two well known real data sets, and the results show that our method is comparable with other existing methods in variable selection and classification.  相似文献   

20.
??When the data has heavy tail feature or contains outliers, conventional variable selection methods based on penalized least squares or likelihood functions perform poorly. Based on Bayesian inference method, we study the Bayesian variable selection problem for median linear models. The Bayesian estimation method is proposed by using Bayesian model selection theory and Bayesian estimation method through selecting the Spike and Slab prior for regression coefficients, and the effective posterior Gibbs sampling procedure is also given. Extensive numerical simulations and Boston house price data analysis are used to illustrate the effectiveness of the proposed method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号