首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
插补法是对缺失数据的调整方法,多重插补弥补了单一插补的缺陷,采用一系列可能的数据集来填充每一个缺失数据值,反映了缺失数据的不确定性。本文介绍了多重插补程序的三种数据插补方法:回归预测法、倾向得分法和蒙特卡罗的马氏链方法,并且对多重插补的插补效果进行推断,指出多重插补存在的问题。  相似文献   

2.
文章通过对缺失值处理方法分析,提出基于分类的三种缺失值处理方法:分类的均值插补法、分类的多重插补法和分类的K-means方法;该方法先对被调查对象问卷中的满意度关键字段按照分值进行分类,然后在同类中的缺失值用该类的平均值、多重插补值和聚类中心值替代.最后,以某食品公司为研究对象,对顾客满意度测评模型进行带缺失值的实证分析.结果表明:基于分类的三种缺失值处理方法优于均值插补法、多重插补法和K-means方法,为顾客满意度指数测评中的缺失值处理提供了实用方法.  相似文献   

3.
基于链式方程的收入变量 缺失值的多重插补   总被引:2,自引:0,他引:2       下载免费PDF全文
刘凤芹 《统计研究》2009,26(1):71-77
 在经济计量分析中收入变量的缺失值是一个普遍而又较难处理的问题。传统的处理方法往往导致分析结果具有系统偏差。本文提出利用基于链式方程的多重插补方法来处理收入变量的缺失值问题。文章将此方法应用到一个实际数据集,然后通过分析插补后的数据集讨论了此方法的性质,并和其他多重插补方法进行了比较。结果表明:基于链式方程的多重插补能在一定程度上纠正推断结果的系统偏差,并且给出恰当的标准差估计。  相似文献   

4.
从匹配模型法数据的缺失机理分析,匹配模型法编制的价格指数的质量偏差可分为两部分:一部分为样本内的质量偏差;一部分为样本外的质量偏差。文章对匹配模型框架下质量偏差的数据模式和数据缺失机理进行了探讨和研究,在此基础上针对不同的数据缺失机理,提出了均值插补法、交叠插补法和hedonic插补法三种质量调整方法。  相似文献   

5.
无回答在大数据应用中频繁发生。通常,实际数据的无回答率较低,在这样的情况下,采用倾向得分模型对无回答单元与回答单元进行匹配,易导致倾向得分匹配插补法的插补效果显著下降。为此,将合成少数类过采样算法的思想融入到倾向得分匹配插补法中,提出基于少数类过采样的倾向得分匹配插补法。利用统计模拟与实证研究,在不同无回答率、插补重数和误差分布情形下,演示新插补法的统计性质和应用效果。统计模拟显示,新插补法具有明显高于倾向得分匹配插补法的精度,统计性质受无回答率、插补重数和误差分布的影响小。实证结果显示,新插补法在实际数据中具有较好的应用性。基于少数类过采样的倾向得分匹配插补法提供了处理无回答问题的新思路,并具有较好的扩展性。  相似文献   

6.
多重插补处理缺失数据方法的理论基础探析   总被引:4,自引:0,他引:4  
本文在比较单一插补法与多重插补法的基础上,对多重插补处理方法的理论基础做了深入探讨,并介绍了多重插补法处理缺失数据的基本思想.  相似文献   

7.
缺失值是调查中普遍存在的问题,对缺失值进行插补是处理缺失值的较好方法.如果变量之间存在相关关系,可以通过正态线形模型利用不存在缺失值的变量对有存在缺失值的变量进行插补.较之单一插补,多重插补更能有效地估计总体方差,因此更多地被使用.文章借助Bootstrap法,让模型的参数和残差来自完全观测的Bootstrap样本的最小平法估计,可进一步准确估计总体方差.通过大量模拟试验,发现Bootstrap多重插补较之单一插补和一般多重插补能构建更宽的置信区间从而有更准确的总体参数覆盖率,这点在数据缺失比重很大时优势更明显.  相似文献   

8.
文章通过多重插补方法对不同缺失率和缺失模式的多变量缺失样本进行插补,研究了多重插补误差与缺失率和缺失模式的依赖关系。结果表明,当缺失率为0~15%时,多重插补误差与缺失率呈线性关系;当缺失率大于15%时,两者呈偏离线性关系。多重插补误差与缺失模式的方差均值比呈正相关性,当方差均值比越大时,误差也越大。  相似文献   

9.
分层随机抽样条件下缺失数据的多重插补方法   总被引:1,自引:0,他引:1  
介绍分层随机抽样条件下多重插补法处理缺失数据的基本思想,分析可忽略无回答的分层随机抽样建立多重插补的常用方法,并通过实例加以说明.  相似文献   

10.
文章通过对国内外顾客满意度指数模型的对比及我国的现状分析,提出顾客满意度指数测评的拓展模型,模型对感知质量潜变量细化为感知产品质量和感知服务质量.在此基础上对带缺失值的顾客满意度指数测评步骤进行研究.针对该拓展模型,基于均值插补法,提出一种新的缺失值处理方法-分类均值插补法,该方法先对被调查对象问卷中的满意度字段按照分值进行分类,然后对同类中的缺失值用该类的平均值替代.  相似文献   

11.
急诊拥堵的早期识别对急诊管理具有重要价值,而基于数据的统计研究已成为急诊拥堵规律发现的主要途径。基于急诊科呼吸入院病人的5个到达特征信息——性别、年龄、来诊方式、分诊级别和到达时间,运用因子分析提炼出代表急诊科呼吸病人到达特征的3个公因子——病情紧急度特征、属性特征和来诊时间特征;依据因子分析结果,结合拥堵指标(急诊科停留时间LOS)进行了Q型聚类分析,将病人数据分为五类;根据不同类别病人之间的特征及LOS差异规律,探究病人特征对急诊科拥堵分级的影响,提出根据病人初始到达信息的急诊科拥堵分级路径。拥堵指标与病人初始特征因素相结合的急诊科拥堵分级路径,对急诊科拥堵的早期干预具有现实意义。  相似文献   

12.
With the development of modern society and the large usage of modern science and technology, the probability of a break out of crisis and emergency will be the highest in the history of mankind and the result will cause great suffering. Natural and political disaster can strike nearly anywhere. Therefore, people in the world now pay more attention to the effective management in crisis and emergency than ever before. This article attempts to improve the efficiency and effectiveness of crisis and emergency management from the point of view of quality using the fuzzy comprehensive evaluation method. The contents of this article consist of six parts: Introduction, literature review, the system of factors which might affect the quality in crisis and emergency management, modeling of evaluating quality in crisis and emergency management, a numerical example, and finally, discussions and suggestions.  相似文献   

13.
14.
Assuming stratified simple random sampling, a confidence interval for a finite population quantile may be desired. Using a confidence interval with endpoints given by order statistics from the combined stratified sample, several procedures to obtain lower bounds (and approximations for the lower bounds) for the confidence coefficients are presented. The procedures differ with respect to the amount of prior information assumed about the var-iate values in the finite population, and the extent to which sample data is used to estimate the lower bounds.  相似文献   

15.
The issue of modelling the joint distribution of survival time and of prognostic variables measured periodically has recently become of interest in the AIDS literature but is of relevance in other applications. The focus of this paper is on clinical trials where follow-up measurements of potentially prognostic variables are often collected but not routinely used. These measurements can be used to study the biological evolution of the disease of interest; in particular the effect of an active treatment can be examined by comparing the time profiles of patients in the active and placebo group. It is proposed to use multilevel regression analysis to model the individual repeated observations as function of time and, possibly, treatment. To address the problem of informative drop-out—which may arise if deaths (or any other censoring events) are related to the unobserved values of the prognostic variables—we analyse sequentially overlapping portions of the follow-up information. An example arising from a randomized clinical trial for the treatment of primary biliary cirrhosis is examined in detail.  相似文献   

16.
We develop the score test for the hypothesis that a parameter of a Markov sequence is constant over time, against the alternatives that it varies over time, i.e., θt = θ + Ut; t = 1,2,…, where {Ut; t = 1,2,...} is a sequence of independently and identically distributed random variables with mean zero and variance σz u and θ is a fixed constant. The asymptotic null distribution of the test statistic is proved to be normal. We illustrate our procedure by examples and a real life data analysis.  相似文献   

17.
Methods for comparing designs for a random (or mixed) linear model have focused primarily on criteria based on single-valued functions. In general, these functions are difficult to use, because of their complex forms, in addition to their dependence on the model's unknown variance components. In this paper, a graphical approach is presented for comparing designs for random models. The one-way model is used for illustration. The proposed approach is based on using quantiles of an estimator of a function of the variance components. The dependence of these quantiles on the true values of the variance components is depicted by plotting the so-called quantile dispersion graphs (QDGs), which provide a comprehensive picture of the quality of estimation obtained with a given design. The QDGs can therefore be used to compare several candidate designs. Two methods of estimation of variance components are considered, namely analysis of variance and maximum-likelihood estimation.  相似文献   

18.
The authors study the estimation of domain totals and means under survey‐weighted regression imputation for missing items. They use two different approaches to inference: (i) design‐based with uniform response within classes; (ii) model‐assisted with ignorable response and an imputation model. They show that the imputed domain estimators are biased under (i) but approximately unbiased under (ii). They obtain a bias‐adjusted estimator that is approximately unbiased under (i) or (ii). They also derive linearization variance estimators. They report the results of a simulation study on the bias ratio and efficiency of alternative estimators, including a complete case estimator that requires the knowledge of response indicators.  相似文献   

19.
20.
The exact distribution of a nonparametric test statistic for ordered alternatives, the rank 2 statistic, is computed for small sample sizes. The exact distribution is compared to an approximation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号