首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
Multiple imputation (MI) has become a standard statistical technique for dealing with missing values. The CDC Anthrax Vaccine Research Program (AVRP) dataset created new challenges for MI due to the large number of variables of different types and the limited sample size. A common method for imputing missing data in such complex studies is to specify, for each of J variables with missing values, a univariate conditional distribution given all other variables, and then to draw imputations by iterating over the J conditional distributions. Such fully conditional imputation strategies have the theoretical drawback that the conditional distributions may be incompatible. When the missingness pattern is monotone, a theoretically valid approach is to specify, for each variable with missing values, a conditional distribution given the variables with fewer or the same number of missing values and sequentially draw from these distributions. In this article, we propose the “multiple imputation by ordered monotone blocks” approach, which combines these two basic approaches by decomposing any missingness pattern into a collection of smaller “constructed” monotone missingness patterns, and iterating. We apply this strategy to impute the missing data in the AVRP interim data. Supplemental materials, including all source code and a synthetic example dataset, are available online.  相似文献   

2.
Principled techniques for incomplete-data problems are increasingly part of mainstream statistical practice. Among many proposed techniques so far, inference by multiple imputation (MI) has emerged as one of the most popular. While many strategies leading to inference by MI are available in cross-sectional settings, the same richness does not exist in multilevel applications. The limited methods available for multilevel applications rely on the multivariate adaptations of mixed-effects models. This approach preserves the mean structure across clusters and incorporates distinct variance components into the imputation process. In this paper, I add to these methods by considering a random covariance structure and develop computational algorithms. The attraction of this new imputation modeling strategy is to correctly reflect the mean and variance structure of the joint distribution of the data, and allow the covariances differ across the clusters. Using Markov Chain Monte Carlo techniques, a predictive distribution of missing data given observed data is simulated leading to creation of multiple imputations. To circumvent the large sample size requirement to support independent covariance estimates for the level-1 error term, I consider distributional impositions mimicking random-effects distributions assigned a priori. These techniques are illustrated in an example exploring relationships between victimization and individual and contextual level factors that raise the risk of violent crime.  相似文献   

3.
目的对医院出院病人调查表普遍存在的数据缺失进行填补与分析,以保证统计调查表的质量,为医院以及上级卫生部门了解现状,进行预策和决策提供技术支持和质量保证。方法运用SAS9.1,采用多重填补方法Markov Chain Monte Carlo(MCMC)模型对缺失数据进行多次填补并综合分析。结果MCMC填补10次的结果最优。结论(Multiple Imputation)MI方法在解决医院出院病人调查表数据缺失时有优势,发挥空间较大,且填补效率较高。  相似文献   

4.
We establish computationally flexible methods and algorithms for the analysis of multivariate skew normal models when missing values occur in the data. To facilitate the computation and simplify the theoretic derivation, two auxiliary permutation matrices are incorporated into the model for the determination of observed and missing components of each observation. Under missing at random mechanisms, we formulate an analytically simple ECM algorithm for calculating parameter estimation and retrieving each missing value with a single-valued imputation. Gibbs sampling is used to perform a Bayesian inference on model parameters and to create multiple imputations for missing values. The proposed methodologies are illustrated through a real data set and comparisons are made with those obtained from fitting the normal counterparts.  相似文献   

5.
Most current implementations of multiple imputation (MI) assume that data are missing at random (MAR), but this assumption is generally untestable. We performed analyses to test the effects of auxiliary variables on MI when the data are missing not at random (MNAR) using simulated data and real data. In the analyses we varied (a) the correlation, (b) the level of missing data, (c) the pattern of missing data, and (d) sample size. Results showed that MI performed adequately without auxiliary variables but they also had a modest impact on bias in the real data and improved efficiency in both data sets. The results of this study suggest that, counter to the concern about the violation of the MAR assumption, MI appears to be quite robust to missing data that are MNAR in analytic situations such as the ones presented here. Further, results can be made even better via the use of auxiliary variables, particularly when efficiency is a primary concern.  相似文献   

6.
We employed both chance-constrained data envelopment analysis (CCDEA) and stochastic frontier analysis (SFA) to measure the technical efficiency of 39 banks in Taiwan. Estimated results show that there are significant differences in efficiency scores between chance-constrained DEA and stochastic frontier production function. The advanced setting of the chance-constrained mechanism of DEA does not change the instinctive differences between DEA and SFA approaches. We further find that the ownership variable is still a significant variable to explain the technical efficiency in Taiwan, irrespective of whether a DEA, CCDEA or SFA approach is used.  相似文献   

7.
This paper uses both the non-parametric method of data envelopment analysis (DEA) and the econometric method of stochastic frontier analysis (SFA) to study the production technology and cost efficiency of the US dental care industry using practice level data. The American Dental Association 2006 survey data for a number of general dental practices in the state of Colorado in the US are used for the empirical analysis. The findings suggest that the cost efficiency score is between 0.79 and 0.87, on average, and the cost inefficiency is mostly due to allocative rather than technical inefficiency. The optimal output level for a dental practice to fully exploit the economies of scale is estimated to be at $1.68 million. Average cost at this level of output is 50.6 cents for each dollar of gross billing generated. The DEA and SFA approaches provide generally consistent results.  相似文献   

8.
Isotonic nonparametric least squares (INLS) is a regression method for estimating a monotonic function by fitting a step function to data. In the literature of frontier estimation, the free disposal hull (FDH) method is similarly based on the minimal assumption of monotonicity. In this paper, we link these two separately developed nonparametric methods by showing that FDH is a sign-constrained variant of INLS. We also discuss the connections to related methods such as data envelopment analysis (DEA) and convex nonparametric least squares (CNLS). Further, we examine alternative ways of applying isotonic regression to frontier estimation, analogous to corrected and modified ordinary least squares (COLS/MOLS) methods known in the parametric stream of frontier literature. We find that INLS is a useful extension to the toolbox of frontier estimation both in the deterministic and stochastic settings. In the absence of noise, the corrected INLS (CINLS) has a higher discriminating power than FDH. In the case of noisy data, we propose to apply the method of non-convex stochastic envelopment of data (non-convex StoNED), which disentangles inefficiency from noise based on the skewness of the INLS residuals. The proposed methods are illustrated by means of simulated examples.  相似文献   

9.
We propose a new method to impute missing values in mixed data sets. It is based on a principal component method, the factorial analysis for mixed data, which balances the influence of all the variables that are continuous and categorical in the construction of the principal components. Because the imputation uses the principal axes and components, the prediction of the missing values is based on the similarity between individuals and on the relationships between variables. The properties of the method are illustrated via simulations and the quality of the imputation is assessed using real data sets. The method is compared to a recent method (Stekhoven and Buhlmann Bioinformatics 28:113–118, 2011) based on random forest and shows better performance especially for the imputation of categorical variables and situations with highly linear relationships between continuous variables.  相似文献   

10.
The 2004 Basel II Accord has pointed out the benefits of credit risk management through internal models using internal data to estimate risk components: probability of default (PD), loss given default, exposure at default and maturity. Internal data are the primary data source for PD estimates; banks are permitted to use statistical default prediction models to estimate the borrowers’ PD, subject to some requirements concerning accuracy, completeness and appropriateness of data. However, in practice, internal records are usually incomplete or do not contain adequate history to estimate the PD. Current missing data are critical with regard to low default portfolios, characterised by inadequate default records, making it difficult to design statistically significant prediction models. Several methods might be used to deal with missing data such as list-wise deletion, application-specific list-wise deletion, substitution techniques or imputation models (simple and multiple variants). List-wise deletion is an easy-to-use method widely applied by social scientists, but it loses substantial data and reduces the diversity of information resulting in a bias in the model's parameters, results and inferences. The choice of the best method to solve the missing data problem largely depends on the nature of missing values (MCAR, MAR and MNAR processes) but there is a lack of empirical analysis about their effect on credit risk that limits the validity of resulting models. In this paper, we analyse the nature and effects of missing data in credit risk modelling (MCAR, MAR and NMAR processes) and take into account current scarce data set on consumer borrowers, which include different percents and distributions of missing data. The findings are used to analyse the performance of several methods for dealing with missing data such as likewise deletion, simple imputation methods, MLE models and advanced multiple imputation (MI) alternatives based on MarkovChain-MonteCarlo and re-sampling methods. Results are evaluated and discussed between models in terms of robustness, accuracy and complexity. In particular, MI models are found to provide very valuable solutions with regard to credit risk missing data.  相似文献   

11.
Incomplete data models typically involve strong untestable assumptions about the missing data distribution. As inference may critically depend on them, the importance of sensitivity analysis is well recognized. Molenberghs, Kenward, and Goetghebeur proposed a formal frequentist approach to sensitivity analysis which distinguishes ignorance due to unintended incompleteness from imprecision due to finite sampling by design. They combine both sources of variation into uncertainty. This article develops estimation tools for ignorance and uncertainty concerning regression coefficients in a complete data model when some of the intended outcome values are missing. Exhaustive enumeration of all possible imputations for the missing data requires enormous computational resources. In contrast, when the boundary of the occupied region is of greatest interest, reasonable computational efficiency may be achieved via the imputation towards directional extremes (IDE) algorithm. This is a special imputation method designed to mark the boundary of the region by maximizing the direction of change of the complete data estimator caused by perturbations to the imputed outcomes. For multi-dimensional parameters, a dimension reduction approach is considered. Additional insights are obtained by considering structures within the region, and by introducing external knowledge to narrow the boundary to useful proportions. Special properties hold for the generalized linear model. Examples from a Kenyan HIV study will illustrate the points.  相似文献   

12.
A first systematic attempt to use data containing missing values in data envelopment analysis (DEA) is presented. It is formally shown that allowing missing values into the data set can only improve estimation of the best-practice frontier. Technically, DEA can automatically exclude the missing data from the analysis if blank data entries are coded by appropriate numerical values.  相似文献   

13.
This article deals with the comparison of technical efficiency results through various stochastic frontier analysis models. Effects of model type, possible frontier shift, distribution of inefficiency term, different output variable, and estimator selection were explored. For this purpose, aggregated annual data within the EU construction sector from 2000 to 2015 were used for the efficiency estimation. The resulting efficiency values were compared by correlation coefficients. Among others, it has been shown that estimator selection may strongly affect efficiency estimate for particular models.  相似文献   

14.
Quantile regression for robust bank efficiency score estimation   总被引:1,自引:0,他引:1  
We discuss quantile regression techniques as a robust and easy to implement alternative for estimating Farell technical efficiency scores. The quantile regression approach estimates the production process for benchmark banks located at top conditional quantiles. Monte Carlo simulations reveal that even when generating data according to the assumptions of the stochastic frontier model (SFA), efficiency estimates obtained from quantile regressions resemble SFA-efficiency estimates. We apply the SFA and the quantile regression approach to German bank data for three banking groups, commercial banks, savings banks and cooperative banks to estimate efficiency scores based on a simple value added function and a multiple-input–multiple-output cost function. The results reveal that the efficient (benchmark) banks have production and cost elasticities which differ considerably from elasticities obtained from conditional mean functions and stochastic frontier functions.  相似文献   

15.
Deterministic models of technical efficiency assume that all deviations from the production frontier are due to inefficiency. Critics argue that no allowance is made for measurement error and other statistical noise so that the resulting efficiency measure will be contaminated. The stochastic frontier model is an alternative that allows both inefficiency and measurement error. Advocates argue that the stochastic frontier models should be used despite other potential limitations because of the superior conceptual treatment of noise. As will be demonstrated in this paper, however, the assumed shape of the error distributions is used to identify a key production function parameter. Therefore, the stochastic frontier models, like the deterministic models, cannot produce absolute measures of efficiency. Moreover, we show that rankings for firm-specific inefficiency estimates produced by traditional stochastic frontier models do not change from the rankings of the composed errors. As a result, the performance of the deterministic models is qualitatively similar to that of the stochastic frontier models.  相似文献   

16.
Critics of the deterministic approach to efficiency measurement argue that no allowance is made for measurement error and other statistical noise. Without controlling for measurement error, the resulting measure of efficiency will be distorted due to the contamination of noise. The stochastic frontier models purportedly allow both inefficiency and measurement error. Some proponents argue that the stochastic frontier models should be used despite the limitations because of the superior conceptual treatment of noise. However, the ultimate value of the stochastic frontier depends on its ability to properly decompose noise and inefficiency. This paper tests the validity of the stochastic frontier cross-sectional models using a Monte Carlo analysis. The results suggest that the technique does not accurately decompose the total error into inefficiency and noise components. Further, the results suggest that at best, the stochastic frontier is only as good as the deterministic model.  相似文献   

17.
In many applications, some covariates could be missing for various reasons. Regression quantiles could be either biased or under-powered when ignoring the missing data. Multiple imputation and EM-based augment approach have been proposed to fully utilize the data with missing covariates for quantile regression. Both methods however are computationally expensive. We propose a fast imputation algorithm (FI) to handle the missing covariates in quantile regression, which is an extension of the fractional imputation in likelihood based regressions. FI and modified imputation algorithms (FIIPW and MIIPW) are compared to existing MI and IPW approaches in the simulation studies, and applied to part of of the National Collaborative Perinatal Project study.  相似文献   

18.
This paper deals with the issue of estimating production frontier and measuring efficiency from a panel data set. First, it proposes an alternate method for the estimation of a production frontier on a short panel data set. The method is based on the so-called mean-and-covariance structure analysis which is closely related to the generalized method of moments. One advantage of the method is that it allows us to investigate the presence of correlations between individual effects and exogenous variables without the requirement of some available instruments uncorrelated with the individual effects as in instrumental variable estimation. Another advantage is that the method is well suited to a panel data set with a short number of periods. Second, the paper considers the question of recovering individual efficiency levels from the estimates obtained from the mean-and-covariance structure analysis. Since individual effects are here viewed as latent variables, they can be estimated as factor scores, i.e., weighted sums of the observed variables. We illustrate the proposed methods with the estimation of a stochastic production frontier on a short panel data of French fruit growers.  相似文献   

19.
In data analysis problems where the data are represented by vectors of real numbers, it is often the case that some of the data-points will have “missing values”, meaning that one or more of the entries of the vector that describes the data-point is not observed. In this paper, we propose a new approach to the imputation of missing binary values. The technique we introduce employs a “similarity measure” introduced by Anthony and Hammer (2006) [1]. We compare experimentally the performance of our technique with ones based on the usual Hamming distance measure and multiple imputation.  相似文献   

20.
Summary  The main purpose of this paper is a comparison of several imputation methods within the simple additive modelty =f(x) + ε where the independent variableX is affected by missing completely at random. Besides the well-known complete case analysis, mean imputation plus random noise, single imputation and two kinds of nearest neighbor imputations are used. A short introduction to the model, the missing mechanism, the inference, the imputation methods and their implementation is followed by the main focus—the simulation experiment. The methods are compared within the experiment based on the sample mean squared error, estimated variances and estimated biases off(x) at the knots.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号