首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
This paper is concerned with density estimation based on the stagewise minimization of the U-divergence. The U-divergence is a general divergence measure involving a convex function U which includes the Kullback-Leibler divergence and the L 2 norm as special cases. The algorithm to yield the density estimator is closely related to the boosting algorithm and it is shown that the usual kernel density estimator can also be seen as a special case of the proposed estimator. Non-asymptotic error bounds of the proposed estimators are developed and numerical experiments show that the proposed estimators often perform better than several existing methods for density estimation.  相似文献   

2.
It is well known now that the minimum Hellinger distance estimation approach introduced by Beran (Beran, R., 1977. Minimum Hellinger distance estimators for parametric models. Ann. Statist. 5, 445-463) produces estimators that achieve efficiency at the model density and simultaneously have excellent robustness properties. However, computational difficulties and algorithmic convergence problems associated with this method have hampered its application in practice, particularly when the method is applied to models with high-dimensional parameter spaces. A one-step minimum Hellinger distance (MHD) procedure is investigated in this paper to overcome computational drawbacks of the fully iterative MHD method. The idea is to start with an initial estimator, and then iterate the Newton-Raphson equation once related to the Hellinger distance. The resulting estimator can be considered a one-step MHD estimator. We show that the proposed one-step MHD estimator has the same asymptotic behavior as the MHD estimator, as long as the initial estimators are reasonably good. Furthermore, our theoretical and numerical studies also demonstrate that the proposed one-step MHD estimator also retains excellent robustness properties of the MHD estimators. A real data example is analyzed as well.  相似文献   

3.
The estimation of density functions for positive multivariate data is discussed. The proposed approach is semiparametric. The estimator combines gamma kernels or local linear kernels, also called boundary kernels, for the estimation of the marginal densities with parametric copulas to model the dependence. This semiparametric approach is robust both to the well-known boundary bias problem and the curse of dimensionality problem. Mean integrated squared error properties, including the rate of convergence, the uniform strong consistency and the asymptotic normality are derived. A simulation study investigates the finite sample performance of the estimator. The proposed estimator performs very well, also for data without boundary bias problems. For bandwidths choice in practice, the univariate least squares cross validation method for the bandwidth of the marginal density estimators is investigated. Applications in the field of finance are provided.  相似文献   

4.
The estimators most widely used to evaluate the prediction error of a non-linear regression model are examined. An extensive simulation approach allowed the comparison of the performance of these estimators for different non-parametric methods, and with varying signal-to-noise ratio and sample size. Estimators based on resampling methods such as Leave-one-out, parametric and non-parametric Bootstrap, as well as repeated Cross Validation methods and Hold-out, were considered. The methods used are Regression Trees, Projection Pursuit Regression and Neural Networks. The repeated-corrected 10-fold Cross-Validation estimator and the Parametric Bootstrap estimator obtained the best performance in the simulations.  相似文献   

5.
A simultaneously efficient and robust approach for distribution-free parametric inference, called the simulated minimum Hellinger distance (SMHD) estimator, is proposed. In the SMHD estimation, the Hellinger distance between the nonparametrically estimated density of the observed data and that of the simulated samples from the model is minimized. The method is applicable to the situation where the closed-form expression of the model density is intractable but simulating random variables from the model is possible. The robustness of the SMHD estimator is equivalent to the minimum Hellinger distance estimator. The finite sample efficiency of the proposed methodology is found to be comparable to the Bayesian Markov chain Monte Carlo and maximum likelihood Monte Carlo methods and outperform the efficient method of moments estimators. The robustness of the method to a stochastic volatility model is demonstrated by a simulation study. An empirical application to the weekly observations of foreign exchange rates is presented.  相似文献   

6.
Penalized B-splines combined with the composite link model are used to estimate a bivariate density from a histogram with wide bins. The goals are multiple: they include the visualization of the dependence between the two variates, but also the estimation of derived quantities like Kendall’s tau, conditional moments and quantiles. Two strategies are proposed: the first one is semiparametric with flexible margins modeled using B-splines and a parametric copula for the dependence structure; the second one is nonparametric and is based on Kronecker products of the marginal B-spline bases. Frequentist and Bayesian estimations are described. A large simulation study quantifies the performances of the two methods under different dependence structures and for varying strengths of dependence, sample sizes and amounts of grouping. It suggests that Schwarz’s BIC is a good tool for classifying the competing models. The density estimates are used to evaluate conditional quantiles in two applications in social and in medical sciences.  相似文献   

7.
The problem of clustering time series is studied for a general class of non-parametric autoregressive models. The dissimilarity between two time series is based on comparing their full forecast densities at a given horizon. In particular, two functional distances are considered: L1 and L2. As the forecast densities are unknown, they are approximated using a bootstrap procedure that mimics the underlying generating processes without assuming any parametric model for the true autoregressive structure of the series. The estimated forecast densities are then used to construct the dissimilarity matrix and hence to perform clustering. Asymptotic properties of the proposed method are provided and an extensive simulation study is carried out. The results show the good behavior of the procedure for a wide variety of nonlinear autoregressive models and its robustness to non-Gaussian innovations. Finally, the proposed methodology is applied to a real dataset involving economic time series.  相似文献   

8.
The importance of suitable distance measures between intuitionistic fuzzy sets (IFSs) arises because of the role they play in the inference problem. A concept closely related to one of distance measures is a divergence measure based on the idea of information-theoretic entropy that was first introduced in communication theory by Shannon (1949). It is known that J-divergence is an important family of divergences. In this paper, we construct J-divergence between IFSs. The proposed J-divergence can induce some useful distance and similarity measures between IFSs. Numerical examples demonstrate that the proposed measures perform well in clustering and pattern recognition.  相似文献   

9.
GeD spline estimation of multivariate Archimedean copulas   总被引:1,自引:0,他引:1  
A new multivariate Archimedean copula estimation method is proposed in a non-parametric setting. The method uses the so-called Geometrically Designed splines (GeD splines) to represent the cdf of a random variable Wθ, obtained through the probability integral transform of an Archimedean copula with parameter θ. Sufficient conditions for the GeD spline estimator to possess the properties of the underlying theoretical cdf, K(θ,t), of Wθ, are given. The latter conditions allow for defining a three-step estimation procedure for solving the resulting non-linear regression problem with linear inequality constraints. In the proposed procedure, finding the number and location of the knots and the coefficients of the unconstrained GeD spline estimator and solving the constraint least-squares optimisation problem are separated. Thus, the resulting spline estimator is used to recover the generator and the related Archimedean copula by solving an ordinary differential equation. The proposed method is truly multivariate, it brings about numerical efficiency and as a result can be applied with large volumes of data and for dimensions d≥2, as illustrated by the numerical examples presented.  相似文献   

10.
Density-weighted averaged derivative estimator gives a computationally convenient consistent and asymptotically normally (CAN) distributed estimate of the parametric component of a semiparametric single index model. This model includes some important parametric models as special cases such as linear regression, Logit/Probit, Tobit and Box–Cox and other transformation models. This estimator involves a nonparametric kernel density estimate and thus it faces the problem of bandwidth selection. A reasonable way of bandwidth selection for point estimation is one minimizing the mean squared error. Alternatively, for the purposes of hypothesis testing and confidence interval estimation, we may like to choose it such that it minimizes the normal approximation error. The purpose of this paper is to propose a new bandwidth suitable for these purposes by minimizing the normal approximation error in the tail of exact distribution of the statistics using higher order asymptotic theory of Edgeworth expansion or bootstrap method.  相似文献   

11.
For finite mixtures, consistent estimation of unknown number of components, called mixture complexity, is considered based on a random sample of counts, when the exact form of component probability mass functions are unknown but are postulated to belong to some parametric family. Following a recent approach of Woo and Sriram [2006. Robust estimation of mixture complexity. J. Amer. Statist. Assoc., to appear.], we develop an estimator of mixture complexity as a by-product of minimizing a Hellinger information criterion, when all the parameters associated with the mixture model are unknown. The estimator is shown to be consistent. Monte Carlo simulations illustrate the ability of our estimator to correctly determine the mixture complexity when the postulated Poisson mixture model is correct. When the postulated model is a Poisson mixture but the data comes from a negative binomial mixture with moderate to more extreme overdispersion in one of its components, simulation results show that our estimator continues to perform well. These confirm the efficiency of the estimator when the model is correctly specified and the robustness when the model is incorrectly specified. A count dataset with overdispersion and possible zero inflation is analyzed to further illustrate the ability of our estimator to determine the number of components.  相似文献   

12.
There have been many changes in statistical theory in the past 30 years, including increased evidence that non-robust methods may fail to detect important results. The statistical advice available to software engineering researchers needs to be updated to address these issues. This paper aims both to explain the new results in the area of robust analysis methods and to provide a large-scale worked example of the new methods. We summarise the results of analyses of the Type 1 error efficiency and power of standard parametric and non-parametric statistical tests when applied to non-normal data sets. We identify parametric and non-parametric methods that are robust to non-normality. We present an analysis of a large-scale software engineering experiment to illustrate their use. We illustrate the use of kernel density plots, and parametric and non-parametric methods using four different software engineering data sets. We explain why the methods are necessary and the rationale for selecting a specific analysis. We suggest using kernel density plots rather than box plots to visualise data distributions. For parametric analysis, we recommend trimmed means, which can support reliable tests of the differences between the central location of two or more samples. When the distribution of the data differs among groups, or we have ordinal scale data, we recommend non-parametric methods such as Cliff’s δ or a robust rank-based ANOVA-like method.  相似文献   

13.
We consider multivariate density estimation with identically distributed observations. We study a density estimator which is a convex combination of functions in a dictionary and the convex combination is chosen by minimizing the L 2 empirical risk in a stagewise manner. We derive the convergence rates of the estimator when the estimated density belongs to the L 2 closure of the convex hull of a class of functions which satisfies entropy conditions. The L 2 closure of a convex hull is a large non-parametric class but under suitable entropy conditions the convergence rates of the estimator do not depend on the dimension, and density estimation is feasible also in high dimensional cases. The variance of the estimator does not increase when the number of components of the estimator increases. Instead, we control the bias-variance trade-off by the choice of the dictionary from which the components are chosen. Editor: Nicolo Cesa-Bianchi  相似文献   

14.
The present paper deals with the problem of fault detection in highly coupled large-scale industrial systems typically operating in noisy environments. The detection algorithm is based on a non-parametric approximation of a modified Kullback–Leibler divergence. Since the dissimilarity between multidimensional probability densities is usually quantified with a scalar quantity that belongs to the f-divergence family, the problem of multidimensional process monitoring can be reduced to such a simpler task where any deviation from the normal operation can be detected in this one dimensional signal. With the modified Kullback–Leibler distance faults can be directly detected without normality assumption or the joint monitoring of related test statistics in different subspaces. However, due to the strong dependence of such metrics on asymptotic density estimates, which is inherently a difficult and cumbersome task especially in high-dimensional problems which may lead to unreliable measures, an alternative way to tackle the problem is to estimate the ratio of densities rather than the densities themselves. The objective of this work is to exploit such a new approach in detecting abnormalities in real industrial systems and confirm its applicability using the industrial benchmark of Tennessee Eastman process.  相似文献   

15.
In random effects meta-analysis, an overall effect is estimated using a weighted mean, with weights based on estimated marginal variances. The variance of the overall effect is often estimated using the inverse of the sum of the estimated weights, and inference about the overall effect is typically conducted using this ‘usual’ variance estimator, which is not robust to errors in the estimated marginal variances. In this paper, robust estimation for the asymptotic variance of a weighted overall effect estimate is explored by considering a robust variance estimator in comparison with the usual variance estimator and another less frequently used estimator, a weighted version of the sample variance. Three illustrative examples are presented to demonstrate and compare the three estimation methods. Furthermore, a simulation study is conducted to assess the robustness of the three variance estimators using estimated weights. The simulation results show that the robust variance estimator and the weighted sample variance estimator both estimate the variance of an overall effect more accurately than the usual variance estimator when the weights are imprecise due to the use of estimated marginal variances, as is typically the case in practice.Therefore, we argue that inference about an overall effect should be based on the robust variance estimator or the weighted sample variance, which provide protection against the practice of using estimated weights in meta-analytical inference.  相似文献   

16.
A new multivariate density estimator suitable for pattern classifier design is proposed. The data are first transformed so that the pattern vector components with the most non-Gaussian structure are separated from the Gaussian components. Nonparametric density estimation is then used to capture the non-Gaussian structure of the data while parametric Gaussian conditional density estimation is applied to the rest of the components. Both simulated and real data sets are used to demonstrate the potential usefulness of the proposed approach.  相似文献   

17.
A crucial part in an adaptive control system is the estimation of the unknown parameters of the process. The estimation is often done using a Kalman filter or an Extended Kalman filter. These estimators give good results if the parameters are not varying too fast. When the parameters are varying fast there are difficulties for the estimator to follow the variations.This paper outlines a new approach to the estimation problem. The new estimator consists of two parts. One conventional Kalman filter for fine estimation and one estimator for coarse estimation. The coarse estimator consists of a finite number of fixed a priori models and a decision mechanism which points out the model which best fits the data.The paper describes the two-level estimator and discusses its properties. Some numerical examples illustrate the behavior of the estimator.  相似文献   

18.
Modelling of changes in atmospheric radiation within the last decade is considered. First, vertical atmospheric radiation profiles are considered as a sample of functional variables and the dependence on time is estimated by a non-parametric regression (kernel smoothing). As a main result, parametric functional multiplicative regression models are provided. In particular, non-periodic models are motivated in a straightforward way by the observed data, while periodic proposals respect a hypothetical relation between atmospheric radiation and the 11-years solar cycle. Finally, some remarks on computational aspects and the choice of a suitable function space are given.  相似文献   

19.
Stoica, P., and Ganesan, G., Linear Regression Constrained to a Ball, Digital Signal Processing11 (2001), 80–90.A worst case lower bound (WCLB) result obtained by Nemirovskii suggests that a potentially significant estimation accuracy enhancement may be achieved provided the true parameter vector is known to belong to a ball. In this paper we discuss the many facets and implications of Nemirovskiirs result by using linear regression as a vehicle for illustration. In particular, we address briefly such issues as biased versus unbiased estimation, minimax optimal estimation, tightness of the WCLB, and comparison of WCLB with the performance of the least squares estimator constrained to the ball and that of the linear minimax estimator.  相似文献   

20.
The unknown error density of a nonparametric regression model is approximated by a mixture of Gaussian densities with means being the individual error realizations and variance a constant parameter. Such a mixture density has the form of a kernel density estimator of error realizations. An approximate likelihood and posterior for bandwidth parameters in the kernel-form error density and the Nadaraya–Watson regression estimator are derived, and a sampling algorithm is developed. A simulation study shows that when the true error density is non-Gaussian, the kernel-form error density is often favored against its parametric counterparts including the correct error density assumption. The proposed approach is demonstrated through a nonparametric regression model of the Australian All Ordinaries daily return on the overnight FTSE and S&P 500 returns. With the estimated bandwidths, the one-day-ahead posterior predictive density of the All Ordinaries return is derived, and a distribution-free value-at-risk is obtained. The proposed algorithm is also applied to a nonparametric regression model involved in state-price density estimation based on S&P 500 options data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号