首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Statistical learning is emerging as a promising field where a number of algorithms from machine learning are interpreted as statistical methods and vice-versa. Due to good practical performance, boosting is one of the most studied machine learning techniques. We propose algorithms for multivariate density estimation and classification. They are generated by using the traditional kernel techniques as weak learners in boosting algorithms. Our algorithms take the form of multistep estimators, whose first step is a standard kernel method. Some strategies for bandwidth selection are also discussed with regard both to the standard kernel density classification problem, and to our 'boosted' kernel methods. Extensive experiments, using real and simulated data, show an encouraging practical relevance of the findings. Standard kernel methods are often outperformed by the first boosting iterations and in correspondence of several bandwidth values. In addition, the practical effectiveness of our classification algorithm is confirmed by a comparative study on two real datasets, the competitors being trees including AdaBoosting with trees.  相似文献   

2.
As conventional cross-validation bandwidth selection methods do not work properly in the situation where the data are serially dependent time series, alternative bandwidth selection methods are necessary. In recent years, Bayesian-based methods for global bandwidth selection have been studied. Our experience shows that a global bandwidth is however less suitable than a localized bandwidth in kernel density estimation based on serially dependent time series data. Nonetheless, a di?cult issue is how we can consistently estimate a localized bandwidth. This paper presents a nonparametric localized bandwidth estimator, for which we establish a completely new asymptotic theory. Applications of this new bandwidth estimator to the kernel density estimation of Eurodollar deposit rate and the S&P 500 daily return demonstrate the effectiveness and competitiveness of the proposed localized bandwidth.  相似文献   

3.
In the context of estimating local modes of a conditional density based on kernel density estimators, we show that existing bandwidth selection methods developed for kernel density estimation are unsuitable for mode estimation. We propose two methods to select bandwidths tailored for mode estimation in the regression setting . Numerical studies using synthetic data and a real-life dataset are carried out to demonstrate the performance of the proposed methods in comparison with several well-received bandwidth selection methods for density estimation.  相似文献   

4.
A crucial problem in kernel density estimates of a probability density function is the selection of the bandwidth. The aim of this study is to propose a procedure for selecting both fixed and variable bandwidths. The present study also addresses the question of how different variable bandwidth kernel estimators perform in comparison with each other and to the fixed type of bandwidth estimators. The appropriate algorithms for implementation of the proposed method are given along with a numerical simulation.The numerical results serve as a guide to determine which bandwidth selection method is most appropriate for a given type of estimator over a vide class of probability density functions, Also, we obtain a numerical comparison of the different types of kernel estimators under various types of bandwidths.  相似文献   

5.
A bandwidth selection method that combines the concept of least-squares cross-validation and the plug-in approach is being introduced in connection with kernel density estimation. A simulation study reveals that this hybrid methodology outperforms some commonly used bandwidth selection rules. It is shown that the proposed approach can also be readily employed in the context of variable kernel density estimation. We conclude with two illustrative examples.  相似文献   

6.
Kernel Density Estimation on a Linear Network   总被引:1,自引:0,他引:1       下载免费PDF全文
This paper develops a statistically principled approach to kernel density estimation on a network of lines, such as a road network. Existing heuristic techniques are reviewed, and their weaknesses are identified. The correct analogue of the Gaussian kernel is the ‘heat kernel’, the occupation density of Brownian motion on the network. The corresponding kernel estimator satisfies the classical time‐dependent heat equation on the network. This ‘diffusion estimator’ has good statistical properties that follow from the heat equation. It is mathematically similar to an existing heuristic technique, in that both can be expressed as sums over paths in the network. However, the diffusion estimate is an infinite sum, which cannot be evaluated using existing algorithms. Instead, the diffusion estimate can be computed rapidly by numerically solving the time‐dependent heat equation on the network. This also enables bandwidth selection using cross‐validation. The diffusion estimate with automatically selected bandwidth is demonstrated on road accident data.  相似文献   

7.
Integrated squared density derivatives are important to the plug-in type of bandwidth selector for kernel density estimation. Conventional estimators of these quantities are inefficient when there is a non-smooth boundary in the support of the density. We introduce estimators that utilize density derivative estimators obtained from local polynomial fitting. They retain the rates of convergence in mean-squared error that are familiar from non-boundary cases, and the constant coefficients have similar forms. The estimators and the formula for their asymptotically optimal bandwidths, which depend on integrated products of density derivatives, are applied to automatic bandwidth selection for local linear density estimation. Simulation studies show that the constructed bandwidth rule and the Sheather–Jones bandwidth are competitive in non-boundary cases, but the former overcomes boundary problems whereas the latter does not.  相似文献   

8.
In Kernel density estimation, a criticism of bandwidth selection techniques which minimize squared error expressions is that they perform poorly when estimating tails of probability density functions. Techniques minimizing absolute error expressions are thought to result in more uniform performance and be potentially superior. An asympotic mean absolute error expression for nonparametric kernel density estimators from right-censored data is developed here. This expression is used to obtain local and global bandwidths that are optimal in the sense that they minimize asymptotic mean absolute error and integrated asymptotic mean absolute error, respectively. These estimators are illustrated fro eight data sets from known distributions. Computer simulation results are discussed, comparing the estimation methods with squared-error-based bandwidth selection for right-censored data.  相似文献   

9.
The problem of selecting the bandwidth for optimal kernel density estimation at a point is considered. A class of local bandwidth selectors which minimize smoothed bootstrap estimates of mean-squared error in density estimation is introduced. It is proved that the bandwidth selectors in the class achieve optimal relative rates of convergence, dependent upon the local smoothness of the target density. Practical implementation of the bandwidth selection methodology is discussed. The use of Gaussian-based kernels to facilitate computation of the smoothed bootstrap estimate of mean-squared error is proposed. The performance of the bandwidth selectors is investigated empirically.  相似文献   

10.
This paper demonstrates that cross-validation (CV) and Bayesian adaptive bandwidth selection can be applied in the estimation of associated kernel discrete functions. This idea is originally proposed by Brewer [A Bayesian model for local smoothing in kernel density estimation, Stat. Comput. 10 (2000), pp. 299–309] to derive variable bandwidths in adaptive kernel density estimation. Our approach considers the adaptive binomial kernel estimator and treats the variable bandwidths as parameters with beta prior distribution. The best variable bandwidth selector is estimated by the posterior mean in the Bayesian sense under squared error loss. Monte Carlo simulations are conducted to examine the performance of the proposed Bayesian adaptive approach in comparison with the performance of the Asymptotic mean integrated squared error estimator and CV technique for selecting a global (fixed) bandwidth proposed in Kokonendji and Senga Kiessé [Discrete associated kernels method and extensions, Stat. Methodol. 8 (2011), pp. 497–516]. The Bayesian adaptive bandwidth estimator performs better than the global bandwidth, in particular for small and moderate sample sizes.  相似文献   

11.
We propose a modification to the regular kernel density estimation method that use asymmetric kernels to circumvent the spill over problem for densities with positive support. First a pivoting method is introduced for placement of the data relative to the kernel function. This yields a strongly consistent density estimator that integrates to one for each fixed bandwidth in contrast to most density estimators based on asymmetric kernels proposed in the literature. Then a data-driven Bayesian local bandwidth selection method is presented and lognormal, gamma, Weibull and inverse Gaussian kernels are discussed as useful special cases. Simulation results and a real-data example illustrate the advantages of the new methodology.  相似文献   

12.
Length-biased data are a particular case of weighted data, which arise in many situations: biomedicine, quality control or epidemiology among others. In this paper we study the theoretical properties of kernel density estimation in the context of length-biased data, proposing two consistent bootstrap methods that we use for bandwidth selection. Apart from the bootstrap bandwidth selectors we suggest a rule-of-thumb. These bandwidth selection proposals are compared with a least-squares cross-validation method. A simulation study is accomplished to understand the behaviour of the procedures in finite samples.  相似文献   

13.
Abstract

In this work, we propose beta prime kernel estimator for estimation of a probability density functions defined with nonnegative support. For the proposed estimator, beta prime probability density function used as a kernel. It is free of boundary bias and nonnegative with a natural varying shape. We obtained the optimal rate of convergence for the mean squared error (MSE) and the mean integrated squared error (MISE). Also, we use adaptive Bayesian bandwidth selection method with Lindley approximation for heavy tailed distributions and compare its performance with the global least squares cross-validation bandwidth selection method. Simulation studies are performed to evaluate the average integrated squared error (ISE) of the proposed kernel estimator against some asymmetric competitors using Monte Carlo simulations. Moreover, real data sets are presented to illustrate the findings.  相似文献   

14.
Kernel smoothing of spatial point data can often be improved using an adaptive, spatially varying bandwidth instead of a fixed bandwidth. However, computation with a varying bandwidth is much more demanding, especially when edge correction and bandwidth selection are involved. This paper proposes several new computational methods for adaptive kernel estimation from spatial point pattern data. A key idea is that a variable-bandwidth kernel estimator for d-dimensional spatial data can be represented as a slice of a fixed-bandwidth kernel estimator in \((d+1)\)-dimensional scale space, enabling fast computation using Fourier transforms. Edge correction factors have a similar representation. Different values of global bandwidth correspond to different slices of the scale space, so that bandwidth selection is greatly accelerated. Potential applications include estimation of multivariate probability density and spatial or spatiotemporal point process intensity, relative risk, and regression functions. The new methods perform well in simulations and in two real applications concerning the spatial epidemiology of primary biliary cirrhosis and the alarm calls of capuchin monkeys.  相似文献   

15.
Bandwidth selection is an important problem of kernel density estimation. Traditional simple and quick bandwidth selectors usually oversmooth the density estimate. Existing sophisticated selectors usually have computational difficulties and occasionally do not exist. Besides, they may not be robust against outliers in the sample data, and some are highly variable, tending to undersmooth the density. In this paper, a highly robust simple and quick bandwidth selector is proposed, which adapts to different types of densities.  相似文献   

16.
Abstract.  The problem of choosing the bandwidth h for kernel density estimation is considered. All the plug-in-type bandwidth selection methods require the use of a pilot bandwidth g . The usual way to make an h -dependent choice of g is by obtaining their asymptotic expressions separately and solving the two equations. In contrast, we obtain the asymptotically optimal value of g for every fixed h , thus making our selection 'less asymptotic'. Exact error expressions show that some usually assumed hypotheses have to be discarded in the asymptotic study in this case. Two versions of a new bandwidth selector based on this idea are proposed, and their properties are analysed through theoretical results and a simulation study.  相似文献   

17.
A great deal of research has focused on improving the bias properties of kernel estimators. One proposal involves removing the restriction of non-negativity on the kernel to construct “higher-order” kernels that eliminate additional terms in the Taylor's series expansion of the bias. This paper considers an alternative that uses a local approach to bandwidth selection to not only reduce the bias, but to eliminate it entirely. These so-called “zero-bias bandwidths” are shown to exist for univariate and multivariate kernel density estimation as well as kernel regression. Implications of the existence of such bandwidths are discussed. An estimation strategy is presented, and the extent of the reduction or elimination of bias in practice is studied through simulation and example.  相似文献   

18.
Abstract.  The performance of multivariate kernel density estimates depends crucially on the choice of bandwidth matrix, but progress towards developing good bandwidth matrix selectors has been relatively slow. In particular, previous studies of cross-validation (CV) methods have been restricted to biased and unbiased CV selection of diagonal bandwidth matrices. However, for certain types of target density the use of full (i.e. unconstrained) bandwidth matrices offers the potential for significantly improved density estimation. In this paper, we generalize earlier work from diagonal to full bandwidth matrices, and develop a smooth cross-validation (SCV) methodology for multivariate data. We consider optimization of the SCV technique with respect to a pilot bandwidth matrix. All the CV methods are studied using asymptotic analysis, simulation experiments and real data analysis. The results suggest that SCV for full bandwidth matrices is the most reliable of the CV methods. We also observe that experience from the univariate setting can sometimes be a misleading guide for understanding bandwidth selection in the multivariate case.  相似文献   

19.
This paper studies bandwidth selection for kernel estimation of derivatives of multidimensional conditional densities, a non-parametric realm unexplored in the literature. This paper extends Baird [Cross validation bandwidth selection for derivatives of multidimensional densities. RAND Working Paper series, WR-1060; 2014] in its examination of conditional multivariate densities, derives and presents criteria for arbitrary kernel order and density dimension, shows consistency of the estimators, and investigates a minimization criterion which jointly estimates numerator and denominator bandwidths. I conduct a Monte Carlo simulation study for various orders of kernels in the Gaussian family and compare the new cross validation criterion with those implied by Baird [Cross validation bandwidth selection for derivatives of multidimensional densities. RAND Working Paper series, WR-1060; 2014]. The paper finds that higher order kernels become increasingly important as the dimension of the distribution increases. I find that the cross validation criterion developed in this paper that jointly estimates the derivative of the joint density (numerator) and the marginal density (denominator) does orders of magnitude better than criteria that estimate the bandwidths separately. I further find that using the infinite order Dirichlet kernel tends to have the best results.  相似文献   

20.
A data-driven bandwidth choice for a kernel density estimator called critical bandwidth is investigated. This procedure allows the estimation to have as many modes as assumed for the density to estimate. Both Gaussian and uniform kernels are considered. For the Gaussian kernel, asymptotic results are given. For the uniform kernel, an argument against these properties is mentioned. These theoretical results are illustrated with a simulation study that compares the kernel estimators that rely on critical bandwidth with another one that uses a plug-in method to select its bandwidth. An estimator that consists in estimates of density contour clusters and takes assumptions on number of modes into account is also considered. Finally, the methodology is illustrated using environment monitoring data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号