首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The k-Nearest Neighbor (k-NN) technique has become extremely popular for a variety of forest inventory mapping and estimation applications. Much of this popularity may be attributed to the non-parametric, multivariate features of the technique, its intuitiveness, and its ease of use. When used with satellite imagery and forest inventory plot data, the technique has been shown to produce useful estimates of many forest attributes including forest/non-forest, volume, and basal area. However, variance estimators for quantifying the uncertainty of means or sums of k-NN pixel-level predictions for areas of interest (AOI) consisting of multiple pixels have not been reported. The primary objectives of the study were to derive variance estimators for AOI estimates obtained from k-NN predictions and to compare precision estimates resulting from different approaches to k-NN prediction and different interpretations of those predictions. The approaches were illustrated by estimating proportion forest area, tree volume per unit area, tree basal area per unit area, and tree density per unit area for 10-km AOIs. Estimates obtained using k-NN approaches and traditional inventory approaches were compared and found to be similar. Further, variance estimates based on different interpretations of k-NN predictions were similar. The results facilitate small area estimation and simultaneous and consistent mapping and estimation of multiple forest attributes.  相似文献   

2.
In this paper, the k-NN approach is used for the purpose of estimating the multiclass, 1-NN Bayes error bounds. We derive an estimator which is asymptotically unbiased, and whose variance can be controlled by the choice of k. The estimator appears to be very economic in its use of samples, and quite stable even in very small sample cases.  相似文献   

3.
Though the k-nearest neighbor (k-NN) pattern classifier is an effective learning algorithm, it can result in large model sizes. To compensate, a number of variant algorithms have been developed that condense the model size of the k-NN classifier at the expense of accuracy. To increase the accuracy of these condensed models, we present a direct boosting algorithm for the k-NN classifier that creates an ensemble of models with locally modified distance weighting. An empirical study conducted on 10 standard databases from the UCI repository shows that this new Boosted k-NN algorithm has increased generalization accuracy in the majority of the datasets and never performs worse than standard k-NN.  相似文献   

4.
Intrusion detection is a necessary step to identify unusual access or attacks to secure internal networks. In general, intrusion detection can be approached by machine learning techniques. In literature, advanced techniques by hybrid learning or ensemble methods have been considered, and related work has shown that they are superior to the models using single machine learning techniques. This paper proposes a hybrid learning model based on the triangle area based nearest neighbors (TANN) in order to detect attacks more effectively. In TANN, the k-means clustering is firstly used to obtain cluster centers corresponding to the attack classes, respectively. Then, the triangle area by two cluster centers with one data from the given dataset is calculated and formed a new feature signature of the data. Finally, the k-NN classifier is used to classify similar attacks based on the new feature represented by triangle areas. By using KDD-Cup ’99 as the simulation dataset, the experimental results show that TANN can effectively detect intrusion attacks and provide higher accuracy and detection rates, and the lower false alarm rate than three baseline models based on support vector machines, k-NN, and the hybrid centroid-based classification model by combining k-means and k-NN.  相似文献   

5.
In a small case study of mixed hardwood Hyrcanian forests of Iran, three non-parametric methods, namely k-nearest neighbour (k-NN), support vector machine regression (SVR) and tree regression based on random forest (RF), were used in plot-level estimation of volume/ha, basal area/ha and stems/ha using field inventory and Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) data. Relevant pre-processing and processing steps were applied to the ASTER data for geometric and atmospheric correction and for enhancing quantitative forest parameters. After collecting terrestrial information on trees in the 101 samples, the volume, basal area and tree number per hectare were calculated in each plot. In the k-NN implementation using different distance measures and k, the cross-validation method was used to find the best distance measure and optimal k. In SVR, the best regularized parameters of four kernel types were obtained using leave-one-out cross-validation. RF was implemented using a bootstrap learning method with regularized parameters for decision tree model and stopping. The validity of performances was examined using unused test samples by absolute and relative root mean square error (RMSE) and bias metrics. In volume/ha estimation, the results showed that all the three algorithms had similar performances. However, SVR and RF produced better results than k-NN with relative RMSE values of 28.54, 25.86 and 26.86 (m3 ha–1), respectively, using k-NN, SVR and RF algorithms, but RF could generate unbiased estimation. In basal area/ha and stems/ha estimation, the implementation results of RF showed that RF was slightly superior in relative RMSE (18.39, 20.64) to SVR (19.35, 22.09) and k-NN (20.20, 21.53), but k-NN could generate unbiased estimation compared with the other two algorithms used.  相似文献   

6.
This study was part of an interdisciplinary research project on soil carbon and phytomass dynamics of boreal and arctic permafrost landscapes. The 45 ha study area was a catchment located in the forest tundra in northern Siberia, approximately 100 km north of the Arctic Circle.The objective of this study was to estimate aboveground carbon (AGC) and assess and model its spatial variability. We combined multi-spectral high resolution remote sensing imagery and sample based field inventory data by means of the k-nearest neighbor (k-NN) technique and linear regression.Field data was collected by stratified systematic sampling in August 2006 with a total sample size of n = 31 circular nested sample plots of 154 m2 for trees and shrubs and 1 m2 for ground vegetation. Destructive biomass samples were taken on a sub-sample for fresh weight and moisture content. Species-specific allometric biomass models were constructed to predict dry biomass from diameter at breast height (dbh) for trees and from elliptic projection areas for shrubs.Quickbird data (standard imagery product), acquired shortly before the field campaign and archived ASTER data (Level-1B product) of 2001 were geo-referenced, converted to calibrated radiances at sensor and used as carrier data. Spectral information of the pixels which were located in the inventory plots were extracted and analyzed as reference set. Stepwise multiple linear regression was applied to identify suitable predictors from the set of variables of the original satellite bands, vegetation indices and texture metrics. To produce thematic carbon maps, carbon values were predicted for all pixels of the investigated satellite scenes. For this prediction, we compared the kNN distance-weighted classifier and multiple linear regression with respect to their predictions.The estimated mean value of aboveground carbon from stratified sampling in the field is 15.3 t/ha (standard error SE = 1.50 t/ha, SE% = 9.8%). Zonal prediction from the k-NN method for the Quickbird image as carrier is 14.7 t/ha with a root mean square error RMSE = 6.42 t/ha, RMSEr = 44%) resulting from leave-one-out cross-validation. The k-NN-approach allows mapping and analysis of the spatial variability of AGC. The results show high spatial variability with AGC predictions ranging from 4.3 t/ha to 28.8 t/ha, reflecting the highly heterogeneous conditions in those permafrost-influenced landscapes. The means and totals of linear regression and k-NN predictions revealed only small differences but some regional distinctions were recognized in the maps.  相似文献   

7.
The k-nearest neighbors classifier is one of the most widely used methods of classification due to several interesting features, such as good generalization and easy implementation. Although simple, it is usually able to match, and even beat, more sophisticated and complex methods. However, no successful method has been reported so far to apply boosting to k-NN. As boosting methods have proved very effective in improving the generalization capabilities of many classification algorithms, proposing an appropriate application of boosting to k-nearest neighbors is of great interest.Ensemble methods rely on the instability of the classifiers to improve their performance, as k-NN is fairly stable with respect to resampling, these methods fail in their attempt to improve the performance of k-NN classifier. On the other hand, k-NN is very sensitive to input selection. In this way, ensembles based on subspace methods are able to improve the performance of single k-NN classifiers. In this paper we make use of the sensitivity of k-NN to input space for developing two methods for boosting k-NN. The two approaches modify the view of the data that each classifier receives so that the accurate classification of difficult instances is favored.The two approaches are compared with the classifier alone and bagging and random subspace methods with a marked and significant improvement of the generalization error. The comparison is performed using a large test set of 45 problems from the UCI Machine Learning Repository. A further study on noise tolerance shows that the proposed methods are less affected by class label noise than the standard methods.  相似文献   

8.
This paper addresses the problem of reinforcing the ability of the k-NN classification of handwritten characters via distortion-tolerant template matching techniques with a limited quantity of data. We compare three kinds of matching techniques: the conventional simple correlation, the tangent distance, and the global affine transformation (GAT) correlation. Although the k-NN classification method is straightforward and powerful, it consumes a lot of time. Therefore, to reduce the computational cost of matching in k-NN classification, we propose accelerating the GAT correlation method by reformulating its computational model and adopting efficient lookup tables. Recognition experiments performed on the IPTP CDROM1B handwritten numerical database show that the matching techniques of the simple correlation, the tangent distance, and the accelerated GAT correlation achieved recognition rates of 97.07%, 97.50%, and 98.70%, respectively. The computation time ratios of the tangent distance and the accelerated GAT correlation to the simple correlation are 26.3 and 36.5 to 1.0, respectively.  相似文献   

9.
The problem of estimating the error probability of a given classification system is considered. Statistical properties of the empirical error count (C) and the average conditional error (R) estimators are studied. It is shown that in the large sample case the R estimator is unbiased and its variance is less than that of the C estimator. In contrast to conventional methods of Bayes error estimation the unbiasedness of the R estimator for a given classifier can be obtained only at the price of an additional set of classified samples. On small test sets the R estimator may be subject to a pessimistic bias caused by the averaging phenomenon characterizing the functioning of conditional error estimators.  相似文献   

10.
For a linear multilevel model with 2 levels, with equal numbers of level-1 units per level-2 unit and a random intercept only, different empirical Bayes estimators of the random intercept are examined. Studied are the classical empirical Bayes estimator, the Morris version of the empirical Bayes estimator and Rao's estimator. It is unclear which of these estimators performs best in terms of Bayes risk. Of these three, the Rao estimator is optimal in case the covariance matrix of random coefficients may be negative definite. However, in the multilevel model this matrix is restricted to be positive semi-definite. The Morris version, replaces the weights of the empirical Bayes estimator by unbiased estimates. This correction, however, is based on known level-1 variances, which in many empirical settings are unknown. A fourth estimator is proposed, a variant of Rao's estimator which restricts the estimated covariance matrix of random coefficients to be positive semi-definite. Since there are no closed-form expressions for estimators involved in the empirical Bayes estimators (except for the Rao estimator), Monte Carlo simulations are done to evaluate the performance of these different empirical Bayes estimators. Only for small sample sizes there are clear differences between these estimators. As a consequence, for larger sample sizes the formula for the Bayes risk of the Rao estimator can be used to calculate the Bayes risk for the other estimators proposed.  相似文献   

11.
Matrix models are often used to model the dynamics of age-structured or size-structured populations. The Usher model is an important particular case that relies on the following hypothesis: between time steps t and t+1, individuals either remain in the same class, move up to the following class, or die. There are then two ways of handling data that do not meet this condition: either remove them prior to data analysis or rectify them. These two ways correspond to two estimators of transition parameters. The former, which corresponds to the classical estimator, is obtained from the latter by a data trimming. The two estimators of transition parameters are compared on the basis of their robustness in order to obtain a criterion of choice between the two estimators. The influence curve of both estimators is first computed, then their gross sensitivity and their asymptotic variance. The untrimmed estimator is more robust than the classical one. Its asymptotic variance can be lower or greater than that of the classical estimator depending on the boundaries used for data trimming. The results are applied to a tropical rain forest in French Guiana, with a discussion on the role of the class width.  相似文献   

12.
k-nearest neighbor (k-NN) classification is a well-known decision rule that is widely used in pattern classification. However, the traditional implementation of this method is computationally expensive. In this paper we develop two effective techniques, namely, template condensing and preprocessing, to significantly speed up k-NN classification while maintaining the level of accuracy. Our template condensing technique aims at “sparsifying” dense homogeneous clusters of prototypes of any single class. This is implemented by iteratively eliminating patterns which exhibit high attractive capacities. Our preprocessing technique filters a large portion of prototypes which are unlikely to match against the unknown pattern. This again accelerates the classification procedure considerably, especially in cases where the dimensionality of the feature space is high. One of our case studies shows that the incorporation of these two techniques to k-NN rule achieves a seven-fold speed-up without sacrificing accuracy.  相似文献   

13.
Chao Sima 《Pattern recognition》2006,39(9):1763-1780
A cross-validation error estimator is obtained by repeatedly leaving out some data points, deriving classifiers on the remaining points, computing errors for these classifiers on the left-out points, and then averaging these errors. The 0.632 bootstrap estimator is obtained by averaging the errors of classifiers designed from points drawn with replacement and then taking a convex combination of this “zero bootstrap” error with the resubstitution error for the designed classifier. This gives a convex combination of the low-biased resubstitution and the high-biased zero bootstrap. Another convex error estimator suggested in the literature is the unweighted average of resubstitution and cross-validation. This paper treats the following question: Given a feature-label distribution and classification rule, what is the optimal convex combination of two error estimators, i.e. what are the optimal weights for the convex combination. This problem is considered by finding the weights to minimize the MSE of a convex estimator. It also considers optimality under the constraint that the resulting estimator be unbiased. Owing to the large amount of results coming from the various feature-label models and error estimators, a portion of the results are presented herein and the main body of results appears on a companion website. In the tabulated results, each table treats the classification rules considered for the model, various Bayes errors, and various sample sizes. Each table includes the optimal weights, mean errors and standard deviations for the relevant error measures, and the MSE and MAE for the optimal convex estimator. Many observations can be made by considering the full set of experiments. Some general trends are outlined in the paper. The general conclusion is that optimizing the weights of a convex estimator can provide substantial improvement, depending on the classification rule, data model, sample size and component estimators. Optimal convex bootstrap estimators are applied to feature-set ranking to illustrate their potential advantage over non-optimized convex estimators.  相似文献   

14.
Estimation of Hurst exponent revisited   总被引:1,自引:0,他引:1  
In order to estimate the Hurst exponent of long-range dependent time series numerous estimators such as based e.g. on rescaled range statistic (R/S) or detrended fluctuation analysis (DFA) are traditionally employed. Motivated by empirical behaviour of the bias of R/S estimator, its bias-corrected version is proposed. It has smaller mean squared error than DFA and behaves comparably to wavelet estimator for traces of size as large as 215 drawn from some commonly considered long-range dependent processes. It is also shown that several variants of R/S and DFA estimators are possible depending on the way they are defined and that they differ greatly in their performance.  相似文献   

15.
A practical problem related to the estimation of quantiles in double sampling with arbitrary sampling designs in each of the two phases is investigated. In practice, this scheme is commonly used for official surveys, in which quantile estimation is often required when the investigation deals with variables such as income or expenditure. A class of estimators for quantiles is proposed and some important properties, such as asymptotic unbiasedness and asymptotic variance, are established. The optimal estimator, in the sense of minimizing the asymptotic variance, is also presented. The proposed class contains several known types of estimators, such as ratio and regression estimators, which are of practical application and are therefore derived. Assuming several populations, the proposed estimators are compared with the direct estimator via an empirical study. Results show that a gain in efficiency can be obtained.  相似文献   

16.
A practical problem related to the estimation of quantiles in double sampling with arbitrary sampling designs in each of the two phases is investigated. In practice, this scheme is commonly used for official surveys, in which quantile estimation is often required when the investigation deals with variables such as income or expenditure. A class of estimators for quantiles is proposed and some important properties, such as asymptotic unbiasedness and asymptotic variance, are established. The optimal estimator, in the sense of minimizing the asymptotic variance, is also presented. The proposed class contains several known types of estimators, such as ratio and regression estimators, which are of practical application and are therefore derived. Assuming several populations, the proposed estimators are compared with the direct estimator via an empirical study. Results show that a gain in efficiency can be obtained.  相似文献   

17.
New model-based estimators of the uncertainty of pixel-level and areal k-nearest neighbour (knn) predictions of attribute Y from remotely-sensed ancillary data X are presented. Non-parametric functions predict Y from scalar ‘Single Index Model’ transformations of X. Variance functions generated estimates of the variance of Y. Three case studies, with data from the Forest Inventory and Analysis program of the U.S. Forest Service, the Finnish National Forest Inventory, and Landsat ETM+ ancillary data, demonstrate applications of the proposed estimators. Nearly unbiased knn predictions of three forest attributes were obtained. Estimates of mean square error indicate that knn is an attractive technique for integrating remotely-sensed and ground data for the provision of forest attribute maps and areal predictions.  相似文献   

18.
Risk difference (RD) has played an important role in a lot of biological and epidemiological investigations to compare the risks of developing certain disease or tumor for two drugs or treatments. When the disease is rare and acute, inverse sampling (rather than binomial sampling) is usually recommended to collect the binary outcomes. In this paper, we derive an asymptotic confidence interval estimator for RD based on the score statistic. To compare its performance with three existing confidence interval estimators, we employ Monte Carlo simulation to evaluate their coverage probabilities, expected confidence interval widths, and the mean difference of the coverage probabilities from the nominal confidence level. Our simulation results suggest that the score-test-based confidence interval estimator is generally more appealing than the Wald, uniformly minimum variance unbiased estimator and likelihood ratio confidence interval estimators for it maintains the coverage probability close to the desired confidence level and yields the shortest expected width in most cases. We illustrate these confidence interval construction methods with real data sets from a drug comparison study and a congenital heart disease study.  相似文献   

19.
In this communication, sample measures of kurtosis adapted by various software packages are compared for data from normal and non-normal populations. Further, two improved estimators of population kurtosis are proposed and their performance is compared with the currently used measures. The suggested estimators have considerably lower mean squared error (MSE) for various sampling designs in our simulation study. Two empirical examples are given to illustrate the usefulness of suggested estimators in practice.  相似文献   

20.
We consider the problem of using sampling to estimate the result of an aggregation operation over a subset-based SQL query, where a subquery is correlated to an outer query by a NOT EXISTS, NOT IN, EXISTS or IN clause. We design an unbiased estimator for our query and prove that it is indeed unbiased. We then provide a second, biased estimator that makes use of the superpopulation concept from statistics to minimize the mean squared error of the resulting estimate. The two estimators are tested over an extensive set of experiments. Material in this paper is based upon work supported by the National Science Foundation via grants 0347408 and 0612170.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号