首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Given a large set of potential features, it is usually necessary to find a small subset with which to classify. The task of finding an optimal feature set is inherently combinatoric and therefore suboptimal algorithms are typically used to find feature sets. If feature selection is based directly on classification error, then a feature-selection algorithm must base its decision on error estimates. This paper addresses the impact of error estimation on feature selection using two performance measures: comparison of the true error of the optimal feature set with the true error of the feature set found by a feature-selection algorithm, and the number of features among the truly optimal feature set that appear in the feature set found by the algorithm. The study considers seven error estimators applied to three standard suboptimal feature-selection algorithms and exhaustive search, and it considers three different feature-label model distributions. It draws two conclusions for the cases considered: (1) depending on the sample size and the classification rule, feature-selection algorithms can produce feature sets whose corresponding classifiers possess errors far in excess of the classifier corresponding to the optimal feature set; and (2) for small samples, differences in performances among the feature-selection algorithms are less significant than performance differences among the error estimators used to implement the algorithms. Moreover, keeping in mind that results depend on the particular classifier-distribution pair, for the error estimators considered in this study, bootstrap and bolstered resubstitution usually outperform cross-validation, and bolstered resubstitution usually performs as well as or better than bootstrap.  相似文献   

2.
Estimation of Hurst exponent revisited   总被引:1,自引:0,他引:1  
In order to estimate the Hurst exponent of long-range dependent time series numerous estimators such as based e.g. on rescaled range statistic (R/S) or detrended fluctuation analysis (DFA) are traditionally employed. Motivated by empirical behaviour of the bias of R/S estimator, its bias-corrected version is proposed. It has smaller mean squared error than DFA and behaves comparably to wavelet estimator for traces of size as large as 215 drawn from some commonly considered long-range dependent processes. It is also shown that several variants of R/S and DFA estimators are possible depending on the way they are defined and that they differ greatly in their performance.  相似文献   

3.
4.
Relaxed Lasso     
The Lasso is an attractive regularisation method for high-dimensional regression. It combines variable selection with an efficient computational procedure. However, the rate of convergence of the Lasso is slow for some sparse high-dimensional data, where the number of predictor variables is growing fast with the number of observations. Moreover, many noise variables are selected if the estimator is chosen by cross-validation. It is shown that the contradicting demands of an efficient computational procedure and fast convergence rates of the ?2-loss can be overcome by a two-stage procedure, termed the relaxed Lasso. For orthogonal designs, the relaxed Lasso provides a continuum of solutions that include both soft- and hard-thresholding of estimators. The relaxed Lasso solutions include all regular Lasso solutions and computation of all relaxed Lasso solutions is often identically expensive as computing all regular Lasso solutions. Theoretical and numerical results demonstrate that the relaxed Lasso produces sparser models with equal or lower prediction loss than the regular Lasso estimator for high-dimensional data.  相似文献   

5.
This article is about testing the equality of several normal means when the variances are unknown and arbitrary, i.e., the set up of the one-way ANOVA. Even though several tests are available in the literature, none of them perform well in terms of Type I error probability under various sample size and parameter combinations. In fact, Type I errors can be highly inflated for some of the commonly used tests; a serious issue that appears to have been overlooked. We propose a parametric bootstrap (PB) approach and compare it with three existing location-scale invariant tests—the Welch test, the James test and the generalized F (GF) test. The Type I error rates and powers of the tests are evaluated using Monte Carlo simulation. Our studies show that the PB test is the best among the four tests with respect to Type I error rates. The PB test performs very satisfactorily even for small samples while the Welch test and the GF test exhibit poor Type I error properties when the sample sizes are small and/or the number of means to be compared is moderate to large. The James test performs better than the Welch test and the GF test. It is also noted that the same tests can be used to test the significance of the random effect variance component in a one-way random model under unequal error variances. Such models are widely used to analyze data from inter-laboratory studies. The methods are illustrated using some examples.  相似文献   

6.
In most pattern recognition (PR) applications, it is advantageous if the accuracy (or error rate) of the classifier can be evaluated or bounded prior to testing it in a real-life setting. It is also well known that if the two class-conditional distributions have a large overlapping volume (almost all the available work on “overlapping of classes” deals with the case when there are only two classes), the classification accuracy is poor. This is because if we intend to use the classification accuracy as a criterion for evaluating a PR system, the points within the overlapping volume tend to lead to maximal misclassification. Unfortunately, the computation of the indices which quantify the overlapping volume is expensive. In this vein, we propose a strategy of using a prototype reduction scheme (PRS) to approximately, but quickly, compute the latter. In this paper, we demonstrate, first of all, that this is an extremely expedient proposition. Indeed, we show that by completely discarding (we are not aware of any reported scheme which discards “irrelevant” sample (training) points, and which simultaneously attains to an almost-comparable accuracy) the points not included by the PRS, we can obtain a reduced set of sample points, using which, in turn, the measures for the overlapping volume can be computed. The value of the corresponding figures is comparable to those obtained with the original training set (i.e., the one which considers all the data points) even though the computations required to obtain the prototypes and the corresponding measures are significantly less. The proposed method has been rigorously tested on artificial and real-life datasets, and the results obtained are, in our opinion, quite impressive—sometimes faster by two orders of magnitude.  相似文献   

7.
For high order interpolations at both end points of two rational Bézier curves, we introduce the concept of C(v,u)-continuity and give a matrix expression of a necessary and sufficient condition for satisfying it. Then we propose three new algorithms, in a unified approach, for the degree reduction of Bézier curves, approximating rational Bézier curves by Bézier curves and the degree reduction of rational Bézier curves respectively; all are in L2 norm and C(v,u)-continuity is satisfied. The algorithms for the first and second problems can get the best approximation results, and for the third one, resorting to the steepest descent method in numerical optimization obtains a series of degree reduced curves iteratively with decreasing approximation errors. Compared to some well-known algorithms for the degree reduction of rational Bézier curves, such as the uniformizing weights algorithm, canceling the best linear common divisor algorithm and shifted Chebyshev polynomials algorithm, the new one presented here can give a better approximation error, do multiple degrees of reduction at a time and preserve high order interpolations at both end points.  相似文献   

8.
Although there have been many researches on cluster analysis considering feature (or variable) weights, little effort has been made regarding sample weights in clustering. In practice, not every sample in a data set has the same importance in cluster analysis. Therefore, it is interesting to obtain the proper sample weights for clustering a data set. In this paper, we consider a probability distribution over a data set to represent its sample weights. We then apply the maximum entropy principle to automatically compute these sample weights for clustering. Such method can generate the sample-weighted versions of most clustering algorithms, such as k-means, fuzzy c-means (FCM) and expectation & maximization (EM), etc. The proposed sample-weighted clustering algorithms will be robust for data sets with noise and outliers. Furthermore, we also analyze the convergence properties of the proposed algorithms. This study also uses some numerical data and real data sets for demonstration and comparison. Experimental results and comparisons actually demonstrate that the proposed sample-weighted clustering algorithms are effective and robust clustering methods.  相似文献   

9.
A Bayesian approach with an iterative reweighted least squares is used to incorporate historical control information into quantal bioassays to estimate the dose-response relationship, where the logit of the historical control responses are assumed to have a normal distribution. The parameters from this normal distribution are estimated from both empirical and full Bayesian approaches with a marginal likelihood function being approximated by Laplace’s Method. A comparison is made using real data between estimates that include the historical control information and those that do not. It was found that the inclusion of the historical control information improves the efficiency of the estimators. In addition, this logit-normal formulation is compared with the traditional beta-binomial for its improvement in parameter estimates. Consequently the estimated dose-response relationship is used to formulate the point estimator and confidence bands for ED(100p) for various values of risk rate p and the potency for any dose level.  相似文献   

10.
The Voronoi diagram of a point set has been extensively used in various disciplines ever since it was first proposed. Its application realms have been even further extended to estimate the shape of point clouds when Edelsbrunner and Mücke introduced the concept of α-shape based on the Delaunay triangulation of a point set.In this paper, we present the theory of β-shape for a set of three-dimensional spheres as the generalization of the well-known α-shape for a set of points. The proposed β-shape fully accounts for the size differences among spheres and therefore it is more appropriate for the efficient and correct solution for applications in biological systems such as proteins.Once the Voronoi diagram of spheres is given, the corresponding β-shape can be efficiently constructed and various geometric computations on the sphere complex can be efficiently and correctly performed. It turns out that many important problems in biological systems such as proteins can be easily solved via the Voronoi diagram of atoms in proteins and β-shapes transformed from the Voronoi diagram.  相似文献   

11.
Consider the situation where the Structuration des Tableaux à Trois Indices de la Statistique (STATIS) methodology is applied to a series of studies, each study being represented by data and weight matrices. Relations between studies may be captured by the Hilbert-Schmidt product of these matrices. Specifically, the eigenvalues and eigenvectors of the Hilbert-Schmidt matrix S may be used to obtain a geometrical representation of the studies. The studies in a series may further be considered to have a common structure whenever their corresponding points lie along the first axis. The matrix S can be expressed as the sum of a rank 1 matrix λuuT with an error matrix E. Therefore, the components of the vector are sufficient to locate the points associated to the studies. Former models for S where vec(E) are mathematically tractable and yet do not take into account the symmetry of the matrix S. Thus a new symmetric model is proposed as well as the corresponding tests for a common structure. It is further shown how to assess the goodness of fit of such models. An application to the human immunodeficiency virus (HIV) infection is used for assessing the proposed model.  相似文献   

12.
In conjoint experiments, each respondent receives a set of profiles to rate. Sometimes, the profiles are expensive prototypes that respondents have to test before rating them. Designing these experiments involves determining how many and which profiles each respondent has to rate and how many respondents are needed. To that end, the set of profiles offered to a respondent is treated as a separate block in the design and a random respondent effect is used in the model because profile ratings from the same respondent are correlated. Optimal conjoint designs are then obtained by means of an adapted version of an algorithm for finding D-optimal split-plot designs. A key feature of the design construction algorithm is that it returns the optimal number of respondents and the optimal number of profiles each respondent has to evaluate for a given number of profiles. The properties of the optimal designs are described in detail and some practical recommendations are given.  相似文献   

13.
14.
We show that for some special functions (called k-multigrid equidistributed functions), we can compute the limit of the frequency of patterns in the discretization of their graph, when the resolution tends to zero. This result is applied to parabolas. We deduce also that local length estimators almost never converge to the length for the parabolas.  相似文献   

15.
We propose a general method for error estimation that displays low variance and generally low bias as well. This method is based on “bolstering” the original empirical distribution of the data. It has a direct geometric interpretation and can be easily applied to any classification rule and any number of classes. This method can be used to improve the performance of any error-counting estimation method, such as resubstitution and all cross-validation estimators, particularly in small-sample settings. We point out some similarities shared by our method with a previously proposed technique, known as smoothed error estimation. In some important cases, such as a linear classification rule with a Gaussian bolstering kernel, the integrals in the bolstered error estimate can be computed exactly. In the general case, the bolstered error estimate may be computed by Monte-Carlo sampling; however, our experiments show that a very small number of Monte-Carlo samples is needed. This results in a fast error estimator, which is in contrast to other resampling techniques, such as the bootstrap. We provide an extensive simulation study comparing the proposed method with resubstitution, cross-validation, and bootstrap error estimation, for three popular classification rules (linear discriminant analysis, k-nearest-neighbor, and decision trees), using several sample sizes, from small to moderate. The results indicate the proposed method vastly improves on resubstitution and cross-validation, especially for small samples, in terms of bias and variance. In that respect, it is competitive with, and in many occasions superior to, bootstrap error estimation, while being tens to hundreds of times faster. We provide a companion web site, which contains: (1) the complete set of tables and plots regarding the simulation study, and (2) C source code used to implement the bolstered error estimators proposed in this paper, as part of a larger library for classification and error estimation, with full documentation and examples. The companion web site can be accessed at the URL http://ee.tamu.edu/~edward/bolster.  相似文献   

16.
In this paper, the concepts of Janowski functions and the conic regions are combined to define a new domain which represents the conic type regions. Different views of this modified conic domain for specific values are shown graphically for better understanding of the behavior of this domain. The class of such functions which map the open unit disk E onto this modified conic domain is defined. Also the classes of k-uniformly Janowski convex and k-Janowski starlike functions are defined and their coefficient inequalities are formulated. The coefficient bound for a certain class of analytic functions, proved by Owa et al. (2006) in [16], has also been improved.  相似文献   

17.
In this paper, the optimal strategies for discrete-time linear system quadratic zero-sum games related to the H-infinity optimal control problem are solved in forward time without knowing the system dynamical matrices. The idea is to solve for an action dependent value function Q(x,u,w) of the zero-sum game instead of solving for the state dependent value function V(x) which satisfies a corresponding game algebraic Riccati equation (GARE). Since the state and actions spaces are continuous, two action networks and one critic network are used that are adaptively tuned in forward time using adaptive critic methods. The result is a Q-learning approximate dynamic programming (ADP) model-free approach that solves the zero-sum game forward in time. It is shown that the critic converges to the game value function and the action networks converge to the Nash equilibrium of the game. Proofs of convergence of the algorithm are shown. It is proven that the algorithm ends up to be a model-free iterative algorithm to solve the GARE of the linear quadratic discrete-time zero-sum game. The effectiveness of this method is shown by performing an H-infinity control autopilot design for an F-16 aircraft.  相似文献   

18.
We study a scheduling problem with rejection on a set of two machines in a flow-shop scheduling system. We evaluate the quality of a solution by two criteria: the first is the makespan and the second is the total rejection cost. We show that the problem of minimizing the makespan plus total rejection cost is NP-hard and for its solution we provide two different approximation algorithms, a pseudo-polynomial time optimization algorithm and a fully polynomial time approximation scheme (FPTAS). We also study the problem of finding the entire set of Pareto-optimal points (this problem is NP-hard due to the NP-hardness of the same problem variation on a single machine [20]). We show that this problem can be solved in pseudo-polynomial time. Moreover, we show how we can provide an FPTAS that, given that there exists a Pareto optimal schedule with a total rejection cost of at most R and a makespan of at most K, finds a solution with a total rejection cost of at most (1+?)R and a makespan value of at most (1+?)K. This is done by defining a set of auxiliary problems and providing an FPTAS algorithm to each one of them.  相似文献   

19.
Discrete classification problems abound in pattern recognition and data mining applications. One of the most common discrete rules is the discrete histogram rule. This paper presents exact formulas for the computation of bias, variance, and RMS of the resubstitution and leave-one-out error estimators, for the discrete histogram rule. We also describe an algorithm to compute the exact probability distribution of resubstitution and leave-one-out, as well as their deviations from the true error rate. Using a parametric Zipf model, we compute the exact performance of resubstitution and leave-one-out, for varying expected true error, number of samples, and classifier complexity (number of bins). We compare this to approximate performance measures-computed by Monte-Carlo sampling—of 10-repeated 4-fold cross-validation and the 0.632 bootstrap error estimator. Our results show that resubstitution is low-biased but much less variable than leave-one-out, and is effectively the superior error estimator between the two, provided classifier complexity is low. In addition, our results indicate that the overall performance of resubstitution, as measured by the RMS, can be substantially better than the 10-repeated 4-fold cross-validation estimator, and even comparable to the 0.632 bootstrap estimator, provided that classifier complexity is low and the expected error rates are moderate. In addition to the results discussed in the paper, we provide an extensive set of plots that can be accessed on a companion website, at the URL http://ee.tamu.edu/edward/exact_discrete.  相似文献   

20.
A string-based negative selection algorithm is an immune-inspired classifier that infers a partitioning of a string space Σ? into “normal” and “anomalous” partitions from a training set S containing only samples from the “normal” partition. The algorithm generates a set of patterns, called “detectors”, to cover regions of the string space containing none of the training samples. Strings that match at least one of these detectors are then classified as “anomalous”. A major problem with existing implementations of this approach is that the detector generating step needs exponential time in the worst case. Here we show that for the two most widely used kinds of detectors, the r-chunk and r-contiguous detectors based on partial matching to substrings of length r, negative selection can be implemented more efficiently by avoiding generating detectors altogether: for each detector type, training set SΣ? and parameter r? one can construct an automaton whose acceptance behaviour is equivalent to the algorithm’s classification outcome. The resulting runtime is O(|S|?r|Σ|) for constructing the automaton in the training phase and O(?) for classifying a string.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号