首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In most pattern recognition (PR) applications, it is advantageous if the accuracy (or error rate) of the classifier can be evaluated or bounded prior to testing it in a real-life setting. It is also well known that if the two class-conditional distributions have a large overlapping volume (almost all the available work on “overlapping of classes” deals with the case when there are only two classes), the classification accuracy is poor. This is because if we intend to use the classification accuracy as a criterion for evaluating a PR system, the points within the overlapping volume tend to lead to maximal misclassification. Unfortunately, the computation of the indices which quantify the overlapping volume is expensive. In this vein, we propose a strategy of using a prototype reduction scheme (PRS) to approximately, but quickly, compute the latter. In this paper, we demonstrate, first of all, that this is an extremely expedient proposition. Indeed, we show that by completely discarding (we are not aware of any reported scheme which discards “irrelevant” sample (training) points, and which simultaneously attains to an almost-comparable accuracy) the points not included by the PRS, we can obtain a reduced set of sample points, using which, in turn, the measures for the overlapping volume can be computed. The value of the corresponding figures is comparable to those obtained with the original training set (i.e., the one which considers all the data points) even though the computations required to obtain the prototypes and the corresponding measures are significantly less. The proposed method has been rigorously tested on artificial and real-life datasets, and the results obtained are, in our opinion, quite impressive—sometimes faster by two orders of magnitude.  相似文献   

2.
The subspace method of pattern recognition is a classification technique in which pattern classes are specified in terms of linear subspaces spanned by their respective class-based basis vectors. To overcome the limitations of the linear methods, kernel-based nonlinear subspace (KNS) methods have been recently proposed in the literature. In KNS, the kernel principal component analysis (kPCA) has been employed to get principal components, not in an input space, but in a high-dimensional space, where the components of the space are nonlinearly related to the input variables. The length of projections onto the basis vectors in the kPCA are computed using a kernel matrix K, whose dimension is equivalent to the number of sample data points. Clearly this is problematic, especially, for large data sets.In this paper, we suggest a computationally superior mechanism to solve the problem. Rather than define the matrix K with the whole data set and compute the principal components, we propose that the data be reduced into a smaller representative subset using a prototype reduction scheme (PRS). Since a PRS has the capability of extracting vectors that satisfactorily represent the global distribution structure, we demonstrate that data points which are ineffective in the classification can be eliminated to obtain a reduced kernel matrix, K, without degrading the performance. Our experimental results demonstrate that the proposed mechanism dramatically reduces the computation time without sacrificing the classification accuracy for samples involving real-life data sets as well as artificial data sets. The results especially demonstrate the computational advantage for large data sets, such as those involved in data mining and text categorization applications.  相似文献   

3.
Various prototype reduction schemes have been reported in the literature. Foremost among these are the prototypes for nearest neighbor (PNN), the vector quantization (VQ), and the support vector machines (SVM) methods. In this paper, we shall show that these schemes can be enhanced by the introduction of a post-processing phase that is related, but not identical to, the LVQ3 process. Although the post-processing with LVQ3 has been reported for the SOM and the basic VQ methods, in this paper, we shall show that an analogous philosophy can be used in conjunction with the SVM and PNN rules. Our essential modification to LVQ3 first entails a partitioning of the respective training sets into two sets called the Placement set and the Optimizing set, which are instrumental in determining the LVQ3 parameters. Such a partitioning is novel to the literature. Our experimental results demonstrate that the proposed enhancement yields the best reported prototype condensation scheme to-date for both artificial data sets, and for samples involving real-life data sets.  相似文献   

4.
5.
The aim of this paper is to present a strategy by which a new philosophy for pattern classification, namely that pertaining to dissimilarity-based classifiers (DBCs), can be efficiently implemented. This methodology, proposed by Duin and his co-authors (see Refs. [Experiments with a featureless approach to pattern recognition, Pattern Recognition Lett. 18 (1997) 1159-1166; Relational discriminant analysis, Pattern Recognition Lett. 20 (1999) 1175-1181; Dissimilarity representations allow for buillding good classifiers, Pattern Recognition Lett. 23 (2002) 943-956; Dissimilarity representations in pattern recognition, Concepts, theory and applications, Ph.D. Thesis, Delft University of Technology, Delft, The Netherlands, 2005; Prototype selection for dissimilarity-based classifiers, Pattern Recognition 39 (2006) 189-208]), is a way of defining classifiers between the classes, and is not based on the feature measurements of the individual patterns, but rather on a suitable dissimilarity measure between them. The advantage of this methodology is that since it does not operate on the class-conditional distributions, the accuracy can exceed the Bayes’ error bound. The problem with this strategy is, however, the need to compute, store and process the inter-pattern dissimilarities for all the training samples, and thus, the accuracy of the classifier designed in the dissimilarity space is dependent on the methods used to achieve this. In this paper, we suggest a novel strategy to enhance the computation for all families of DBCs. Rather than compute, store and process the DBC based on the entire data set, we advocate that the training set be first reduced into a smaller representative subset. Also, rather than determine this subset on the basis of random selection, or clustering, etc., we advocate the use of a prototype reduction scheme (PRS), whose output yields the points to be utilized by the DBC. The rationale for this is explained in the paper. Apart from utilizing PRSs, in the paper we also propose simultaneously employing the Mahalanobis distance as the dissimilarity-measurement criterion to increase the DBCs classification accuracy. Our experimental results demonstrate that the proposed mechanism increases the classification accuracy when compared with the “conventional” approaches for samples involving real-life as well as artificial data sets—even though the resulting dissimilarity criterion is not symmetric.  相似文献   

6.
This paper concerns the use of prototype reduction schemes (PRS) to optimize the computations involved in typical k-nearest neighbor (k-NN) rules. These rules have been successfully used for decades in statistical pattern recognition (PR) [1], [15] applications and are particularly effective for density estimation, classification, and regression because of the known error bounds that they possess. For a given data point of unknown identity, the k-NN possesses the phenomenon that it combines the information about the samples from a priori target classes (values) of selected neighbors to predict the target class of the tested sample, or to estimate the density function value of the given queried sample. Recently, an implementation of the k-NN, named as the locally linear reconstruction (LLR) [2], has been proposed. The salient and brilliant feature of the latter is that by invoking a quadratic optimization process, it is capable of systematically setting model parameters, such as the number of neighbors (specified by the parameter, k) and the weights. However, the LLR takes more time than other conventional methods when it has to be applied to classification tasks. To overcome this problem, we propose a strategy of using a PRS to efficiently compute the optimization problem. In this paper, we demonstrate, first of all, that by completely discarding the points not included by the PRS, we can obtain a reduced set of sample points, using which, in turn, the quadratic optimization problem can be computed far more expediently. The values of the corresponding indices are comparable to those obtained with the original training set (i.e., the one which considers all the data points) even though the computations required to obtain the prototypes and the corresponding classification accuracies are noticeably less. The proposed method has been tested on artificial and real-life data sets, and the results obtained are very promising, and could have potential in PR applications.  相似文献   

7.
MGRS: A multi-granulation rough set   总被引:4,自引:0,他引:4  
The original rough set model was developed by Pawlak, which is mainly concerned with the approximation of sets described by a single binary relation on the universe. In the view of granular computing, the classical rough set theory is established through a single granulation. This paper extends Pawlak’s rough set model to a multi-granulation rough set model (MGRS), where the set approximations are defined by using multi equivalence relations on the universe. A number of important properties of MGRS are obtained. It is shown that some of the properties of Pawlak’s rough set theory are special instances of those of MGRS.Moreover, several important measures, such as accuracy measureα, quality of approximationγ and precision of approximationπ, are presented, which are re-interpreted in terms of a classic measure based on sets, the Marczewski-Steinhaus metric and the inclusion degree measure. A concept of approximation reduct is introduced to describe the smallest attribute subset that preserves the lower approximation and upper approximation of all decision classes in MGRS as well. Finally, we discuss how to extract decision rules using MGRS. Unlike the decision rules (“AND” rules) from Pawlak’s rough set model, the form of decision rules in MGRS is “OR”. Several pivotal algorithms are also designed, which are helpful for applying this theory to practical issues. The multi-granulation rough set model provides an effective approach for problem solving in the context of multi granulations.  相似文献   

8.
Most of the prototype reduction schemes (PRS), which have been reported in the literature, process the data in its entirety to yield a subset of prototypes that are useful in nearest-neighbor-like classification. Foremost among these are the prototypes for nearest neighbor classifiers, the vector quantization technique, and the support vector machines. These methods suffer from a major disadvantage, namely, that of the excessive computational burden encountered by processing all the data. In this paper, we suggest a recursive and computationally superior mechanism referred to as adaptive recursive partitioning (ARP)_PRS. Rather than process all the data using a PRS, we propose that the data be recursively subdivided into smaller subsets. This recursive subdivision can be arbitrary, and need not utilize any underlying clustering philosophy. The advantage of ARP_PRS is that the PRS processes subsets of data points that effectively sample the entire space to yield smaller subsets of prototypes. These prototypes are then, in turn, gathered and processed by the PRS to yield more refined prototypes. In this manner, prototypes which are in the interior of the Voronoi spaces, and thus ineffective in the classification, are eliminated at the subsequent invocations of the PRS. We are unaware of any PRS that employs such a recursive philosophy. Although we marginally forfeit accuracy in return for computational efficiency, our experimental results demonstrate that the proposed recursive mechanism yields classification comparable to the best reported prototype condensation schemes reported to-date. Indeed, this is true for both artificial data sets and for samples involving real-life data sets. The results especially demonstrate that a fair computational advantage can be obtained by using such a recursive strategy for "large" data sets, such as those involved in data mining and text categorization applications.  相似文献   

9.
In this paper, a Scale-orientation histogram is defined for analyzing the “directionality” and “periodicity”, which are two of the most important deterministic dimensions in human texture perception. This histogram is applied to texture retrieval in a case study, and the experimental results illustrate its effectiveness.  相似文献   

10.
Model selection for support vector machines via uniform design   总被引:2,自引:0,他引:2  
The problem of choosing a good parameter setting for a better generalization performance in a learning task is the so-called model selection. A nested uniform design (UD) methodology is proposed for efficient, robust and automatic model selection for support vector machines (SVMs). The proposed method is applied to select the candidate set of parameter combinations and carry out a k-fold cross-validation to evaluate the generalization performance of each parameter combination. In contrast to conventional exhaustive grid search, this method can be treated as a deterministic analog of random search. It can dramatically cut down the number of parameter trials and also provide the flexibility to adjust the candidate set size under computational time constraint. The key theoretic advantage of the UD model selection over the grid search is that the UD points are “far more uniform”and “far more space filling” than lattice grid points. The better uniformity and space-filling phenomena make the UD selection scheme more efficient by avoiding wasteful function evaluations of close-by patterns. The proposed method is evaluated on different learning tasks, different data sets as well as different SVM algorithms.  相似文献   

11.
A string-based negative selection algorithm is an immune-inspired classifier that infers a partitioning of a string space Σ? into “normal” and “anomalous” partitions from a training set S containing only samples from the “normal” partition. The algorithm generates a set of patterns, called “detectors”, to cover regions of the string space containing none of the training samples. Strings that match at least one of these detectors are then classified as “anomalous”. A major problem with existing implementations of this approach is that the detector generating step needs exponential time in the worst case. Here we show that for the two most widely used kinds of detectors, the r-chunk and r-contiguous detectors based on partial matching to substrings of length r, negative selection can be implemented more efficiently by avoiding generating detectors altogether: for each detector type, training set SΣ? and parameter r? one can construct an automaton whose acceptance behaviour is equivalent to the algorithm’s classification outcome. The resulting runtime is O(|S|?r|Σ|) for constructing the automaton in the training phase and O(?) for classifying a string.  相似文献   

12.
A brief taxonomy and ranking of creative prototype reduction schemes   总被引:1,自引:1,他引:0  
Various Prototype Reduction Schemes (PRS) have been reported in the literature. Based on their operating characteristics, these schemes fall into two fairly distinct categories — those which are of a creative sort, and those which are essentially selective. The norms for evaluating these methods are typically, the reduction rate and the classification accuracy. It is generally believed that the former class of methods is superior to the latter. In this paper, we report the results of executing various creative PRSs, and attempt to comparatively quantify their capabilities. The paper presents a brief taxonomy of the various reported PRS schemes. Our experimental results for three artificial data sets, and for samples involvingreal-life data sets, demonstrate that no single method is uniformly superior to the others for all kinds of applications. This result, though consistent with the findings of Bezdek and Kuncheva [1], is, in one sense, counter-intuitive, because the various researchers have presented their specific PRS with the hope that it would be superior to the previously reported methods. However, the fact is that while one method is superior in certain domains, it is inferior to another method when dealing with a data set with markedly different characteristics. The conclusion of this study is that the question of determining when one method is superior to another remains open. Indeed, it appears as if the designers of the pattern recognition system will have to choose the appropriate PRS based to the specific characteristics of the data that they are studying. The paper also suggests answers to various hypotheses that relate to the accuracies and reduction rates of families of PRS.  相似文献   

13.
Many companies have adopted Process-aware Information Systems (PAIS) to support their business processes in some form. On the one hand these systems typically log events (e.g., in transaction logs or audit trails) related to the actual business process executions. On the other hand explicit process models describing how the business process should (or is expected to) be executed are frequently available. Together with the data recorded in the log, this situation raises the interesting question “Do the model and the log conform to each other?”. Conformance checking, also referred to as conformance analysis, aims at the detection of inconsistencies between a process model and its corresponding execution log, and their quantification by the formation of metrics. This paper proposes an incremental approach to check the conformance of a process model and an event log. First of all, the fitness between the log and the model is measured (i.e., “Does the observed process comply with the control flow specified by the process model?”). Second, the appropriateness of the model can be analyzed with respect to the log (i.e., “Does the model describe the observed process in a suitable way?”). Appropriateness can be evaluated from both a structural and a behavioral perspective. To operationalize the ideas presented in this paper a Conformance Checker has been implemented within the ProM framework, and it has been evaluated using artificial and real-life event logs.  相似文献   

14.
We consider the problem of using a stochastic approximation algorithm to perform online tracking in a non-stationary environment characterised by abrupt “regime changes”. The primary contribution of this paper is a new approach for adaptive stepsize selection that is suitable for this type of non-stationarity. Our approach is pre-emptive rather than reactive, and is based on a strategy of maximising the rate of adaptation, subject to a constraint on the probability that the iterates fall outside a pre-determined range of acceptable error. The basis for our approach is provided by the theory of weak convergence for stochastic approximation algorithms.  相似文献   

15.
Process mining allows for the automated discovery of process models from event logs. These models provide insights and enable various types of model-based analysis. This paper demonstrates that the discovered process models can be extended with information to predict the completion time of running instances. There are many scenarios where it is useful to have reliable time predictions. For example, when a customer phones her insurance company for information about her insurance claim, she can be given an estimate for the remaining processing time. In order to do this, we provide a configurable approach to construct a process model, augment this model with time information learned from earlier instances, and use this to predict e.g., the completion time. To provide meaningful time predictions we use a configurable set of abstractions that allow for a good balance between “overfitting” and “underfitting”. The approach has been implemented in ProM and through several experiments using real-life event logs we demonstrate its applicability.  相似文献   

16.
In this paper, we propose a modification to the BitTorrent  protocol related to its peer unchoking policy. In particular, we apply a novel optimistic unchoking approach that improves the quality of inter-connections amongst peers, i.e., increases the number of directly-connected and interested-in-cooperation peers without penalizing underutilized and/or idle peers. Our optimistic unchoking policy takes into consideration the number of clients currently interested in downloading from a peer that is to be unchoked. Our conjecture is that peers having few clients interested in downloading data from them, should be favored with optimistic unchoke intervals. This enables the peers in question to receive data since they become unchoked faster and in turn, they will trigger the interest of additional clients. In contrast, peers with plenty of “interested” clients should enjoy a lower priority to be selected as planned optimistic unchoked, since these peers likely have enough data to forward; nevertheless, they receive enough data due to tit-for-tat peer reciprocation and are not in need of optimistic unchoking slots. Armed with this realization, we establish an analytical model and prove a significant performance improvement under our modified BitTorrent  protocol. Experimental results, also, indicate that our approach significantly outperforms the existing optimistic unchoking policy in three important aspects: first, there is a higher number of interested-in-cooperation and directly-connected peers. Second, since leechers now act as data intermediaries, the load on seeders eases up considerably. Last, a shorter bootstrapping period for fresh peers is achieved. Hence, we claim that our approach helps implement an enhanced BitTorrent  protocol and we name it “EnhancedBit”.  相似文献   

17.
基于半监督多示例学习的对象图像检索   总被引:2,自引:0,他引:2  
李大湘 《控制与决策》2010,25(7):981-986
针对基于对象的图像检索问题,提出一种新的半监督多示例学习(MIL)算法.该算法将图像当作包,分割区域的视觉特征当作包中的示例,按"点密度"最大原则,提取"视觉语义"构造投影空间;然后利用定义的非线性函数将包映射成投影空间中的一个点,以获得图像的"投影特征",并采用粗糙集(RS)方法对其进行属性约简;最后利用直推式支持向量机(TSVM)进行半监督的学习,得到分类器.实验结果表明,该方法有效且性能优于其他方法.  相似文献   

18.
Malaria transmission is highly influenced by environmental and climatic conditions but their effects are often not linear. The climate-malaria relation is unlikely to be the same over large areas covered by different agro-ecological zones. Similarly, spatial correlation in malaria transmission arisen mainly due to spatially structured covariates (environmental and human made factors), could vary across the agro-ecological zones, introducing non-stationarity. Malaria prevalence data from West Africa extracted from the “Mapping Malaria Risk in Africa” database were analyzed to produce regional parasitaemia risk maps. A non-stationary geostatistical model was developed assuming that the underlying spatial process is a mixture of separate stationary processes within each zone. Non-linearity in the environmental effects was modeled by separate P-splines in each agro-ecological zone. The model allows smoothing at the borders between the zones. The P-splines approach has better predictive ability than categorizing the covariates as an alternative of modeling non-linearity. Model fit and prediction was handled within a Bayesian framework, using Markov chain Monte Carlo (MCMC) simulations.  相似文献   

19.
A detailed computational study is presented of the flow pattern around the Esso Osaka with rudder in simple maneuvering conditions: “static rudder” and “pure drift”. The objectives are: (1) apply RANS for maneuvering simulation; (2) perform verification and validation on field quantities; (3) characterize flow pattern; and (4) correlate behavior of the integral quantities with the flow field. The general-purpose code CFDSHIP-IOWA is used. The free surface is neglected and the two-equation k-ω turbulence model is used. The levels of verification of the velocity components for the “straight-ahead”, “static rudder” and “pure drift” conditions show ranges from 5.5% to 28.3% of free stream, U0, for the axial velocity U and 2.5-29.1%U0 for the cross flow (VW). Qualitative validation against limited experimental data shows encouraging results with respect to trends and levels. The flow pattern is characterized by fore and aft body bilge and side vortices, which are similar for “straight-ahead” and “static rudder” conditions, except in close vicinity of the rudder. The “pure drift” condition shows strong asymmetry on windward vs. leeward sides and a more complex vortex system with additional bilge vortices. Similarities and differences with data for other tanker, container, and surface combatant hulls and relation between flow pattern and forces and moments are discussed. Future work focuses on influence of propeller.  相似文献   

20.
Gaussian mixture models (GMM) are commonly employed in nonparametric supervised classification. In high-dimensional problems it is often the case that information relevant to the separation of the classes is contained in a few directions. A GMM fitting procedure oriented to supervised classification is proposed, with the aim of reducing the number of free parameters. It resorts to projection pursuit as a dimension reduction method and combines it with GM modelling of class-conditional densities. In its derivation, issues regarding the forward and backward projection pursuit algorithms are discussed. The proposed procedure avoids the “curse of dimensionality”, is able to model structure in subspaces and regularizes the classification model. Its performance is illustrated on a simulation experiment and on a real data set, in comparison with other GMM-based classification methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号