首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The aim of this paper is to present a strategy by which a new philosophy for pattern classification, namely that pertaining to dissimilarity-based classifiers (DBCs), can be efficiently implemented. This methodology, proposed by Duin and his co-authors (see Refs. [Experiments with a featureless approach to pattern recognition, Pattern Recognition Lett. 18 (1997) 1159-1166; Relational discriminant analysis, Pattern Recognition Lett. 20 (1999) 1175-1181; Dissimilarity representations allow for buillding good classifiers, Pattern Recognition Lett. 23 (2002) 943-956; Dissimilarity representations in pattern recognition, Concepts, theory and applications, Ph.D. Thesis, Delft University of Technology, Delft, The Netherlands, 2005; Prototype selection for dissimilarity-based classifiers, Pattern Recognition 39 (2006) 189-208]), is a way of defining classifiers between the classes, and is not based on the feature measurements of the individual patterns, but rather on a suitable dissimilarity measure between them. The advantage of this methodology is that since it does not operate on the class-conditional distributions, the accuracy can exceed the Bayes’ error bound. The problem with this strategy is, however, the need to compute, store and process the inter-pattern dissimilarities for all the training samples, and thus, the accuracy of the classifier designed in the dissimilarity space is dependent on the methods used to achieve this. In this paper, we suggest a novel strategy to enhance the computation for all families of DBCs. Rather than compute, store and process the DBC based on the entire data set, we advocate that the training set be first reduced into a smaller representative subset. Also, rather than determine this subset on the basis of random selection, or clustering, etc., we advocate the use of a prototype reduction scheme (PRS), whose output yields the points to be utilized by the DBC. The rationale for this is explained in the paper. Apart from utilizing PRSs, in the paper we also propose simultaneously employing the Mahalanobis distance as the dissimilarity-measurement criterion to increase the DBCs classification accuracy. Our experimental results demonstrate that the proposed mechanism increases the classification accuracy when compared with the “conventional” approaches for samples involving real-life as well as artificial data sets—even though the resulting dissimilarity criterion is not symmetric.  相似文献   

2.
A conventional way to discriminate between objects represented by dissimilarities is the nearest neighbor method. A more efficient and sometimes a more accurate solution is offered by other dissimilarity-based classifiers. They construct a decision rule based on the entire training set, but they need just a small set of prototypes, the so-called representation set, as a reference for classifying new objects. Such alternative approaches may be especially advantageous for non-Euclidean or even non-metric dissimilarities.The choice of a proper representation set for dissimilarity-based classifiers is not yet fully investigated. It appears that a random selection may work well. In this paper, a number of experiments has been conducted on various metric and non-metric dissimilarity representations and prototype selection methods. Several procedures, like traditional feature selection methods (here effectively searching for prototypes), mode seeking and linear programming are compared to the random selection. In general, we find out that systematic approaches lead to better results than the random selection, especially for a small number of prototypes. Although there is no single winner as it depends on data characteristics, the k-centres works well, in general. For two-class problems, an important observation is that our dissimilarity-based discrimination functions relying on significantly reduced prototype sets (3-10% of the training objects) offer a similar or much better classification accuracy than the best k-NN rule on the entire training set. This may be reached for multi-class data as well, however such problems are more difficult.  相似文献   

3.
In this paper classification on dissimilarity representations is applied to medical imaging data with the task of discrimination between normal images and images with signs of disease. We show that dissimilarity-based classification is a beneficial approach in dealing with weakly labeled data, i.e. when the location of disease in an image is unknown and therefore local feature-based classifiers cannot be trained. A modification to the standard dissimilarity-based approach is proposed that makes a dissimilarity measure multi-valued, hence, able to retain more information. A multi-valued dissimilarity between an image and a prototype becomes an image representation vector in classification. Several classification outputs with respect to different prototypes are further integrated into a final image decision. Both standard and proposed methods are evaluated on data sets of chest radiographs with textural abnormalities and compared to several feature-based region classification approaches applied to the same data. On a tuberculosis data set the multi-valued dissimilarity-based classification performs as well as the best region classification method applied to the fully labeled data, with an area under the receiver operating characteristic (ROC) curve (Az) of 0.82. The standard dissimilarity-based classification yields Az=0.80. On a data set with interstitial abnormalities both dissimilarity-based approaches achieve Az=0.98 which is closely behind the best region classification method.  相似文献   

4.
Prototype-based classification relies on the distances between the examples to be classified and carefully chosen prototypes. A small set of prototypes is of interest to keep the computational complexity low, while maintaining high classification accuracy. An experimental study of some old and new prototype optimisation techniques is presented, in which the prototypes are either selected or generated from the given data. These condensing techniques are evaluated on real data, represented in vector spaces, by comparing their resulting reduction rates and classification performance.Usually the determination of prototypes is studied in relation with the nearest neighbour rule. We will show that the use of more general dissimilarity-based classifiers can be more beneficial. An important point in our study is that the adaptive condensing schemes here discussed allow the user to choose the number of prototypes freely according to the needs. If such techniques are combined with linear dissimilarity-based classifiers, they provide the best trade-off of small condensed sets and high classification accuracy.  相似文献   

5.
多维尺度分析(MDS)通常以欧氏空间中点的距离来度量对象间的差异性(相似性)。当对象有像性别、颜色等名义属性时,通常的做法是将它们数量化,然后再对其运用欧氏距离,显然,这种处理方法存在不合理性。将一种混合值差度量(HVDM)引入含名义属性的对象间距离的计算,以改善名义属性下MDS的计算合理性。在UCI Abalone数据集上进行的实验,结果表明该方法比传统的数量化方法在重构能力、重构精确度方面都有更好的表现。  相似文献   

6.
In solving pattern recognition problems, many classification methods, such as the nearest-neighbor (NN) rule, need to determine prototypes from a training set. To improve the performance of these classifiers in finding an efficient set of prototypes, this paper introduces a training sample sequence planning method. In particular, by estimating the relative nearness of the training samples to the decision boundary, the approach proposed here incrementally increases the number of prototypes until the desired classification accuracy has been reached. This approach has been tested with a NN classification method and a neural network training approach. Studies based on both artificial and real data demonstrate that higher classification accuracy can be achieved with fewer prototypes.  相似文献   

7.
王林  郭娜娜 《计算机应用》2017,37(4):1032-1037
针对传统分类技术对不均衡电信客户数据集中流失客户识别能力不足的问题,提出一种基于差异度的改进型不均衡数据分类(IDBC)算法。该算法在基于差异度分类(DBC)算法的基础上改进了原型选择策略。在原型选择阶段,利用改进型的样本子集优化方法从整体数据集中选择最具参考价值的原型集,从而避免了随机选择所带来的不确定性;在分类阶段,分别利用训练集和原型集、测试集和原型集样本之间的差异性构建相应的特征空间,进而采用传统的分类预测算法对映射到相应特征空间内的差异度数据集进行学习。最后选用了UCI数据库中的电信客户数据集和另外6个普通的不均衡数据集对该算法进行验证,相对于传统基于特征的不均衡数据分类算法,DBC算法对稀有类的识别率平均提高了8.3%,IDBC算法对稀有类的识别率平均提高了11.3%。实验结果表明,所提IDBC算法不受类别分布的影响,而且对不均衡数据集中稀有类的识别能力优于已有的先进分类技术。  相似文献   

8.
针对tri_training协同训练算法在小样本的高光谱遥感影像半监督分类过程中,存在增选样本的误标记问题,提出一种基于空间邻域信息的半监督协同训练分类算法tri_training_SNI(tri_training based on Spatial Neighborhood Information)。首先利用分类器度量方法不一致度量和新提出的不一致精度度量从MLR(Multinomial Logistic Regression)、KNN(k-Nearest Neighbor)、ELM(Extreme Learning Machine)和RF(Random Forest)4个分类器中选择3分类性能差异性最大的3个分类器;然后在样本选择过程中,采用选择出来的3个分类器,在两个分类器分类结果相同的基础上,加入初始训练样本的8邻域信息进行未标记样本的二次筛选和标签的确定,提高了半监督学习的样本选择精度。通过对AVIRIS和ROSIS两景高光谱遥感影像进行分类实验,结果表明与传统的tri_training协同算法相比,该算法在分类精度方面有明显提高。  相似文献   

9.
Most of classification problems concern applications with objects lying in an Euclidean space, but, in some situations, only dissimilarities between objects are known. We are concerned with supervised classification analysis from an observed dissimilarity table, which task is classifying new unobserved or implicit objects (only known through their dissimilarity measures with previously classified ones forming the training data set) into predefined classes. This work concentrates on developing model-based classifiers for dissimilarities which take into account the measurement error w.r.t. Euclidean distance. Basically, it is assumed that the unobserved objects are unknown parameters to estimate in an Euclidean space, and the observed dissimilarity table is a random perturbation of their Euclidean distances of gaussian type. Allowing the distribution of these perturbations to vary across pairs of classes in the population leads to more flexible classification methods than usual algorithms. Model parameters are estimated from the training data set via the maximum likelihood (ML) method, and allocation is done by assigning a new implicit object to the group in the population and positioning in the Euclidean space maximizing the conditional group likelihood with the estimated parameters. This point of view can be expected to be useful in classifying dissimilarity tables that are no longer Euclidean due to measurement error or instabilities of various types. Two possible structures are postulated for the error, resulting in two different model-based classifiers. First results on real or simulated data sets show interesting behavior of the two proposed algorithms, ant the respective effects of the dissimilarity type and of the data intrinsic dimension are investigated. For these latter two aspects, one of the constructed classifiers appears to be very promising. Interestingly, the data intrinsic dimension seems to have a much less adverse effect on our classifiers than initially feared, at least for small to moderate dimensions.  相似文献   

10.
In this paper, a new matching pursuits dissimilarity measure (MPDM) is presented that compares two signals using the information provided by their matching pursuits (MP) approximations, without requiring any prior domain knowledge. MPDM is a flexible and differentiable measure that can be used to perform shape-based comparisons and fuzzy clustering of very high-dimensional, possibly compressed, data. A novel prototype based classification algorithm, which is termed the computer aided minimization procedure (CAMP), is also proposed. The CAMP algorithm uses the MPDM with the competitive agglomeration (CA) fuzzy clustering algorithm to build reliable shape based prototypes for classification. MP is a well known sparse signal approximation technique, which is commonly used for video and image coding. The dictionary and coefficient information produced by MP has previously been used to define features to build discrimination and prototype based classifiers. However, existing MP based classification applications are quite problem domain specific, thus making their generalization to other problems quite difficult. The proposed CAMP algorithm is the first MP based classification system that requires no assumptions about the problem domain and builds a bridge between the MP and fuzzy clustering algorithms. Experimental results also show that the CAMP algorithm is more resilient to outliers in test data than the multilayer perceptron (MLP) and support-vector-machine (SVM) classifiers, as well as prototype-based classifiers using the Euclidean distance as their dissimilarity measure.  相似文献   

11.
Conventional Fuzzy C-means (FCM) algorithm uses Euclidean distance to describe the dissimilarity between data and cluster prototypes. Since the Euclidean distance based dissimilarity measure only characterizes the mean information of a cluster, it is sensitive to noise and cluster divergence. In this paper, we propose a novel fuzzy clustering algorithm for image segmentation, in which the Mahalanobis distance is utilized to define the dissimilarity measure. We add a new regularization term to the objective function of the proposed algorithm, reflecting the covariance of the cluster. We experimentally demonstrate the effectiveness of the proposed algorithm on a generated 2D dataset and a subset of Berkeley benchmark images.  相似文献   

12.
This paper proposes a non-parametric method for the classification of thin-layer chromatographic (TLC) images from patterns represented in a dissimilarity space. Each pattern corresponds to a mixture of Gaussian approximation of the intensity profile. The methodology comprises various phases, including image processing and analysis steps to extract the chromatographic profiles and a classification phase to discriminate among two groups, one corresponding to normal cases and the other to three pathological classes. We present an extensive study of several dissimilarity-based approaches analysing the influence of the dissimilarity measure and the prototype selection method on the classification performance. The main conclusions of this paper are that, Match and Profile-difference dissimilarity measures present better results, and a new prototype selection methodology achieves a performance similar or even better than conventional methods. Furthermore, we also concluded that simplest classifiers, such as k-NN and linear discriminant classifiers (LDCs), present good performance being the overall classification error less than 10% for the four-class problem.  相似文献   

13.
A common approach in structural pattern classification is to define a dissimilarity measure on patterns and apply a distance-based nearest-neighbor classifier. In this paper, we introduce an alternative method for classification using kernel functions based on edit distance. The proposed approach is applicable to both string and graph representations of patterns. By means of the kernel functions introduced in this paper, string and graph classification can be performed in an implicit vector space using powerful statistical algorithms. The validity of the kernel method cannot be established for edit distance in general. However, by evaluating theoretical criteria we show that the kernel functions are nevertheless suitable for classification, and experiments on various string and graph datasets clearly demonstrate that nearest-neighbor classifiers can be outperformed by support vector machines using the proposed kernel functions.  相似文献   

14.
针对传统网络流量分类方法准确率低、开销大、应用范围受限等问题,提出一种支持向量机(SVM)的半监督网络流量分类方法。该方法在SVM训练中,使用增量学习技术在初始和新增样本集中动态地确定支持向量,避免不必要的重复训练,改善因出现新样本而造成原分类器分类精度降低、分类时间长的情况;改进半监督Tri-training方法对分类器进行协同训练,同时使用大量未标记和少量已标记样本对分类器进行反复修正, 减少辅助分类器的噪声数据,克服传统协同验证对分类算法及样本类型要求苛刻的不足。实验结果表明,该方法可明显提高网络流量分类的准确率和效率。  相似文献   

15.
Case-based reasoning (CBR) is used when generalized knowledge is lacking. The method works on a set of cases formerly processed and stored in the case base. A new case is interpreted based on its similarity to cases in the case base. The closest case with its associated result is selected and presented as output of the system. Recently, dissimilarity-based classification (DSC) has been introduced due to the curse of dimensionality of feature spaces and the problem arising when trying to make image features explicitly. The approach classifies samples based on their dissimilarity value to all training samples. In this paper we are reviewing the basic properties of these two approaches. We show the similarity of dissimilarity-based classification to case-based reasoning. Finally, we conclude that dissimilarity-based classification is a variant of case-based reasoning and that most of the open problems in dissimilarity-based classification are research topics of case-based reasoning.  相似文献   

16.
In case-based reasoning (CBR) classification systems, the similarity metrics play a key role and directly affect the system's performance. Based on our previous work on the learning pseudo metrics (LPM), we propose a case-based reasoning method for pattern classification, where the widely used Euclidean distance is replaced by the LPM to measure the closeness between the target case and each source case. The same type of case as the target case can be retrieved and the category of the target case can be defined by using the majority of reuse principle. Experimental results over some benchmark datasets and a fault diagnosis of the Tennessee-Eastman (TE) process demonstrate that the proposed reasoning techniques in this paper can effectively improve the classification accuracy, and the LPM-based retrieval method can substantially improve the quality and learning ability of CBR classifiers.  相似文献   

17.
针对基于拉普拉斯支持向量机(LapSVM)的半监督分类方法需要将全部无标记样本加入训练样本集中训练得到分类器,算法需要的时间和空间复杂度高,不能有效处理大规模图像分类的问题,提出了模糊C-均值聚类(FCM)预选取样本的LapSVM图像分类方法。该方法利用FCM算法对无标记样本聚类,根据聚类结果选择可能在最优分类超平面附近的无标记样本点加入训练样本集,这些样本可能是支持向量,携带对分类有用的信息,其数量只是无标记样本的一少部分,因此使训练样本集减小。计算机仿真结果表明该方法充分利用了无标记样本所蕴含的判别信息,有效地提高了分类器的分类精度,降低了算法的时间和空间复杂度。  相似文献   

18.

Classification is one of the data mining processes used to predict predetermined target classes with data learning accurately. This study discusses data classification using a fuzzy soft set method to predict target classes accurately. This study aims to form a data classification algorithm using the fuzzy soft set method. In this study, the fuzzy soft set was calculated based on the normalized Hamming distance. Each parameter in this method is mapped to a power set from a subset of the fuzzy set using a fuzzy approximation function. In the classification step, a generalized normalized Euclidean distance is used to determine the similarity between two sets of fuzzy soft sets. The experiments used the University of California (UCI) Machine Learning dataset to assess the accuracy of the proposed data classification method. The dataset samples were divided into training (75% of samples) and test (25% of samples) sets. Experiments were performed in MATLAB R2010a software. The experiments showed that: (1) The fastest sequence is matching function, distance measure, similarity, normalized Euclidean distance, (2) the proposed approach can improve accuracy and recall by up to 10.3436% and 6.9723%, respectively, compared with baseline techniques. Hence, the fuzzy soft set method is appropriate for classifying data.

  相似文献   

19.
The following two-stage approach to learning from dissimilarity data is described: (1) embed both labeled and unlabeled objects in a Euclidean space; then (2) train a classifier on the labeled objects. The use of linear discriminant analysis for (2), which naturally invites the use of classical multidimensional scaling for (1), is emphasized. The choice of the dimension of the Euclidean space in (1) is a model selection problem; too few or too many dimensions can degrade classifier performance. The question of how the inclusion of unlabeled objects in (1) affects classifier performance is investigated. In the case of spherical covariances, including unlabeled objects in (1) is demonstrably superior. Several examples are presented.  相似文献   

20.
The image Euclidean distance (IMED) considers the spatial relationship between the pixels of different images and can easily be embedded in existing image recognition algorithms that are based on Euclidean distance. IMED uses the prior knowledge that pixels located near one another have little variance in gray scale values, and defines a metric matrix according to the spatial distance between pixels. In this paper, we propose an adaptive image Euclidean distance (AIMED), which considers not only the prior spatial knowledge, but also the prior gray level knowledge from images. The most important advantage of the proposed AIMED over IMED is that AIMED makes the metric matrix adaptive to the content of the concerned images. Two ways of using gray level information are proposed. One is based on gray level distances, and the other is based on cosine dissimilarity of gray levels. Experiments on two facial databases and a handwritten digital database show that AIMED achieves the highest classification accuracy when it is embedded in nearest neighbor classifiers, principal component analysis, and support vector machines.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号