首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
As an extension of multi-class classification, machine learning algorithms have been proposed that are able to deal with situations in which the class labels are defined in a non-crisp way. Objects exhibit in that sense a degree of membership to several classes. In a similar setting, models are developed here for classification problems where an order relation is specified on the classes (i.e., non-crisp ordinal regression problems). As for traditional (crisp) ordinal regression problems, it is argued that the order relation on the classes should be reflected by the model structure as well as the performance measure used to evaluate the model. These arguments lead to a natural extension of the well-known proportional odds model for non-crisp ordinal regression problems, in which the underlying latent variable is not necessarily restricted to the class of linear models (by using kernel methods).  相似文献   

2.
Many classification tasks can be viewed as ordinal. Use of numeric information usually provides possibilities for more powerful analysis than ordinal data. On the other hand, ordinal data allows more powerful analysis when compared to nominal data. It is therefore important not to overlook knowledge about ordinal dependencies in data sets used in data mining. This paper investigates data mining support available from ordinal data. The effect of considering ordinal dependencies in the data set on the overall results of constructing decision trees and induction rules is illustrated. The degree of improved prediction of ordinal over nominal data is demonstrated. When data was very representative and consistent, use of ordinal information reduced the number of final rules with a lower error rate. Data treatment alternatives are presented to deal with data sets having greater imperfections.  相似文献   

3.
Correlation is usually used in the context of real-valued sequences but, in data mining, the values of fields may be of various types—real, nominal or ordinal. Techniques for measuring correlation between any two sequences of data are reviewed, regardless of their type. In particular, a new technique for measuring the correlation between real-valued data and nominal data is proposed. The technique relies on the definition of an assignment of the nominal values to real values and hence is called A-correlation (A for assignment). The proposed assignment is defined to be the most favourable of all such assignments and can be efficiently computed. Moreover, it is shown that the resulting correlation coefficient has a natural interpretation independent of the assignment. With nominal/nominal data, Cramer's V-statistic can be used or, alternatively, a new statistic based on matching. These can also be used in the ordinal/nominal case. With just ordinal data, correlations based on rank are appropriate and these can also be used in the ordinal/ordinal and the ordinal/real cases.  相似文献   

4.
Ordinal regression(OR)or classification is a machine learning paradigm for ordinal labels.To date,there have been a variety of methods proposed including kernel based and neural network based methods with significant performance.However,existing OR methods rarely consider latent structures of given data,particularly the interaction among covariates,thus losing interpretability to some extent.To compensate this,in this paper,we present a new OR method:ordinal factorization machine with hierarchical sparsity(OFMHS),which combines factorization machine and hierarchical sparsity together to explore the hierarchical structure behind the input variables.For the sake of optimization,we formulate OFMHS as a convex optimization problem and solve it by adopting the efficient alternating directions method of multipliers(ADMM)algorithm.Experimental results on synthetic and real datasets demonstrate the superiority of our method in both performance and significant variable selection.  相似文献   

5.
目前,只有少量面向多任务学习的序数回归方法。这些方法假设不同的任务具有相同的权重,对整体模型具有相同的贡献。然而,在真实应用中,不同任务对于整体模型的贡献往往是不同的。为此,提出了一种基于任务权重自动优化的多任务序数回归算法。首先,提出了基于支持向量机的多任务序数回归模型,通过分类器参数共享,实现不同任务之间的信息迁移;其次,考虑到不同任务对整体模型可能具有不同贡献,赋予每个任务一个权重,这些权重将在学习过程中自动优化求解;最后,采用了启发式框架,交替地建立多任务序数回归模型和优化任务权重。实验结果表明,提出方法相比于其他多任务序数回归方法,平均0-1误差降低了3.8%~12.3%,平均绝对误差降低了4.1%~11%。考虑了每个任务的不同权重,通过自动优化这些权重,降低了多任务序数回归模型的分类误差。  相似文献   

6.
Regression for ordinal variables without underlying continuous variables   总被引:2,自引:0,他引:2  
Several techniques exist nowadays for continuous (i.e. numerical) data analysis and modeling. However, although part of the information gathered by companies, statistical offices and other institutions is numerical, a large part of it is represented using categorical variables in ordinal or nominal scales. Techniques for model building on categorical data are required to take advantage of such a wealth of information. In this paper, current approaches to regression for ordinal data are reviewed and a new proposal is described which has the advantage of not assuming any latent continuous variable underlying the dependent ordinal variable. Estimation in the new approach can be implemented using genetic algorithms. An artificial example is presented to illustrate the feasibility of the proposal.  相似文献   

7.
Regression problems try estimating a continuous variable from a number of characteristics or predictors. Several proposals have been made for regression models based on the use of fuzzy rules; however, all these proposals make use of rule models in which the irrelevance of the input variables in relation to the variable to be approximated is not taken into account. Regression problems share with the ordinal classification the existence of an explicit relationship of order between the values of the variable to be predicted. In a recent paper, the authors have proposed an ordinal classification algorithm that takes into account the detection of the irrelevance of input variables. This algorithm extracts a set of fuzzy rules from an example set, using as the basic model a sequential covering strategy along with a genetic algorithm. In this paper, a proposal for a regression algorithm based on this ordinal classification algorithm is presented. The proposed model can be interpreted as a multiclassifier and multilevel system that learns at each stage using the knowledge gained in previous stages. Due to similarities between regression and ordinal problems as well as the use of a set of ordinal algorithms, an error interval can be returned with the regression output value. Experimental results show the good behavior of the proposal as well as the results of the error interval.  相似文献   

8.
In many data mining tasks, the goal is to classify entities into a set of pre‐defined groups (classes). A second and equally important goal is the interpretation, i.e. understanding the nature of the population aggregated in each class. These tasks are rendered even more complex when there is no a‐priori information regarding the right classification. The current paper is based on two concepts: (1) Bounded‐Rationality theory which implements an S‐shaped function that represents human logic as a saliency measure to determine the substantial features that characterize each potential group and (2) Classification by clustering (CBC) that applies Decision Tree‐like classification in unsupervised clustering problems, where neither an a‐priori classification nor target‐attributes are known in advance. In the context of these two concepts, the current research contributes: (1) by expanding the saliency measure to all possible types of variables (nominal as well as numerical), (2) by evaluating, using five datasets, a composite model that combines the CBC method and the saliency concept. The findings show that by using clustering algorithms for classification tasks (CBC method) the results are as accurate as those obtained by conventional Decision Trees, but with a better saliency factor.  相似文献   

9.
付红伟 《计算机科学》2016,43(Z6):452-453, 460
分类规则挖掘方法和回归问题的区别在于分类规则挖掘的目标属性是离散的标称值,而回归问题的目标属性是连续和有序的值。主要介绍了用GEP实现分类规则挖掘的两种主要方法,并分析了如何对适应度函数进行改进以挖掘易于理解的分类规则。  相似文献   

10.
When items are classified according to whether they have more or less of a characteristic, the scale used is referred to as an ordinal scale. The main characteristic of the ordinal scale is that the categories have a logical or ordered relationship to each other. Thus, the ordinal scale data processing is very common in marketing, satisfaction and attitudinal research. This study proposes a new data mining method, using a rough set-based association rule, to analyze ordinal scale data, which has the ability to handle uncertainty in the data classification/sorting process. The induction of rough-set rules is presented as method of dealing with data uncertainty, while creating predictive if—then rules that generalize data values, for the beverage market in Taiwan. Empirical evaluation reveals that the proposed Rough Set Associational Rule (RSAR), combined with rough set theory, is superior to existing methods of data classification and can more effectively address the problems associated with ordinal scale data, for exploration of a beverage product spectrum.  相似文献   

11.
An approach to classification using a multi-agent system founded on an Argumentation from Experience paradigm is proposed. The technique is based on the idea that classification can be conducted as a process whereby a group of agents “argue” about the classification of a given case according to their experience as recorded in individual local data sets. The paper describes mechanisms whereby this can be achieved, which have been realised in the PISA framework. The framework allows both the possibility of agents operating in groups (coalitions) and migrating between groups. The proposed multi-agent classification using the Argumentation from Experience paradigm has been used to address standard, ordinal and unbalanced classification problems with good results. A full evaluation, in the context of these applications, is presented.  相似文献   

12.
Stochastic dominance-based rough set model for ordinal classification   总被引:1,自引:0,他引:1  
In order to discover interesting patterns and dependencies in data, an approach based on rough set theory can be used. In particular, dominance-based rough set approach (DRSA) has been introduced to deal with the problem of ordinal classification with monotonicity constraints (also referred to as multicriteria classification in decision analysis). However, in real-life problems, in the presence of noise, the notions of rough approximations were found to be excessively restrictive. In this paper, we introduce a probabilistic model for ordinal classification problems with monotonicity constraints. Then, we generalize the notion of lower approximations to the stochastic case. We estimate the probabilities with the maximum likelihood method which leads to the isotonic regression problem for a two-class (binary) case. The approach is easily generalized to a multi-class case. Finally, we show the equivalence of the variable consistency rough sets to the specific empirical risk-minimizing decision rule in the statistical decision theory.  相似文献   

13.
数据分类是数据挖掘技术在医疗数据分析中的一个重要应用,在分析了医疗数据特点后,以大肠早癌诊断数据为例,提出了利用计数最近邻算法对其进行分类的思想;同时在分析该算法性能的基础上,提出了基于检索树和样本密度的计数最近邻新算法对改数据进行分析,以检索树的构建来提高原算法的计算效率,基于全局密度、K-密度的改进算法来提高原算法的精确度。通过实验证明新算法在大肠早癌的数据分析中,其计算复杂度、存储空间和数据分类精确度都得到了较大的提高,同时新算法适应于数值数据、文本数据以及混合数据的分类。  相似文献   

14.
The current paper presents a novel approach to bitmap-indexing for data mining purposes. Currently bitmap-indexing enables efficient data storage and retrieval, but is limited in terms of similarity measurement, and hence as regards classification, clustering and data mining. Bitmap-indexes mainly fit nominal discrete attributes and thus unattractive for widespread use, which requires the ability to handle continuous data in a raw format. The current research describes a scheme for representing ordinal and continuous data by applying the concept of “padding” where each discrete nominal data value is transformed into a range of nominal-discrete values. This "padding" is done by adding adjacent bits "around" the original value (bin). The padding factor, i.e., the number of adjacent bits added, is calculated from the first and second derivative degrees of each attribute’s domain-distribution. The padded representation better supports similarity measures, and therefore improves the accuracy of clustering and mining. The advantages of padding bitmaps are demonstrated on Fisher’s Iris dataset.  相似文献   

15.
基于多重冗余标记CRFs的句子情感分析研究   总被引:2,自引:0,他引:2  
本文提出了一种基于多重冗余标记的CRFs并将其应用于情感分析任务。该方法不仅能够有效地解决有序标记的分类问题,还能够在保证情感分析中各子任务能够使用不同特征的前提下,将情感分析中的主客观分类、褒贬分类和褒贬强弱分类任务统一在一个模型之中,在多个子任务上寻求联合最优,制约分步完成时误差的传播。实验证明,该方法有效地提高了句子情感分析任务的准确率。在理论上,该方法也为基于最大似然训练的算法解决序回归问题提供了一条途径。  相似文献   

16.
周庆  牟超  杨丹 《软件学报》2015,26(11):3026-3042
教育数据挖掘(educational data mining,简称EDM)技术运用教育学、计算机科学、心理学和统计学等多个学科的理论和技术来解决教育研究与教学实践中的问题.在大数据时代背景下,EDM研究将迎来新的转折点.为方便读者了解EDM的研究进展或从事相关研究和实践,首先介绍EDM研究的概貌、特点和发展历程,然后重点介绍和分析了EDM近年来的研究成果.在成果介绍部分,选取的研究成果大部分发表于2013年以后,包括以往较少涉及的几种新型教育技术.在成果分析部分,对近年来的典型案例作了分类、统计和对比分析,对EDM研究的特点、不足及发展趋势进行了归纳和预测.最后讨论了大数据时代下EDM面临的机遇和挑战.  相似文献   

17.
In many applications of information systems learning algorithms have to act in dynamic environments where data are collected in the form of transient data streams. Compared to static data mining, processing streams imposes new computational requirements for algorithms to incrementally process incoming examples while using limited memory and time. Furthermore, due to the non-stationary characteristics of streaming data, prediction models are often also required to adapt to concept drifts. Out of several new proposed stream algorithms, ensembles play an important role, in particular for non-stationary environments. This paper surveys research on ensembles for data stream classification as well as regression tasks. Besides presenting a comprehensive spectrum of ensemble approaches for data streams, we also discuss advanced learning concepts such as imbalanced data streams, novelty detection, active and semi-supervised learning, complex data representations and structured outputs. The paper concludes with a discussion of open research problems and lines of future research.  相似文献   

18.
为提高软件缺陷严重程度的预测性能,通过充分考虑软件缺陷严重程度标签间的次序性,提出一种基于有序回归的软件缺陷严重程度预测方法ORESP.该方法首先使用基于Spearman的特征选择方法来识别并移除数据集内的冗余特征,随后使用基于比例优势模型的神经网络来构建预测模型.通过与五种经典分类方法的比较,所提的ORESP方法在四种不同类型的度量下均可取得更高的预测性能,其中基于平均0-1误差(MZE)评测指标,预测模型性能最大可提升10.3%;基于平均绝对误差(MAE)评测指标,预测模型性能最大可提升12.3%.除此之外,发现使用基于Spearman的特征选择方法可以有效提升ORESP方法的预测性能.  相似文献   

19.
有序回归是一种特殊的机器学习范式,其目标是利用类间内在的有序标号来划分模式。尽管已有众多有序学习方法相继被提出,但其性能常受制于有限的训练样本。借鉴最近提出的边际特征扰动思想,通过对训练样本的输入和输出分别施加已知分布噪声的随机扰动和确定偏差的可控扰动,以弥补样本有限的不足,进而在最小平方有序回归基础上发展出采用双重特征扰动的最小平方有序回归(least squares ordinal regres-sion using doubly corrupted features,LSOR-DCF)。实验结果表明,LSOR-DCF性能优于无扰动或单一输入/输出的扰动,且在小数据集上表现得尤其明显。  相似文献   

20.
Predicting corporate credit-rating using statistical and artificial intelligence (AI) techniques has received considerable research attention in the literature. In recent years, multi-class support vector machines (MSVMs) have become a very appealing machine-learning approach due to their good performance. Until now, researchers have proposed a variety of techniques for adapting support vector machines (SVMs) to multi-class classification, since SVMs were originally devised for binary classification. However, most of them have only focused on classifying samples into nominal categories; thus, the unique characteristic of credit-rating - ordinality - seldom has been considered in the proposed approaches. This study proposes a new type of MSVM classifier (named OMSVM) that is designed to extend the binary SVMs by applying an ordinal pairwise partitioning (OPP) strategy. Our model can efficiently and effectively handle multiple ordinal classes. To validate OMSVM, we applied it to a real-world case of bond rating. We compared the results of our model with those of conventional MSVM approaches and other AI techniques including MDA, MLOGIT, CBR, and ANNs. The results showed that our proposed model improves the performance of classification in comparison to other typical multi-class classification techniques and uses fewer computational resources.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号