首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In multi-label classification, examples can be associated with multiple labels simultaneously. The task of learning from multi-label data can be addressed by methods that transform the multi-label classification problem into several single-label classification problems. The binary relevance approach is one of these methods, where the multi-label learning task is decomposed into several independent binary classification problems, one for each label in the set of labels, and the final labels for each example are determined by aggregating the predictions from all binary classifiers. However, this approach fails to consider any dependency among the labels. Aiming to accurately predict label combinations, in this paper we propose a simple approach that enables the binary classifiers to discover existing label dependency by themselves. An experimental study using decision trees, a kernel method as well as Naïve Bayes as base-learning techniques shows the potential of the proposed approach to improve the multi-label classification performance.  相似文献   

2.
传统的多标签分类算法是以二值标签预测为基础的,而二值标签由于仅能指示数据是否具有相关类别,所含语义信息较少,无法充分表示标签语义信息。为充分挖掘标签空间的语义信息,提出了一种基于非负矩阵分解和稀疏表示的多标签分类算法(MLNS)。该算法结合非负矩阵分解与稀疏表示技术,将数据的二值标签转化为实值标签,从而丰富标签语义信息并提升分类效果。首先,对标签空间进行非负矩阵分解以获得标签潜在语义空间,并将标签潜在语义空间与原始特征空间结合以形成新的特征空间;然后,对此特征空间进行稀疏编码来获得样本间的全局相似关系;最后,利用该相似关系重构二值标签向量,从而实现二值标签与实值标签的转化。在5个标准多标签数据集和5个评价指标上将所提算法与MLBGM、ML2、LIFT和MLRWKNN等算法进行对比。实验结果表明,所提MLNS在多标签分类中优于对比的多标签分类算法,在50%的案例中排名第一,在76%的案例中排名前二,在全部的案例中排名前三。  相似文献   

3.
Liu  Haiyang  Wang  Zhihai  Sun  Yange 《Neural computing & applications》2020,32(22):16763-16774

Exploiting dependencies between the labels is the key of improving the performance of multi-label classification. In this paper, we divide the utilizing methods of label dependence into two groups from the perspective of different ways of problem transformation: label grouping method and feature space extending method. As to the feature space extending method, we find that the common problem is how to measure the dependencies between labels and to select proper labels to add to the original feature space. Therefore, we propose a ReliefF-based pruning model for multi-label classification (ReliefF-based stacking, RFS). RFS measures the dependencies between labels in a feature selection perspective and then selects the more relative labels into the original feature space. Experimental results of 9 multi-label benchmark datasets shows that RFS is more effective compared to other advanced multi-label classification algorithms.

  相似文献   

4.
Statistical topic models for multi-label document classification   总被引:2,自引:0,他引:2  
Machine learning approaches to multi-label document classification have to date largely relied on discriminative modeling techniques such as support vector machines. A?drawback of these approaches is that performance rapidly drops off as the total number of labels and the number of labels per document increase. This problem is amplified when the label frequencies exhibit the type of highly skewed distributions that are often observed in real-world datasets. In this paper we investigate a class of generative statistical topic models for multi-label documents that associate individual word tokens with different labels. We investigate the advantages of this approach relative to discriminative models, particularly with respect to classification problems involving large numbers of relatively rare labels. We compare the performance of generative and discriminative approaches on document labeling tasks ranging from datasets with several thousand labels to datasets with tens of labels. The experimental results indicate that probabilistic generative models can achieve competitive multi-label classification performance compared to discriminative methods, and have advantages for datasets with many labels and skewed label frequencies.  相似文献   

5.
Multi-label classification exhibits several challenges not present in the binary case. The labels may be interdependent, so that the presence of a certain label affects the probability of other labels’ presence. Thus, exploiting dependencies among the labels could be beneficial for the classifier’s predictive performance. Surprisingly, only a few of the existing algorithms address this issue directly by identifying dependent labels explicitly from the dataset. In this paper we propose new approaches for identifying and modeling existing dependencies between labels. One principal contribution of this work is a theoretical confirmation of the reduction in sample complexity that is gained from unconditional dependence. Additionally, we develop methods for identifying conditionally and unconditionally dependent label pairs; clustering them into several mutually exclusive subsets; and finally, performing multi-label classification incorporating the discovered dependencies. We compare these two notions of label dependence (conditional and unconditional) and evaluate their performance on various benchmark and artificial datasets. We also compare and analyze labels identified as dependent by each of the methods. Moreover, we define an ensemble framework for the new methods and compare it to existing ensemble methods. An empirical comparison of the new approaches to existing base-line and state-of-the-art methods on 12 various benchmark datasets demonstrates that in many cases the proposed single-classifier and ensemble methods outperform many multi-label classification algorithms. Perhaps surprisingly, we discover that the weaker notion of unconditional dependence plays the decisive role.  相似文献   

6.
研究者目前通常通过标注标签之间的相关信息研究标签之间的相关性,未考虑未标注标签与标注标签之间的关系对标签集质量的影响.受K近邻的启发,文中提出近邻标签空间的非平衡化标签补全算法(NeLC-NLS),旨在充分利用近邻空间中元素的相关性,提升近邻标签空间的质量,从而提升多标签分类性能.首先利用标签之间的信息熵衡量标签之间关系的强弱,获得基础标签置信度矩阵.然后利用提出的非平衡标签置信度矩阵计算方法,获得包含更多信息的非平衡标签置信度矩阵.继而度量样本在特征空间中的相似度,得到k个近邻标签空间样本,并利用非平衡标签置信度矩阵计算得到近邻标签空间的标签补全矩阵.最后利用极限学习机作为线性分类器进行分类.在公开的8个基准多标签数据集上的实验表明,NeLC-NLS具有一定优势,使用假设检验和稳定性分析进一步说明算法的有效性.  相似文献   

7.
Hybrid strategy, which generalizes a specific single-label algorithm while one or two data decomposition tricks are applied implicitly or explicitly, has become an effective and efficient tool to design and implement various multi-label classification algorithms. In this paper, we extend traditional binary support vector machine by introducing an approximate ranking loss as its empirical loss term to build a novel support vector machine for multi-label classification, resulting into a quadratic programming problem with different upper bounds of variables to characterize label correlation of individual instance. Further, our optimization problem can be solved via combining one-versus-rest data decomposition trick with modified binary support vector machine, which dramatically reduces computational cost. Experimental study on ten multi-label data sets illustrates that our method is a powerful candidate for multi-label classification, compared with four state-of-the-art multi-label classification approaches.  相似文献   

8.
在多标记分类问题当中,多标记分类器的目的是为实例预测一个与其关联的标记集合。典型方法之一是将多标记分类问题转化为多个二类分类问题,这些二类分类器之间可以存在一定的关系。简单地考虑标记间依赖关系可以在一定程度上改善分类性能,但同时计算复杂度也是必须考虑的问题。该文提出了一种利用多标记间依赖关系的有序分类器集合算法,该算法通过启发式的搜索策略寻找分类器之间的某种次序,这种次序可以更好地反映标记间的依赖关系。在实验中,该文选取了来自不同领域的数据集和多个评价指标,实验结果表明该文所提出的算法比一般多标记分类算法具有更好的分类性能。  相似文献   

9.
随着大数据技术的快速发展,多标签文本分类在司法领域也催生出诸多应用.在法律文本中通常存在多个要素标签,标签之间往往具有相互依赖性或相关性,准确识别这些标签需要多标签分类方法的支持.因此,文中提出融合标签关系的法律文本多标签分类方法.方法构建标签的共现矩阵,利用图卷积网络捕捉标签之间的依赖关系,并结合标签注意力机制,计算法律文本和标签每个词的相关程度,得到特定标签的法律文本语义表示.最后,融合标签图构建的依赖关系和特定标签的法律文本语义表示,对文本进行综合表示,实现文本的多标签分类.在法律数据集上的实验表明,文中方法获得较好的分类精度和稳定性.  相似文献   

10.
Currently a consensus on multi-label classification is to exploit label correlations for performance improvement. Many approaches build one classifier for each label based on the one-versus-all strategy, and integrate classifiers by enforcing a regularization term on the global weights to exploit label correlations. However, this strategy might be suboptimal since it may be only part of the global weights that support the assumption. This paper proposes clustered intrinsic label correlations for multi-label classification (CILC), which extends traditional support vector machine to the multi-label setting. The predictive function of each classifier consists of two components: one component is the common information among all labels, and the other component is a label-specific one which highly depends on the corresponding label. The label-specific one representing the intrinsic label correlations is regularized by clustered structure assumption. The appealing features of the proposed method are that it separates the common information and the label-specific information of the labels and utilizes clustered structures among labels represented by the label-specific parts. The practical multi-label classification problems can be directly solved by the proposed CILC method, such as text categorization, image annotation and sentiment analysis. Experiments across five data sets validate the effectiveness of CILC, compared with six well-established multi-label classification algorithms.  相似文献   

11.
Multi-label learning deals with problems where each example is represented by a single instance while being associated with multiple class labels simultaneously. Binary relevance is arguably the most intuitive solution for learning from multi-label examples. It works by decomposing the multi-label learning task into a number of independent binary learning tasks (one per class label). In view of its potential weakness in ignoring correlations between labels, many correlation-enabling extensions to binary relevance have been proposed in the past decade. In this paper, we aim to review the state of the art of binary relevance from three perspectives. First, basic settings for multi-label learning and binary relevance solutions are briefly summarized. Second, representative strategies to provide binary relevancewith label correlation exploitation abilities are discussed. Third, some of our recent studies on binary relevance aimed at issues other than label correlation exploitation are introduced. As a conclusion, we provide suggestions on future research directions.  相似文献   

12.
近年来,多标签分类任务(MLC)受到了广泛关注。传统的情感预测被视为一种单标签的监督学习,而忽视了多种情感可能在同一实例中共存的问题。以往的多标签情感预测方法没有同时提取文本的局部特征和全局语义信息,或未考虑标签之间的相关性。基于此,该文提出了一种基于神经网络融合标签相关性的多标签情感预测模型(Label-CNNLSTMAttention,L-CLA),利用Word2Vec方法训练词向量,将CNN和LSTM相结合,通过CNN层挖掘文本更深层次的词语特征,通过LSTM层学习词语之间的长期依赖关系,利用Attention机制为情意词特征分配更高的权重。同时,用标签相关矩阵将标签特征向量补全后与文本特征共同作为分类器的输入,考察了标签之间的相关性。实验结果表明,L-CLA模型在重新标注后的NLP&CC2013数据集上拥有较好的分类效果。  相似文献   

13.
在多标记分类中,标签与标签之间的相关关系是影响分类效果的一个重要因子。而传统的经典多标签分类方法如BR算法,ML-KNN算法等,忽略了标签之间的相关关系对实际分类的影响,分类效果一直不能令人满意。面对类别关联度极高的不良信息的多标签分类,分类效果更是大打折扣。针对上述问题,本文改进一种经典的多标签分类算法RAkEL,首先根据训练文本计算出各标签之间的相似度系数,然后再根据自定义不良信息层次关系计算出综合标签相似度系数矩阵,最后在RAkEL算法投票过程中根据综合标签相似度与中心标签重新确定最终的结果标签集合。与传统的分类方法在真实的语料库上进行多标签分类效果对比,实验证明,该方法在对不良信息分类具有较好的效果。  相似文献   

14.
Cheng  Yusheng  Song  Fan  Qian  Kun 《Applied Intelligence》2021,51(10):6997-7015

For a multi-label learning framework, each instance may belong to multiple labels simultaneously. The classification accuracy can be improved significantly by exploiting various correlations, such as label correlations, feature correlations, or the correlations between features and labels. There are few studies on how to combine the feature and label correlations, and they deal more with complete data sets. However, missing labels or other phenomena often occur because of the cost or technical limitations in the data acquisition process. A few label completion algorithms currently suitable for missing multi-label learning, ignore the noise interference of the feature space. At the same time, the threshold of the discriminant function often affects the classification results, especially those of the labels near the threshold. All these factors pose considerable difficulties in dealing with missing labels using label correlations. Therefore, we propose a missing multi-label learning algorithm with non-equilibrium based on a two-level autoencoder. First, label density is introduced to enlarge the classification margin of the label space. Then, a new supplementary label matrix is augmented from the missing label matrix with the non-equilibrium label completion method. Finally, considering feature space noise, a two-level kernel extreme learning machine autoencoder is constructed to implement the information feature and label correlation. The effectiveness of the proposed algorithm is verified by many experiments on both missing and complete label data sets. A statistical analysis of hypothesis validates our approach.

  相似文献   

15.
Rakel(Random k-labelsets)算法从原始标签集中随机选择一部分标签子集,并且使用LP(Label Powerset)算法训练相应的多标签子分类器。由于随机选择标签的原因,导致LP子分类器预测性能不好。本文基于标签的共现关系选择成对标签来训练LP分类器,提出PwRakel(Pairwise Random k-labelsets)算法。该算法通过挖掘标签相关性扩展训练集,有效提高分类性能。实验结果表明,所提出的算法与Rakel算法以及其他算法对比,分类准确度更高。  相似文献   

16.
在多标记学习系统中,每个样本同时与多个类别标记相关,却均由一个属性特征向量描述。大部分已有的多标记分类算法采用的共同策略是使用相同的属性特征集合预测所有的类别标记,但它并非最佳选择,原因在于每个标记可能与其自身独有的属性特征相关性最大。针对这一问题,提出了融合标记独有属性特征的k近邻多标记分类算法—IML-kNN。首先对多标记数据的特征向量进行预处理,分别为每类标记构造对该类标记最具有判别能力的属性特征;然后基于得到的属性特征使用改进后的ML-kNN算法进行分类。实验结果表明,IML-kNN算法在yeast和image数据集上的性能明显优于ML-kNN算法以及其他3种常用的多标记分类算法。  相似文献   

17.
传统的多标签文本分类算法在挖掘标签的关联信息和提取文本与标签之间的判别信息过程中存在不足,由此提出一种基于标签组合的预训练模型与多粒度融合注意力的多标签文本分类算法。通过标签组合的预训练模型训练得到具有标签关联性的文本编码器,使用门控融合策略融合预训练语言模型和词向量得到词嵌入表示,送入预训练编码器中生成基于标签语义的文本表征。通过自注意力和多层空洞卷积增强的标签注意力分别得到全局信息和细粒度语义信息,自适应融合后输入到多层感知机进行多标签预测。在特定威胁识别数据集和两个通用多标签文本分类数据集上的实验结果表明,提出的方法在能够有效捕捉标签与文本之间的关联信息,并在F1值、汉明损失和召回率上均取得了明显提升。  相似文献   

18.
针对标签随着时间变化的动态多标签文本分类问题,提出了一种基于标签语义相似的动态多标签文本分类算法。该算法在训练阶段,首先按照标签固定训练得到一个基于卷积神经网络的多标签文本分类器,然后以该分类器的倒数第二层的输出为文本的特征向量。由于该特征向量是在有标签训练得到的,因而相对于基于字符串即文本内容而言,该特征向量含有标签语义信息。在测试阶段,将测试文档输入训练阶段的多标签文本分类器获取相应的特征向量,然后计算相似性,同时乘以时间衰减因子修正,使得时间越近的文本具有较高的相似性。最后,采用最近邻算法分类。实验结果表明,该算法在处理动态多标签文本分类问题上具有较优的性能。  相似文献   

19.
在多标记学习中,每个样本都由一个实例表示,并与多个类标记相关联。现有的多标记学习算法大多是在全局利用标记相关性,即假设所有的样本共享不同类别标记之间的正相关性。然而,在实际应用中,不同的样本共享不同的标记相关性,标记间不仅存在正相关性,而且存在相互排斥的现象,即负相关性。针对这一问题,提出了基于局部正、负成对标记相关性的k近邻多标记分类算法PNLC。首先,对多标记数据的特征向量进行预处理,分别为每类标记构造对该类标记最具有判别能力的属性特征;然后,在训练阶段,PNLC算法通过所有训练样本中各样本的每个k近邻的真实标记构建标记之间的正、负局部成对相关性矩阵;最后,在测试阶段,首先得到每个测试样例的k近邻及其对应的正、负成对标记关系,利用该标记关系计算最大后验概率对测试样例进行预测。实验结果表明,PNLC算法在yeast和image数据集上的分类准确率明显优于其他常用的多标记分类算法。  相似文献   

20.
多示例多标记是一种新的机器学习框架,在该框架下一个对象用多个示例来表示,同时与多个类别标记相关联。MIMLSVM+算法将多示例多标记问题转化为一系列独立的二类分类问题,但是在退化过程中标记之间的联系信息会丢失,而E-MIMLSVM+算法则通过引入多任务学习技术对MIMLSVM+算法进行了改进。为了充分利用未标记样本来提高分类准确率,使用半监督支持向量机TSVM对E-MIMLSVM+算法进行了改进。通过实验将该算法与其他多示例多标记算法进行了比较,实验结果显示,改进算法取得了良好的分类效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号