首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 21 毫秒
1.
以R-S不等保护码为研究对象,在分析码空间特性的基础上,着重研究编译码算法。编码时,利用分离最小理想的方法构造出信息元距离的不均匀性。译码时,如果接收码字中错误码元的数目小于或等于码空间的最低保护能力,则利用一般译码算法译码;否则对高保护等级信息元的值进行假设,并利用低保护等级的子空间验证该假设,用试探法找到满足验证条件的高保护等级信息元的值。仿真显示,该编译码算法对R-S不等保护码是有效的,它可以在不改变编码效率的前提下为高保护等级的信息元提供更好的误码性能。  相似文献   

2.
Classifier chains for multi-label classification   总被引:5,自引:0,他引:5  
The widely known binary relevance method for multi-label classification, which considers each label as an independent binary problem, has often been overlooked in the literature due to the perceived inadequacy of not directly modelling label correlations. Most current methods invest considerable complexity to model interdependencies between labels. This paper shows that binary relevance-based methods have much to offer, and that high predictive performance can be obtained without impeding scalability to large datasets. We exemplify this with a novel classifier chains method that can model label correlations while maintaining acceptable computational complexity. We extend this approach further in an ensemble framework. An extensive empirical evaluation covers a broad range of multi-label datasets with a variety of evaluation metrics. The results illustrate the competitiveness of the chaining method against related and state-of-the-art methods, both in terms of predictive performance and time complexity.  相似文献   

3.
Ensemble methods have been shown to be an effective tool for solving multi-label classification tasks. In the RAndom k-labELsets (RAKEL) algorithm, each member of the ensemble is associated with a small randomly-selected subset of k labels. Then, a single label classifier is trained according to each combination of elements in the subset. In this paper we adopt a similar approach, however, instead of randomly choosing subsets, we select the minimum required subsets of k labels that cover all labels and meet additional constraints such as coverage of inter-label correlations. Construction of the cover is achieved by formulating the subset selection as a minimum set covering problem (SCP) and solving it by using approximation algorithms. Every cover needs only to be prepared once by offline algorithms. Once prepared, a cover may be applied to the classification of any given multi-label dataset whose properties conform with those of the cover. The contribution of this paper is two-fold. First, we introduce SCP as a general framework for constructing label covers while allowing the user to incorporate cover construction constraints. We demonstrate the effectiveness of this framework by proposing two construction constraints whose enforcement produces covers that improve the prediction performance of random selection by achieving better coverage of labels and inter-label correlations. Second, we provide theoretical bounds that quantify the probabilities of random selection to produce covers that meet the proposed construction criteria. The experimental results indicate that the proposed methods improve multi-label classification accuracy and stability compared to the RAKEL algorithm and to other state-of-the-art algorithms.  相似文献   

4.
Decision trees for hierarchical multi-label classification   总被引:3,自引:0,他引:3  
Hierarchical multi-label classification (HMC) is a variant of classification where instances may belong to multiple classes at the same time and these classes are organized in a hierarchy. This article presents several approaches to the induction of decision trees for HMC, as well as an empirical study of their use in functional genomics. We compare learning a single HMC tree (which makes predictions for all classes together) to two approaches that learn a set of regular classification trees (one for each class). The first approach defines an independent single-label classification task for each class (SC). Obviously, the hierarchy introduces dependencies between the classes. While they are ignored by the first approach, they are exploited by the second approach, named hierarchical single-label classification (HSC). Depending on the application at hand, the hierarchy of classes can be such that each class has at most one parent (tree structure) or such that classes may have multiple parents (DAG structure). The latter case has not been considered before and we show how the HMC and HSC approaches can be modified to support this setting. We compare the three approaches on 24 yeast data sets using as classification schemes MIPS’s FunCat (tree structure) and the Gene Ontology (DAG structure). We show that HMC trees outperform HSC and SC trees along three dimensions: predictive accuracy, model size, and induction time. We conclude that HMC trees should definitely be considered in HMC tasks where interpretable models are desired.  相似文献   

5.
Label embedding (LE) is an important family of multi-label classification algorithms that digest the label information jointly for better performance. Different real-world applications evaluate performance by different cost functions of interest. Current LE algorithms often aim to optimize one specific cost function, but they can suffer from bad performance with respect to other cost functions. In this paper, we resolve the performance issue by proposing a novel cost-sensitive LE algorithm that takes the cost function of interest into account. The proposed algorithm, cost-sensitive label embedding with multidimensional scaling (CLEMS), approximates the cost information with the distances of the embedded vectors by using the classic multidimensional scaling approach for manifold learning. CLEMS is able to deal with both symmetric and asymmetric cost functions, and effectively makes cost-sensitive decisions by nearest-neighbor decoding within the embedded vectors. We derive theoretical results that justify how CLEMS achieves the desired cost-sensitivity. Furthermore, extensive experimental results demonstrate that CLEMS is significantly better than a wide spectrum of existing LE algorithms and state-of-the-art cost-sensitive algorithms across different cost functions.  相似文献   

6.
Fan  Xiaodong  Chen  Xiangyue  Wang  Changzhong  Wang  Yang  Zhang  Ying 《Applied Intelligence》2022,52(6):6079-6092

Multi-label classification is a typical supervised machine learning problem and widely applied in text classification and image recognition. When there are redundant attributes in the data, the efficiency of classification will be reduced. However, the existing attribute reduction algorithms have high computational complexity. This paper aims to design an efficient attribute reduction algorithm. The k pairs of boundary samples were selected from the positive and negative classes respectively, and the distance between each pair was calculated as the evaluation of attributes. By maximizing the evaluation function, the definition of reduction and the design of the algorithm were established. The comparison experiment is carried out on eight generic multi-label data. The experimental results show that the attribute importance evaluation defined in this paper can better represent the classification performance of the attribute for multi-label classification. The boundary samples can better reflect the classification effect of attributes. The proposed model avoids the point-by-point statistics of all samples’ information and improves the computational efficiency.

  相似文献   

7.
In classic pattern recognition problems, classes are mutually exclusive by definition. Classification errors occur when the classes overlap in the feature space. We examine a different situation, occurring when the classes are, by definition, not mutually exclusive. Such problems arise in semantic scene and document classification and in medical diagnosis. We present a framework to handle such problems and apply it to the problem of semantic scene classification, where a natural scene may contain multiple objects such that the scene can be described by multiple class labels (e.g., a field scene with a mountain in the background). Such a problem poses challenges to the classic pattern recognition paradigm and demands a different treatment. We discuss approaches for training and testing in this scenario and introduce new metrics for evaluating individual examples, class recall and precision, and overall accuracy. Experiments show that our methods are suitable for scene classification; furthermore, our work appears to generalize to other classification problems of the same nature.  相似文献   

8.
Statistical topic models for multi-label document classification   总被引:2,自引:0,他引:2  
Machine learning approaches to multi-label document classification have to date largely relied on discriminative modeling techniques such as support vector machines. A?drawback of these approaches is that performance rapidly drops off as the total number of labels and the number of labels per document increase. This problem is amplified when the label frequencies exhibit the type of highly skewed distributions that are often observed in real-world datasets. In this paper we investigate a class of generative statistical topic models for multi-label documents that associate individual word tokens with different labels. We investigate the advantages of this approach relative to discriminative models, particularly with respect to classification problems involving large numbers of relatively rare labels. We compare the performance of generative and discriminative approaches on document labeling tasks ranging from datasets with several thousand labels to datasets with tens of labels. The experimental results indicate that probabilistic generative models can achieve competitive multi-label classification performance compared to discriminative methods, and have advantages for datasets with many labels and skewed label frequencies.  相似文献   

9.
It has recently been suggested that assuming independence between labels is not suitable for real-world multi-label classification. To account for label dependencies, this paper proposes a supervised topic modeling algorithm, namely labelset topic model (LsTM). Our algorithm uses two labelset layers to capture label dependencies. LsTM offers two major advantages over existing supervised topic modeling algorithms: it is straightforward to interpret and it allows words to be assigned to combinations of labels, rather than a single label. We have performed extensive experiments on several well-known multi-label datasets. Experimental results indicate that the proposed model achieves performance on par with and often exceeding that of state-of-the-art methods both qualitatively and quantitatively.  相似文献   

10.
Several meta-learning techniques for multi-label classification (MLC), such as chaining and stacking, have already been proposed in the literature, mostly aimed at improving predictive accuracy through the exploitation of label dependencies. In this paper, we propose another technique of that kind, called dependent binary relevance (DBR) learning. DBR combines properties of both, chaining and stacking. We provide a careful analysis of the relationship between these and other techniques, specifically focusing on the underlying dependency structure and the type of training data used for model construction. Moreover, we offer an extensive empirical evaluation, in which we compare different techniques on MLC benchmark data. Our experiments provide evidence for the good performance of DBR in terms of several evaluation measures that are commonly used in MLC.  相似文献   

11.
In many important application domains such as text categorization, biomolecular analysis, scene classification and medical diagnosis, examples are naturally associated with more than one class label, giving rise to multi-label classification problems. This fact has led, in recent years, to a substantial amount of research on feature selection methods that allow the identification of relevant and informative features for multi-label classification. However, the methods proposed for this task are scattered in the literature, with no common framework to describe them and to allow an objective comparison. Here, we revisit a categorization of existing multi-label classification methods and, as our main contribution, we provide a comprehensive survey and novel categorization of the feature selection techniques that have been created for the multi-label classification setting. We conclude this work with concrete suggestions for future research in multi-label feature selection which have been derived from our categorization and analysis.  相似文献   

12.
Feature selection for multi-label naive Bayes classification   总被引:4,自引:0,他引:4  
In multi-label learning, the training set is made up of instances each associated with a set of labels, and the task is to predict the label sets of unseen instances. In this paper, this learning problem is addressed by using a method called Mlnb which adapts the traditional naive Bayes classifiers to deal with multi-label instances. Feature selection mechanisms are incorporated into Mlnb to improve its performance. Firstly, feature extraction techniques based on principal component analysis are applied to remove irrelevant and redundant features. After that, feature subset selection techniques based on genetic algorithms are used to choose the most appropriate subset of features for prediction. Experiments on synthetic and real-world data show that Mlnb achieves comparable performance to other well-established multi-label learning algorithms.  相似文献   

13.
Currently a consensus on multi-label classification is to exploit label correlations for performance improvement. Many approaches build one classifier for each label based on the one-versus-all strategy, and integrate classifiers by enforcing a regularization term on the global weights to exploit label correlations. However, this strategy might be suboptimal since it may be only part of the global weights that support the assumption. This paper proposes clustered intrinsic label correlations for multi-label classification (CILC), which extends traditional support vector machine to the multi-label setting. The predictive function of each classifier consists of two components: one component is the common information among all labels, and the other component is a label-specific one which highly depends on the corresponding label. The label-specific one representing the intrinsic label correlations is regularized by clustered structure assumption. The appealing features of the proposed method are that it separates the common information and the label-specific information of the labels and utilizes clustered structures among labels represented by the label-specific parts. The practical multi-label classification problems can be directly solved by the proposed CILC method, such as text categorization, image annotation and sentiment analysis. Experiments across five data sets validate the effectiveness of CILC, compared with six well-established multi-label classification algorithms.  相似文献   

14.
针对基于概率统计的ML-kNN算法只能对每个独立的标签进行分析,忽略了真实世界中标签间的相关性,提出了一种联系标签相关性的ML-kNN算法(S-ML-kNN).该方法对训练集进行扩展,并按照标签间的二阶组合来构造新的标签,融合了标签之间的相关性.实验结果表明,S-ML-kNN算法优于ML-kNN算法.  相似文献   

15.
Chu  Hong-Min  Huang  Kuan-Hao  Lin  Hsuan-Tien 《Machine Learning》2019,108(8-9):1193-1230

We study multi-label classification (MLC) with three important real-world issues: online updating, label space dimension reduction (LSDR), and cost-sensitivity. Current MLC algorithms have not been designed to address these three issues simultaneously. In this paper, we propose a novel algorithm, cost-sensitive dynamic principal projection (CS-DPP) that resolves all three issues. The foundation of CS-DPP is an online LSDR framework derived from a leading LSDR algorithm. In particular, CS-DPP is equipped with an efficient online dimension reducer motivated by matrix stochastic gradient, and establishes its theoretical backbone when coupled with a carefully-designed online regression learner. In addition, CS-DPP embeds the cost information into label weights to achieve cost-sensitivity along with theoretical guarantees. Experimental results verify that CS-DPP achieves better practical performance than current MLC algorithms across different evaluation criteria, and demonstrate the importance of resolving the three issues simultaneously.

  相似文献   

16.
Multi-label text classification is an increasingly important field as large amounts of text data are available and extracting relevant information is important in many application contexts. Probabilistic generative models are the basis of a number of popular text mining methods such as Naive Bayes or Latent Dirichlet Allocation. However, Bayesian models for multi-label text classification often are overly complicated to account for label dependencies and skewed label frequencies while at the same time preventing overfitting. To solve this problem we employ the same technique that contributed to the success of deep learning in recent years: greedy layer-wise training. Applying this technique in the supervised setting prevents overfitting and leads to better classification accuracy. The intuition behind this approach is to learn the labels first and subsequently add a more abstract layer to represent dependencies among the labels. This allows using a relatively simple hierarchical topic model which can easily be adapted to the online setting. We show that our method successfully models dependencies online for large-scale multi-label datasets with many labels and improves over the baseline method not modeling dependencies. The same strategy, layer-wise greedy training, also makes the batch variant competitive with existing more complex multi-label topic models.  相似文献   

17.
Li  Zhao  Lu  Wei  Sun  Zhanquan  Xing  Weiwei 《Multimedia Tools and Applications》2018,77(5):6079-6094
Multimedia Tools and Applications - Multi-label classification is one of the most challenging tasks in the computer vision community, owing to different composition and interaction (e.g. partial...  相似文献   

18.
The problem of automated video categorization in large datasets is considered in the paper. A new Iterative Multi-label Propagation (IMP) algorithm for relational learning in multi-label data is proposed. Based on the information of the already categorized videos and their relations to other videos, the system assigns suitable categories—multiple labels to the unknown videos. The MapReduce approach to the IMP algorithm described in the paper enables processing of large datasets in parallel computing. The experiments carried out on 5-million videos dataset revealed the good efficiency of the multi-label classification for videos categorization. They have additionally shown that classification of all unknown videos required only several parallel iterations.  相似文献   

19.
Many challenging real world problems involve multi-label data streams. Efficient methods exist for multi-label classification in non-streaming scenarios. However, learning in evolving streaming scenarios is more challenging, as classifiers must be able to deal with huge numbers of examples and to adapt to change using limited time and memory while being ready to predict at any point. This paper proposes a new experimental framework for learning and evaluating on multi-label data streams, and uses it to study the performance of various methods. From this study, we develop a multi-label Hoeffding tree with multi-label classifiers at the leaves. We show empirically that this method is well suited to this challenging task. Using our new framework, which allows us to generate realistic multi-label data streams with concept drift (as well as real data), we compare with a selection of baseline methods, as well as new learning methods from the literature, and show that our Hoeffding tree method achieves fast and more accurate performance.  相似文献   

20.
Hybrid strategy, which generalizes a specific single-label algorithm while one or two data decomposition tricks are applied implicitly or explicitly, has become an effective and efficient tool to design and implement various multi-label classification algorithms. In this paper, we extend traditional binary support vector machine by introducing an approximate ranking loss as its empirical loss term to build a novel support vector machine for multi-label classification, resulting into a quadratic programming problem with different upper bounds of variables to characterize label correlation of individual instance. Further, our optimization problem can be solved via combining one-versus-rest data decomposition trick with modified binary support vector machine, which dramatically reduces computational cost. Experimental study on ten multi-label data sets illustrates that our method is a powerful candidate for multi-label classification, compared with four state-of-the-art multi-label classification approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号

京公网安备 11010802026262号