共查询到19条相似文献,搜索用时 839 毫秒
1.
基于混淆矩阵的层次结构构造方法比较 总被引:1,自引:0,他引:1
根据混淆矩阵,采用层次聚类和混淆类别两种不同的策略构造文档类别层次结构,最后采用层次分类的方法进行实验.实验结果表明混淆类别策略优于层次聚类策略,对平面分类的查全率和查准率都有所提高. 相似文献
2.
3.
基于特征相关性的汉语文本自动分类模型的研究 总被引:17,自引:1,他引:17
本文提出一种基于预定义类别与文本特征之间相关性的自动分类算法,并在文中详细阐述了汉语文本自动分类模型的设计与实现过程。为测试分类模型实现性能,建立具有12类别的分类体系,并构造包含近500篇汉语新闻语料的测试集。实验结果表明,评价自动分类算法的两个重要指标:查全率和查准率,都比较令人满意。 相似文献
4.
在Web文本分类中当类别数量庞大或者类别复杂情况下,层次分类是一种有效的分类方法,但其不足之一是在大类正确划分的前提下,由于子类之间存在较多共性,导致分类精度下降.而层次结构本质决定了同一大类下的子类存在特征交叉现象,针对这一局限性,结合KNN的优越性能,提出了一种结合层次结构和KNN的Web文本分类方法.该方法通过建立层次结构模型(树形结构),分类时先从层次结构模型获得相似度最大的k0个类别,然后在k0个类别训练文档中抽取部分代表样本采用KNN算法,最后由一种改进的相似度计算方法决定最终的所属类别.实验表明,结合层次结构和KNN的方法在Web文本分类中能够获得较好的分类效果. 相似文献
5.
6.
7.
传统类别区分词特征选择算法以类间分散度和类内重要度作为度量指标,忽略了2个指标对特征评分函数的贡献权重往往不同这一事实,从而在一定程度上影响了特征选择效果。在类别区分词特征选择算法基础上,引入平衡因子,通过调节平衡因子来调整2个指标对特征评价函数的贡献权重,完成更加高效的特征选择,进而达到更好的文本分类效果。使用朴素贝叶斯算法进行文本分类,相比主流特征选择算法,改进算法在分类准确率、查准率、查全率和F1指标上都取得了可观的性能提升。
相似文献
8.
9.
分析了文本分类过程中存在的混淆类现象,主要研究混淆类的判别技术,进而改善文本分类的性能.首先,提出了一种基于分类错误分布的混淆类识别技术,识别预定义类别中的混淆类集合.为了有效判别混淆类,提出了一种基于判别能力的特征选取技术,通过评价某一特征对类别之间的判别能力实现特征选取.最后,通过基于两阶段的分类器设计框架,将初始分类器和混淆类分类器进行集成,组合了两个阶段的分类结果作为最后输出.混淆类分类器的激活条件是:当测试文本被初始分类器标注为混淆类类别时,即采用混淆类分类器进行重新判别.在比较实验中采用了Newsgroup和863中文评测语料,针对单标签、多类分类器.实验结果显示,该技术有效地改善了分类性能. 相似文献
10.
11.
12.
Ricardo Cerri Rodrigo C. Barros André C.P.L.F. de Carvalho 《Journal of Computer and System Sciences》2014
Hierarchical multi-label classification is a complex classification task where the classes involved in the problem are hierarchically structured and each example may simultaneously belong to more than one class in each hierarchical level. In this paper, we extend our previous works, where we investigated a new local-based classification method that incrementally trains a multi-layer perceptron for each level of the classification hierarchy. Predictions made by a neural network in a given level are used as inputs to the neural network responsible for the prediction in the next level. We compare the proposed method with one state-of-the-art decision-tree induction method and two decision-tree induction methods, using several hierarchical multi-label classification datasets. We perform a thorough experimental analysis, showing that our method obtains competitive results to a robust global method regarding both precision and recall evaluation measures. 相似文献
13.
卷积神经网络(convolutional neural network, CNN)被广泛用于图像分类任务中。大多数现有的CNN模型都按照N路分类器的形式训练。然而,不同类别之间总存在差异性限制了N路分类器的分类能力。为了解决上述问题,提出的神经网络模型将混淆树结构(confusion tree, CT)和CNN模型结合,设计了性能更强的基于混淆树的卷积神经网络模型(confusion tree CNN,CT-CNN)。该模型首先建立一个混淆树来对类别之间的混淆性进行建模;然后,将混淆树的分层结构嵌入到CNN模型中,通过这种方式可以引导CNN的训练过程更加关注混淆性强的类别集合。该模型在公共数据集上进行了评估,实验结果证明,CT-CNN能克服大规模数据类别间的分类难度分布不均匀的局限,在复杂大规模的分类任务中取得稳定的优秀表现。 相似文献
14.
声学场景分类技术可以通过在公共区域中录制的音频分析出它的录制环境, 在日常生活中发挥着重要的作用. 与传统分类问题类与类之间没有关系不同, 声学场景分类的类别间存在着层次结构关系(父类与子类), 如机场和购物中心的父类为室内. 而现有的方法在设计时并未考虑声学场景分类任务的这一特性, 忽略了父类和子类间的依赖关系. 因此, 本文利用声学场景类别间的层次结构关系, 提出了一种基于层次信息融合的声学场景分类方法. 该方法为父类和子类分别设计了单独的分类器, 在子类分类的过程中融合了父类的信息, 并设计了层次依赖损失来对预测的父类和子类不匹配的情况进行惩罚. 在TAU城市声学场景2020移动开发数据集上的实验结果表明, 基于层次信息融合的方法有效地提升了声学场景分类模型的性能, 分类准确率提升了1.1%. 相似文献
15.
Hierarchical classification of protein function with ensembles of rules and particle swarm optimisation 总被引:1,自引:1,他引:0
Nicholas Holden Alex A. Freitas 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2009,13(3):259-272
This paper focuses on hierarchical classification problems where the classes to be predicted are organized in the form of
a tree. The standard top-down divide and conquer approach for hierarchical classification consists of building a hierarchy
of classifiers where a classifier is built for each internal (non-leaf) node in the class tree. Each classifier discriminates
only between its child classes. After the tree of classifiers is built, the system uses them to classify test examples one
class level at a time, so that when the example is assigned a class at a given level, only the child classes need to be considered
at the next level. This approach has the drawback that, if a test example is misclassified at a certain class level, it will
be misclassified at deeper levels too. In this paper we propose hierarchical classification methods to mitigate this drawback.
More precisely, we propose a method called hierarchical ensemble of hierarchical rule sets (HEHRS), where different ensembles
are built at different levels in the class tree and each ensemble consists of different rule sets built from training examples
at different levels of the class tree. We also use a particle swarm optimisation (PSO) algorithm to optimise the rule weights
used by HEHRS to combine the predictions of different rules into a class to be assigned to a given test example. In addition,
we propose a variant of a method to mitigate the aforementioned drawback of top-down classification. These three types of
methods are compared against the standard top-down hierarchical classification method in six challenging bioinformatics datasets,
involving the prediction of protein function. Overall HEHRS with the rule weights optimised by the PSO algorithm obtains the
best predictive accuracy out of the four types of hierarchical classification method. 相似文献
16.
Tao Li Shenghuo Zhu Mitsunori Ogihara 《Journal of Intelligent Information Systems》2007,29(2):211-230
Automated text categorization has witnessed a booming interest with the exponential growth of information and the ever-increasing
needs for organizations. The underlying hierarchical structure identifies the relationships of dependence between different
categories and provides valuable sources of information for categorization. Although considerable research has been conducted
in the field of hierarchical document categorization, little has been done on automatic generation of topic hierarchies. In
this paper, we propose the method of using linear discriminant projection to generate more meaningful intermediate levels
of hierarchies in large flat sets of classes. The linear discriminant projection approach first transforms all documents onto
a low-dimensional space and then clusters the categories into hier- archies accordingly. The paper also investigates the effect
of using generated hierarchical structure for text classification. Our experiments show that generated hierarchies improve
classification performance in most cases. 相似文献
17.
Fingerprint classification is still a challenging problem due to large intra-class variability, small inter-class variability and the presence of noise. To deal with these difficulties, we propose a regularized orientation diffusion model for fingerprint orientation extraction and a hierarchical classifier for fingerprint classification in this paper. The proposed classification algorithm is composed of five cascading stages. The first stage rapidly distinguishes a majority of Arch by using complex filter responses. The second stage distinguishes a majority of Whorl by using core points and ridge line flow classifier. In the third stage, K-NN classifier finds the top two categories by using orientation field and complex filter responses. In the fourth stage, ridge line flow classifier is used to distinguish Loop from other classes except Whorl. SVM is adopted to make the final classification in the last stage. The regularized orientation diffusion model has been evaluated on a web-based automated evaluation system FVC-onGoing, and a promising result is obtained. The classification method has been evaluated on the NIST SD 4. It achieved a classification accuracy of 95.9% for five-class classification and 97.2% for four-class classification without rejection. 相似文献
18.
19.
提出了一种规则和隐马尔可夫模型相结合的音频分层分类算法,首先利用规则将新闻节目中的音频分为静音、语音和音乐三类,然后采用隐马尔可夫模型进一步将语音和音乐细分为男主持人语音、女主持人语音、交替报道、独白语音、现场语音和音乐六类。实验结果表明,男主持人语音、女主持人语音以及音乐的分类效果最好,查准率和查全率均可达90%以上;交替报道的分类性能最差,查准率为57.5%,查全率为79.3%;其他类别的分类性能居中,在70%~90%左右。与同类算法相比,该算法分类性能较高。 相似文献