首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 140 毫秒
1.
Text categorization presents unique challenges to traditional classification methods due to the large number of features inherent in the datasets from real-world applications of text categorization, and a great deal of training samples. In high-dimensional document data, the classes are typically categorized only by subsets of features, which are typically different for the classes of different topics. This paper presents a simple but effective classifier for text categorization using class-dependent projection based method. By projecting onto a set of individual subspaces, the samples belonging to different document classes are separated such that they are easily to be classified. This is achieved by developing a new supervised feature weighting algorithm to learn the optimized subspaces for all the document classes. The experiments carried out on common benchmarking corpuses showed that the proposed method achieved both higher classification accuracy and lower computational costs than some distinguishing classifiers in text categorization, especially for datasets including document categories with overlapping topics.  相似文献   

2.
何丽  刘军 《计算机工程》2006,32(20):4-6
提出了一种基于概念特征向量的NB文档分类方法。该方法在未标注文档集上通过SOM(Self-Organizing Maps)聚类产生若干初始文档类,并为每个文档类分配一个类标签,使用最大信息熵的方法建立每个文档类的概念特征向量。在概念特征向量空间上建立最终的文档分类器:CFB-NB。  相似文献   

3.
Grouping images into semantically meaningful categories using low-level visual features is a challenging and important problem in content-based image retrieval. Based on these groupings, effective indices can be built for an image database. In this paper, we show how a specific high-level classification problem (city images vs landscapes) can be solved from relatively simple low-level features geared for the particular classes. We have developed a procedure to qualitatively measure the saliency of a feature towards a classification problem based on the plot of the intra-class and inter-class distance distributions. We use this approach to determine the discriminative power of the following features: color histogram, color coherence vector, DCT coefficient, edge direction histogram, and edge direction coherence vector. We determine that the edge direction-based features have the most discriminative power for the classification problem of interest here. A weighted k-NN classifier is used for the classification which results in an accuracy of 93.9% when evaluated on an image database of 2716 images using the leave-one-out method. This approach has been extended to further classify 528 landscape images into forests, mountains, and sunset/sunrise classes. First, the input images are classified as sunset/sunrise images vs forest & mountain images (94.5% accuracy) and then the forest & mountain images are classified as forest images or mountain images (91.7% accuracy). We are currently identifying further semantic classes to assign to images as well as extracting low level features which are salient for these classes. Our final goal is to combine multiple 2-class classifiers into a single hierarchical classifier.  相似文献   

4.
We demonstrate a text-mining method, called associative Naïve Bayes (ANB) classifier, for automated linking of MEDLINE documents to gene ontology (GO). The approach of this paper is a nontrivial extension of document classification methodology from a fixed set of classes C={c1,c2,…,cn} to a knowledge hierarchy like GO. Due to the complexity of GO, we use a knowledge representation structure. With that structure, we develop the text mining classifier, called ANB classifier, which automatically links Medline documents to GO. To check the performance, we compare our datasets under several well-known classifiers: NB classifier, large Bayes classifier, support vector machine and ANB classifier. Our results, described in the following, indicate its practical usefulness.  相似文献   

5.
快速成型切片数据的优化算法研究   总被引:4,自引:0,他引:4  
为了能够顺利地进行 STL模型切片轮廓数据的进一步处理 ,提出了对切片数据进行优化处理的算法 .对由于STL模型的缺陷造成切片之后的轮廓信息数据有大量的冗余数据 ,提出了一种冗余数据的滤除算法 ;针对切片轮廓的不封闭 ,给出了有效的修正算法 ;同时给出了对切片轮廓的内外边界进行自动识别的算法 .该算法高效简单 ,提高了后续的数据处理的效率和成型件的加工质量 ,改善了零件成型的加工性能  相似文献   

6.
文档图像分割的研究对于打印、传真以及这样的数据处理工作具有十分重要的意义 .提出了一个文档图像分割的新算法 .分割算法的特征是基于根据文档图像中各种图像类型直方图的不同特性 .算法中重要的特征是通过小波图像来加强原始图像的特征 ,从而使得精确度提高  相似文献   

7.
提出一种基于关系权重的文本表示方法.通过优化关系权重,在文本向量中体现了不同特征项在不同类别中重要程度的差异,使得在此权重下不同类别的文本得到更准确的区分.运用SVM分类实验表明,基于关系权重的文本表示方法,较之传统的 TF-IDF 文本表示法,有更高的准确率和召回率.  相似文献   

8.
A new differential LSI space-based probabilistic document classifier   总被引:1,自引:0,他引:1  
We have developed a new effective probabilistic classifier for document classification by introducing the concept of differential document vectors and DLSI (differential latent semantic indexing) spaces. A combined use of the projections on and the distances to the DLSI spaces introduced from the differential document vectors improves the adaptability of the LSI (latent semantic indexing) method by capturing unique characteristics of documents. Using the intra- and extra-document statistics, both a simple posteriori calculation on a small example and an experiment on a large Reuters-21578 database demonstrate the advantage of the DLSI space-based probabilistic classifier over the LSI space-based classifier in classification performance.  相似文献   

9.
基于隐马尔可夫模型的文本分类算法   总被引:2,自引:0,他引:2  
杨健  汪海航 《计算机应用》2010,30(9):2348-2350
自动文本分类领域近年来已经产生了若干成熟的分类算法,但这些算法主要基于概率统计模型,没有与文本自身的语法和语义建立起联系。提出了将隐马尔可夫序列分析模型(HMM)用于自动文本分类的算法,首先构造表示文档类别的特征词集合,并以文档类别的特征词序列作为不同HMM分类器的观察序列,而HMM的状态转换序列则隐含地表示了不同类别文档内容的形成演化过程。分类时,具有最大生成概率的HMM分类器类标即为测试文档的分类结果。该算法构造的分类器模型一定程度上体现了不同类别文档的语法和语义特征,并可以实现多类别的自动文本分类,分类效率较高。  相似文献   

10.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号