首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
提出了一种新的Web文本聚类算法WTCA——基于自组织特征映射神经网络(SOM)的聚类算法。该算法分为训练SOM网络及聚类分析两个阶段,具有自稳定性,无须外界给出评价函数;能够识别概念空间中最有意义的特征,抗噪音能力强。该算法应用到现代远程教育网,可以对各类远程教育站点上收集的文本资料信息自动进行聚类分析;从海量Web文本信息源中快速有效地获取重要的知识。  相似文献   

2.
基于潜在语义分析和自组织特征映射神经网络(LSA—SOM),本文提出一种文本聚类方法。采用潜在语义分析的理论表示文本特征向量,以体现特征词的语义关系并实现特征向量的降维。利用SOM网络算法进行无监督自组织学习,并通过不断调节网络节点间的权向量来实现文本聚类。该方法不必预先给定聚类个数,可以在任意合适的位置生成一个新的类,克服传统方法中文本种类需要预先给定的缺点。  相似文献   

3.
文本聚类的核心问题是找到一种优化的聚类算法对文本向量进行聚类,是典型的高维数据聚类,提出一种基于自组织神经网络SOM和人工免疫网络aiNet的两阶段文本聚类算法TCBSA。新算法先用SOM神经网络进行聚类,把高维的文本数据映射到二维的平面上,然后再用aiNet对文本聚类。该方法利用SOM神经网络对高维数据降维的优点,克服了人工免疫网络对高维数据的聚类能力差的缺点。仿真实验结果表明该文本聚类算法不仅是可行的,而且具有一定的自适应能力和较好的聚类效果。  相似文献   

4.
基于SOM算法的文本聚类实现   总被引:2,自引:0,他引:2  
以自组织映射(Self-organizing Map,SOM)算法作为理论基础,实现对文本聚类,并采用U矩阵进行可视化表示。通过对聚类结果的分析,表明SOM算法具有较好的聚类效果。  相似文献   

5.
传统的K-均值算法聚类虽然速度快,在文本聚类中易于实现,但其同量地依赖于所有变量,聚类效果往往不尽如人意.为了克服这一缺点,提出一种改进的K-均值文本聚类算法,它在K-均值聚类过程中,向每一个聚类簇中的关键词自动计算添加一个权重,重要的关键词赋予较大的权重.经过实验测试,获得了一种基于子空间变量自动加权的适合文本数据聚类分析的改进算法,它不仅可以在大规模、高维和稀疏的文本数据上有效地进行聚类,还能够生成质量较高的聚类结果.实验结果表明基于子空间变量自动加权的K-均值文本聚类算法是有效的大规模文本数据聚类算法.  相似文献   

6.
随着微博用户的增多,微博平台的信息更新频繁,针对微博文本的数据稀疏性、新词多、用语不规范等特点,提出了基于SOM聚类的微博话题发现方法。首先从原始语料中对文本进行预处理,通过词向量模型对短文本进行特征提取,降低了向量维度过高带来的计算量繁重问题;然后,采用改进的SOM对话题进行聚类,该算法改善了传统文本聚类的不足,进而能够有效的发现话题。实验表明,该算法较传统文本聚类算法的综合指标F值有明显提高。  相似文献   

7.
Web文本挖掘系统及聚类分析算法   总被引:2,自引:0,他引:2  
朱克斌  唐菁  杨炳儒 《计算机工程》2004,30(13):138-139,183
给出了Web文本挖掘系统WTMS的系统总体结构图,开发并实现了基于SOM的Web文档层次聚类算法。同时结合现代远程教育背景实现了Web文本挖掘的原型系统。该系统可以对各类远程教育站点上收集的文本资料信息自动进行聚类挖掘,从而帮助人们快速进行文本信息导航,获取重要的知识。  相似文献   

8.
由于向量空间模型在文本聚类中的应用,而必须对文本特征进行降维。本方法首先利用特征的概率分布计算特征之间的相似度,在此基础上对特征进行聚类;然后在文本聚类的结果上计算各个特征的信息增益值;最后在各个特征类上取出一定比例的最重要的特征达到特征选择的目标。实验表明,该改进算法在聚类的准确度方面较以前的方法有所提高,可以有效地用于文本自动聚类。  相似文献   

9.
通过基于概念的聚类方法,对博客作者的情感极性进行分析。在知网情感词汇库的基础上,将概念引入向量空间模型。首先,提取博客文本情感词,利用基于情感词概念的向量空间模型完成对博客文本的表示。然后,使用k-means算法对博客文本进行聚类,完成对博客情感极性的分析。在向量空间模型中使用概念作为特征项,提高了对博客作者情感极性分析的精度。实验证明基于概念的向量空间模型比传统基于词语的向量空间模型在博客文本情感聚类上具有更好的性能。  相似文献   

10.
针对当前知识管理系统中知识树的创建和维护问题,设计了一种新的基于文本聚类的知识树构建方法。由于从传统的K-means和SOM等文本聚类的结果中难以提取知识树中节点对应的概念和词汇列表,选取PLSA方法进行聚类和知识层次树构建。实验表明,新方法除了在聚类精确度上优于传统方法,聚类结果还包含文档的主题与词汇之间的概率关系,因此新方法在聚类的同时,可以方便地提取知识树上每个节点对应的概念或概念集合。  相似文献   

11.
The self-organizing map (SOM) is an efficient tool for visualizing high-dimensional data. In this paper, the clustering and visualization capabilities of the SOM, especially in the analysis of textual data, i.e., document collections, are reviewed and further developed. A novel clustering and visualization approach based on the SOM is proposed for the task of text mining. The proposed approach first transforms the document space into a multidimensional vector space by means of document encoding. Afterwards, a growing hierarchical SOM (GHSOM) is trained and used as a baseline structure to automatically produce maps with various levels of detail. Following the GHSOM training, the new projection method, namely the ranked centroid projection (RCP), is applied to project the input vectors to a hierarchy of 2-D output maps. The RCP is used as a data analysis tool as well as a direct interface to the data. In a set of simulations, the proposed approach is applied to an illustrative data set and two real-world scientific document collections to demonstrate its applicability.  相似文献   

12.
Statistical pattern recognition techniques, supervised and unsupervised classification techniques being two good examples here, rely on the computations of similarity and distance metrics. The   distances are computed in a multi-dimensional space. The axes of this space in principle relate to the features inherent in the input data. Usually, such features are chosen by neural network developers, thereby introducing a possible bias. A method of automatically generating feature sets is discussed, with specific reference to the categorisation of streams of free-text news items. The feature sets were generated by a procedure that automatically selects a group of keywords based on a lexico-semantic analysis. Three different types of text streams – headlines only, news summaries and full news items including the body of the text –have been categorised using Self-Organising Feature Maps (SOFM). A method for assessing the discrimination ability of a SOFM, based on Fisher’s Linear Discriminant Rule suggests that the maps trained on vectors related to summaries only provides a fairly accurate cluster when compared with vectors related to full text. The use of summaries as document surrogates for document categorisation is suggested.  相似文献   

13.
何丽  刘军 《计算机工程》2006,32(20):4-6
提出了一种基于概念特征向量的NB文档分类方法。该方法在未标注文档集上通过SOM(Self-Organizing Maps)聚类产生若干初始文档类,并为每个文档类分配一个类标签,使用最大信息熵的方法建立每个文档类的概念特征向量。在概念特征向量空间上建立最终的文档分类器:CFB-NB。  相似文献   

14.
Rainfall prediction model using soft computing technique   总被引:6,自引:0,他引:6  
 Rainfall prediction in this paper is a spatial interpolation problem that makes use of the daily rainfall information to predict volume of rainfall at unknown locations within area covered by existing observations. This paper proposed the use of self-organising map (SOM), backpropagation neural networks (BPNN) and fuzzy rule systems to perform rainfall spatial interpolation based on local method. The SOM is first used to separate the whole data space into some local surface automatically without any knowledge from the analyst. In each sub-surface, the complexity of the whole data space is reduced to something more homogeneous. After classification, BPNNs are then use to learn the generalization characteristics from the data within each cluster. Fuzzy rules for each cluster are then extracted. The fuzzy rule base is then used for rainfall prediction. This method is used to compare with an established method, which uses radial basis function networks and orographic effect. Results show that this method could provide similar results from the established method. However, this method has the advantage of allowing analyst to understand and interact with the model using fuzzy rules.  相似文献   

15.
Unlike conventional unsupervised classification methods, such as K‐means and ISODATA, which are based on partitional clustering techniques, the methodology proposed in this work attempts to take advantage of the properties of Kohonen's self‐organizing map (SOM) together with agglomerative hierarchical clustering methods to perform the automatic classification of remotely sensed images. The key point of the proposed method is to execute the cluster analysis process by means of a set of SOM prototypes, instead of working directly with the original patterns of the image. This strategy significantly reduces the complexity of the data analysis, making it possible to use techniques that have not normally been considered viable in the processing of remotely sensed images, such as hierarchical clustering methods and cluster validation indices. Through the use of the SOM, the proposed method maps the original patterns of the image to a two‐dimensional neural grid, attempting to preserve the probability distribution and topology of the input space. Afterwards, an agglomerative hierarchical clustering method with restricted connectivity is applied to the trained neural grid, generating a simplified dendrogram for the image data. Utilizing SOM statistic properties, the method employs modified versions of cluster validation indices to automatically determine the ideal number of clusters for the image. The experimental results show examples of the application of the proposed methodology and compare its performance to the K‐means algorithm.  相似文献   

16.
A map of text documents arranged using the Self-Organizing Map (SOM) algorithm (1) is organized in a meaningful manner so that items with similar content appear at nearby locations of the 2-dimensional map display, and (2) clusters the data, resulting in an approximate model of the data distribution in the high-dimensional document space. This article describes how a document map that is automatically organized for browsing and visualization can be successfully utilized also in speeding up document retrieval. Furthermore, experiments on the well-known CISI collection [3] show significantly improved performance compared to Salton's vector space model, measured by average precision (AP) when retrieving a small, fixed number of best documents. Regarding comparison with Latent Semantic Indexing the results are inconclusive. This revised version was published online in August 2006 with corrections to the Cover Date.  相似文献   

17.
以智慧城市管理应用系统中的案件上报短文本为对象,研究有效的特征生成和特征选择方法,实现案件快速准确地自动分类。根据案件描述短文本的特点,提出一种互邻特征组合算法,以生成描述力更强的组合特征;为进一步约减特征并优化特征空间,提出一种新的隶属度函数来为分类体系中的每个类别构建一个类别特征域,然后利用类别特征域进一步优化选择原始特征与组合特征,最终得到对分类贡献最高的特征表示集合。以南宁市青秀区“城管通”App中的案例分类为实例,验证提出的特征生成及选择方法,实验表明相对于文档频率、互信息和信息增益,提出的方法对案件分类的准确率更高,引入组合特征能显著提升分类准确率。  相似文献   

18.
为提高文本分类的准确性,本文提出了一种基于量子PSO和RBF神经网络的新的文本分类方法.首先建立描述样本类别的关键词集合,并采用模糊向量空间模型建立每类样本的特征向量,然后采用RBF神经网络实施文本自动分类,采用改进的量子PSO优化RBF神经网络的参数,以提高其逼近能力.选取中国期刊网的部分文献作为实验数据,实验结果说明本文所提出方法的分类精准度与其他同类方法相比有明显的提高.  相似文献   

19.
基于SOM聚类的软构件分类方法   总被引:1,自引:0,他引:1  
软构件刻面分类法是一种被各大软构件库系统广泛采用的分类方法,但是传统的刻面分类法需要人工建立和维护庞大的术语空间,增大了软构件建库和入库的工作量.利用基于SOM神经网络的聚类技术可实现无需建立术语空间的软构件自动分类,同时针对软构件的特点和SOM聚类的需要预先确定拓扑结构和聚类结果与输入样本的次序有关等缺点,对SOM聚类的训练过程进行改进以满足软构件聚类的要求.  相似文献   

20.
基于向量空间模型的贝叶斯文本分类方法   总被引:2,自引:0,他引:2  
提出基于向量空间模型的贝叶斯文本分类方法。首先提取出文本训练集的特征词,建立特征向量空间模型。然后采用贝叶斯文本分类方法对未知类别文档进行分类。给出了贝叶斯文本分类方法过程的详细描述和文本分类的一个测试实例。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号