首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
自组织映射在Web结构挖掘中的应用   总被引:1,自引:0,他引:1  
该文讨论了用自组织映射进行Web结构挖掘的基本方法。用SOM可直观地表示数据的相似性和进行分类,还可方便地进行数据聚簇分析,并可在Web挖掘中找到权威页面等有用信息。  相似文献   

2.
Correlation-Based Web Document Clustering for Adaptive Web Interface Design   总被引:2,自引:2,他引:2  
A great challenge for web site designers is how to ensure users' easy access to important web pages efficiently. In this paper we present a clustering-based approach to address this problem. Our approach to this challenge is to perform efficient and effective correlation analysis based on web logs and construct clusters of web pages to reflect the co-visit behavior of web site users. We present a novel approach for adapting previous clustering algorithms that are designed for databases in the problem domain of web page clustering, and show that our new methods can generate high-quality clusters for very large web logs when previous methods fail. Based on the high-quality clustering results, we then apply the data-mined clustering knowledge to the problem of adapting web interfaces to improve users' performance. We develop an automatic method for web interface adaptation: by introducing index pages that minimize overall user browsing costs. The index pages are aimed at providing short cuts for users to ensure that users get to their objective web pages fast, and we solve a previously open problem of how to determine an optimal number of index pages. We empirically show that our approach performs better than many of the previous algorithms based on experiments on several realistic web log files. Received 25 November 2000 / Revised 15 March 2001 / Accepted in revised form 14 May 2001  相似文献   

3.
数据挖掘技术是电子商务系统不可缺少的重要工具,两者的结合具有长远的发展前途。初探了将一种基于SOM的文本聚类方法应用于一个扩展的电子商务系统的方法,使之用来进行注册客户的聚类挖掘,有利于充分利用网络资源,提高网络系统的使用效率。  相似文献   

4.
Web内容挖掘技术研究   总被引:10,自引:4,他引:10  
简要介绍了Web挖掘的概念、分类以及其功能,阐述了Web挖掘与传统数据挖掘以及Web信息检索之间的关系。给出了Web内容挖掘的不同分类方法、文本以及多媒体文本数据挖掘的定义、分类与应用。重点分析了Web文本挖掘的方法,包括文本的特征表示与抽取、文本的分类与聚类等,讨论了多媒体文本分类挖掘方法。  相似文献   

5.
In this paper, a new algorithm named polar self-organizing map (PolSOM) is proposed. PolSOM is constructed on a 2-D polar map with two variables, radius and angle, which represent data weight and feature, respectively. Compared with the traditional algorithms projecting data on a Cartesian map by using the Euclidian distance as the only variable, PolSOM not only preserves the data topology and the inter-neuron distance, it also visualizes the differences among clusters in terms of weight and feature. In PolSOM, the visualization map is divided into tori and circular sectors by radial and angular coordinates, and neurons are set on the boundary intersections of circular sectors and tori as benchmarks to attract the data with the similar attributes. Every datum is projected on the map with the polar coordinates which are trained towards the winning neuron. As a result, similar data group together, and data characteristics are reflected by their positions on the map. The simulations and comparisons with Sammon's mapping, SOM and ViSOM are provided based on four data sets. The results demonstrate the effectiveness of the PolSOM algorithm for multidimensional data visualization.  相似文献   

6.
In data mining, the usefulness of a data pattern depends on the user of the database and does not solely depend on the statistical strength of the pattern. Based on the premise that heuristic search in combinatorial spaces built on computer and human cognitive theories is useful for effective knowledge discovery, this study investigates how the use of self-organizing maps as a tool of data visualization in data mining plays a significant role in human–computer interactive knowledge discovery. This article presents the conceptual foundations of the integration of data visualization and query processing for knowledge discovery, and proposes a set of query functions for the validation of self-organizing maps in data mining. Received 1 November 1999 / Revised 2 March 2000 / Accepted in revised form 20 October 2000  相似文献   

7.
Self-Organizing Map (SOM) networks have been successfully applied as a clustering method to numeric datasets. However, it is not feasible to directly apply SOM for clustering transactional data. This paper proposes the Transactions Clustering using SOM (TCSOM) algorithm for clustering binary transactional data. In the TCSOM algorithm, a normalized Dot Product norm based dissimilarity measure is utilized for measuring the distance between input vector and output neuron. And a modified weight adaptation function is employed for adjusting weights of the winner and its neighbors. More importantly, TCSOM is a one-pass algorithm, which is extremely suitable for data mining applications. Experimental results on real datasets show that TCSOM algorithm is superior to those state-of-the-art transactional data clustering algorithms with respect to clustering accuracy.  相似文献   

8.
Web文本挖掘系统及聚类分析算法   总被引:2,自引:0,他引:2  
朱克斌  唐菁  杨炳儒 《计算机工程》2004,30(13):138-139,183
给出了Web文本挖掘系统WTMS的系统总体结构图,开发并实现了基于SOM的Web文档层次聚类算法。同时结合现代远程教育背景实现了Web文本挖掘的原型系统。该系统可以对各类远程教育站点上收集的文本资料信息自动进行聚类挖掘,从而帮助人们快速进行文本信息导航,获取重要的知识。  相似文献   

9.
Several techniques have been recently proposed to automatically generate Web wrappers, i.e., programs that extract data from HTML pages, and transform them into a more structured format, typically in XML. These techniques automatically induce a wrapper from a set of sample pages that share a common HTML template. An open issue, however, is how to collect suitable classes of sample pages to feed the wrapper inducer. Presently, the pages are chosen manually. In this paper, we tackle the problem of automatically discovering the main classes of pages offered by a site by exploring only a small yet representative portion of it. We propose a model to describe abstract structural features of HTML pages. Based on this model, we have developed an algorithm that accepts the URL of an entry point to a target Web site, visits a limited yet representative number of pages, and produces an accurate clustering of pages based on their structure. We have developed a prototype, which has been used to perform experiments on real-life Web sites.  相似文献   

10.
胡伟 《微计算机信息》2012,(1):159-160,144
针对常用聚类方法不能有效处理噪声数据的问题,本文结合神经网络具有自适应性的特点,提出基于神经网络的聚类(NN_Cluster)模型,并设计了基于自适应共振理论的神经网络聚类模型(ARTNN_Cluster)和基于自组织特征映射的神经网络聚类模型(SOMNN_Cluster)。标准数据集上的实验结果表明,与传统的K_means聚类方法相比,本文提出的基于神经网络的聚类模型有效地克服了传统方法的噪声问题,得到了较好的聚类效果。  相似文献   

11.
将自组织映射神经网络(SOM)与FCM结合,利用SOM的并行计算能够减少模糊C均值算法在处理海量数据时的聚类时间,可以提高聚类算法的速度和效果,同时使用该算法对校园网Web日志进行数据挖掘,能够对用户行为进行分析,从而提出相应的方法,更好地提高服务效率和管理质量.  相似文献   

12.
基于多维自组织特征映射的聚类算法研究   总被引:2,自引:1,他引:1  
江波  张黎 《计算机科学》2008,35(6):181-182
作为神经网络的一种方法,自组织特征映射在数据挖掘、模式分类和机器学习中得到了广泛应用.本文详细讨论了自组织特征映射的聚类算法的工作原理和具体实现算法.通过系统仿真实验分析,SOFMF算法很好地克服了许多聚类算法存在的问题,在时间复杂度上具有良好的性能.  相似文献   

13.
一种快速有效的Web文档聚类方法   总被引:2,自引:0,他引:2  
以矢量空间模型VSM为Web文本的表示方法,提出了一种基于关联规则的Web文档聚类方法。实验证明:该方法能在保证文档聚类高精度的同时,依然保持高效率,其聚类性能明显优于传统Web文档聚类算法。  相似文献   

14.
The fast growing cellular mobile systems demand more efficient and faster channel allocation techniques. Borrowing channel assignment (BCA) is a compromising technique between fixed channel allocation (FCA) and dynamic channel allocation (DCA). However, in the case of patterned traffic load, BCA is not efficient to further enhance the performance because some heavy-traffic cells are unable to borrow channels from neighboring cells that do not have unused nominal channels. The performance of the whole system can be raised if the short-term traffic load can be predicted and the nominal channels can be re-assigned for all cells. This paper describes an improved BCA scheme using traffic load prediction. The prediction is obtained by using the short-term forecasting ability of cellular probabilistic self-organizing map (CPSOM). This paper shows that the proposed CPSOM-based BCA method is able to enhance the performance of patterned traffic load compared with the traditional BCA methods. Simulation results corroborate that the proposed method delivers significantly better performance than BCA for patterned traffic load situations, and is virtually as good as BCA in the other situations analyzed.  相似文献   

15.
The World Wide Web (WWW) has been recognized as the ultimate and unique source of information for information retrieval and knowledge discovery communities. Tremendous amount of knowledge are recorded using various types of media, producing enormous amount of web pages in the WWW. Retrieval of required information from the WWW is thus an arduous task. Different schemes for retrieving web pages have been used by the WWW community. One of the most widely used scheme is to traverse predefined web directories to reach a user's goal. These web directories are compiled or classified folders of web pages and are usually organized into hierarchical structures. The classification of web pages into proper directories and the organization of directory hierarchies are generally performed by human experts. In this work, we provide a corpus-based method that applies a kind of text mining techniques on a corpus of web pages to automatically create web directories and organize them into hierarchies. The method is based on the self-organizing map learning algorithm and requires no human intervention during the construction of web directories and hierarchies. The experiments show that our method can produce comprehensible and reasonable web directories and hierarchies.  相似文献   

16.
This study presents an unsupervised feature selection and learning approach for the discovery and intuitive imaging of significant temporal patterns in seismic single-station or network recordings. For this purpose, the data are parametrized by real-valued feature vectors for short time windows using standard analysis tools for seismic data, such as frequency-wavenumber, polarization, and spectral analysis. We use Self-Organizing Maps (SOMs) for a data-driven feature selection, visualization and clustering procedure, which is in particular suitable for high-dimensional data sets. Our feature selection method is based on significance testing using the Wald–Wolfowitz runs test for individual features and on correlation hunting with SOMs in feature subsets. Using synthetics composed of Rayleigh and Love waves and real-world data, we show the robustness and the improved discriminative power of that approach compared to feature subsets manually selected from individual wavefield parametrization methods. Furthermore, the capability of the clustering and visualization techniques to investigate the discrimination of wave phases is shown by means of synthetic waveforms and regional earthquake recordings.  相似文献   

17.
基于粒子群优化的自组织特征映射神经网络及应用   总被引:5,自引:1,他引:5  
吕强  俞金寿 《控制与决策》2005,20(10):1115-1119
采用粒子群优化(PSO)算法优化权重失真指数(LW D I),提出了基于粒子群优化的SOM(PSO-SOM)训练算法.用该算法取代K ohonen提出的启发式训练算法,同时引进核函数,以加强PSO-SOM算法的非线性聚类能力.以某工厂丙烯腈反应器数据为聚类应用研究对象,研究结果表明,与启发式训练算法相比,PSO-SOM算法能够得到较优的聚类,而且该算法实现简单、便于工程应用,对丙烯腈反应器参数调整以及收率监测具有显著的指导作用.  相似文献   

18.
This work is focused on the usage analysis of a citizen web portal, Infoville XXI (http://www.infoville.es) by means of Self-Organizing Maps (SOM). In this paper, a variant of the classical SOM has been used, the so-called Growing Hierarchical SOM (GHSOM). The GHSOM is able to find an optimal architecture of the SOM in a few iterations. There are also other variants which allow to find an optimal architecture, but they tend to need a long time for training, especially in the case of complex data sets. Another relevant contribution of the paper is the new visualization of the patterns in the hierarchical structure. Results show that GHSOM is a powerful and versatile tool to extract relevant and straightforward knowledge from the vast amount of information involved in a real citizen web portal.  相似文献   

19.
一种局部化的线性流形自组织映射   总被引:1,自引:0,他引:1       下载免费PDF全文
郑慧诚  沈伟 《自动化学报》2008,34(10):1298-1304
提出一种局部化的线性流形自组织映射方法, 可自主学习高维向量空间中的一组有序的低维线性流形. 与现有的基于Kohonen的自适应子空间自组织映射网络(Adaptive-subspace self-organizing map, ASSOM)方法相比较, 本文方法有效地克服了流形表达中出现的数据混淆现象, 网络中各神经元渐近学习各自区域内样本数据的平均向量和主元子空间, 数据表达更加清晰可辨. 实验中, 新方法对数据簇的分类准确率明显优于参与对比的其他三种方法, 其对手写体数字识别的准确率在MNIST训练集和测试集上分别达到了98.26%和97.46%.  相似文献   

20.
数据挖掘中聚类的研究   总被引:16,自引:0,他引:16  
聚类是数据挖掘中重要的研究课题。文章介绍了聚类,讨论了聚类分析中的数据类型及其相异度,概括了数据挖掘中常用的聚类方法。最后,提出了聚类研究中今后的若干发展趋势。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号

京公网安备 11010802026262号