共查询到20条相似文献,搜索用时 15 毫秒
1.
自组织映射在Web结构挖掘中的应用 总被引:1,自引:0,他引:1
该文讨论了用自组织映射进行Web结构挖掘的基本方法。用SOM可直观地表示数据的相似性和进行分类,还可方便地进行数据聚簇分析,并可在Web挖掘中找到权威页面等有用信息。 相似文献
2.
Zhong Su Qiang Yang Hongjiang Zhang Xiaowei Xu Yu-Hen Hu Shaoping Ma 《Knowledge and Information Systems》2002,4(2):151-167
A great challenge for web site designers is how to ensure users' easy access to important web pages efficiently. In this
paper we present a clustering-based approach to address this problem. Our approach to this challenge is to perform efficient
and effective correlation analysis based on web logs and construct clusters of web pages to reflect the co-visit behavior
of web site users. We present a novel approach for adapting previous clustering algorithms that are designed for databases
in the problem domain of web page clustering, and show that our new methods can generate high-quality clusters for very large
web logs when previous methods fail. Based on the high-quality clustering results, we then apply the data-mined clustering
knowledge to the problem of adapting web interfaces to improve users' performance. We develop an automatic method for web
interface adaptation: by introducing index pages that minimize overall user browsing costs. The index pages are aimed at providing
short cuts for users to ensure that users get to their objective web pages fast, and we solve a previously open problem of
how to determine an optimal number of index pages. We empirically show that our approach performs better than many of the
previous algorithms based on experiments on several realistic web log files.
Received 25 November 2000 / Revised 15 March 2001 / Accepted in revised form 14 May 2001 相似文献
3.
4.
5.
Lu Xu Author VitaeAuthor Vitae Tommy W.S. Chow Author Vitae 《Pattern recognition》2010,43(4):1668-1675
In this paper, a new algorithm named polar self-organizing map (PolSOM) is proposed. PolSOM is constructed on a 2-D polar map with two variables, radius and angle, which represent data weight and feature, respectively. Compared with the traditional algorithms projecting data on a Cartesian map by using the Euclidian distance as the only variable, PolSOM not only preserves the data topology and the inter-neuron distance, it also visualizes the differences among clusters in terms of weight and feature. In PolSOM, the visualization map is divided into tori and circular sectors by radial and angular coordinates, and neurons are set on the boundary intersections of circular sectors and tori as benchmarks to attract the data with the similar attributes. Every datum is projected on the map with the polar coordinates which are trained towards the winning neuron. As a result, similar data group together, and data characteristics are reflected by their positions on the map. The simulations and comparisons with Sammon's mapping, SOM and ViSOM are provided based on four data sets. The results demonstrate the effectiveness of the PolSOM algorithm for multidimensional data visualization. 相似文献
6.
Knowledge Discovery Through Self-Organizing Maps: Data Visualization and Query Processing 总被引:2,自引:1,他引:2
In data mining, the usefulness of a data pattern depends on the user of the database and does not solely depend on the statistical
strength of the pattern. Based on the premise that heuristic search in combinatorial spaces built on computer and human cognitive
theories is useful for effective knowledge discovery, this study investigates how the use of self-organizing maps as a tool
of data visualization in data mining plays a significant role in human–computer interactive knowledge discovery. This article
presents the conceptual foundations of the integration of data visualization and query processing for knowledge discovery,
and proposes a set of query functions for the validation of self-organizing maps in data mining.
Received 1 November 1999 / Revised 2 March 2000 / Accepted in revised form 20 October 2000 相似文献
7.
Self-Organizing Map (SOM) networks have been successfully applied as a clustering method to numeric datasets. However, it
is not feasible to directly apply SOM for clustering transactional data. This paper proposes the Transactions Clustering using
SOM (TCSOM) algorithm for clustering binary transactional data. In the TCSOM algorithm, a normalized Dot Product norm based
dissimilarity measure is utilized for measuring the distance between input vector and output neuron. And a modified weight
adaptation function is employed for adjusting weights of the winner and its neighbors. More importantly, TCSOM is a one-pass algorithm, which is extremely suitable for data mining applications. Experimental results on real datasets show that TCSOM
algorithm is superior to those state-of-the-art transactional data clustering algorithms with respect to clustering accuracy. 相似文献
8.
9.
Several techniques have been recently proposed to automatically generate Web wrappers, i.e., programs that extract data from HTML pages, and transform them into a more structured format, typically in XML. These techniques automatically induce a wrapper from a set of sample pages that share a common HTML template. An open issue, however, is how to collect suitable classes of sample pages to feed the wrapper inducer. Presently, the pages are chosen manually. In this paper, we tackle the problem of automatically discovering the main classes of pages offered by a site by exploring only a small yet representative portion of it. We propose a model to describe abstract structural features of HTML pages. Based on this model, we have developed an algorithm that accepts the URL of an entry point to a target Web site, visits a limited yet representative number of pages, and produces an accurate clustering of pages based on their structure. We have developed a prototype, which has been used to perform experiments on real-life Web sites. 相似文献
10.
针对常用聚类方法不能有效处理噪声数据的问题,本文结合神经网络具有自适应性的特点,提出基于神经网络的聚类(NN_Cluster)模型,并设计了基于自适应共振理论的神经网络聚类模型(ARTNN_Cluster)和基于自组织特征映射的神经网络聚类模型(SOMNN_Cluster)。标准数据集上的实验结果表明,与传统的K_means聚类方法相比,本文提出的基于神经网络的聚类模型有效地克服了传统方法的噪声问题,得到了较好的聚类效果。 相似文献
11.
翟剑锋 《电脑编程技巧与维护》2012,14(14):40-42
将自组织映射神经网络(SOM)与FCM结合,利用SOM的并行计算能够减少模糊C均值算法在处理海量数据时的聚类时间,可以提高聚类算法的速度和效果,同时使用该算法对校园网Web日志进行数据挖掘,能够对用户行为进行分析,从而提出相应的方法,更好地提高服务效率和管理质量. 相似文献
12.
基于多维自组织特征映射的聚类算法研究 总被引:2,自引:1,他引:1
作为神经网络的一种方法,自组织特征映射在数据挖掘、模式分类和机器学习中得到了广泛应用.本文详细讨论了自组织特征映射的聚类算法的工作原理和具体实现算法.通过系统仿真实验分析,SOFMF算法很好地克服了许多聚类算法存在的问题,在时间复杂度上具有良好的性能. 相似文献
13.
一种快速有效的Web文档聚类方法 总被引:2,自引:0,他引:2
以矢量空间模型VSM为Web文本的表示方法,提出了一种基于关联规则的Web文档聚类方法。实验证明:该方法能在保证文档聚类高精度的同时,依然保持高效率,其聚类性能明显优于传统Web文档聚类算法。 相似文献
14.
The fast growing cellular mobile systems demand more efficient and faster channel allocation techniques. Borrowing channel
assignment (BCA) is a compromising technique between fixed channel allocation (FCA) and dynamic channel allocation (DCA).
However, in the case of patterned traffic load, BCA is not efficient to further enhance the performance because some heavy-traffic
cells are unable to borrow channels from neighboring cells that do not have unused nominal channels. The performance of the
whole system can be raised if the short-term traffic load can be predicted and the nominal channels can be re-assigned for
all cells. This paper describes an improved BCA scheme using traffic load prediction. The prediction is obtained by using
the short-term forecasting ability of cellular probabilistic self-organizing map (CPSOM). This paper shows that the proposed
CPSOM-based BCA method is able to enhance the performance of patterned traffic load compared with the traditional BCA methods.
Simulation results corroborate that the proposed method delivers significantly better performance than BCA for patterned traffic
load situations, and is virtually as good as BCA in the other situations analyzed. 相似文献
15.
The World Wide Web (WWW) has been recognized as the ultimate and unique source of information for information retrieval and knowledge discovery communities. Tremendous amount of knowledge are recorded using various types of media, producing enormous amount of web pages in the WWW. Retrieval of required information from the WWW is thus an arduous task. Different schemes for retrieving web pages have been used by the WWW community. One of the most widely used scheme is to traverse predefined web directories to reach a user's goal. These web directories are compiled or classified folders of web pages and are usually organized into hierarchical structures. The classification of web pages into proper directories and the organization of directory hierarchies are generally performed by human experts. In this work, we provide a corpus-based method that applies a kind of text mining techniques on a corpus of web pages to automatically create web directories and organize them into hierarchies. The method is based on the self-organizing map learning algorithm and requires no human intervention during the construction of web directories and hierarchies. The experiments show that our method can produce comprehensible and reasonable web directories and hierarchies. 相似文献
16.
This study presents an unsupervised feature selection and learning approach for the discovery and intuitive imaging of significant temporal patterns in seismic single-station or network recordings. For this purpose, the data are parametrized by real-valued feature vectors for short time windows using standard analysis tools for seismic data, such as frequency-wavenumber, polarization, and spectral analysis. We use Self-Organizing Maps (SOMs) for a data-driven feature selection, visualization and clustering procedure, which is in particular suitable for high-dimensional data sets. Our feature selection method is based on significance testing using the Wald–Wolfowitz runs test for individual features and on correlation hunting with SOMs in feature subsets. Using synthetics composed of Rayleigh and Love waves and real-world data, we show the robustness and the improved discriminative power of that approach compared to feature subsets manually selected from individual wavefield parametrization methods. Furthermore, the capability of the clustering and visualization techniques to investigate the discrimination of wave phases is shown by means of synthetic waveforms and regional earthquake recordings. 相似文献
17.
基于粒子群优化的自组织特征映射神经网络及应用 总被引:5,自引:1,他引:5
采用粒子群优化(PSO)算法优化权重失真指数(LW D I),提出了基于粒子群优化的SOM(PSO-SOM)训练算法.用该算法取代K ohonen提出的启发式训练算法,同时引进核函数,以加强PSO-SOM算法的非线性聚类能力.以某工厂丙烯腈反应器数据为聚类应用研究对象,研究结果表明,与启发式训练算法相比,PSO-SOM算法能够得到较优的聚类,而且该算法实现简单、便于工程应用,对丙烯腈反应器参数调整以及收率监测具有显著的指导作用. 相似文献
18.
Web mining based on Growing Hierarchical Self-Organizing Maps: Analysis of a real citizen web portal 总被引:1,自引:0,他引:1
Antonio Soriano-Asensi Jos D. Martín-Guerrero Emilio Soria-Olivas Alberto Palomares Rafael Magdalena-Benedito Antonio J. Serrano-Lpez 《Expert systems with applications》2008,34(4):2988-2994
This work is focused on the usage analysis of a citizen web portal, Infoville XXI (http://www.infoville.es) by means of Self-Organizing Maps (SOM). In this paper, a variant of the classical SOM has been used, the so-called Growing Hierarchical SOM (GHSOM). The GHSOM is able to find an optimal architecture of the SOM in a few iterations. There are also other variants which allow to find an optimal architecture, but they tend to need a long time for training, especially in the case of complex data sets. Another relevant contribution of the paper is the new visualization of the patterns in the hierarchical structure. Results show that GHSOM is a powerful and versatile tool to extract relevant and straightforward knowledge from the vast amount of information involved in a real citizen web portal. 相似文献
19.
提出一种局部化的线性流形自组织映射方法, 可自主学习高维向量空间中的一组有序的低维线性流形. 与现有的基于Kohonen的自适应子空间自组织映射网络(Adaptive-subspace self-organizing map, ASSOM)方法相比较, 本文方法有效地克服了流形表达中出现的数据混淆现象, 网络中各神经元渐近学习各自区域内样本数据的平均向量和主元子空间, 数据表达更加清晰可辨. 实验中, 新方法对数据簇的分类准确率明显优于参与对比的其他三种方法, 其对手写体数字识别的准确率在MNIST训练集和测试集上分别达到了98.26%和97.46%. 相似文献
20.
数据挖掘中聚类的研究 总被引:16,自引:0,他引:16
聚类是数据挖掘中重要的研究课题。文章介绍了聚类,讨论了聚类分析中的数据类型及其相异度,概括了数据挖掘中常用的聚类方法。最后,提出了聚类研究中今后的若干发展趋势。 相似文献