The self-organizing map (SOM) is an excellent tool in exploratory phase of data mining. It projects input space on prototypes of a low-dimensional regular grid that can be effectively utilized to visualize and explore properties of the data. When the number of SOM units is large, to facilitate quantitative analysis of the map and the data, similar units need to be grouped, i.e., clustered. In this paper, different approaches to clustering of the SOM are considered. In particular, the use of hierarchical agglomerative clustering and partitive clustering using K-means are investigated. The two-stage procedure-first using SOM to produce the prototypes that are then clustered in the second stage-is found to perform well when compared with direct clustering of the data and to reduce the computation time. 相似文献
Data clustering is aimed at finding groups of data that share common hidden properties. These kinds of techniques are especially critical at early stages of data analysis where no information about the dataset is available. One of the mayor shortcomings of the clustering algorithms is the difficulty for non-experts users to configure them and, in some cases, interpret the results. In this work a computational approach with a two-layer structure based on Self-Organizing Map (SOM) is presented for cluster analysis. In the first level, a quantization of the data samples using topology-preserving metrics to automatically determine the number of units in the SOM is proposed. In the second level the obtained SOM prototypes are clustered by means of a connectivity analysis to explore the quality of the partitioning with different number of clusters. The most important benefit of this two-layer procedure is that computational load decreases considerably in comparison with data based clustering methods, making it possible to cluster large data sets and to consider several different clustering alternatives in a limited time. This methodology produces a two-dimensional map representation of the, usually, high dimensional input space, along with quantitative information on viable clustering alternatives, which facilitates the exploration of the possible partitions in a dataset. The efficiency and interpretation of the methodology is illustrated by its application to artificial, benchmark and real complex biological datasets. The experimental results demonstrate the ability of the method to identify possible segmentations in a dataset, compared to algorithms that only yield a single clustering solution. The proposed algorithm tackles the intrinsic limitations of SOM and the parameter settings associated with the clustering methodology, without requiring the number of clusters or the SOM architecture as a prerequisite, among others. This way, it makes possible its application even by researchers with a limited expertise in machine learning. 相似文献
The Self-Organising Map (SOM) is an Artificial Neural Network (ANN) model consisting of a regular grid of processing units. A model of some multidimensional observation, e.g. a class of digital images, is associated with each unit. The map attempts to represent all the available observations using a restricted set of models. In unsupervised learning, the models become ordered on the grid so that similar models are close to each other. We review here the objective functions and learning rules related to the SOM, starting from vector coding based on a Euclidean metric and extending the theory of arbitrary metrics and to a subspace formalism, in which each SOM unit represents a subspace of the observation space. It is shown that this Adaptive-Subspace SOM (ASSOM) is able to create sets of wavelet- and Gabor-type filters when randomly displaced or moving input patterns are used as training data. No analytical functional form for these filters is thereby postulated. The same kind of adaptive system can create many other kinds of invariant visual filters, like rotation or scale-invariant filters, if there exist corresponding transformations in the training data. The ASSOM system can act as a learning feature-extraction stage for pattern recognisers, being able to adapt to arbitrary sensory environments. We then show that the invariant Gabor features can be effectively used in face recognition, whereby the sets of Gabor filter outputs are coded with the SOM and a face is represented by the histogram over the SOM units. 相似文献
In this article, we have proposed a methodology for making a radial basis function network (RBFN) robust with respect to additive and multiplicative input noises. This is achieved by properly selecting the centers and widths for the radial basis function (RBF) units of the hidden layer. For this purpose, firstly, a set of self-organizing map (SOM) networks are trained for center selection. For training a SOM network, random Gaussian noise is injected in the samples of each class of the data set. The number of SOM networks is same as the number of classes present in the data set, and each of the SOM networks is trained separately by the samples belonging to a particular class. The weight vector associated with a unit in the output layer of a particular SOM network corresponding to a class is used as the center of a RBF unit for that class. To determine the widths of the RBF units, p-nearest neighbor algorithm is used class-wise. Proper selection of centers and widths makes the RBFN robust with respect to input perturbation and outliers present in the data set. The weights between the hidden and output layers of RBFN are obtained by pseudo inverse method. To test the robustness of the proposed method in additive and multiplicative noise scenarios, ten standard data sets have been used for classification. Proposed method has been compared with three existing methods, where the centers have been generated in three ways: randomly, using k-means algorithm, and based on SOM network. Simulation results show the superiority of the proposed method compared to those methods. Wilcoxon signed-rank test also shows that the proposed method is statistically better than those methods.
Visual inspections by hand often cause bottlenecks in production processes in industries. Therefore, it is desirable to be
mechanized and automated. In order to satisfy these requirements, we apply image recognition using a self-organizing map (SOM)
to visual inspection equipment. The SOM maps high-dimensional input data onto a low-dimensional (typically two-dimensional)
space. Through the mapping, the data are automatically clustered based on their similarity. Any unknown data which are input
onto the self-organized map are also mapped onto it according to their similarity. The categories of the unknown data are
thus recognized based on their positions on the map. The reason we use a SOM for inspections is that users can then know the
similarity distribution of all data at a glance on the map, and understand the mechanism of the recognition visually. We have
developed a visual inspection system using a SOM, and have evaluated it using actual product images. We have obtained high
recognition accuracies of 98% and 96% for one- and two-inspection-point tests, respectively, for a real industrial product. 相似文献
This paper presents a methodology to estimate the future success of a collaborative recommender in a citizen web portal. This methodology consists of four stages, three of them are developed in this study. First of all, a user model, which takes into account some usual characteristics of web data, is developed to produce artificial data sets. These data sets are used to carry out a clustering algorithm comparison in the second stage of our approach. This comparison provides information about the suitability of each algorithm in different scenarios. The benchmarked clustering algorithms are the ones that are most commonly used in the literature: c-Means, Fuzzy c-Means, a set of hierarchical algorithms, Gaussian mixtures trained by the expectation-maximization algorithm, and Kohonen's self-organizing maps (SOM). The most accurate clustering is yielded by SOM. Afterwards, we turn to real data. The users of a citizen web portal (Infoville XXI, http://www.infoville.es) are clustered. The clustering achieved enables us to study the future success of a collaborative recommender by means of a prediction strategy. New users are recommended according to the cluster in which they have been classified. The suitability of the recommendation is evaluated by checking whether or not the recommended objects correspond to those actually selected by the user. The results show the relevance of the information provided by clustering algorithms in this web portal, and therefore, the relevance of developing a collaborative recommender for this web site. 相似文献